Skip to content

[GLUTEN-11683][VL] Fix SPARK-18108 and parquet-thrift compatibility#11689

Draft
baibaichen wants to merge 4 commits intoapache:mainfrom
baibaichen:pr2/fix-parquet-thrift-spark18108
Draft

[GLUTEN-11683][VL] Fix SPARK-18108 and parquet-thrift compatibility#11689
baibaichen wants to merge 4 commits intoapache:mainfrom
baibaichen:pr2/fix-parquet-thrift-spark18108

Conversation

@baibaichen
Copy link
Contributor

@baibaichen baibaichen commented Mar 3, 2026

What changes were proposed in this pull request?

Fix SPARK-18108 and parquet-thrift compatibility issues.

Changes

  1. Velox: Replace OAP INT narrowing with upstream PR #15173 (get-velox.sh):
    Fix parquet-thrift compatibility by importing upstream fix.

  2. Fix SPARK-18108 (SubstraitToVeloxPlan.cc):
    Exclude partition columns from HiveTableHandle.dataColumns() to prevent type validation failures when partition column types differ from file column types.

  3. Update VeloxTestSettings (spark40 + spark41):

    • Remove 2 excludes now passing: LongType→IntegerType, LongType→DateType
    • Add 2 excludes for new failures: IntegerType→ShortType (OAP removed)

Test Results

#11684 This PR
✅ Passed 21 21
❌ Excluded 63 63 (-2 +2)

Additionally fixed (not in TypeWideningSuite):

  • SPARK-18108 Parquet reader fails when data column types conflict with partition ones ✅
  • Read Parquet file generated by parquet-thrift ✅

Depends on #11684 (PR1).
Fixes #11683

How was this patch tested?

Local tests: TypeWideningSuite 21 pass / 63 ignored, SPARK-18108 ✅, parquet-thrift ✅.

Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

@github-actions github-actions bot added CORE works for Gluten Core BUILD VELOX labels Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen force-pushed the pr2/fix-parquet-thrift-spark18108 branch from 0c5caa5 to 0728011 Compare March 3, 2026 13:19
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Run Gluten Clickhouse CI on x86

@baibaichen baibaichen force-pushed the pr2/fix-parquet-thrift-spark18108 branch from 0728011 to 33e87af Compare March 6, 2026 09:28
@github-actions
Copy link

github-actions bot commented Mar 6, 2026

Run Gluten Clickhouse CI on x86

baibaichen and others added 3 commits March 10, 2026 05:34
Replace OAP commit [15173][15343] (INT narrowing) with upstream Velox
PR #15173 (fix reading array of row) to fix parquet-thrift compatibility.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…olumns

When Gluten creates HiveTableHandle, it was passing all columns (including
partition columns) as dataColumns. This caused Velox's convertType() to
validate partition column types against the Parquet file's physical types,
failing when they differ (e.g., LongType in file vs IntegerType from
partition inference).

Fix: build dataColumns excluding partition columns (ColumnType::kPartitionKey).
Partition column values come from the partition path, not from the file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With OAP INT narrowing commit replaced by upstream Velox PR #15173:
- Remove 2 excludes now passing: LongType->IntegerType, LongType->DateType
- Add 2 excludes for new failures: IntegerType->ShortType (OAP removed)

Exclude 63 (net unchanged: -2 +2). Test results: 21 pass / 63 ignored.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@baibaichen baibaichen force-pushed the pr2/fix-parquet-thrift-spark18108 branch from 8c11878 to 2ec5651 Compare March 10, 2026 07:37
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

These tests regress after skipping OAP commit 8c2bd0849 (Allow reading
integers into smaller-range types). They will be re-enabled in PR3 when
Velox widening commits are applied.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@baibaichen baibaichen force-pushed the pr2/fix-parquet-thrift-spark18108 branch from 2ec5651 to 4bef8b1 Compare March 10, 2026 09:44
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BUILD CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Support type widening in Parquet reader (SPARK-40876)

1 participant