Skip to content

[CORE][VL] Fix struct field binding after explode aliasing#11934

Open
alexr17 wants to merge 2 commits intoapache:mainfrom
alexr17:alexr/bind-getstructfield-explode-fix
Open

[CORE][VL] Fix struct field binding after explode aliasing#11934
alexr17 wants to merge 2 commits intoapache:mainfrom
alexr17:alexr/bind-getstructfield-explode-fix

Conversation

@alexr17
Copy link
Copy Markdown
Contributor

@alexr17 alexr17 commented Apr 13, 2026

What changes are proposed in this pull request?

GenerateExecTransformer can rewrite exploded output attributes to post-project aliases while preserving the original exprId. A downstream GetStructField may still refer to the original attribute name, so name-based binding fails even though the matching input attribute is present by exprId.

This patch keeps the existing name-based binding path, then falls back to exprId matching when the root attribute name is no longer present. It also hardens nested struct ordinal resolution so partially matched paths do not return stale ordinals.

How was this patch tested?

Added regression coverage in MiscOperatorSuite for:

  • struct field projection after LATERAL VIEW EXPLODE
  • struct field filtering after LATERAL VIEW EXPLODE
  • nested struct field grouping after explode_outer

Ran:

  • mvn -Pbackends-velox -pl backends-velox spotless:check -DspotlessFiles=backends-velox/src/test/scala/org/apache/gluten/execution/MiscOperatorSuite.scala

Was this patch authored or co-authored using generative AI tooling?

Yes, issue was reproed locally and fix was partially generated with AI

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Apr 13, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@alexr17 alexr17 force-pushed the alexr/bind-getstructfield-explode-fix branch from f7b05ea to 3b26d4f Compare April 13, 2026 22:17
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@alexr17 alexr17 force-pushed the alexr/bind-getstructfield-explode-fix branch from 3b26d4f to 560f222 Compare April 13, 2026 22:22
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

mcdull-zhang pushed a commit to mcdull-zhang/gluten that referenced this pull request Apr 15, 2026
* [GLUTEN-6887][VL] Daily Update Velox Version (dft-2026_02_06)

Upstream Velox's New Commits:
8fa3041b1 by Basit Ayantunde, feat(cudf): Implement JIT Expression Evaluator for CUDF (#16075)
6beae8654 by Pedro Eugenio Rocha Pedreira, feat(cursor): Add at() API to return cursor location (#16257)
b0585adfe by Zac Wen, misc: Improve velox selective reader E2EFilter with index setup (#16172)
886187ff2 by Ke Wang, feat: Add ioWaitWallNanos stats breakdown (#16189)
ac0bfb36b by Natasha Sehgal, feat: Add P4HyperLogLog custom type documentation
ef377ac6f by Jialiang Tan, feat: Do not allow null key in index writer (#16242)
0abf00047 by Pedro Eugenio Rocha Pedreira, feat(python): Add LocalDebuggerRunner (#16225)
442a78899 by Jonathan Hehir, fix: Return non-empty distribution function on empty QDigest (#14617)
b8db1b1ac by Heidi Han, fix(test): Flakiness in KHLL uniqueDistribution test (#16237)
76305b8db by Jialiang Tan, feat: Add dynamic output batch sizing for MergeJoin (#16052)
2ae3fddd9 by Guilherme Kunigami, Add dependency injection for remote function thrift client (#16231)
55dc73e51 by Shanyue Wan, feat: Support unknown in coerceTypeBase (#16207)
f6cde1e62 by Henry Edwin Dikeman, refactor: Remove deprecated code from velox/type/parser/ (#16219)
de2a5cd65 by Jialiang Tan, fix: Let spiller report file io stats (#16209)
6213c0591 by Jialiang Tan, fix: Fix KeyEncoder null increment carry over (#16236)
a6fa50194 by Masha Basmanova, feat: Add isDefaultNullBehavior API (#16239)
10bdc0688 by Xiaoxuan Meng, feat: Add constant bound support for between conditions in HiveIndexReader (#16228)
e3dacce2a by Zac Wen, misc: Add cursor_copy_result flag to trace replayer (#16227)
165fe367c by Jialiang Tan, feat: Add user friendly trace prompting API (#16192)
12be762d3 by Jialiang Tan, fix: Fix replayer crash at cleanup (#16191)
1ccad056a by Sergey Pershin, Fix: Fix int32 overflow in SEQUENCE() function (#16232)
9f8cb40e2 by Ping Liu, misc: Remove backward compatibility IcebergInsertTableHandle ctor (#16153)
4b29e6094 by Sergey Pershin, Fix: Fix int32 overflow in REPEAT() function (#16224)
b53358590 by PHILO-HE, feat: Add Spark map_from_entries function (apache#11934)
3177849f9 by Ping Liu, feat: Add IcebergDataSource (#16177)
a539ae3b7 by Ping Liu, feat: Support DuckDB to velox TIME conversion (#16218)
eb6d20241 by Xiaoxuan Meng, feat: Support Hive index source (#16215)
277285d1b by Karthikeyan Natarajan, refactor(cudf): Improve debug printing for cudf (#15831)
c0469b4b3 by Jacob Khaliqi, docs: Add docs for array_top_n (#15965)
43959c9aa by Pedro Eugenio Rocha Pedreira, feat(cursor): Integrate debug cursor into TaskCursor (#16206)
bc708220d by Xiaoxuan Meng, fix: Fix lookup input column naming and deduplicate input columns (#16205)
7da351de0 by Mohammad Linjawi, feat: Add ANSI mode support for Spark CAST(string as boolean) (#16059)
ec99bc775 by Peter Enescu, refactor(Cursor): Preserve vector encoded copy (#16086)
6a1154b3a by Krishna Pai, fix: Increase base machine for ubuntu resolve dependencies (#16204)
da0d2fdba by Simon Eves, fix(cudf): Check that join filter expressions can be evaluated by CUDF (#16180)
36519e096 by Jacob Khaliqi, docs: Add docs for replace_first, longest_common_prefix, and bit_length (#15947)
4599ca7fe by Zac Wen, misc: Clean up deprecated code in IndexLookupJoin (#16174)
bef494238 by Abhinav Mukherjee, Remove generic overloads from map_except and map intersect (#16178)
f91d90388 by Masha Basmanova, fix: Harden validation of output type for NestedLoop and IndexLookup join (#16200)
265d6b1a5 by aditi-pandit, fix(docs): Correct metric name for spill_writes_count (#15340)
7a3777221 by Bradley Dice, feat(cudf): Add managed async memory resource support (#16182)
b9c6617ea by Masha Basmanova, feat: Extend PlanConsistencyChecker to check for duplicate names in the output of NLJ (#16197)
35fbc84cd by Xiaoxuan Meng, feat: Support mixed index join conditions and filters in HiveIndexReader (#16193)
6fe8d019a by Xiaoxuan Meng, feat: Add hive index reader for Nimble file format (#16175)
217ed8e2c by Masha Basmanova, feat: Improve PlanConsistencyChecker error messages (#16186)
c15c79aa5 by duanmeng, fix: Fix test hang caused by missing add `TestValue::adjust` in AsyncDataCache.cpp (#16185)
a25666b54 by Christian Zentgraf, fix(build): Missing typename in Clang 15 (#16171)
874139512 by Christian Zentgraf, feat(build): Update FBOS to v2026.01.05.00 (#15967)
52a01b520 by Matt Gara, feat(cudf): Check signatures of aggregate functions before replacing with `cudf` variants (#15529)
91150b733 by Pedro Eugenio Rocha Pedreira, feat(trace): Add TaskDebuggerCursor (#16119)
e46bf5388 by beliefer, refactor: Replace new with make_shared for ThriftInternal (#16136)
b3aa142e8 by Han Yan, fix: task cleanup in ExchangeClientTest to prevent CI timeout (#16168)

Signed-off-by: glutenperfbot <glutenperfbot@glutenproject-internal.com>

* bump folly

Signed-off-by: Yuan <yuanzhou@apache.org>

* fix

Signed-off-by: Yuan <yuanzhou@apache.org>

* Revert "fix"

This reverts commit be0dd01.

* Revert "bump folly"

This reverts commit 6e05172.

---------

Signed-off-by: glutenperfbot <glutenperfbot@glutenproject-internal.com>
Signed-off-by: Yuan <yuanzhou@apache.org>
Co-authored-by: glutenperfbot <glutenperfbot@glutenproject-internal.com>
Co-authored-by: Yuan <yuanzhou@apache.org>
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant