[SPARK-56385][SQL][FOLLOW-UP] Fix FIELD_NOT_FOUND when remapping pushed filters after nested schema pruning by anton5798 · Pull Request #55477 · apache/spark

anton5798 · 2026-04-22T10:01:56Z

What changes were proposed in this pull request?

Wrap projectionFunc in scala.util.Try when remapping pushedFilterExpressions against the pruned scan output in V2ScanRelationPushDown.pruneColumns, and drop filters whose remap fails. The accompanying .subsetOf(AttributeSet(output)) filter is retained for the top-level-column pruning case.

Why are the changes needed?

After SPARK-56385, pushedFilterExpressions are remapped through ProjectionOverSchema to match the post-pruning scan output. When a pushed filter references a nested struct field that nested schema pruning has dropped, ProjectionOverSchema calls
StructType.fieldIndex on the narrowed struct and throws SparkIllegalArgumentException: [FIELD_NOT_FOUND].

Repro (exercised by the new test):

Schema:  s: struct<a: int, b: int>, i: int
Query:   SELECT s.b FROM t WHERE s.a > 3   (s.a fully pushed)

Column pruning narrows s to struct<b>. The parent s is still in the output, so the existing .subsetOf guard passes, but remapping GetStructField(s, "a") through ProjectionOverSchema throws because field a is gone.

This does not crash for top-level pruning — when the pruned column is entirely absent from the output, ProjectionOverSchema.getProjection returns None and transformDown leaves the expression unchanged, which .subsetOf then drops cleanly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a unit test in DataSourceV2Suite that reproduces the crash via a new NestedSchemaDataSourceV2 + SELECT s.b WHERE s.a > 3 pattern.

…d filters after nested schema pruning ### What changes were proposed in this pull request? Wrap `projectionFunc` in `scala.util.Try` when remapping `pushedFilterExpressions` against the pruned scan output in `V2ScanRelationPushDown.pruneColumns`, and drop filters whose remap fails. The accompanying `.subsetOf(AttributeSet(output))` filter is retained for the top-level-column pruning case. ### Why are the changes needed? After SPARK-56385, `pushedFilterExpressions` are remapped through `ProjectionOverSchema` to match the post-pruning scan output. When a pushed filter references a nested struct field that nested schema pruning has dropped, `ProjectionOverSchema` calls `StructType.fieldIndex` on the narrowed struct and throws `SparkIllegalArgumentException: [FIELD_NOT_FOUND]`. Repro (exercised by the new test): ``` Schema: s: struct<a: int, b: int>, i: int Query: SELECT s.b FROM t WHERE s.a > 3 (s.a fully pushed) ``` Column pruning narrows `s` to `struct<b>`. The parent `s` is still in the output, so the existing `.subsetOf` guard passes, but remapping `GetStructField(s, "a")` through `ProjectionOverSchema` throws because field `a` is gone. This does not crash for top-level pruning — when the pruned column is entirely absent from the output, `ProjectionOverSchema.getProjection` returns `None` and `transformDown` leaves the expression unchanged, which `.subsetOf` then drops cleanly. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added a unit test in `DataSourceV2Suite` that reproduces the crash via a new `NestedSchemaDataSourceV2` + `SELECT s.b WHERE s.a > 3` pattern.

yyanyy

Thank you for helping fixing this!

cloud-fan

LGTM, with one optional nit. Clean, well-targeted follow-up. The fix is localized to the one call site that actually needs to tolerate remap failures (fully-pushed filters), and correctly leaves the post-scan remap at line 820 alone — those filter references are considered by SchemaPruning.identifyRootFields, so their nested fields are preserved.

Address review feedback from Wenchen: catch only the specific `SparkIllegalArgumentException` with condition `FIELD_NOT_FOUND` thrown by `StructType.fieldIndex` when a pushed filter references a pruned nested field, instead of swallowing every `Throwable` via `scala.util.Try`. Other failure modes (e.g., `SparkException.internalError` from `ProjectionOverSchema`'s "unmatched child schema" branches) now surface instead of being silently dropped.

cloud-fan · 2026-04-23T17:56:53Z

thanks, merging to master!

yyanyy approved these changes Apr 22, 2026

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-56385][SQL][FOLLOWUP] Fix FIELD_NOT_FOUND when remapping pushed filters after nested schema pruning~~ [SPARK-56385][SQL][FOLLOW-UP] Fix FIELD_NOT_FOUND when remapping pushed filters after nested schema pruning Apr 23, 2026

cloud-fan approved these changes Apr 23, 2026

View reviewed changes

Comment thread ...re/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala

anton5798 force-pushed the fix-pushed-filter-nested-pruning branch from 49a1510 to 92b01b3 Compare April 23, 2026 08:38

cloud-fan closed this in 875a2f2 Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56385][SQL][FOLLOW-UP] Fix FIELD_NOT_FOUND when remapping pushed filters after nested schema pruning#55477

[SPARK-56385][SQL][FOLLOW-UP] Fix FIELD_NOT_FOUND when remapping pushed filters after nested schema pruning#55477
anton5798 wants to merge 2 commits into
apache:masterfrom
anton5798:fix-pushed-filter-nested-pruning

anton5798 commented Apr 22, 2026 •

edited

Loading

Uh oh!

yyanyy left a comment

Uh oh!

cloud-fan left a comment

Uh oh!

Uh oh!

cloud-fan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anton5798 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

yyanyy left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cloud-fan commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anton5798 commented Apr 22, 2026 •

edited

Loading