Skip to content

[SPARK-54419][SQL] Avoid expanding expensive alias chains in optimizer#55666

Open
201573 wants to merge 1 commit intoapache:masterfrom
201573:codex/54419-predicate-pushdown
Open

[SPARK-54419][SQL] Avoid expanding expensive alias chains in optimizer#55666
201573 wants to merge 1 commit intoapache:masterfrom
201573:codex/54419-predicate-pushdown

Conversation

@201573
Copy link
Copy Markdown

@201573 201573 commented May 4, 2026

What changes were proposed in this pull request?

This PR avoids eagerly expanding expensive projection aliases during predicate pushdown, and prevents CollapseProject from force-inlining multi-use expensive Python-UDF-containing aliases just to merge adjacent Python UDF projections.

It also adds a regression test for the deep withColumn rewrite pattern reported in SPARK-54419.

Why are the changes needed?

The optimizer could blow up on deep iterative withColumn rewrites when a filter above the projection chain referenced an expensive alias. We were expanding those aliases before deciding whether the predicate could stay above the project, and then CollapseProject could still force-inline the expensive chain while merging Python UDF projections.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Verified locally:

  • ./build/sbt "catalyst/testOnly org.apache.spark.sql.catalyst.optimizer.FilterPushdownSuite org.apache.spark.sql.catalyst.optimizer.CollapseProjectSuite"
  • git diff --check

@201573 201573 marked this pull request as ready for review May 4, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant