[SPARK-57125][SQL] LimitPushDown should fold literal Limit+Offset sum so plan stays planable without ConstantFolding#56180
Open
rdtr wants to merge 1 commit into
Open
Conversation
… so plan stays planable without ConstantFolding
### What changes were proposed in this pull request?
The `LimitPushDown` rule in `Optimizer.scala` rewrites
`LocalLimit(le, Offset(oe, child))` to `Offset(oe, LocalLimit(Add(le,
oe), child))`,
leaving the `Add` unfolded. It relies on a subsequent
`ConstantFolding` pass to
turn `Add(Literal(N), Literal(M))` into `Literal(N + M)`.
This patch folds the sum eagerly when both operands are integer
literals, so the
rule produces a planable logical plan in a single application.
### Why are the changes needed?
If `ConstantFolding` is excluded via
`spark.sql.optimizer.excludedRules`, the
unfolded `LocalLimit(Add(Literal(N), Literal(M)), ...)` reaches
physical
planning. `BasicOperators` in `SparkStrategies` only matches
`LocalLimit(IntegerLiteral, _)`, so planning fails with
`AssertionError: No plan for LocalLimit (N + M)` wrapped as
`INTERNAL_ERROR`.
Repro (Scala):
spark.conf.set("spark.sql.optimizer.excludedRules",
"org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES
(10,'d1'),(20,'d2'),(30,'d3') AS t(id,name)")
spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS
t(id, dept_id)")
spark.sql("""
SELECT * FROM emp
WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
""").show()
Self-sufficient rules also reduce the risk for downstream consumers
(custom
optimizer pipelines, plugins) that exclude folding rules for
legitimate
reasons such as forcing certain code paths during testing.
### Does this PR introduce any user-facing change?
No. Default config behavior is unchanged because `ConstantFolding`
would have
folded the `Add` anyway. Only queries that previously crashed with
`OPTIMIZER_EXCLUDED_RULES=ConstantFolding` now succeed.
### How was this patch tested?
Added a new test in `LimitPushdownSuite` that runs only
`LimitPushDown` (no
`ConstantFolding`) and verifies the output limit is already folded to
a literal.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
LimitPushDowninOptimizer.scalarewritesLocalLimit(le, Offset(oe, child))toOffset(oe, LocalLimit(Add(le, oe), child)), leaving theAddunfolded. The rulerelies on a subsequent
ConstantFoldingpass to turnAdd(Literal(N), Literal(M))into
Literal(N + M)so thatBasicOperators(which only matchesLocalLimit(IntegerLiteral, _)) can produce a physical plan.This patch folds the sum eagerly when both operands are integer literals — the
realistic case from
LIMIT N OFFSET Mclauses — so the rule produces a planablelogical plan in a single application, without depending on a downstream rule.
Why are the changes needed?
If
ConstantFoldingis excluded viaspark.sql.optimizer.excludedRules, theunfolded
LocalLimit(Add(Literal(N), Literal(M)), ...)reaches physicalplanning.
BasicOperatorsonly matchesLocalLimit(IntegerLiteral, _), soplanning fails with:
wrapped as
[INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error.Repro (Scala):
Self-sufficient rules also reduce fragility for downstream consumers — custom optimizer pipelines and plugins that legitimately exclude folding rules (for example to force certain code paths during testing) shouldn't have queries that
crash at physical planning when a logically-equivalent plan exists.
Scope and follow-up
This patch fixes the literal-only case
LIMIT N OFFSET M, which is what the SQL parser produces for the common usage.It does not fix the case where a user writes literal arithmetic in the SQL,
e.g.
LIMIT 3 - 1 OFFSET 5 - 3, because:(
Subtract(Literal(3), Literal(1)), notLiteral(2)).BasicOperatorspatterns forLocalLimit/GlobalLimit/Offset(and thecomposite extractors
OffsetAndLimit/LimitAndOffset) all matchIntegerLiteralonly, so the unfoldedSubtractand the resultingOffset(Subtract, ...)still wouldn't plan.Fixing that broader case cleanly requires teaching
BasicOperatorsto evaluate any foldableIntegerTypeexpression at planning time (5 patterns: 3 simple + 2 composite extractors). I'd be happy to follow up with that change in a separate PR if reviewers feel the broader fix is appropriate. For now this PR targets the narrow, realistic case.Does this PR introduce any user-facing change?
No. Default config behavior is unchanged because
ConstantFoldingwould fold theAddanyway. Only queries that previously crashed whenOPTIMIZER_EXCLUDED_RULESincludedConstantFoldingnow succeed.How was this patch tested?
Added a new test in
LimitPushdownSuitethat runs onlyLimitPushDown(no subsequentConstantFolding) and verifies the output limit is already folded to a literal. Verified the test fails on master and passes with this fix.Existing
LimitPushdownSuitetests continue to pass (the existing "Push down limit 1 through Offset" test exercises the full pipeline includingConstantFoldingand remains unchanged in output).