Skip to content

[SPARK-57125][SQL] LimitPushDown should fold literal Limit+Offset sum so plan stays planable without ConstantFolding#56180

Open
rdtr wants to merge 1 commit into
apache:masterfrom
rdtr:limit-pushdown-fold-offset
Open

[SPARK-57125][SQL] LimitPushDown should fold literal Limit+Offset sum so plan stays planable without ConstantFolding#56180
rdtr wants to merge 1 commit into
apache:masterfrom
rdtr:limit-pushdown-fold-offset

Conversation

@rdtr
Copy link
Copy Markdown

@rdtr rdtr commented May 28, 2026

What changes were proposed in this pull request?

LimitPushDown in Optimizer.scala rewrites LocalLimit(le, Offset(oe, child)) to
Offset(oe, LocalLimit(Add(le, oe), child)), leaving the Add unfolded. The rule
relies on a subsequent ConstantFolding pass to turn Add(Literal(N), Literal(M))
into Literal(N + M) so that BasicOperators (which only matches
LocalLimit(IntegerLiteral, _)) can produce a physical plan.

This patch folds the sum eagerly when both operands are integer literals — the
realistic case from LIMIT N OFFSET M clauses — so the rule produces a planable
logical plan in a single application, without depending on a downstream rule.

case LocalLimit(le, Offset(oe, grandChild)) =>
  val mergedLimit = (le, oe) match {
    case (IntegerLiteral(l), IntegerLiteral(o)) => Literal(l + o, IntegerType)
    case _ => Add(le, oe)
  }
  Offset(oe, LocalLimit(mergedLimit, grandChild))

Why are the changes needed?

If ConstantFolding is excluded via spark.sql.optimizer.excludedRules, the
unfolded LocalLimit(Add(Literal(N), Literal(M)), ...) reaches physical
planning. BasicOperators only matches LocalLimit(IntegerLiteral, _), so
planning fails with:

java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
  at scala.Predef$.assert(Predef.scala:279)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
  at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
  ...

wrapped as [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error.

Repro (Scala):

val spark = SparkSession.builder().master("local[2]")
  .config("spark.sql.optimizer.excludedRules",
    "org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
  .getOrCreate()
spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES (10,'d1'),(20,'d2'),(30,'d3') AS t(id, name)")
spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS t(id, dept_id)")
spark.sql("""
  SELECT * FROM emp
  WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
""").show()

Self-sufficient rules also reduce fragility for downstream consumers — custom optimizer pipelines and plugins that legitimately exclude folding rules (for example to force certain code paths during testing) shouldn't have queries that
crash at physical planning when a logically-equivalent plan exists.

Scope and follow-up

This patch fixes the literal-only case LIMIT N OFFSET M, which is what the SQL parser produces for the common usage.

It does not fix the case where a user writes literal arithmetic in the SQL,
e.g. LIMIT 3 - 1 OFFSET 5 - 3, because:

  • The optimizer rule's input expressions are themselves unfolded
    (Subtract(Literal(3), Literal(1)), not Literal(2)).
  • BasicOperators patterns for LocalLimit / GlobalLimit / Offset (and the
    composite extractors OffsetAndLimit / LimitAndOffset) all match
    IntegerLiteral only, so the unfolded Subtract and the resulting
    Offset(Subtract, ...) still wouldn't plan.

Fixing that broader case cleanly requires teaching BasicOperators to evaluate any foldable IntegerType expression at planning time (5 patterns: 3 simple + 2 composite extractors). I'd be happy to follow up with that change in a separate PR if reviewers feel the broader fix is appropriate. For now this PR targets the narrow, realistic case.

Does this PR introduce any user-facing change?

No. Default config behavior is unchanged because ConstantFolding would fold the Add anyway. Only queries that previously crashed when OPTIMIZER_EXCLUDED_RULES included ConstantFolding now succeed.

How was this patch tested?

Added a new test in LimitPushdownSuite that runs only LimitPushDown (no subsequent ConstantFolding) and verifies the output limit is already folded to a literal. Verified the test fails on master and passes with this fix.

Existing LimitPushdownSuite tests continue to pass (the existing "Push down limit 1 through Offset" test exercises the full pipeline including ConstantFolding and remains unchanged in output).

… so plan stays planable without ConstantFolding

  ### What changes were proposed in this pull request?

  The `LimitPushDown` rule in `Optimizer.scala` rewrites
  `LocalLimit(le, Offset(oe, child))` to `Offset(oe, LocalLimit(Add(le,
oe), child))`,
  leaving the `Add` unfolded. It relies on a subsequent
`ConstantFolding` pass to
  turn `Add(Literal(N), Literal(M))` into `Literal(N + M)`.

  This patch folds the sum eagerly when both operands are integer
literals, so the
  rule produces a planable logical plan in a single application.

  ### Why are the changes needed?

  If `ConstantFolding` is excluded via
`spark.sql.optimizer.excludedRules`, the
  unfolded `LocalLimit(Add(Literal(N), Literal(M)), ...)` reaches
physical
  planning. `BasicOperators` in `SparkStrategies` only matches
  `LocalLimit(IntegerLiteral, _)`, so planning fails with
  `AssertionError: No plan for LocalLimit (N + M)` wrapped as
`INTERNAL_ERROR`.

  Repro (Scala):
  spark.conf.set("spark.sql.optimizer.excludedRules",
    "org.apache.spark.sql.catalyst.optimizer.ConstantFolding")
  spark.sql("CREATE TEMP VIEW dept AS SELECT * FROM VALUES
(10,'d1'),(20,'d2'),(30,'d3') AS t(id,name)")
  spark.sql("CREATE TEMP VIEW emp AS SELECT * FROM VALUES (1,10) AS
t(id, dept_id)")
  spark.sql("""
    SELECT * FROM emp
    WHERE EXISTS (SELECT name FROM dept WHERE id > 10 LIMIT 1 OFFSET 2)
  """).show()

  Self-sufficient rules also reduce the risk for downstream consumers
(custom
  optimizer pipelines, plugins) that exclude folding rules for
legitimate
  reasons such as forcing certain code paths during testing.

  ### Does this PR introduce any user-facing change?

  No. Default config behavior is unchanged because `ConstantFolding`
would have
  folded the `Add` anyway. Only queries that previously crashed with
  `OPTIMIZER_EXCLUDED_RULES=ConstantFolding` now succeed.

  ### How was this patch tested?

  Added a new test in `LimitPushdownSuite` that runs only
`LimitPushDown` (no
  `ConstantFolding`) and verifies the output limit is already folded to
a literal.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant