[GLUTEN-11916][VL][TEST] Enable subquery/exists-subquery/exists-orderby-limit.sql with SPARK-57125 workaround by rdtr · Pull Request #12165 · apache/gluten

rdtr · 2026-05-28T07:09:14Z

Summary

Enables subquery/exists-subquery/exists-orderby-limit.sql in spark41 SQL query tests by re-enabling ConstantFolding for just this one file via a per-file --SET spark.sql.optimizer.excludedRules=... directive. The test was previously TODO-commented out because it crashed with an INTERNAL_ERROR during physical planning.

Root cause

GlutenSQLQueryTestSuite excludes ConvertToLocalRelation, ConstantFolding,
and NullPropagation from the optimizer by default to force queries through
Gluten's offload paths. With ConstantFolding excluded, Spark's LimitPushDown
rule produces an unfolded Add expression that BasicOperators in
SparkStrategies cannot match, causing physical planning to fail with:

java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
+- Project [1 AS col#...]
   +- Filter (dept_id#... > 10)
      +- LocalRelation [dept_id#..., dept_name#..., state#...]

  at scala.Predef$.assert(Predef.scala:279)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
  at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
  ...
  at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.compileSubquery(...)

wrapped as [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error.

The trigger is the EXISTS+OFFSET pattern (query # 17 in this SQL file):

SELECT * FROM emp WHERE EXISTS (
  SELECT dept.dept_name FROM dept WHERE dept.dept_id > 10 LIMIT 1 OFFSET 2
)

LimitPushDown rewrites LocalLimit(le, Offset(oe, child)) into Offset(oe, LocalLimit(Add(le, oe), child)) and relies on ConstantFolding to subsequently fold Add(Literal(1), Literal(2)) to Literal(3) so that BasicOperators (which only matches LocalLimit(IntegerLiteral, _)) can produce a physical plan.

Fix

Add a per-file --SET directive that overrides the default exclusion to only exclude ConvertToLocalRelation, re-enabling ConstantFolding (and NullPropagation, which gets re-enabled because the test framework's --SET parser splits values by comma — see the note below). This keeps Gluten's offload paths exercised while allowing the test to plan.

The upstream Spark fix is tracked as SPARK-57125 (PR
apache/spark#56180), which makes
LimitPushDown produce a folded literal directly so the rule no longer depends
on ConstantFolding. Once that lands and Gluten picks up the Spark version,
the --SET directive in this file can be removed.

Test framework limitation

The Gluten/Spark SQL test framework's --SET parser at
SQLQueryTestHelper.scala:476-481 splits values by comma at the top level,
which means multi-rule values like
excludedRules=Rule1,Rule2 can't be specified in a single --SET
(StringIndexOutOfBoundsException). For now, accepting NullPropagation
being re-enabled is fine for this test. Filed as a separate follow-up.

Verification

✅ Reproduced the AssertionError: No plan for LocalLimit (1 + 2) by
enabling the test without the --SET directive on the original
GlutenSQLQueryTestSuite (not a diagnostic subclass).
✅ With the --SET directive applied, the test passes.
✅ Spark unit test added in SPARK-57125 / [SPARK-57125][SQL] LimitPushDown should fold literal Limit+Offset sum so plan stays planable without ConstantFolding spark#56180 fails on
master and passes with the upstream fix.

Regarding Spark 4.0

The same SQL test file (with the same EXISTS+OFFSET queries) is enabled and
appears to pass in gluten-ut/spark40. I checked:

Same SQL input file content
Identical golden file (both versions expect successful results for the
OFFSET queries)
Same LimitPushDown rule in Spark 4.0.0 source (verified via the GitHub
v4.0.0 tag)
Same BasicOperators IntegerLiteral-only matchers in Spark 4.0.0 source
Same rule exclusions (ConvertToLocalRelation, ConstantFolding,
NullPropagation) in gluten-ut/spark40/.../GlutenSQLQueryTestSuite.scala
gluten-ut/spark40 is enabled in velox_backend_x86.yml and the
spark-test-spark40-slow job runs GlutenSQLQueryTestSuite via the
ExtendedSQLTest tag — and passes

I couldn't identify what makes Spark 4.0.0 + Gluten avoid hitting this code path while Spark 4.1.1 + Gluten triggers it. The bug ingredients look identical. If a reviewer with more context on the spark40 setup can shed light, that would be appreciated — but it doesn't block this PR. I am happy to build and test locally with 4.0 if necessary.

Test plan

CI: spark-test-spark41-slow runs GlutenSQLQueryTestSuite with
ExtendedSQLTest tag, which exercises this file.
Verified locally in IntelliJ by running GlutenSQLQueryTestSuite
filtered to subquery/exists-subquery/exists-orderby-limit.sql.

Related: GLUTEN-11916, #12146
(first batch of Spark 4.1 TODO test fixes).

Related issue: #11916

…by-limit.sql with SPARK-57125 workaround GlutenSQLQueryTestSuite excludes ConvertToLocalRelation, ConstantFolding and NullPropagation by default to force queries through Gluten's offload paths. However, EXISTS+OFFSET queries in exists-orderby-limit.sql hit Spark's LimitPushDown rule which rewrites LocalLimit(le, Offset(oe, child)) into Offset(oe, LocalLimit(Add(le, oe), child)) and relies on ConstantFolding to subsequently fold `Add(Literal(N), Literal(M))` to `Literal(N + M)`. Without ConstantFolding the unfolded Add reaches physical planning where BasicOperators only matches LocalLimit(IntegerLiteral, _), producing AssertionError: No plan for LocalLimit (1 + 2) wrapped as [INTERNAL_ERROR] during the planning phase. This patch enables the test and re-enables ConstantFolding for just this SQL file via a per-file `--SET spark.sql.optimizer.excludedRules=...` directive that keeps only ConvertToLocalRelation excluded. The upstream Spark fix is tracked as SPARK-57125 (Apache Spark PR #56180), which makes LimitPushDown produce a literal sum directly so the rule no longer depends on ConstantFolding. Once that lands and Gluten picks up the Spark version, the `--SET` directive in this file can be removed. Note: the test framework's `--SET` parser splits values by comma, so multiple excluded rules cannot be specified in a single directive (recorded separately for a future Spark/Gluten follow-up). NullPropagation getting re-enabled is acceptable for this test.

…ve commas in config values What changes were proposed in this pull request? `SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on every comma, which conflicts with Spark configs whose values themselves contain commas (e.g. `spark.sql.optimizer.excludedRules` accepts a comma-separated rule list). The current parser crashes with `StringIndexOutOfBoundsException` when it encounters such a value. Change the split to only occur at commas that are immediately followed by what looks like a new `key=` (word characters or dots ending in `=`). This preserves the documented multi-setting form `--SET k1=v1,k2=v2` while allowing values to contain commas. Adds `SQLQueryTestHelperSuite` with focused unit tests. Why are the changes needed? The parser cannot currently express settings whose values contain commas, forcing users to scope down their SET to a single value. This was hit when trying to specify a multi-rule `excludedRules` value in Apache Gluten's spark41 SQL test workaround (apache/gluten#12165). Does this PR introduce any user-facing change? No. Test-framework-only change. Existing tests that rely on the documented multi-setting form continue to parse as before. How was this patch tested? New `SQLQueryTestHelperSuite` with 6 cases covering: single setting, multi- setting in one `--SET`, multiple `--SET` lines, comma-containing value, mixed, and non-SET comments. All pass.

github-actions Bot added the CORE works for Gluten Core label May 28, 2026

rdtr mentioned this pull request May 28, 2026

[SPARK-57128][SQL][TESTS] SQLQueryTestHelper --SET parser must preserve commas in config values apache/spark#56184

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-11916][VL][TEST] Enable subquery/exists-subquery/exists-orderby-limit.sql with SPARK-57125 workaround#12165

[GLUTEN-11916][VL][TEST] Enable subquery/exists-subquery/exists-orderby-limit.sql with SPARK-57125 workaround#12165
rdtr wants to merge 1 commit into
apache:mainfrom
rdtr:spark41-enable-exists-orderby-limit

rdtr commented May 28, 2026 •

edited by github-actions Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdtr commented May 28, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Test framework limitation

Verification

Regarding Spark 4.0

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rdtr commented May 28, 2026 •

edited by github-actions Bot

Loading