Skip to content

[GLUTEN-11916][VL][TEST] Enable subquery/exists-subquery/exists-orderby-limit.sql with SPARK-57125 workaround#12165

Open
rdtr wants to merge 1 commit into
apache:mainfrom
rdtr:spark41-enable-exists-orderby-limit
Open

[GLUTEN-11916][VL][TEST] Enable subquery/exists-subquery/exists-orderby-limit.sql with SPARK-57125 workaround#12165
rdtr wants to merge 1 commit into
apache:mainfrom
rdtr:spark41-enable-exists-orderby-limit

Conversation

@rdtr
Copy link
Copy Markdown

@rdtr rdtr commented May 28, 2026

Summary

Enables subquery/exists-subquery/exists-orderby-limit.sql in spark41 SQL query tests by re-enabling ConstantFolding for just this one file via a per-file --SET spark.sql.optimizer.excludedRules=... directive. The test was previously TODO-commented out because it crashed with an INTERNAL_ERROR during physical planning.

Root cause

GlutenSQLQueryTestSuite excludes ConvertToLocalRelation, ConstantFolding,
and NullPropagation from the optimizer by default to force queries through
Gluten's offload paths. With ConstantFolding excluded, Spark's LimitPushDown
rule produces an unfolded Add expression that BasicOperators in
SparkStrategies cannot match, causing physical planning to fail with:

java.lang.AssertionError: assertion failed: No plan for LocalLimit (1 + 2)
+- Project [1 AS col#...]
   +- Filter (dept_id#... > 10)
      +- LocalRelation [dept_id#..., dept_name#..., state#...]

  at scala.Predef$.assert(Predef.scala:279)
  at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
  at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:79)
  ...
  at org.apache.spark.sql.execution.adaptive.InsertAdaptiveSparkPlan.compileSubquery(...)

wrapped as [INTERNAL_ERROR] The Spark SQL phase planning failed with an internal error.

The trigger is the EXISTS+OFFSET pattern (query # 17 in this SQL file):

SELECT * FROM emp WHERE EXISTS (
  SELECT dept.dept_name FROM dept WHERE dept.dept_id > 10 LIMIT 1 OFFSET 2
)

LimitPushDown rewrites LocalLimit(le, Offset(oe, child)) into Offset(oe, LocalLimit(Add(le, oe), child)) and relies on ConstantFolding to subsequently fold Add(Literal(1), Literal(2)) to Literal(3) so that BasicOperators (which only matches LocalLimit(IntegerLiteral, _)) can produce a physical plan.

Fix

Add a per-file --SET directive that overrides the default exclusion to only exclude ConvertToLocalRelation, re-enabling ConstantFolding (and NullPropagation, which gets re-enabled because the test framework's --SET parser splits values by comma — see the note below). This keeps Gluten's offload paths exercised while allowing the test to plan.

The upstream Spark fix is tracked as SPARK-57125 (PR
apache/spark#56180), which makes
LimitPushDown produce a folded literal directly so the rule no longer depends
on ConstantFolding. Once that lands and Gluten picks up the Spark version,
the --SET directive in this file can be removed.

Test framework limitation

The Gluten/Spark SQL test framework's --SET parser at
SQLQueryTestHelper.scala:476-481 splits values by comma at the top level,
which means multi-rule values like
excludedRules=Rule1,Rule2 can't be specified in a single --SET
(StringIndexOutOfBoundsException). For now, accepting NullPropagation
being re-enabled is fine for this test. Filed as a separate follow-up.

Verification

Regarding Spark 4.0

The same SQL test file (with the same EXISTS+OFFSET queries) is enabled and
appears to pass in gluten-ut/spark40. I checked:

  • Same SQL input file content
  • Identical golden file (both versions expect successful results for the
    OFFSET queries)
  • Same LimitPushDown rule in Spark 4.0.0 source (verified via the GitHub
    v4.0.0 tag)
  • Same BasicOperators IntegerLiteral-only matchers in Spark 4.0.0 source
  • Same rule exclusions (ConvertToLocalRelation, ConstantFolding,
    NullPropagation) in gluten-ut/spark40/.../GlutenSQLQueryTestSuite.scala
  • gluten-ut/spark40 is enabled in velox_backend_x86.yml and the
    spark-test-spark40-slow job runs GlutenSQLQueryTestSuite via the
    ExtendedSQLTest tag — and passes

I couldn't identify what makes Spark 4.0.0 + Gluten avoid hitting this code path while Spark 4.1.1 + Gluten triggers it. The bug ingredients look identical. If a reviewer with more context on the spark40 setup can shed light, that would be appreciated — but it doesn't block this PR. I am happy to build and test locally with 4.0 if necessary.

Test plan

  • CI: spark-test-spark41-slow runs GlutenSQLQueryTestSuite with
    ExtendedSQLTest tag, which exercises this file.
  • Verified locally in IntelliJ by running GlutenSQLQueryTestSuite
    filtered to subquery/exists-subquery/exists-orderby-limit.sql.

Related: GLUTEN-11916, #12146
(first batch of Spark 4.1 TODO test fixes).

Related issue: #11916

…by-limit.sql with SPARK-57125 workaround

GlutenSQLQueryTestSuite excludes ConvertToLocalRelation, ConstantFolding and
NullPropagation by default to force queries through Gluten's offload paths.
However, EXISTS+OFFSET queries in exists-orderby-limit.sql hit Spark's
LimitPushDown rule which rewrites

  LocalLimit(le, Offset(oe, child))

into

  Offset(oe, LocalLimit(Add(le, oe), child))

and relies on ConstantFolding to subsequently fold `Add(Literal(N), Literal(M))`
to `Literal(N + M)`. Without ConstantFolding the unfolded Add reaches physical
planning where BasicOperators only matches LocalLimit(IntegerLiteral, _),
producing

  AssertionError: No plan for LocalLimit (1 + 2)

wrapped as [INTERNAL_ERROR] during the planning phase.

This patch enables the test and re-enables ConstantFolding for just this SQL
file via a per-file `--SET spark.sql.optimizer.excludedRules=...` directive
that keeps only ConvertToLocalRelation excluded.

The upstream Spark fix is tracked as SPARK-57125 (Apache Spark PR #56180), which
makes LimitPushDown produce a literal sum directly so the rule no longer depends
on ConstantFolding. Once that lands and Gluten picks up the Spark version, the
`--SET` directive in this file can be removed.

Note: the test framework's `--SET` parser splits values by comma, so multiple
excluded rules cannot be specified in a single directive (recorded separately
for a future Spark/Gluten follow-up). NullPropagation getting re-enabled is
acceptable for this test.
@github-actions github-actions Bot added the CORE works for Gluten Core label May 28, 2026
rdtr added a commit to rdtr/spark that referenced this pull request May 28, 2026
…ve commas in config values

What changes were proposed in this pull request?

`SQLQueryTestHelper.getSparkSettings` splits `--SET` directive values on every
comma, which conflicts with Spark configs whose values themselves contain commas
(e.g. `spark.sql.optimizer.excludedRules` accepts a comma-separated rule list).
The current parser crashes with `StringIndexOutOfBoundsException` when it
encounters such a value.

Change the split to only occur at commas that are immediately followed by what
looks like a new `key=` (word characters or dots ending in `=`). This preserves
the documented multi-setting form `--SET k1=v1,k2=v2` while allowing values to
contain commas.

Adds `SQLQueryTestHelperSuite` with focused unit tests.

Why are the changes needed?

The parser cannot currently express settings whose values contain commas,
forcing users to scope down their SET to a single value. This was hit when
trying to specify a multi-rule `excludedRules` value in Apache Gluten's spark41
SQL test workaround (apache/gluten#12165).

Does this PR introduce any user-facing change?

No. Test-framework-only change. Existing tests that rely on the documented
multi-setting form continue to parse as before.

How was this patch tested?

New `SQLQueryTestHelperSuite` with 6 cases covering: single setting, multi-
setting in one `--SET`, multiple `--SET` lines, comma-containing value,
mixed, and non-SET comments. All pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant