[SPARK-46446][SQL] Disable subqueries with correlated OFFSET to fix correctness bug #44401
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Subqueries with correlation under LIMIT with OFFSET have a correctness bug, introduced recently when support for correlation under OFFSET was enabled but were not handled correctly. (So we went from unsupported, query throws error -> wrong results.) This is in master branch, not yet released.
This PR first disables correlated OFFSET by adding a feature flag for it, which is disabled. Next PR will add support for it and re-enable it. This PR also adds a feature flag for the related LIMIT support (which is enabled).
It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS
Example repro:
Correct result: empty set
Spark result: Array([2,2])
Why are the changes needed?
Correctness bug
Does this PR introduce any user-facing change?
Disables correlated OFFSET query shape which was not handled correctly. (This was enabled on master branch but not yet released.)
How was this patch tested?
Add tests
Was this patch authored or co-authored using generative AI tooling?
No