Skip to content

[SPARK-55794][SQL] Always alias OuterReferences#54576

Closed
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:mihailo-timotic_data/fix_outer_ref
Closed

[SPARK-55794][SQL] Always alias OuterReferences#54576
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:mihailo-timotic_data/fix_outer_ref

Conversation

@mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Mar 2, 2026

What changes were proposed in this pull request?

In this PR I propose that we always alias OuterReferences

Why are the changes needed?

These changes are needed for 2 main reasons: provide avoid potential issues with exposing raw outer references and their expressions ids and provide compatibility between fixed-point and single-pass analyzers.

For example, in a query like:

table t
|> where exists (
    table other
    |> extend t.x
    |> select * except (a, b))

before this change, the output will be:

Filter exists#x [x#1]
:  +- Project [x#1]
:     +- Project [a#3, b#4, outer(x#1)]
:        +- SubqueryAlias spark_catalog.default.other
:           +- Relation spark_catalog.default.other[a#3,b#4] json
+- PipeOperator
   +- SubqueryAlias spark_catalog.default.t
      +- Relation spark_catalog.default.t[x#1,y#2] csv

The output of the subquery is now exactly the same as the one from outer reference (x#1). This can potentially cause query failures or correctness issues, but at the moment only presents as a compatibility issue between fixed-point and single-pass analyzers

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added new test cases + existing tests

Was this patch authored or co-authored using generative AI tooling?

No

@mihailotim-db mihailotim-db force-pushed the mihailo-timotic_data/fix_outer_ref branch 3 times, most recently from 5c9c9a7 to 1a995b3 Compare March 2, 2026 17:34
@mihailotim-db mihailotim-db force-pushed the mihailo-timotic_data/fix_outer_ref branch from 1a995b3 to 3df9592 Compare March 2, 2026 18:05
Copy link
Contributor

@dtenedor dtenedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for making the behavior converge between the analyzers.

@dtenedor
Copy link
Contributor

dtenedor commented Mar 2, 2026

LGTM, merging to master

@dtenedor dtenedor closed this in 4ebdc4a Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants