-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46541][SQL][CONNECT] Fix the ambiguous column reference in self join #44532
[SPARK-46541][SQL][CONNECT] Fix the ambiguous column reference in self join #44532
Conversation
ad7a53e
to
a154b93
Compare
333c038
to
50939b9
Compare
cc @cloud-fan I think it is ready for the initial review |
.flatMap(resolveUnresolvedAttributeByPlan(u, _, isMetadataAccess)) | ||
if (isMetadataAccess) { | ||
// NOTE: A metadata column might appear in `output` instead of `metadataOutput`. | ||
val metadataOutputSet = child.outputSet ++ AttributeSet(child.metadataOutput) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to filter with output+metadataOutput
, otherwise metadata column's tests will fail
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Outdated
Show resolved
Hide resolved
6e8cf9c
to
5c07a39
Compare
|
||
val isMetadataAccess = u.getTagValue(LogicalPlan.IS_METADATA_COL).isDefined | ||
private def resolveUnresolvedAttributeByPlanId( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should merge this with the other resolveUnresolvedAttributeByPlanId
. The overall algorithm should be
- bottom-up traverse the plan tree to find the matching df plan.
- resolve the column and propogate it up
- during propogation, prune the resolved columns with plan's output. It should happen for each plan node, not just the top plan node's children.
966ba48
to
c2e235c
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Outdated
Show resolved
Hide resolved
c2e235c
to
43387e6
Compare
b093a13
to
1daf298
Compare
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
Outdated
Show resolved
Hide resolved
nit try to fix metadata column test II
try to address
try to address
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
…ysis/ColumnResolutionHelper.scala Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
2585c73
to
c91af56
Compare
thanks @cloud-fan for guide and review merged to master |
What changes were proposed in this pull request?
fix the logic of ambiguous column detection in spark connect
Why are the changes needed?
Does this PR introduce any user-facing change?
yes, fix a bug
How was this patch tested?
added ut
Was this patch authored or co-authored using generative AI tooling?
no