Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45509][SQL][3.5] Fix df column reference behavior for Spark Connect #43699

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

backport #43465 to 3.5

What changes were proposed in this pull request?

This PR fixes a few problems of column resolution for Spark Connect, to make the behavior closer to classic Spark SQL (unfortunately we still have some behavior differences in corner cases).

  1. resolve df column references in both resolveExpressionByPlanChildren and resolveExpressionByPlanOutput. Previously it's only in resolveExpressionByPlanChildren.
  2. when the plan id has multiple matches, fail with AMBIGUOUS_COLUMN_REFERENCE

Why are the changes needed?

fix behavior differences between spark connect and classic spark sql

Does this PR introduce any user-facing change?

Yes, for spark connect scala client

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

This PR fixes a few problems of column resolution for Spark Connect, to make the behavior closer to classic Spark SQL (unfortunately we still have some behavior differences in corner cases).
1. resolve df column references in both `resolveExpressionByPlanChildren` and `resolveExpressionByPlanOutput`. Previously it's only in `resolveExpressionByPlanChildren`.
2. when the plan id has multiple matches, fail with `AMBIGUOUS_COLUMN_REFERENCE`

fix behavior differences between spark connect and classic spark sql

Yes, for spark connect scala client

new tests

no

Closes apache#43465 from cloud-fan/column.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
zhengruifeng pushed a commit that referenced this pull request Nov 8, 2023
…nnect

backport #43465 to 3.5

### What changes were proposed in this pull request?

This PR fixes a few problems of column resolution for Spark Connect, to make the behavior closer to classic Spark SQL (unfortunately we still have some behavior differences in corner cases).
1. resolve df column references in both `resolveExpressionByPlanChildren` and `resolveExpressionByPlanOutput`. Previously it's only in `resolveExpressionByPlanChildren`.
2. when the plan id has multiple matches, fail with `AMBIGUOUS_COLUMN_REFERENCE`

### Why are the changes needed?

fix behavior differences between spark connect and classic spark sql

### Does this PR introduce _any_ user-facing change?

Yes, for spark connect scala client

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes #43699 from cloud-fan/backport.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@zhengruifeng
Copy link
Contributor

merged to 3.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants