Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column. #40265

Closed
wants to merge 6 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Mar 3, 2023

What changes were proposed in this pull request?

When colregex returns a single column it should link the plans plan_id. For reference here is the non-connect Dataset code that does this:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512
This also needs to be fixed for the Python client.

Why are the changes needed?

Let the UnresolvedAttribute link plan_id if it is exist.

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New test cases.

def colRegex(colName: String): Column = Column { builder =>
builder.getUnresolvedRegexBuilder.setColName(colName)
def colRegex(colName: String): Column = {
val planId = if (plan.getRoot.hasCommon && plan.getRoot.getCommon.hasPlanId) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps put this in a helper function?

@hvanhovell
Copy link
Contributor

@beliefer can you add a test to SparkPlannerSuite or ClientE2ESuite to make sure this properly works?

@hvanhovell
Copy link
Contributor

cc @zhengruifeng can you also take a look?

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhengruifeng
Copy link
Contributor

we'd better always add e2e tests, since it was added in ClientE2ESuite, I think don't need to add one in test_connect_basic

zhengruifeng pushed a commit that referenced this pull request Mar 4, 2023
… only matches a single column

### What changes were proposed in this pull request?
When colregex returns a single column it should link the plans plan_id. For reference here is the non-connect Dataset code that does this:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512
This also needs to be fixed for the Python client.

### Why are the changes needed?
Let the `UnresolvedAttribute` link plan_id if it is exist.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes #40265 from beliefer/SPARK-42556.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit c99a632)
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
@zhengruifeng
Copy link
Contributor

merged into master/branch-3.4

@beliefer
Copy link
Contributor Author

beliefer commented Mar 4, 2023

@hvanhovell @zhengruifeng Thank you!

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
… only matches a single column

### What changes were proposed in this pull request?
When colregex returns a single column it should link the plans plan_id. For reference here is the non-connect Dataset code that does this:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512
This also needs to be fixed for the Python client.

### Why are the changes needed?
Let the `UnresolvedAttribute` link plan_id if it is exist.

### Does this PR introduce _any_ user-facing change?
'No'.
New feature.

### How was this patch tested?
New test cases.

Closes apache#40265 from beliefer/SPARK-42556.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
(cherry picked from commit c99a632)
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants