-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column. #40265
Conversation
… only matches a single column.
def colRegex(colName: String): Column = Column { builder => | ||
builder.getUnresolvedRegexBuilder.setColName(colName) | ||
def colRegex(colName: String): Column = { | ||
val planId = if (plan.getRoot.hasCommon && plan.getRoot.getCommon.hasPlanId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps put this in a helper function?
@beliefer can you add a test to SparkPlannerSuite or ClientE2ESuite to make sure this properly works? |
cc @zhengruifeng can you also take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
we'd better always add e2e tests, since it was added in |
… only matches a single column ### What changes were proposed in this pull request? When colregex returns a single column it should link the plans plan_id. For reference here is the non-connect Dataset code that does this: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512 This also needs to be fixed for the Python client. ### Why are the changes needed? Let the `UnresolvedAttribute` link plan_id if it is exist. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes #40265 from beliefer/SPARK-42556. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit c99a632) Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
merged into master/branch-3.4 |
@hvanhovell @zhengruifeng Thank you! |
… only matches a single column ### What changes were proposed in this pull request? When colregex returns a single column it should link the plans plan_id. For reference here is the non-connect Dataset code that does this: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512 This also needs to be fixed for the Python client. ### Why are the changes needed? Let the `UnresolvedAttribute` link plan_id if it is exist. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#40265 from beliefer/SPARK-42556. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org> (cherry picked from commit c99a632) Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
What changes were proposed in this pull request?
When colregex returns a single column it should link the plans plan_id. For reference here is the non-connect Dataset code that does this:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512
This also needs to be fixed for the Python client.
Why are the changes needed?
Let the
UnresolvedAttribute
link plan_id if it is exist.Does this PR introduce any user-facing change?
'No'.
New feature.
How was this patch tested?
New test cases.