-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40938][CONNECT] Support Alias for every type of Relation #38415
Conversation
R: @cloud-fan |
@@ -47,6 +47,13 @@ message Relation { | |||
|
|||
Unknown unknown = 999; | |||
} | |||
// Optional. Every relation might have an alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion, but not sure which one is better. We can also follow catalyst, and add a new plan SubqueryAlias(child, alias)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is already a message, also if we want to match DataFrame API, I think it makes more sense to have a SubqueryAlias
in proto, and the call for as(alias)
in the client will just create this SubqueryAlias
, which is the same as existing DF implementation.
I have updated this PR accordingly.
Alias alias = 200; | ||
|
||
// Relation alias. | ||
message Alias { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can't we use string directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think if we need to care client call xx.as("")
? Does client side should reject this? If so we can just use a string.
It's always a matter of if we need to know a field set or not set or default value, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I thought the default value of string is null...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default is empty string...
I really don't like the way of proto for this..
Can one of the admins verify this patch? |
3dc5198
to
5f86618
Compare
message SubqueryAlias { | ||
// Required. The input relation. | ||
Relation input = 1; | ||
// Required. The alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we check if it's present or not, given the default value is empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given our discussion on the protocol, this is a required field so we ask clients to always set it. Server side only fetch the value in this field not matter what it is (either ""
, or not)
5f86618
to
d4d2849
Compare
thanks, merging to master! |
### What changes were proposed in this pull request? In the past, Connect server can check `alias` for `Read` and `Project`. However for Spark DataFrame, every DataFrame can be chained with `as(alias: String)` thus every Relation/LogicalPlan can have an `alias`. This PR refactors to make this work. ### Why are the changes needed? Improve API coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? UT Closes apache#38415 from amaliujia/every_relation_has_alias. Authored-by: Rui Wang <rui.wang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the past, Connect server can check
alias
forRead
andProject
. However for Spark DataFrame, every DataFrame can be chained withas(alias: String)
thus every Relation/LogicalPlan can have analias
. This PR refactors to make this work.Why are the changes needed?
Improve API coverage.
Does this PR introduce any user-facing change?
No
How was this patch tested?
UT