-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-41169][CONNECT][PYTHON] Implement DataFrame.drop
#38686
Conversation
// (Required) columns to drop. | ||
// | ||
// Should contain at least 1 item. | ||
repeated Expression cols = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does drop actually support arbitrary expressions? Shouldn't this be a repeated unresolved attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset.drop takes arbitrary expressions into account
spark/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
Lines 2952 to 2957 in 3b4faaf
val expressions = (for (col <- allColumns) yield col match { | |
case Column(u: UnresolvedAttribute) => | |
queryExecution.analyzed.resolveQuoted( | |
u.name, sparkSession.sessionState.analyzer.resolver).getOrElse(u) | |
case Column(expr: Expression) => expr | |
}) |
val sparkPlan2 = sparkTestRelation.drop("id", "name") | ||
comparePlans(connectPlan2, sparkPlan2) | ||
|
||
// non-existing column |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you treat the dropped columns as expressions we need to add a negative test for unsupported expressions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked the implementation of Dataset.drop
, it supports all kinds of expressions, a expression will be just ignored if it doesn't semanticEquals the columns in current dataframe.
@@ -255,10 +255,21 @@ def distinct(self) -> "DataFrame": | |||
) | |||
|
|||
def drop(self, *cols: "ColumnOrString") -> "DataFrame": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting case where one could argue for implementing the behavior on the client side instead of the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, there will be two RPC if we implement it on the client side
1, all_cols = self.columns
to fetch the schema;
2, build the plan
with a dedicated proto mesage, we only need one RPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have built the consensus that we prefer re-using the proto than ask clients do duplicate work.
// (Required) columns to drop. | ||
// | ||
// Should contain at least 1 item. | ||
repeated Expression cols = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if the name should be more explicit like "dropped"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here follows the naming in def drop(col: Column, cols: Column*): DataFrame
assert(rel.getColsCount > 0, s"cols must contains at least 1 item!") | ||
|
||
val cols = rel.getColsList.asScala.toArray.map { expr => | ||
Column(transformExpression(expr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should verify supported types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean verify for the arrow-based collect?
since we will remove the json code path, it always fails if there are unsupported types.
I will update this PR after the arrow-based collect is fixed #38706 , otherwise, some e2e tests will fail |
c74a825
to
587e9d5
Compare
Merged to master. |
LGTM |
### What changes were proposed in this pull request? Implement `DataFrame.drop` with a proto message ### Why are the changes needed? for api coverage ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? added UT Closes apache#38686 from zhengruifeng/connect_df_drop. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Implement `DataFrame.drop` with a proto message ### Why are the changes needed? for api coverage ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? added UT Closes apache#38686 from zhengruifeng/connect_df_drop. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Implement `DataFrame.drop` with a proto message ### Why are the changes needed? for api coverage ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? added UT Closes apache#38686 from zhengruifeng/connect_df_drop. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Implement
DataFrame.drop
with a proto messageWhy are the changes needed?
for api coverage
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
added UT