-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-13594][SQL] remove typed operations(e.g. map, flatMap) from python DataFrame #11445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #52244 has finished for PR 11445 at commit
|
| 4 | ||
| """ | ||
| return self.rdd.mapPartitions(f, preservesPartitioning) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also remove foreach and foreachPartition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are fine, since they don't return anything.
|
Test build #52273 has finished for PR 11445 at commit
|
|
Test build #52285 has finished for PR 11445 at commit
|
|
Test build #52287 has finished for PR 11445 at commit
|
|
Thanks - merging this in master. |
|
This change surprised me as a user of Pyspark on the 2.0.0-Snapshot. Thanks for documenting this well. Since I usually use the Scala API, it was not clear to me that Pyspark didn't support the Datasets API yet (i.e., |
…thon DataFrame ## What changes were proposed in this pull request? Remove `map`, `flatMap`, `mapPartitions` from python DataFrame, to prepare for Dataset API in the future. ## How was this patch tested? existing tests Author: Wenchen Fan <wenchen@databricks.com> Closes apache#11445 from cloud-fan/python-clean.
|
@rxin |
What changes were proposed in this pull request?
Remove
map,flatMap,mapPartitionsfrom python DataFrame, to prepare for Dataset API in the future.How was this patch tested?
existing tests