Skip to content

Conversation

@cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

Remove map, flatMap, mapPartitions from python DataFrame, to prepare for Dataset API in the future.

How was this patch tested?

existing tests

@cloud-fan
Copy link
Contributor Author

cc @rxin @yhuai

@SparkQA
Copy link

SparkQA commented Mar 1, 2016

Test build #52244 has finished for PR 11445 at commit 86ec0ff.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

4
"""
return self.rdd.mapPartitions(f, preservesPartitioning)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also remove foreach and foreachPartition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are fine, since they don't return anything.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52273 has finished for PR 11445 at commit bf4b9d5.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52285 has finished for PR 11445 at commit d0a69fa.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 2, 2016

Test build #52287 has finished for PR 11445 at commit 5e711e3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Mar 2, 2016

Thanks - merging this in master.

@asfgit asfgit closed this in 4dd2481 Mar 2, 2016
@mydpy
Copy link

mydpy commented Mar 15, 2016

This change surprised me as a user of Pyspark on the 2.0.0-Snapshot. Thanks for documenting this well. Since I usually use the Scala API, it was not clear to me that Pyspark didn't support the Datasets API yet (i.e., df.rdd.flatMap(...) returns a PythonRDD as-opposed to a Dataset)

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
…thon DataFrame

## What changes were proposed in this pull request?

Remove `map`, `flatMap`, `mapPartitions` from python DataFrame, to prepare for Dataset API in the future.

## How was this patch tested?

existing tests

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#11445 from cloud-fan/python-clean.
@maver1ck
Copy link
Contributor

maver1ck commented Jul 19, 2016

@rxin
As we're not planning to implement DataSets in Python is there a plan to revert this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants