[SPARK-9964] [PySpark] [SQL] PySpark DataFrameReader accept RDD of String for JSON #8444

yanboliang · 2015-08-26T03:10:11Z

PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path.
If this PR is merged, it should be duplicated to cover the other input types (not just JSON).

SparkQA · 2015-08-26T03:35:08Z

Test build #41583 has finished for PR 8444 at commit 127717a.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-08-26T04:53:12Z

Test build #41589 has finished for PR 8444 at commit f160ec4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-08-26T07:25:12Z

I don't think we should rename the name of the parameter since it could break compatibility. How about just changing the description?

SparkQA · 2015-08-26T08:08:11Z

Test build #41611 has finished for PR 8444 at commit 3842a6b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-08-26T18:16:44Z

python/pyspark/sql/readwriter.py

+        elif isinstance(path, RDD):
+            return self._df(self._jreader.json(path._jrdd))
+        else:
+            raise Exception("path can be only string or RDD")


This should be https://docs.python.org/2/library/exceptions.html#exceptions.TypeError I think.

cc @davies

SparkQA · 2015-08-27T03:51:18Z

Test build #41668 has finished for PR 8444 at commit b2d072d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2015-08-27T05:18:51Z

Thanks - I've merged this in master.

PySpark DataFrameReader accept RDD of String for JSON

127717a

yanboliang added 2 commits August 26, 2015 11:59

merge two function into one

09f6763

fix python2&3 compatibility

f160ec4

yanboliang added 2 commits August 26, 2015 15:37

revert pathOrRdd to path

7c94f9c

fix typos

3842a6b

rxin reviewed Aug 26, 2015
View reviewed changes

change Exception to TypeError

b2d072d

asfgit closed this in ce97834 Aug 27, 2015

yanboliang deleted the spark-9964 branch August 27, 2015 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9964] [PySpark] [SQL] PySpark DataFrameReader accept RDD of String for JSON #8444

[SPARK-9964] [PySpark] [SQL] PySpark DataFrameReader accept RDD of String for JSON #8444

yanboliang commented Aug 26, 2015

SparkQA commented Aug 26, 2015

SparkQA commented Aug 26, 2015

rxin commented Aug 26, 2015

SparkQA commented Aug 26, 2015

rxin Aug 26, 2015

davies Aug 26, 2015

SparkQA commented Aug 27, 2015

rxin commented Aug 27, 2015

[SPARK-9964] [PySpark] [SQL] PySpark DataFrameReader accept RDD of String for JSON #8444

[SPARK-9964] [PySpark] [SQL] PySpark DataFrameReader accept RDD of String for JSON #8444

Conversation

yanboliang commented Aug 26, 2015

SparkQA commented Aug 26, 2015

SparkQA commented Aug 26, 2015

rxin commented Aug 26, 2015

SparkQA commented Aug 26, 2015

rxin Aug 26, 2015

Choose a reason for hiding this comment

davies Aug 26, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 27, 2015

rxin commented Aug 27, 2015