[SPARK-15615] [SQL] Support Json input from Dataset[String] #13460

pjfanning · 2016-06-02T03:21:17Z

What changes were proposed in this pull request?

[SPARK-15615] add new json function that takes Dataset[String] as input and deprecate the existing RDD based functions

How was this patch tested?

Changed the existing unit tests

…o json-dataset Conflicts: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala

HyukjinKwon · 2016-06-02T03:38:03Z

I guess it would be nicer if the title has the form such as [SPARK-15615][SQL] ... according to https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark.

HyukjinKwon · 2016-06-02T03:42:39Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

@@ -335,16 +336,32 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging {
   * input once to determine the input schema.
   *
   * @param jsonRDD input RDD with one JSON object per record
-   * @since 1.4.0
+   * @since 1.4.0*


It seems there is a typo here, *.

HyukjinKwon · 2016-06-02T03:58:28Z

sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala

-  def complexFieldAndType1: RDD[String] =
-    spark.sparkContext.parallelize(
+  def complexFieldAndType1: Dataset[String] =
+    sqlContext.createDataset(


I guess it is preferred to use spark rather than sqlContext..

HyukjinKwon · 2016-06-02T04:29:52Z

I just fetched this PR and run tests. It seems "SPARK-7565 MapType in JsonRDD" test is being failed. Maybe this should be solved. Also, if json(jsonRDD: JavaRDD[String]) is being deprecated, then I think its usages should be changed to the new ones in this PR or in a follow-up.

BTW, I guess it is arguable to add a new API. For me, I feel it is a bit questionable to add this API because there is already rdd one, json(jsonRDD: RDD[String]). Dataset one might be able to be easily done with this API like the blow:

json(jsonDataset.rdd)

I guess APIs would not be added only for consistency.

Maybe I think we should wait for a committer's call.

pjfanning · 2016-06-02T13:49:32Z

@HyukjinKwon this change in input parameter relates to #13300. There was a request there to treat Dataset[String] as a preferred input to RDD[String].

pjfanning · 2016-06-07T22:14:22Z

@HyukjinKwon all the JsonSuite tests pass for me on my laptop - would it be feasible to get this reviewed again?

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

AmplabJenkins · 2016-08-24T05:12:17Z

Can one of the admins verify this patch?

PJ Fanning added 3 commits May 28, 2016 08:07

[SPARK-15615] add json(Dataset[String])

f8683a2

[SPARK-15615] add json(Dataset[String])

1cc1ffc

Merge branch 'json-dataset' of https://github.com/pjfanning/spark int…

19525ff

…o json-dataset Conflicts: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala

HyukjinKwon reviewed Jun 2, 2016
View reviewed changes

pjfanning changed the title ~~Json dataset~~ [SPARK-15615] [SQL] Json dataset Jun 2, 2016

pjfanning changed the title ~~[SPARK-15615] [SQL] Json dataset~~ [SPARK-15615] [SQL] Support Json input from Dataset[String] Jun 2, 2016

PJ Fanning added 2 commits June 1, 2016 23:52

use testImplicits._ in test class

af18884

fix typo in version number

c242b14

HyukjinKwon reviewed Jun 2, 2016
View reviewed changes

PJ Fanning added 2 commits June 7, 2016 14:38

fix javadoc formatting error

7a0dd67

use spark context instead of sqlContext

aa1addb

Merge remote-tracking branch 'upstream/master' into json-dataset

6b73cfe

# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala

pjfanning closed this Nov 25, 2016

HyukjinKwon mentioned this pull request Feb 11, 2017

[SPARK-15463][SQL] Add an API to load DataFrame from Dataset[String] storing CSV #16854

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15615] [SQL] Support Json input from Dataset[String] #13460

[SPARK-15615] [SQL] Support Json input from Dataset[String] #13460

pjfanning commented Jun 2, 2016

HyukjinKwon commented Jun 2, 2016

HyukjinKwon Jun 2, 2016 •

edited

Loading

HyukjinKwon Jun 2, 2016

HyukjinKwon commented Jun 2, 2016 •

edited

Loading

pjfanning commented Jun 2, 2016

pjfanning commented Jun 7, 2016

AmplabJenkins commented Aug 24, 2016

[SPARK-15615] [SQL] Support Json input from Dataset[String] #13460

[SPARK-15615] [SQL] Support Json input from Dataset[String] #13460

Conversation

pjfanning commented Jun 2, 2016

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon commented Jun 2, 2016

HyukjinKwon Jun 2, 2016 • edited Loading

Choose a reason for hiding this comment

HyukjinKwon Jun 2, 2016

Choose a reason for hiding this comment

HyukjinKwon commented Jun 2, 2016 • edited Loading

pjfanning commented Jun 2, 2016

pjfanning commented Jun 7, 2016

AmplabJenkins commented Aug 24, 2016

HyukjinKwon Jun 2, 2016 •

edited

Loading

HyukjinKwon commented Jun 2, 2016 •

edited

Loading