-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15615] [SQL] Support Json input from Dataset[String] #13460
Conversation
…o json-dataset Conflicts: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala
I guess it would be nicer if the title has the form such as |
@@ -335,16 +336,32 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { | |||
* input once to determine the input schema. | |||
* | |||
* @param jsonRDD input RDD with one JSON object per record | |||
* @since 1.4.0 | |||
* @since 1.4.0* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there is a typo here, *
.
def complexFieldAndType1: RDD[String] = | ||
spark.sparkContext.parallelize( | ||
def complexFieldAndType1: Dataset[String] = | ||
sqlContext.createDataset( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it is preferred to use spark
rather than sqlContext
..
I just fetched this PR and run tests. It seems BTW, I guess it is arguable to add a new API. For me, I feel it is a bit questionable to add this API because there is already rdd one,
I guess APIs would not be added only for consistency. Maybe I think we should wait for a committer's call. |
@HyukjinKwon this change in input parameter relates to #13300. There was a request there to treat Dataset[String] as a preferred input to RDD[String]. |
@HyukjinKwon all the JsonSuite tests pass for me on my laptop - would it be feasible to get this reviewed again? |
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
Can one of the admins verify this patch? |
What changes were proposed in this pull request?
[SPARK-15615] add new json function that takes Dataset[String] as input and deprecate the existing RDD based functions
How was this patch tested?
Changed the existing unit tests