-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated usage of json(RDD[String]) API #17071
Conversation
Let me please cc both @cloud-fan and @srowen |
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil) | ||
val otherPeople = spark.read.json(otherPeopleRDD) | ||
// an Dataset[String] storing one JSON object per string | ||
val otherPeopleDataset = spark.sparkContext.makeRDD( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use spark.createDataset
here?
@@ -590,7 +590,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { | |||
val dir = Utils.createTempDir() | |||
dir.delete() | |||
val path = dir.getCanonicalPath | |||
primitiveFieldAndType.map(record => record.replaceAll("\n", " ")).saveAsTextFile(path) | |||
primitiveFieldAndType.rdd.map(record => record.replaceAll("\n", " ")).saveAsTextFile(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we switch to DataFrameWriter
with text format?
@@ -828,7 +828,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { | |||
|
|||
val mergedJsonDF = spark.read | |||
.option("prefersDecimal", "true") | |||
.json(floatingValueRecords ++ bigIntegerRecords) | |||
.json((floatingValueRecords.rdd ++ bigIntegerRecords.rdd).toDS()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Dataset.union
?
@@ -26,22 +26,17 @@ | |||
|
|||
import org.apache.hadoop.fs.FileSystem; | |||
import org.apache.hadoop.fs.Path; | |||
import org.apache.spark.sql.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opps, let me fix this one too.
Test build #73478 has finished for PR 17071 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we shouldn't use the deprecated method in non-test code. Even a lot of the test occurrences can be updated. I suppose there should be at least one test that the deprecated method still works?
Oh, no. I think I updated all instances. Let me maybe leave one case in |
Test build #73483 has finished for PR 17071 at commit
|
Test build #73484 has finished for PR 17071 at commit
|
I left single usage in |
Test build #73490 has finished for PR 17071 at commit
|
Test build #73488 has finished for PR 17071 at commit
|
Test build #73489 has finished for PR 17071 at commit
|
I like it, though, regarding still testing the deprecated method -- maybe it's best to even have a test that is explicitly just for testing the old method? that may be clearer than just picking some test from among another batch to leave with the old behavior. It might mean actually adding one new small test case in the generic JSON test suite for this purpose. What do you think? |
Sure, sounds better and I can't find a reason to not follow. Let me maybe add single small Java one somewhere because the deprecated Java one calls the deprecated Scala one. |
4cef1c6
to
6f35ee3
Compare
Test build #73506 has finished for PR 17071 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR proposes to replace the deprecated
json(RDD[String])
usage tojson(Dataset[String])
.This currently produces so many warnings.
How was this patch tested?
Fixed tests.