Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated usage of json(RDD[String]) API #17071

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to replace the deprecated json(RDD[String]) usage to json(Dataset[String]).

This currently produces so many warnings.

How was this patch tested?

Fixed tests.

@HyukjinKwon
Copy link
Member Author

Let me please cc both @cloud-fan and @srowen

@HyukjinKwon HyukjinKwon changed the title [SPARK-15615][SQL][BUILD][FOLLOW-UP] Remove deprecated usage of json(RDD[String]) API [SPARK-15615][SQL][BUILD][FOLLOW-UP] Replace deprecated usage of json(RDD[String]) API Feb 26, 2017
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)
val otherPeople = spark.read.json(otherPeopleRDD)
// an Dataset[String] storing one JSON object per string
val otherPeopleDataset = spark.sparkContext.makeRDD(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use spark.createDataset here?

@@ -590,7 +590,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
val dir = Utils.createTempDir()
dir.delete()
val path = dir.getCanonicalPath
primitiveFieldAndType.map(record => record.replaceAll("\n", " ")).saveAsTextFile(path)
primitiveFieldAndType.rdd.map(record => record.replaceAll("\n", " ")).saveAsTextFile(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we switch to DataFrameWriter with text format?

@@ -828,7 +828,7 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {

val mergedJsonDF = spark.read
.option("prefersDecimal", "true")
.json(floatingValueRecords ++ bigIntegerRecords)
.json((floatingValueRecords.rdd ++ bigIntegerRecords.rdd).toDS())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Dataset.union?

@@ -26,22 +26,17 @@

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.spark.sql.*;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opps, let me fix this one too.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73478 has finished for PR 17071 at commit 626ca25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we shouldn't use the deprecated method in non-test code. Even a lot of the test occurrences can be updated. I suppose there should be at least one test that the deprecated method still works?

@HyukjinKwon
Copy link
Member Author

Oh, no. I think I updated all instances. Let me maybe leave one case in JsonSuite.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73483 has finished for PR 17071 at commit f25d1ab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73484 has finished for PR 17071 at commit d746d78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

I left single usage in JavaSaveLoadSuite. I believe this should be fine for testing the deprecated ones.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73490 has finished for PR 17071 at commit 4cef1c6.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73488 has finished for PR 17071 at commit 2e89259.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 26, 2017

Test build #73489 has finished for PR 17071 at commit aa82df2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 27, 2017

I like it, though, regarding still testing the deprecated method -- maybe it's best to even have a test that is explicitly just for testing the old method? that may be clearer than just picking some test from among another batch to leave with the old behavior. It might mean actually adding one new small test case in the generic JSON test suite for this purpose. What do you think?

@HyukjinKwon
Copy link
Member Author

Sure, sounds better and I can't find a reason to not follow. Let me maybe add single small Java one somewhere because the deprecated Java one calls the deprecated Scala one.

@HyukjinKwon
Copy link
Member Author

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73506 has finished for PR 17071 at commit 6f35ee3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in 8a5a585 Feb 27, 2017
@HyukjinKwon HyukjinKwon deleted the SPARK-15615-followup branch January 2, 2018 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants