Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25955][TEST] Porting JSON tests for CSV functions #22960

Closed
wants to merge 5 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 6, 2018

What changes were proposed in this pull request?

In the PR, I propose to port existing JSON tests from JsonFunctionsSuite that are applicable for CSV, and put them to CsvFunctionsSuite. In particular:

  • roundtrip from_csv to to_csv, and to_csv to from_csv
  • using schema_of_csv in from_csv
  • Java API from_csv
  • using from_csv and to_csv in exprs.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98530 has finished for PR 22960 at commit 606be67.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98531 has finished for PR 22960 at commit d863402.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

assert(out.schema == expected)
}

test("Support to_csv in SQL") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk, wouldn't the tests in csv-functions.sql be enough for SQL support test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for double check that the functions are available/(and work) from expressions in Scala. Probably we can make the test smaller.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just get rid of it. I can't imagine both functions are specifically broken alone in selectExpr.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk . Sorry, but Porting seems to be not the best way to do this.
Could you refactor this by introducing new test helper functions?
I believe you can test from_json/csv/avro with the same functions.

val df = Seq("""1,"haa"""").toDS()
checkAnswer(
df.select(
from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference are the data at line 91 and from_csv and from_json.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 7, 2018

Sorry, but Porting seems to be not the best way to do this.

I saw a bunch of common code in Csv/JsonExpressionsSuite, Csv/JsonFunctionsSuite and Csv/JsonSuite. I just didn't want to overcomplicate the tests especially in the case when there are small differences. So, passing functions (with inputs and expected result) to template functions will not make them easy to read.

Could you refactor this by introducing new test helper functions?

In any case, I will try that.

@dongjoon-hyun Just in case, do you want to see the refactoring in this PR or a separate one? I ask because we can extract common code to helper function not only from the ported tests.

# Conflicts:
#	sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
@dongjoon-hyun
Copy link
Member

Yes. It would be great if we do that in this PR.

When I did the similar thing for ORC (port tests from Parquet to ORC, port from old ORC to new ORC). I received the same comments.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Nov 7, 2018

Ur, maybe, I wasn't clear to the point. The refactoring scope of this PR is limited to the new tests here.

test("from_csv uses DDL strings for defining a schema - java")
test("roundtrip to_csv -> from_csv")
test("roundtrip from_csv -> to_csv")
test("infers schemas of a CSV string and pass to to from_csv")
test("Support to_csv in SQL")
test("Support from_csv in SQL")

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98542 has finished for PR 22960 at commit 1d3a31b.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 7, 2018

Test build #98566 has finished for PR 22960 at commit 24933a9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

@asfgit asfgit closed this in ee03f76 Nov 8, 2018
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

In the PR, I propose to port existing JSON tests from `JsonFunctionsSuite` that are applicable for CSV, and put them to `CsvFunctionsSuite`. In particular:
- roundtrip `from_csv` to `to_csv`, and `to_csv` to `from_csv`
- using `schema_of_csv` in `from_csv`
- Java API `from_csv`
- using `from_csv` and `to_csv` in exprs.

Closes apache#22960 from MaxGekk/csv-additional-tests.

Authored-by: Maxim Gekk <max.gekk@gmail.com>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
@MaxGekk MaxGekk deleted the csv-additional-tests branch August 17, 2019 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants