-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25955][TEST] Porting JSON tests for CSV functions #22960
Conversation
Test build #98530 has finished for PR 22960 at commit
|
Test build #98531 has finished for PR 22960 at commit
|
assert(out.schema == expected) | ||
} | ||
|
||
test("Support to_csv in SQL") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk, wouldn't the tests in csv-functions.sql
be enough for SQL support test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for double check that the functions are available/(and work) from expressions in Scala. Probably we can make the test smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just get rid of it. I can't imagine both functions are specifically broken alone in selectExpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk . Sorry, but Porting
seems to be not the best way to do this.
Could you refactor this by introducing new test helper functions?
I believe you can test from_json/csv/avro
with the same functions.
val df = Seq("""1,"haa"""").toDS() | ||
checkAnswer( | ||
df.select( | ||
from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only difference are the data at line 91 and from_csv
and from_json
.
I saw a bunch of common code in
In any case, I will try that. @dongjoon-hyun Just in case, do you want to see the refactoring in this PR or a separate one? I ask because we can extract common code to helper function not only from the ported tests. |
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala
Yes. It would be great if we do that in this PR. When I did the similar thing for ORC ( |
Ur, maybe, I wasn't clear to the point. The refactoring scope of this PR is limited to the new tests here.
|
Test build #98542 has finished for PR 22960 at commit
|
Test build #98566 has finished for PR 22960 at commit
|
Merged to master. |
## What changes were proposed in this pull request? In the PR, I propose to port existing JSON tests from `JsonFunctionsSuite` that are applicable for CSV, and put them to `CsvFunctionsSuite`. In particular: - roundtrip `from_csv` to `to_csv`, and `to_csv` to `from_csv` - using `schema_of_csv` in `from_csv` - Java API `from_csv` - using `from_csv` and `to_csv` in exprs. Closes apache#22960 from MaxGekk/csv-additional-tests. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: hyukjinkwon <gurwls223@apache.org>
What changes were proposed in this pull request?
In the PR, I propose to port existing JSON tests from
JsonFunctionsSuite
that are applicable for CSV, and put them toCsvFunctionsSuite
. In particular:from_csv
toto_csv
, andto_csv
tofrom_csv
schema_of_csv
infrom_csv
from_csv
from_csv
andto_csv
in exprs.