New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30592][SQL] Interval support for csv and json funtions #27317
Conversation
SELECT cast(from_csv('1, 1 day', 'a INT, b string').b as interval) struct<CAST(from_csv(1, 1 day).b AS INTERVAL):interval> 1 days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test for spark.sql.legacy.to_json.verifyMapKeyType.enabled
is false
to be sure that the config works.
Thanks, @MaxGekk on second thought, we should support these functions to deal with intervals as we only need to prohibit interval from external storage. So I have removed that conf |
val mapType1 = MapType(CalendarIntervalType, IntegerType) | ||
val schema1 = StructType(StructField("a", mapType1) :: Nil) | ||
val struct1 = Literal.create(null, schema1) | ||
test("from_json - interval support") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we follow csv test and test from/to in one test case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
@@ -222,33 +222,24 @@ class JsonFunctionsSuite extends QueryTest with SharedSparkSession { | |||
Row("""{"_1":"26/08/2015 18:00"}""") :: Nil) | |||
} | |||
|
|||
test("to_json - key types of map don't matter") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove it now. There is no unsupported types anymore. JSON supports all the types.
parseJsonToken[CalendarInterval](parser, dataType) { | ||
case VALUE_STRING => | ||
IntervalUtils.safeStringToInterval(UTF8String.fromString(parser.getText)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shift }
to the right, it belongs to parseJsonToken
not to case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
Test build #117222 has finished for PR 27317 at commit
|
Test build #117223 has finished for PR 27317 at commit
|
Test build #117230 has finished for PR 27317 at commit
|
Test build #117232 has finished for PR 27317 at commit
|
Test build #117235 has finished for PR 27317 at commit
|
thanks, merging to master! |
SELECT from_csv('1, 1 day', 'a INT, b interval'); | ||
SELECT to_csv(named_struct('a', interval 32 month, 'b', interval 70 minute)); | ||
SELECT from_json('{"a":"1 days"}', 'a interval'); | ||
SELECT to_json(map('a', interval 25 month 100 day 130 minute)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yaooqinn Could you add round-trip test cases?
- from_csv(to_csv(....))
- to_csv(from_csv(...))
- from_json(to_json(....))
- to_json(from_json(....))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll make a followup.
@yaooqinn |
The changes in CSV are not user-facing |
### What changes were proposed in this pull request? Add round-trip tests for CSV and JSON functions as #27317 (comment) asked. ### Why are the changes needed? improve test coverage ### Does this PR introduce any user-facing change? no ### How was this patch tested? add uts Closes #27510 from yaooqinn/SPARK-30592-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Add round-trip tests for CSV and JSON functions as #27317 (comment) asked. ### Why are the changes needed? improve test coverage ### Does this PR introduce any user-facing change? no ### How was this patch tested? add uts Closes #27510 from yaooqinn/SPARK-30592-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org> (cherry picked from commit 58b9ca1) Signed-off-by: HyukjinKwon <gurwls223@apache.org>
### What changes were proposed in this pull request? Add round-trip tests for CSV and JSON functions as apache#27317 (comment) asked. ### Why are the changes needed? improve test coverage ### Does this PR introduce any user-facing change? no ### How was this patch tested? add uts Closes apache#27510 from yaooqinn/SPARK-30592-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>
What changes were proposed in this pull request?
In this PR, I'd propose to fully support interval for the CSV and JSON functions.
On one hand, CSV and JSON records consist of string values, in the cast logic, we can cast string from/to interval now, so we can make those functions support intervals easily.
Before this change we can only use this as a workaround.
On the other hand, we ban reading or writing intervals from CSV and JSON files. To directly read and write with external json/csv storage, you still need explicit cast, e.g.
Why are the changes needed?
for interval's future-proofing purpose
Does this PR introduce any user-facing change?
yes, the
to_json
/from_json
function can deal with intervals now. e.g.for
from_json
there is no such use case because we do not supporta interval
for
to_json
, we can use interval values nowbefore
after
How was this patch tested?
add ut