-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21954][SQL] JacksonUtils should verify MapType's value type instead of key type #19167
Conversation
|
||
class JacksonUtilsSuite extends SparkFunSuite { | ||
|
||
test("verifySchema") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya, would you mind if I ask to leave a simple e2e test alone for this PR? This one looks quite a simple fix to me and I think this won't require such many tests alone for this issue. I want to backport this but only leave strictly related changes here.
I think we could do something like SELECT to_json(struct(map(interval 1 second, 'a')))
or SELECT to_json(struct(map('a', interval 1 second)))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am fine with those test coverage improvement too together. Up to you.
val atomicTypes = DataTypeTestUtils.atomicTypes | ||
val atomicArrayTypes = atomicTypes.map(ArrayType(_, containsNull = false)) | ||
val atomicMapTypes = for (keyType <- atomicTypes; | ||
valueType <- atomicTypes) yield MapType(keyType, valueType, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd do this as below:
val atomicMapTypes = for {
keyType <- atomicTypes
valueType <- atomicTypes
} yield MapType(keyType, valueType, false)
|
||
class JacksonUtilsSuite extends SparkFunSuite { | ||
|
||
test("verifySchema") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am fine with those test coverage improvement too together. Up to you.
|
||
// For MapType, its keys are treated as a string basically when generating JSON, so we only | ||
// care if the values are valid for JSON. | ||
val alsoValidMapTypes = for (keyType <- atomicTypes ++ invalidTypes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to move these supported cases into 45L so we loop twice, one for the vaild and one for invalid one?
Test build #81552 has finished for PR 19167 at commit
|
9bc8b2d
to
a15bdb7
Compare
@HyukjinKwon Thanks for review. I simplified the test cases. Please take a look when you are available. |
a15bdb7
to
884c533
Compare
@@ -257,6 +257,18 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { | |||
"A type of keys and values in map() must be string, but got")) | |||
} | |||
|
|||
test("SPARK-21954: JacksonUtils should verify MapType's value type instead of key type") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya, how about puting this test around to_json unsupported type
here and maybe use Scala function API for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified as suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for a minor comment.
.select(struct(map(lit("a"), $"a._1".cast(CalendarIntervalType)).as("col1")).as("c")) | ||
checkAnswer( | ||
df2.select(to_json($"c")), | ||
Row("""{"col1":{"interval -3 months 7 hours":"a"}}""") :: Nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val df2 = baseDf
.select(struct(map($"a._1".cast(CalendarIntervalType), lit("a")).as("col1")).as("c"))
...
checkAnswer(
df2.select(to_json($"c")),
Row("""{"col1":{"interval -3 months 7 hours":"a"}}""") :: Nil)
This case looks a supported case though. We could maybe make a separate test for this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Moved into a test.
Test build #81573 has finished for PR 19167 at commit
|
Test build #81575 has finished for PR 19167 at commit
|
Test build #81582 has finished for PR 19167 at commit
|
Test build #81578 has finished for PR 19167 at commit
|
retest this please |
Test build #81583 has finished for PR 19167 at commit
|
…stead of key type ## What changes were proposed in this pull request? `JacksonUtils.verifySchema` verifies if a data type can be converted to JSON. For `MapType`, it now verifies the key type. However, in `JacksonGenerator`, when converting a map to JSON, we only care about its values and create a writer for the values. The keys in a map are treated as strings by calling `toString` on the keys. Thus, we should change `JacksonUtils.verifySchema` to verify the value type of `MapType`. ## How was this patch tested? Added tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes #19167 from viirya/test-jacksonutils. (cherry picked from commit 6b45d7e) Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
Merged to master and branch-2.2. |
Thanks @HyukjinKwon |
…stead of key type ## What changes were proposed in this pull request? `JacksonUtils.verifySchema` verifies if a data type can be converted to JSON. For `MapType`, it now verifies the key type. However, in `JacksonGenerator`, when converting a map to JSON, we only care about its values and create a writer for the values. The keys in a map are treated as strings by calling `toString` on the keys. Thus, we should change `JacksonUtils.verifySchema` to verify the value type of `MapType`. ## How was this patch tested? Added tests. Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#19167 from viirya/test-jacksonutils. (cherry picked from commit 6b45d7e) Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
What changes were proposed in this pull request?
JacksonUtils.verifySchema
verifies if a data type can be converted to JSON. ForMapType
, it now verifies the key type. However, inJacksonGenerator
, when converting a map to JSON, we only care about its values and create a writer for the values. The keys in a map are treated as strings by callingtoString
on the keys.Thus, we should change
JacksonUtils.verifySchema
to verify the value type ofMapType
.How was this patch tested?
Added tests.