Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-21954][SQL] JacksonUtils should verify MapType's value type instead of key type #19167

Closed
wants to merge 4 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 8, 2017

What changes were proposed in this pull request?

JacksonUtils.verifySchema verifies if a data type can be converted to JSON. For MapType, it now verifies the key type. However, in JacksonGenerator, when converting a map to JSON, we only care about its values and create a writer for the values. The keys in a map are treated as strings by calling toString on the keys.

Thus, we should change JacksonUtils.verifySchema to verify the value type of MapType.

How was this patch tested?

Added tests.


class JacksonUtilsSuite extends SparkFunSuite {

test("verifySchema") {
Copy link
Member

@HyukjinKwon HyukjinKwon Sep 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya, would you mind if I ask to leave a simple e2e test alone for this PR? This one looks quite a simple fix to me and I think this won't require such many tests alone for this issue. I want to backport this but only leave strictly related changes here.

I think we could do something like SELECT to_json(struct(map(interval 1 second, 'a'))) or SELECT to_json(struct(map('a', interval 1 second))).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am fine with those test coverage improvement too together. Up to you.

val atomicTypes = DataTypeTestUtils.atomicTypes
val atomicArrayTypes = atomicTypes.map(ArrayType(_, containsNull = false))
val atomicMapTypes = for (keyType <- atomicTypes;
valueType <- atomicTypes) yield MapType(keyType, valueType, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do this as below:

val atomicMapTypes = for {
  keyType <- atomicTypes
  valueType <- atomicTypes
} yield MapType(keyType, valueType, false)


class JacksonUtilsSuite extends SparkFunSuite {

test("verifySchema") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am fine with those test coverage improvement too together. Up to you.


// For MapType, its keys are treated as a string basically when generating JSON, so we only
// care if the values are valid for JSON.
val alsoValidMapTypes = for (keyType <- atomicTypes ++ invalidTypes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move these supported cases into 45L so we loop twice, one for the vaild and one for invalid one?

@SparkQA
Copy link

SparkQA commented Sep 8, 2017

Test build #81552 has finished for PR 19167 at commit 760edad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 9, 2017

@HyukjinKwon Thanks for review. I simplified the test cases. Please take a look when you are available.

@@ -257,6 +257,18 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext {
"A type of keys and values in map() must be string, but got"))
}

test("SPARK-21954: JacksonUtils should verify MapType's value type instead of key type") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya, how about puting this test around to_json unsupported type here and maybe use Scala function API for consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggestion.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for a minor comment.

.select(struct(map(lit("a"), $"a._1".cast(CalendarIntervalType)).as("col1")).as("c"))
checkAnswer(
df2.select(to_json($"c")),
Row("""{"col1":{"interval -3 months 7 hours":"a"}}""") :: Nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    val df2 = baseDf
      .select(struct(map($"a._1".cast(CalendarIntervalType), lit("a")).as("col1")).as("c"))
    ...
    checkAnswer(
      df2.select(to_json($"c")),
      Row("""{"col1":{"interval -3 months 7 hours":"a"}}""") :: Nil)

This case looks a supported case though. We could maybe make a separate test for this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Moved into a test.

@SparkQA
Copy link

SparkQA commented Sep 9, 2017

Test build #81573 has finished for PR 19167 at commit 9bc8b2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 9, 2017

Test build #81575 has finished for PR 19167 at commit 884c533.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 9, 2017

Test build #81582 has finished for PR 19167 at commit fbf8a88.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 9, 2017

Test build #81578 has finished for PR 19167 at commit 2cf6bbb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Sep 9, 2017

Test build #81583 has finished for PR 19167 at commit fbf8a88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Sep 9, 2017
…stead of key type

## What changes were proposed in this pull request?

`JacksonUtils.verifySchema` verifies if a data type can be converted to JSON. For `MapType`, it now verifies the key type. However, in `JacksonGenerator`, when converting a map to JSON, we only care about its values and create a writer for the values. The keys in a map are treated as strings by calling `toString` on the keys.

Thus, we should change `JacksonUtils.verifySchema` to verify the value type of `MapType`.

## How was this patch tested?

Added tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19167 from viirya/test-jacksonutils.

(cherry picked from commit 6b45d7e)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
@asfgit asfgit closed this in 6b45d7e Sep 9, 2017
@HyukjinKwon
Copy link
Member

Merged to master and branch-2.2.

@viirya
Copy link
Member Author

viirya commented Sep 9, 2017

Thanks @HyukjinKwon

MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…stead of key type

## What changes were proposed in this pull request?

`JacksonUtils.verifySchema` verifies if a data type can be converted to JSON. For `MapType`, it now verifies the key type. However, in `JacksonGenerator`, when converting a map to JSON, we only care about its values and create a writer for the values. The keys in a map are treated as strings by calling `toString` on the keys.

Thus, we should change `JacksonUtils.verifySchema` to verify the value type of `MapType`.

## How was this patch tested?

Added tests.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#19167 from viirya/test-jacksonutils.

(cherry picked from commit 6b45d7e)
Signed-off-by: hyukjinkwon <gurwls223@gmail.com>
@viirya viirya deleted the test-jacksonutils branch December 27, 2023 18:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants