Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24543][SQL] Support any type as DDL string for from_json's schema #21550

Closed
wants to merge 6 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Jun 13, 2018

What changes were proposed in this pull request?

In the PR, I propose to support any DataType represented as DDL string for the from_json function. After the changes, it will be possible to specify MapType in SQL like:

select from_json('{"a":1, "b":2}', 'map<string, int>')

and in Scala (similar in other languages)

val in = Seq("""{"a": {"b": 1}}""").toDS()
val schema = "map<string, map<string, int>>"
val out = in.select(from_json($"value", schema, Map.empty[String, String]))

How was this patch tested?

Added a couple sql tests and modified existing tests for Python and Scala. The former tests were modified because it is not imported for them in which format schema for from_json is provided.

@HyukjinKwon
Copy link
Member

Yea, I like this way. Did it solve your case too?

@MaxGekk
Copy link
Member Author

MaxGekk commented Jun 13, 2018

Did it solve your case too?

It doesn't solve our case fully but at least it unblocks us.

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91759 has finished for PR 21550 at commit 5d53ec7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 13, 2018

Test build #91771 has finished for PR 21550 at commit 5d53ec7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -110,6 +111,8 @@ abstract class DataType extends AbstractDataType {
@InterfaceStability.Stable
object DataType {

def fromDDL(ddl: String): DataType = CatalystSqlParser.parseDataType(ddl)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable for DataType.fromDDL to also support table style schema like a int, b long. How about we put the try catch here and other places just need to call DataType.fromDDL?

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91832 has finished for PR 21550 at commit af946b8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 14, 2018

Test build #91844 has finished for PR 21550 at commit af946b8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b8f27ae Jun 14, 2018
@@ -354,8 +354,8 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext {

test("SPARK-24027: from_json - map<string, map<string, int>>") {
val in = Seq("""{"a": {"b": 1}}""").toDS()
val schema = MapType(StringType, MapType(StringType, IntegerType))
val out = in.select(from_json($"value", schema))
val schema = "map<string, map<string, int>>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general suggestion. Create a new test case for these changes, instead of modifying the existing ones.

jzhuge pushed a commit to jzhuge/spark that referenced this pull request Mar 7, 2019
## What changes were proposed in this pull request?

In the PR, I propose to support any DataType represented as DDL string for the from_json function. After the changes, it will be possible to specify `MapType` in SQL like:
```sql
select from_json('{"a":1, "b":2}', 'map<string, int>')
```
and in Scala (similar in other languages)
```scala
val in = Seq("""{"a": {"b": 1}}""").toDS()
val schema = "map<string, map<string, int>>"
val out = in.select(from_json($"value", schema, Map.empty[String, String]))
```

## How was this patch tested?

Added a couple sql tests and modified existing tests for Python and Scala. The former tests were modified because it is not imported for them in which format schema for `from_json` is provided.

Author: Maxim Gekk <maxim.gekk@databricks.com>

Closes apache#21550 from MaxGekk/from_json-ddl-schema.
@MaxGekk MaxGekk deleted the from_json-ddl-schema branch August 17, 2019 13:33
jzhuge pushed a commit to jzhuge/spark that referenced this pull request Oct 15, 2019
## What changes were proposed in this pull request?

In the PR, I propose to support any DataType represented as DDL string for the from_json function. After the changes, it will be possible to specify `MapType` in SQL like:
```sql
select from_json('{"a":1, "b":2}', 'map<string, int>')
```
and in Scala (similar in other languages)
```scala
val in = Seq("""{"a": {"b": 1}}""").toDS()
val schema = "map<string, map<string, int>>"
val out = in.select(from_json($"value", schema, Map.empty[String, String]))
```

## How was this patch tested?

Added a couple sql tests and modified existing tests for Python and Scala. The former tests were modified because it is not imported for them in which format schema for `from_json` is provided.

Author: Maxim Gekk <maxim.gekk@databricks.com>

Closes apache#21550 from MaxGekk/from_json-ddl-schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants