[SPARK-22435][SQL] Support processing array and map type using script by jinxing64 · Pull Request #19652 · apache/spark

jinxing64 · 2017-11-03T10:27:56Z

What changes were proposed in this pull request?

Currently, It is not supported to use script(e.g. python) to process array type or map type, it will complain with below message:
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast to [Ljava.lang.Object
org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to java.util.Map

This pr proposes to support it by using DelimitedJSONSerDe
This pr also fixes a bug -- when using input row format with script, no data will be produced from ScriptTransformationExec.

How was this patch tested?

Tests added.

jinxing64 · 2017-11-03T10:29:45Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

t.getText doesn't work, we need to process the token. e.g. remove the quote

jinxing64 · 2017-11-03T10:35:24Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

writer writes records into the input stream of script. Isn't it should be initialized with input format?

jinxing64 · 2017-11-03T10:38:31Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

DelimitedJSONSerDe doesn't support deserialize.
Use DelimitedJSONSerDe for input SerDe and LazySimpleSerDe for output SerDe.

jinxing64 · 2017-11-03T13:01:24Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala

TODO: build json string for more types

SparkQA · 2017-11-03T13:22:53Z

Test build #83399 has finished for PR 19652 at commit 4a99426.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-11-04T17:18:06Z

Test build #83446 has finished for PR 19652 at commit 0d706ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-11-04T18:00:46Z

Thanks for working on it. Will review it next week.

jinxing64 · 2017-11-13T08:36:59Z

@gatorsmile
(Very gentle ping)
Could you please give some comments when you have time :)
Thanks you so much :)

SparkQA · 2018-12-13T16:10:24Z

Test build #100089 has finished for PR 19652 at commit 0d706ff.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jinxing64 commented Nov 3, 2017

View reviewed changes

Support processing array and map type using script

0d706ff

jinxing64 force-pushed the SPARK-22435 branch from 4a99426 to 0d706ff Compare November 4, 2017 14:13

dongjoon-hyun added the SQL label Jun 14, 2019

jinxing64 closed this Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-22435][SQL] Support processing array and map type using script#19652

[SPARK-22435][SQL] Support processing array and map type using script#19652
jinxing64 wants to merge 1 commit intoapache:masterfrom
jinxing64:SPARK-22435

jinxing64 commented Nov 3, 2017

Uh oh!

jinxing64 Nov 3, 2017

Uh oh!

jinxing64 Nov 3, 2017

Uh oh!

jinxing64 Nov 3, 2017

Uh oh!

jinxing64 Nov 3, 2017

Uh oh!

SparkQA commented Nov 3, 2017

Uh oh!

SparkQA commented Nov 4, 2017

Uh oh!

gatorsmile commented Nov 4, 2017

Uh oh!

jinxing64 commented Nov 13, 2017

Uh oh!

SparkQA commented Dec 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jinxing64 commented Nov 3, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

jinxing64 Nov 3, 2017

Choose a reason for hiding this comment

Uh oh!

jinxing64 Nov 3, 2017

Choose a reason for hiding this comment

Uh oh!

jinxing64 Nov 3, 2017

Choose a reason for hiding this comment

Uh oh!

jinxing64 Nov 3, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 3, 2017

Uh oh!

SparkQA commented Nov 4, 2017

Uh oh!

gatorsmile commented Nov 4, 2017

Uh oh!

jinxing64 commented Nov 13, 2017

Uh oh!

SparkQA commented Dec 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants