Skip to content

[SPARK-22435][SQL] Support processing array and map type using script#19652

Closed
jinxing64 wants to merge 1 commit intoapache:masterfrom
jinxing64:SPARK-22435
Closed

[SPARK-22435][SQL] Support processing array and map type using script#19652
jinxing64 wants to merge 1 commit intoapache:masterfrom
jinxing64:SPARK-22435

Conversation

@jinxing64
Copy link

What changes were proposed in this pull request?

Currently, It is not supported to use script(e.g. python) to process array type or map type, it will complain with below message:
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData cannot be cast to [Ljava.lang.Object
org.apache.spark.sql.catalyst.expressions.UnsafeMapData cannot be cast to java.util.Map

This pr proposes to support it by using DelimitedJSONSerDe
This pr also fixes a bug -- when using input row format with script, no data will be produced from ScriptTransformationExec.

How was this patch tested?

Tests added.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t.getText doesn't work, we need to process the token. e.g. remove the quote

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writer writes records into the input stream of script. Isn't it should be initialized with input format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DelimitedJSONSerDe doesn't support deserialize.
Use DelimitedJSONSerDe for input SerDe and LazySimpleSerDe for output SerDe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: build json string for more types

@SparkQA
Copy link

SparkQA commented Nov 3, 2017

Test build #83399 has finished for PR 19652 at commit 4a99426.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 4, 2017

Test build #83446 has finished for PR 19652 at commit 0d706ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks for working on it. Will review it next week.

@jinxing64
Copy link
Author

@gatorsmile
(Very gentle ping)
Could you please give some comments when you have time :)
Thanks you so much :)

@SparkQA
Copy link

SparkQA commented Dec 13, 2018

Test build #100089 has finished for PR 19652 at commit 0d706ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants