[SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column #29500

AngersZhuuuu · 2020-08-21T01:53:32Z

What changes were proposed in this pull request?

Hive no serde mode when column less then output specified column, it will pad null value to it, spark should do this also.

hive> SELECT TRANSFORM(a, b)
    >   ROW FORMAT DELIMITED
    >   FIELDS TERMINATED BY '|'
    >   LINES TERMINATED BY '\n'
    >   NULL DEFINED AS 'NULL'
    > USING 'cat' as (a string, b string, c string, d string)
    >   ROW FORMAT DELIMITED
    >   FIELDS TERMINATED BY '|'
    >   LINES TERMINATED BY '\n'
    >   NULL DEFINED AS 'NULL'
    > FROM (
    > select 1 as a, 2 as b
    > ) tmp ;
OK
1	2	NULL	NULL
Time taken: 24.626 seconds, Fetched: 1 row(s)

Why are the changes needed?

Keep save behavior with hive data.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

…e to filling column

HyukjinKwon · 2020-08-21T02:41:02Z

cc @cloud-fan

SparkQA · 2020-08-21T06:32:14Z

Test build #127712 has finished for PR 29500 at commit 4173cfb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-08-21T07:27:27Z

The patch LGTM, but I have a question about the terminology here. AFAIK no-serde in Hive means a default serde is picked. Shall we use "default-serde mode" instead of "no-serde mode"?

AngersZhuuuu · 2020-08-21T07:32:28Z

The patch LGTM, but I have a question about the terminology here. AFAIK no-serde in Hive means a default serde is picked. Shall we use "default-serde mode" instead of "no-serde mode"?

Yea

cloud-fan · 2020-08-21T07:37:09Z

thanks, merging to master!

cloud-fan · 2020-08-21T07:38:10Z

sql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala

@@ -372,6 +372,35 @@ abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestU
            'e.cast("string"))).collect())
    }
  }
+
+  test("SPARK-32667: SCRIPT TRANSFORM pad null value to fill column" +
+    " when without schema less (no-serde)") {


please update "no-serde" in the codebase in your other TRANSFORM PRs.

please update "no-serde" in the codebase in your other TRANSFORM PRs.

Will raise a pr to handle this together

AngersZhuuuu added 2 commits August 21, 2020 09:45

[SPARK-32667][SQL] Scrip transform no-serde mode should pad null valu…

9340034

…e to filling column

Update BaseScriptTransformationExec.scala

4173cfb

probot-autolabeler bot added the SQL label Aug 21, 2020

AngersZhuuuu changed the title ~~Spark 32667~~ [SPARK-32667][SQL] Scrip transform no-serde mode should pad null value to filling column Aug 21, 2020

HyukjinKwon changed the title ~~[SPARK-32667][SQL] Scrip transform no-serde mode should pad null value to filling column~~ [SPARK-32667][SQL] Script transform no-serde mode should pad null value to filling column Aug 21, 2020

HyukjinKwon approved these changes Aug 21, 2020

View reviewed changes

AngersZhuuuu changed the title ~~[SPARK-32667][SQL] Script transform no-serde mode should pad null value to filling column~~ [SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column Aug 21, 2020

cloud-fan closed this in c75a827 Aug 21, 2020

cloud-fan reviewed Aug 21, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column #29500

[SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column #29500

AngersZhuuuu commented Aug 21, 2020

HyukjinKwon commented Aug 21, 2020

SparkQA commented Aug 21, 2020

cloud-fan commented Aug 21, 2020

AngersZhuuuu commented Aug 21, 2020

cloud-fan commented Aug 21, 2020

cloud-fan Aug 21, 2020

AngersZhuuuu Aug 21, 2020

[SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column #29500

[SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column #29500

Conversation

AngersZhuuuu commented Aug 21, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

HyukjinKwon commented Aug 21, 2020

SparkQA commented Aug 21, 2020

cloud-fan commented Aug 21, 2020

AngersZhuuuu commented Aug 21, 2020

cloud-fan commented Aug 21, 2020

cloud-fan Aug 21, 2020

Choose a reason for hiding this comment

AngersZhuuuu Aug 21, 2020

Choose a reason for hiding this comment