Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column #29500

Closed
wants to merge 2 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

Hive no serde mode when column less then output specified column, it will pad null value to it, spark should do this also.

hive> SELECT TRANSFORM(a, b)
    >   ROW FORMAT DELIMITED
    >   FIELDS TERMINATED BY '|'
    >   LINES TERMINATED BY '\n'
    >   NULL DEFINED AS 'NULL'
    > USING 'cat' as (a string, b string, c string, d string)
    >   ROW FORMAT DELIMITED
    >   FIELDS TERMINATED BY '|'
    >   LINES TERMINATED BY '\n'
    >   NULL DEFINED AS 'NULL'
    > FROM (
    > select 1 as a, 2 as b
    > ) tmp ;
OK
1	2	NULL	NULL
Time taken: 24.626 seconds, Fetched: 1 row(s)

Why are the changes needed?

Keep save behavior with hive data.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added UT

@AngersZhuuuu AngersZhuuuu changed the title Spark 32667 [SPARK-32667][SQL] Scrip transform no-serde mode should pad null value to filling column Aug 21, 2020
@HyukjinKwon HyukjinKwon changed the title [SPARK-32667][SQL] Scrip transform no-serde mode should pad null value to filling column [SPARK-32667][SQL] Script transform no-serde mode should pad null value to filling column Aug 21, 2020
@HyukjinKwon
Copy link
Member

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Aug 21, 2020

Test build #127712 has finished for PR 29500 at commit 4173cfb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The patch LGTM, but I have a question about the terminology here. AFAIK no-serde in Hive means a default serde is picked. Shall we use "default-serde mode" instead of "no-serde mode"?

@AngersZhuuuu
Copy link
Contributor Author

The patch LGTM, but I have a question about the terminology here. AFAIK no-serde in Hive means a default serde is picked. Shall we use "default-serde mode" instead of "no-serde mode"?

Yea

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-32667][SQL] Script transform no-serde mode should pad null value to filling column [SPARK-32667][SQL] Script transform 'default-serde' mode should pad null value to filling column Aug 21, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c75a827 Aug 21, 2020
@@ -372,6 +372,35 @@ abstract class BaseScriptTransformationSuite extends SparkPlanTest with SQLTestU
'e.cast("string"))).collect())
}
}

test("SPARK-32667: SCRIPT TRANSFORM pad null value to fill column" +
" when without schema less (no-serde)") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update "no-serde" in the codebase in your other TRANSFORM PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update "no-serde" in the codebase in your other TRANSFORM PRs.

Will raise a pr to handle this together

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants