Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32685][SQL] When specify serde, default filed.delim is '\t' #30942

Closed
wants to merge 4 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

What changes were proposed in this pull request?

In hive script transform, when we use specified serde, the filed.delim is '\t'
image
And change to other serde and explain query plan, filed.delim is same.

In spark current code, the result is as below:
image

We should keep same as hive.

Notic:
the result's NULL value is different is another issue https://issues.apache.org/jira/browse/SPARK-32684

Why are the changes needed?

Keep same with hive serde

Does this PR introduce any user-facing change?

In script transform, is not specified, field.delim keep same with hive as \t

How was this patch tested?

UT added

@github-actions github-actions bot added the SQL label Dec 28, 2020
@AngersZhuuuu
Copy link
Contributor Author

Add UT

val query4 = sql(
      """
        |SELECT split(value, "&") FROM (
        |SELECT TRANSFORM(a, b, c)
        |  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
        |USING 'cat'
        |  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
        |  WITH SERDEPROPERTIES (
        |   'serialization.last.column.takes.rest' = 'true',
        |   'field.delim' = '&'
        |  )
        |FROM (SELECT 1 AS a, 2 AS b, 3 AS c) t
        |) temp;
      """.stripMargin)
    checkAnswer(query4, identity, Row(null) :: Nil)

Since in Hive
image

@AngersZhuuuu
Copy link
Contributor Author

FYI @maropu @cloud-fan

@@ -505,7 +505,12 @@ class SparkSqlAstBuilder extends AstBuilder {
} else {
None
}
(Seq.empty, Option(name), props.toSeq, recordHandler)
val finalProps = if (!props.contains("field.delim")) {
props.toSeq ++ Seq("field.delim" -> "\t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior now? which delimiter do we use by default before this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the behavior now? which delimiter do we use by default before this PR?

No default we use '\u0001'

@@ -505,7 +505,12 @@ class SparkSqlAstBuilder extends AstBuilder {
} else {
None
}
(Seq.empty, Option(name), props.toSeq, recordHandler)
val finalProps = if (!props.contains("field.delim")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this case sensitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this case sensitive?

Yes, case sensitive
image

@SparkQA
Copy link

SparkQA commented Dec 28, 2020

Test build #133421 has finished for PR 30942 at commit 79e78f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

The code has been modified to make it easier for it to add other possible default parameters in specified Hive serde mode.
FYI @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in fc508d1 Dec 28, 2020
@gatorsmile
Copy link
Member

gatorsmile commented Jan 6, 2021

@AngersZhuuuu Please update the migration guide

@AngersZhuuuu
Copy link
Contributor Author

@AngersZhuuuu Please update the migration guide

Yea

cloud-fan pushed a commit that referenced this pull request Jan 6, 2021
…ault filed.delim to '\t' when user specifies serde

### What changes were proposed in this pull request?
Update migration guide according to #30942 (comment)

### Why are the changes needed?
update migration guide.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Not need

Closes #31051 from AngersZhuuuu/SPARK-32685-FOLLOW-UP.

Authored-by: angerszhu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants