SPARK-33755: Allow creating orc table when row format separator is defined by StefanXiepj · Pull Request #30785 · apache/spark

StefanXiepj · 2020-12-15T15:44:41Z

What changes were proposed in this pull request?

When creating table like this:
create table test_orc(c1 string) row format delimited fields terminated by '002' stored as orcfile;
spark throws exception like :
Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orcfile'(line 2, pos 0)

In this pr, we support non-strict rules when creating orc table with row format delimited.

Why are the changes needed?

I found this problem when migrating task from hive to spark. Hive is supported (It's not good, but it's not a problem, we can ignore it). So I fixed it in version Spark 2.4. Although Orc doesn't need this delimiter, but I don't think we need to be so strict in syntax. It is more convenient to migrate tasks from hive to spark.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UTs to be added

…fined

AmplabJenkins · 2020-12-15T16:05:43Z

Can one of the admins verify this patch?

HyukjinKwon · 2020-12-16T02:14:17Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala

+           |  intField INT,
+           |  stringField STRING
+           |)
+           |ROW FORMAT DELIMITED FIELDS TERMINATED BY '002'


How does the ORC table wotk with the delimiter?

Although this is unnecessary for ORC table, we can support an option that user can choose to ignore it or not. Maybe it is better to throw warning than error.

It's better to be explicit for what Spark supports. It's a bit odd that we add no-op syntax.

We have many tasks which are running on hive and slowly, so we want to migrate it from hive to spark. I think other companies want to this too. If we support ignore this exception by set spark.sql.orc.skipRowFormatDelimitedError=true , we can migrate hive task and user do not need to modify his sql script.

Does Hive work with this delimiter specified with ORC?

Hive doesn't work with it, but hive doesn't throw exception too.

Let's don't fix it then. It's odd that Spark works with a syntax that's no-op, and that does not also work in Hive.

thx very much, i'll close it.

…dError

SPARK-33755: Allow creating orc table when row format separator is de…

51c496d

…fined

StefanXiepj mentioned this pull request Dec 15, 2020

SPARK-33755: Allow creating orc table when row format separator is defined #30734

Closed

HyukjinKwon reviewed Dec 16, 2020

View reviewed changes

change default value is false for spark.sql.orc.skipRowFormatDelimite…

05fcc63

…dError

StefanXiepj closed this Dec 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-33755: Allow creating orc table when row format separator is defined#30785

SPARK-33755: Allow creating orc table when row format separator is defined#30785
StefanXiepj wants to merge 2 commits intoapache:branch-2.4from
StefanXiepj:SPARK-33755

StefanXiepj commented Dec 15, 2020 •

edited

Loading

Uh oh!

AmplabJenkins commented Dec 15, 2020

Uh oh!

HyukjinKwon Dec 16, 2020

Uh oh!

StefanXiepj Dec 16, 2020 •

edited

Loading

Uh oh!

HyukjinKwon Dec 16, 2020

Uh oh!

StefanXiepj Dec 16, 2020

Uh oh!

HyukjinKwon Dec 17, 2020

Uh oh!

StefanXiepj Dec 17, 2020

Uh oh!

HyukjinKwon Dec 28, 2020

Uh oh!

StefanXiepj Dec 28, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

StefanXiepj commented Dec 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Dec 15, 2020

Uh oh!

HyukjinKwon Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

StefanXiepj Dec 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

StefanXiepj Dec 16, 2020

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 17, 2020

Choose a reason for hiding this comment

Uh oh!

StefanXiepj Dec 17, 2020

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Dec 28, 2020

Choose a reason for hiding this comment

Uh oh!

StefanXiepj Dec 28, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

StefanXiepj commented Dec 15, 2020 •

edited

Loading

StefanXiepj Dec 16, 2020 •

edited

Loading