SPARK-33755: Allow creating orc table when row format separator is defined#30785
SPARK-33755: Allow creating orc table when row format separator is defined#30785StefanXiepj wants to merge 2 commits intoapache:branch-2.4from
Conversation
|
Can one of the admins verify this patch? |
| | intField INT, | ||
| | stringField STRING | ||
| |) | ||
| |ROW FORMAT DELIMITED FIELDS TERMINATED BY '002' |
There was a problem hiding this comment.
How does the ORC table wotk with the delimiter?
There was a problem hiding this comment.
Although this is unnecessary for ORC table, we can support an option that user can choose to ignore it or not. Maybe it is better to throw warning than error.
There was a problem hiding this comment.
It's better to be explicit for what Spark supports. It's a bit odd that we add no-op syntax.
There was a problem hiding this comment.
We have many tasks which are running on hive and slowly, so we want to migrate it from hive to spark. I think other companies want to this too. If we support ignore this exception by set spark.sql.orc.skipRowFormatDelimitedError=true , we can migrate hive task and user do not need to modify his sql script.
There was a problem hiding this comment.
Does Hive work with this delimiter specified with ORC?
There was a problem hiding this comment.
Hive doesn't work with it, but hive doesn't throw exception too.
There was a problem hiding this comment.
Let's don't fix it then. It's odd that Spark works with a syntax that's no-op, and that does not also work in Hive.
There was a problem hiding this comment.
thx very much, i'll close it.
What changes were proposed in this pull request?
When creating table like this:
create table test_orc(c1 string) row format delimited fields terminated by '002' stored as orcfile;spark throws exception like :
Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orcfile'(line 2, pos 0)In this pr, we support non-strict rules when creating orc table with row format delimited.
Why are the changes needed?
I found this problem when migrating task from hive to spark. Hive is supported (It's not good, but it's not a problem, we can ignore it). So I fixed it in version Spark 2.4. Although Orc doesn't need this delimiter, but I don't think we need to be so strict in syntax. It is more convenient to migrate tasks from hive to spark.
Does this PR introduce any user-facing change?
No
How was this patch tested?
UTs to be added