[SPAKRK-33639][SQL]The external table does not specify a location by guixiaowen · Pull Request #30719 · apache/spark

guixiaowen · 2020-12-11T03:19:00Z

What changes were proposed in this pull request?

When creating an external table, if the external path is not declared, an error will be reported, so the sense of interaction is not very user-friendly.
When creating an internal table, a path is initialized, which is the default path.
If this path is also set to the path of the external table, user interaction will become more friendly.

Why are the changes needed?

Increase the friendly experience of user interaction.

Does this PR introduce any user-facing change?

The user does not need to make any modification. If the user does not want such a configuration to take effect, he can close this modification.

How was this patch tested?

State before modification：

spark-sql> create external table if not exists external_table_spark_2(last_update STRING, col_a STRING)
> PARTITIONED BY (par_dt STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> WITH SERDEPROPERTIES (
> 'field.delim' = ',',
> 'serialization.format' = ','
> )
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> ;
Error in query:
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0)

== SQL ==
create external table if not exists external_table_spark_2(last_update STRING, col_a STRING)
^^^
PARTITIONED BY (par_dt STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim' = ',',
'serialization.format' = ','
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

Modified state：

spark-sql> CREATE EXTERNAL TABLE external_table_spark_10(last_update STRING, col_a STRING)
> PARTITIONED BY (par_dt STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> WITH SERDEPROPERTIES (
> 'field.delim' = ',',
> 'serialization.format' = ','
> )
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
20/11/02 11:26:59 DEBUG log: DDL: struct external_table_spark_10 { string last_update, string col_a}
Response code
Time taken: 1.024 seconds
spark-sql>

After modification, the table creation statement will add the table creation path by default.

spark-sql> show create table external_table_spark_10;
createtab_stmt
CREATE EXTERNAL TABLE external_table_spark_10(last_update STRING, col_a STRING)
PARTITIONED BY (par_dt STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim' = ',',
'serialization.format' = ','
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION 'hdfs://xxxx/xxxx/xxxx/xxxx/xxxx/xxxx/external_table_spark_10'
TBLPROPERTIES (
'transient_lastDdlTime' = '1604287619',
'PART_LIMIT' = '10000',
'LEVEL' = '0',
'TTL' = '60'
)

If you don't want this modification to take effect, you can set spark.sql.create.external.statement.location to true.

spark.sql.create.external.statement.location=true

AmplabJenkins · 2020-12-11T03:49:02Z

Can one of the admins verify this patch?

HyukjinKwon · 2020-12-11T05:12:24Z

@guixiaowen, new features should target master branch. Also, I believe this syntax is from Hive. In this case, I don't think it's worthwhile fixing it.

github-actions · 2021-03-22T00:49:27Z

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

edit spark-33639

1bc607b

github-actions bot added the Stale label Mar 22, 2021

github-actions bot closed this Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPAKRK-33639][SQL]The external table does not specify a location#30719

[SPAKRK-33639][SQL]The external table does not specify a location#30719
guixiaowen wants to merge 1 commit intoapache:branch-2.4from
guixiaowen:SPARK-33639

guixiaowen commented Dec 11, 2020 •

edited

Loading

Uh oh!

AmplabJenkins commented Dec 11, 2020

Uh oh!

HyukjinKwon commented Dec 11, 2020

Uh oh!

github-actions bot commented Mar 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

guixiaowen commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Dec 11, 2020

Uh oh!

HyukjinKwon commented Dec 11, 2020

Uh oh!

github-actions bot commented Mar 22, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guixiaowen commented Dec 11, 2020 •

edited

Loading