Skip to content

[SPAKRK-33639][SQL]The external table does not specify a location#30719

Closed
guixiaowen wants to merge 1 commit intoapache:branch-2.4from
guixiaowen:SPARK-33639
Closed

[SPAKRK-33639][SQL]The external table does not specify a location#30719
guixiaowen wants to merge 1 commit intoapache:branch-2.4from
guixiaowen:SPARK-33639

Conversation

@guixiaowen
Copy link
Contributor

@guixiaowen guixiaowen commented Dec 11, 2020

What changes were proposed in this pull request?

When creating an external table, if the external path is not declared, an error will be reported, so the sense of interaction is not very user-friendly.
When creating an internal table, a path is initialized, which is the default path.
If this path is also set to the path of the external table, user interaction will become more friendly.

Why are the changes needed?

Increase the friendly experience of user interaction.

Does this PR introduce any user-facing change?

The user does not need to make any modification. If the user does not want such a configuration to take effect, he can close this modification.

How was this patch tested?

State before modification:

spark-sql> create external table if not exists external_table_spark_2(last_update STRING, col_a STRING)
> PARTITIONED BY (par_dt STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> WITH SERDEPROPERTIES (
> 'field.delim' = ',',
> 'serialization.format' = ','
> )
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> ;
Error in query:
Operation not allowed: CREATE EXTERNAL TABLE must be accompanied by LOCATION(line 1, pos 0)

== SQL ==
create external table if not exists external_table_spark_2(last_update STRING, col_a STRING)
^^^
PARTITIONED BY (par_dt STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim' = ',',
'serialization.format' = ','
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

Modified state:

spark-sql> CREATE EXTERNAL TABLE external_table_spark_10(last_update STRING, col_a STRING)
> PARTITIONED BY (par_dt STRING)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> WITH SERDEPROPERTIES (
> 'field.delim' = ',',
> 'serialization.format' = ','
> )
> STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
20/11/02 11:26:59 DEBUG log: DDL: struct external_table_spark_10 { string last_update, string col_a}
Response code
Time taken: 1.024 seconds
spark-sql>

After modification, the table creation statement will add the table creation path by default.

spark-sql> show create table external_table_spark_10;
createtab_stmt
CREATE EXTERNAL TABLE external_table_spark_10(last_update STRING, col_a STRING)
PARTITIONED BY (par_dt STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
'field.delim' = ',',
'serialization.format' = ','
)
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION 'hdfs://xxxx/xxxx/xxxx/xxxx/xxxx/xxxx/external_table_spark_10'
TBLPROPERTIES (
'transient_lastDdlTime' = '1604287619',
'PART_LIMIT' = '10000',
'LEVEL' = '0',
'TTL' = '60'
)

If you don't want this modification to take effect, you can set spark.sql.create.external.statement.location to true.

spark.sql.create.external.statement.location=true

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

@guixiaowen, new features should target master branch. Also, I believe this syntax is from Hive. In this case, I don't think it's worthwhile fixing it.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 22, 2021
@github-actions github-actions bot closed this Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants