Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19724][SQL]create a managed table with an existed default table should throw an exception #17272

Closed
wants to merge 7 commits into from

Conversation

windpiger
Copy link
Contributor

What changes were proposed in this pull request?

This JIRA is a follow up work after SPARK-19583

As we discussed in that PR

The following DDL for a managed table with an existed default location should throw an exception:

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
CREATE TABLE ... (PARTITIONED BY ...)

Currently there are some situations which are not consist with above logic:

  1. CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default location
    situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog)

  2. CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
    situation: hive table succeed with an existed default location

This PR is going to make above two situations consist with the logic that it should throw an exception
with an existed default location.

How was this patch tested?

unit test added

@@ -187,38 +187,32 @@ class InMemoryCatalog(
val db = tableDefinition.identifier.database.get
requireDbExists(db)
val table = tableDefinition.identifier.table
if (tableExists(db, table)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have check if the table exists in SessionCatalog.createtable

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74436 has finished for PR 17272 at commit a0ba419.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -164,15 +164,6 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac
assert(actual.tableType === CatalogTableType.EXTERNAL)
}

test("create table when the table already exists") {
Copy link
Contributor Author

@windpiger windpiger Mar 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since in InMemoryCatalog and HiveExternalCatalog does not check if the table exists, here move the test case to SessionCatalogSuite(using InMemoryCatalog) and HiveExternalCatalogSuite

@@ -155,6 +154,8 @@ case class CreateDataSourceTableAsSelectCommand(
} else {
table.storage.locationUri
}

sparkSession.sessionState.catalog.checkTableOrPathExists(table, ignoreIfExists = false)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because in following saveDataIntoTable will create the table path and store data in it, we should not check the path in following createTable, we check it here before saveDataIntoTable

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74437 has finished for PR 17272 at commit 65d7ea9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74441 has finished for PR 17272 at commit 0e753cc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74445 has finished for PR 17272 at commit c8d9b77.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74468 has finished for PR 17272 at commit c8d9b77.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74472 has finished for PR 17272 at commit cd4a091.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74493 has started for PR 17272 at commit 2ac70b4.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74499 has finished for PR 17272 at commit 2ac70b4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1892,7 +1892,7 @@ def test_list_tables(self):
self.assertEquals(spark.catalog.listTables(), [])
self.assertEquals(spark.catalog.listTables("some_db"), [])
spark.createDataFrame([(1, 1)]).createOrReplaceTempView("temp_tab")
spark.sql("CREATE TABLE tab1 (name STRING, age INT) USING parquet")
spark.sql("CREATE TABLE tbl1 (name STRING, age INT) USING parquet")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python test failed

the location('file:/home/jenkins/workspace/SparkPullRequestBuilder/spark-warehouse/tab1') of table('`default`.`tab1`') already exists.;

the location of tab1 does not deleted , I have tried to found out in which test case it forget to delete it(search all test cases containing tab1 and run it), but they are all ok to delete the location of tab1, so here we change the table name to work around it.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74516 has finished for PR 17272 at commit 516c4e4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@windpiger
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74524 has finished for PR 17272 at commit 516c4e4.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74535 has finished for PR 17272 at commit 739f207.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 15, 2017

Test build #74604 has finished for PR 17272 at commit 4351cd7.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@windpiger is this still active?

@gatorsmile
Copy link
Member

cc @ltnwgl Could you please take this over? Thanks!

@gatorsmile
Copy link
Member

cc @dongjoon-hyun Are you interested in this PR? Just take it over?

@gengliangwang
Copy link
Member

@gatorsmile I will take it over :)

asfgit pushed a commit that referenced this pull request Apr 6, 2018
…le should throw an exception

## What changes were proposed in this pull request?
This PR is to finish #17272

This JIRA is a follow up work after SPARK-19583

As we discussed in that PR

The following DDL for a managed table with an existed default location should throw an exception:

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
CREATE TABLE ... (PARTITIONED BY ...)
Currently there are some situations which are not consist with above logic:

CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default location
situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog)

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
situation: hive table succeed with an existed default location

This PR is going to make above two situations consist with the logic that it should throw an exception
with an existed default location.
## How was this patch tested?

unit test added

Author: Gengliang Wang <gengliang.wang@databricks.com>

Closes #20886 from gengliangwang/pr-17272.
robert3005 pushed a commit to palantir/spark that referenced this pull request Apr 7, 2018
…le should throw an exception

## What changes were proposed in this pull request?
This PR is to finish apache#17272

This JIRA is a follow up work after SPARK-19583

As we discussed in that PR

The following DDL for a managed table with an existed default location should throw an exception:

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
CREATE TABLE ... (PARTITIONED BY ...)
Currently there are some situations which are not consist with above logic:

CREATE TABLE ... (PARTITIONED BY ...) succeed with an existed default location
situation: for both hive/datasource(with HiveExternalCatalog/InMemoryCatalog)

CREATE TABLE ... (PARTITIONED BY ...) AS SELECT ...
situation: hive table succeed with an existed default location

This PR is going to make above two situations consist with the logic that it should throw an exception
with an existed default location.
## How was this patch tested?

unit test added

Author: Gengliang Wang <gengliang.wang@databricks.com>

Closes apache#20886 from gengliangwang/pr-17272.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants