Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30679][SQL] REPLACE TABLE can omit the USING clause #27402

Closed
wants to merge 2 commits into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This PR updates the parser rule to allow REPLACE TABLE statement to omit the USING clause.

This is extracted from #27345

Why are the changes needed?

CREATE TABLE can omit the USING clause, and REPLACE TABLE should be consistent.

Does this PR introduce any user-facing change?

yes, now it's easier to write REPLACE TABLE command.

How was this patch tested?

a new test

@cloud-fan
Copy link
Contributor Author

cc @rdblue @brkyvz @gengliangwang

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117562 has finished for PR 27402 at commit ac30945.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@brkyvz
Copy link
Contributor

brkyvz commented Jan 30, 2020

I actually don't like the fact that you can be accidentally changing the format your table is stored in, but if people think that's fine, I won't oppose this change

val provider = ctx.tableProvider.multipartIdentifier.getText
val defaultProvider = conf.defaultDataSourceName
val provider =
Option(ctx.tableProvider).map(_.multipartIdentifier.getText).getOrElse(defaultProvider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default provider should not be set here. The purpose of the statement logical plans is to carry only what was expressed in SQL. Replacing with a default loses the fact that this query had no provider specified.

Instead, this needs to update the statement plan and pass provider as an option.

@rdblue
Copy link
Contributor

rdblue commented Jan 30, 2020

I'm -1 on this because it incorrectly returns a statement that fills in information not in the user's SQL query. Defaults should be set later.

@cloud-fan
Copy link
Contributor Author

cloud-fan commented Jan 31, 2020

Hi @rdblue , I just remind that, by design the default provider should be determined by the catalog implementation, not Spark.

I think you are right that the statement should hold an Option and we shouldn't set the "provider" property if USING clause is omitted when calling TableCatalog.createTable. I'll fix REPLACE TABLE here, as well as CREATE TABLE. These two are tightly coupled because of CREATE OR REPLACE TABLE.

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117619 has finished for PR 27402 at commit a91e4e5.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gengliangwang
Copy link
Member

I actually don't like the fact that you can be accidentally changing the format your table is stored in, but if people think that's fine, I won't oppose this change

I am +1 with @brkyvz

@gengliangwang
Copy link
Member

retest this please

@cloud-fan
Copy link
Contributor Author

Seems no one is against allowing to omit the USING clause, but the question is what the default value should be. I see two options:

  1. let catalog implementation to decide the default value. If the catalog doesn't implement StagingTableCatalog, then Spark will simply drop and re-create the table. It's hard for the catalog to retain the provider when createTable is called, as the table is already dropped.
  2. Spark sets the default value as the provider of the existing table. This makes sure that when the USING clause is omitted, we won't accidentally change the table format.

I'll go with option 2.

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117641 has finished for PR 27402 at commit a91e4e5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -49,26 +49,6 @@ class DDLParserSuite extends AnalysisTest {
comparePlans(parsePlan(sql), expected, checkAnalysis = false)
}

test("SPARK-30098: create table without provider should " +
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default provider shouldn't be set by the parser. The newly added test case will test the default provider.

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117651 has finished for PR 27402 at commit 1874e39.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117667 has finished for PR 27402 at commit 647851c.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

retest this please

1 similar comment
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 18, 2020

Test build #118619 has finished for PR 27402 at commit 647851c.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

It's not a breaking change to allow to omit the provider in REPLACE TABLE, so I'll leave it to 3.1.

The CREATE TABLE behavior should be fixed before 3.0: #27650

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 31, 2020
@github-actions github-actions bot closed this Jun 1, 2020
@cloud-fan cloud-fan removed the Stale label Jun 1, 2020
@cloud-fan cloud-fan reopened this Jun 1, 2020
@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123361 has finished for PR 27402 at commit 647851c.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Sep 10, 2020
@cloud-fan cloud-fan removed the Stale label Sep 10, 2020
@cloud-fan cloud-fan closed this Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants