Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37878][SQL][FOLLOWUP] V1Table should always carry the "location" property #36498

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from V1Table if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in ShowCreateTableExec. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables.

This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue:

  1. introduce a new reserved table property is_managed_location, to indicate that the location is managed by the catalog, not user given.
  2. ShowCreateTableExec only generates the LOCATION clause if the "location" property is present and is not managed.

Why are the changes needed?

avoid a potential regression

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default.

@github-actions github-actions bot added the SQL label May 10, 2022
@cloud-fan
Copy link
Contributor Author

cc @Peng-Lei @huaxingao @MaxGekk

case _ => true
// It's safe to set whatever table comment, so we don't make it a reserved table property.
case (PROP_COMMENT, _) => true
case (k, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is duplicate with case before. Just leave code you added is simple and clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the code above to throw errors with a better error message, which gives suggestions like please use CREATE EXTERNAL TABLE

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where I got confused too. If we want to keep the code above to issue a better error message, why not adding something in parallel such as

case (PROP_IS_MANAGED_LOCATION, _) if !legacyOn =>
    throw QueryParsingErrors.cannotCleanReservedTablePropertyError(
       PROP_IS_MANAGED_LOCATION, ctx, "xxxxxx")
case (PROP_IS_MANAGED_LOCATION, _) =>
    false

Copy link
Contributor Author

@cloud-fan cloud-fan May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may add more reserved properties in the future that the suggestion can only be please remove it from the TBLPROPERTIES list, so I wrote the code this way to ease the work of adding new reserved properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks for the explanation.

* A reserved property to indicate that the table location is managed, not user-specified.
* If this property is "true", SHOW CREATE TABLE will not generate the LOCATION clause.
*/
String PROP_IS_MANAGED_LOCATION = "is_managed_location";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit :
actually, we can add a reserved property named PROP_TABLE_TYPE, because type include EXTERNAL MANAGED VIEW that also can control the different behaviors when TYPE == MANAGED. And PROP_TABLE_TYPE sounds more generic, may be we can add more type in future.

Copy link
Contributor Author

@cloud-fan cloud-fan May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we will use TableCatalog to support views. I'm adding this new property as I think this is the most precise way. People can create EXTERNAL table without location, or create MANAGED TABLE with location. What we care in SHOW CREATE TABLE is if the location is generated by the catalog or not, instead of the table type.

@cloud-fan
Copy link
Contributor Author

thanks for the review, merging to master/3.3!

@cloud-fan cloud-fan closed this in fa2bda5 May 11, 2022
cloud-fan added a commit that referenced this pull request May 11, 2022
…n" property

### What changes were proposed in this pull request?

This is a followup of #35204 . #35204 introduced a potential regression: it removes the "location" table property from `V1Table` if the table is not external. The intention was to avoid putting the LOCATION clause for managed tables in `ShowCreateTableExec`. However, if we use the v2 DESCRIBE TABLE command by default in the future, this will bring a behavior change and v2 DESCRIBE TABLE command won't print the table location for managed tables.

This PR fixes this regression by using a different idea to fix the SHOW CREATE TABLE issue:
1. introduce a new reserved table property `is_managed_location`, to indicate that the location is managed by the catalog, not user given.
2. `ShowCreateTableExec` only generates the LOCATION clause if the "location" property is present and is not managed.

### Why are the changes needed?

avoid a potential regression

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests. We can add a test when we use v2 DESCRIBE TABLE command by default.

Closes #36498 from cloud-fan/regression.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit fa2bda5)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants