Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided #27642

Closed

Conversation

imback82
Copy link
Contributor

What changes were proposed in this pull request?

For the following:

CREATE TABLE t USING json AS SELECT 1 AS i
SELECT * FROM spark_catalog.t

spark_catalog.t is resolved to spark_catalog.default.t assuming the current namespace is default. However, this is not consistent with V2 behavior where the namespace must be specified if the catalog name is provided. This PR proposes to fix this inconsistency.

Why are the changes needed?

To be consistent with V2 table naming scheme in SQL commands.

Does this PR introduce any user-facing change?

Yes, now the user has to specify the namespace if the catalog name is provided. For example,

SELECT * FROM spark_catalog.t # Will throw AnalysisException with 'Session catalog cannot have an empty namespace: spark_catalog.t'
SELECT * FROM spark_catalog.default.t # OK

How was this patch tested?

Added new tests

}
}
if (CatalogV2Util.isSessionCatalog(catalog) && ident.namespace.isEmpty) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan For a session catalog, I could make this assumption that the namespace is required, right? I looked at CatalogManager that uses v1SessionCatalog for setting current namespace if the current catalog is a session catalog; and v1SessionCatalog requires the namespace (database) to already exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we do this check inside SessionCatalogAndIdentifier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or inside ResolveSessionCatalog

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CatalogAndIdentifier should just focus on extracting catalog and identifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I put here is that CatalogAndIdentifier is also used here:

private val (catalog, identifier) = {
val CatalogAndIdentifier(catalog, identifier) = tableName
(catalog.asTableCatalog, identifier)
}
, which means I have to put the same check everywhere CatalogAndIdentifier is used?

Copy link
Contributor

@cloud-fan cloud-fan Feb 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think V2SessionCatalog should stop filling the default database. It should assume the input identifier is the final identifier like other catalogs, and fail if identifier doesn't have database part. Then we don't need to do the check here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let me explore this route. Thanks!

@SparkQA
Copy link

SparkQA commented Feb 20, 2020

Test build #118701 has finished for PR 27642 at commit 018e6f6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82 imback82 changed the title [SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided [WIP][SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided Feb 21, 2020
@imback82 imback82 changed the title [WIP][SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided [SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided Feb 21, 2020
@@ -396,7 +397,7 @@ class ResolveSessionCatalog(
}

case AnalyzeColumnStatement(tbl, columnNames, allColumns) =>
val v1TableName = parseV1Table(tbl, "ANALYZE TABLE")
val v1TableName = parseTempViewOrV1Table(tbl, "ANALYZE TABLE")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnalyzeColumnCommand actually supports temp view.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check if I understand this correctly. This reason why we need to make this change is that parseV1Table would accidentally add currentName as the namespace for a temp view which is incorrect because temp view shouldn't have a namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and this command supports temp views.

@@ -415,6 +416,10 @@ class ResolveSessionCatalog(
partition)

case ShowCreateTableStatement(tbl, asSerde) if !asSerde =>
if (isTempView(tbl)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is moved from ShowCreateTableCommand

@@ -1085,47 +1085,42 @@ case class ShowCreateTableCommand(table: TableIdentifier)

override def run(sparkSession: SparkSession): Seq[Row] = {
val catalog = sparkSession.sessionState.catalog
if (catalog.isTemporaryTable(table)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is only removing this if block, but diff is bad.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we keep the if? it doesn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

@@ -257,9 +257,10 @@ class ResolveSessionCatalog(
case v1Table: V1Table =>
DescribeColumnCommand(tbl.asTableIdentifier, colNameParts, isExtended)
}.getOrElse {
if (isTempView(tbl)) {
Copy link
Contributor

@cloud-fan cloud-fan Feb 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we follow how we deal with UncacheTableStatement? Basically just call parseTempViewOrV1Table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@@ -659,14 +669,7 @@ class ResolveSessionCatalog(
object SessionCatalogAndTable {
def unapply(nameParts: Seq[String]): Option[(CatalogPlugin, Seq[String])] = nameParts match {
case SessionCatalogAndIdentifier(catalog, ident) =>
if (nameParts.length == 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are going to remove the hack in TempViewOrV1Table in a followup PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will do it as a follow up. (I couldn't just remove the check because of SPARK-30799: temp view name can't contain catalog name, but I will think about it as a follow up).

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118751 has finished for PR 27642 at commit fea7909.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118747 has finished for PR 27642 at commit 29c88f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2020

Test build #118752 has finished for PR 27642 at commit 95d2297.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yuchenhuo yuchenhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so familiar with this, but in general looks great! Left some questions. Thanks for doing this!

@@ -396,7 +397,7 @@ class ResolveSessionCatalog(
}

case AnalyzeColumnStatement(tbl, columnNames, allColumns) =>
val v1TableName = parseV1Table(tbl, "ANALYZE TABLE")
val v1TableName = parseTempViewOrV1Table(tbl, "ANALYZE TABLE")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check if I understand this correctly. This reason why we need to make this change is that parseV1Table would accidentally add currentName as the namespace for a temp view which is incorrect because temp view shouldn't have a namespace?

@@ -396,7 +397,7 @@ class ResolveSessionCatalog(
}

case AnalyzeColumnStatement(tbl, columnNames, allColumns) =>
val v1TableName = parseV1Table(tbl, "ANALYZE TABLE")
val v1TableName = parseTempViewOrV1Table(tbl, "ANALYZE TABLE")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and this command supports temp views.

@@ -1085,47 +1085,42 @@ case class ShowCreateTableCommand(table: TableIdentifier)

override def run(sparkSession: SparkSession): Seq[Row] = {
val catalog = sparkSession.sessionState.catalog
if (catalog.isTemporaryTable(table)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted.

@@ -257,9 +257,10 @@ class ResolveSessionCatalog(
case v1Table: V1Table =>
DescribeColumnCommand(tbl.asTableIdentifier, colNameParts, isExtended)
}.getOrElse {
if (isTempView(tbl)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@@ -659,14 +669,7 @@ class ResolveSessionCatalog(
object SessionCatalogAndTable {
def unapply(nameParts: Seq[String]): Option[(CatalogPlugin, Seq[String])] = nameParts match {
case SessionCatalogAndIdentifier(catalog, ident) =>
if (nameParts.length == 1) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I will do it as a follow up. (I couldn't just remove the check because of SPARK-30799: temp view name can't contain catalog name, but I will think about it as a follow up).

// make sure table doesn't exist
var e = intercept[AnalysisException](spark.table("nonexistentTable")).getMessage
assert(e.contains(expectedErrorMsg))
assert(e.contains(s"$expectedErrorMsg nonexistentTable"))
Copy link
Contributor Author

@imback82 imback82 Feb 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spark.table just creates UnresolvedRelation(multipartIdentifier), so it doesn't add current namespace.

@SparkQA
Copy link

SparkQA commented Feb 23, 2020

Test build #118825 has finished for PR 27642 at commit 9a4671c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2020

Test build #118826 has finished for PR 27642 at commit 4e80fd2.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2020

Test build #118827 has finished for PR 27642 at commit 1cf8907.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2020

Test build #118835 has finished for PR 27642 at commit ef83ba1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 23, 2020

Test build #118836 has finished for PR 27642 at commit bcd9a65.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -44,7 +44,7 @@ SHOW CREATE TABLE tbl
-- !query schema
struct<createtab_stmt:string>
-- !query output
CREATE TABLE `tbl` (
CREATE TABLE `default`.`tbl` (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR but a future improvement to v2 commands: since we resolve the catalog and tables during the analysis phase, it would be better to display the fully qualified table name(include catalog name) in EXPLAIN, to let users know which table exactly was picked by the command.

@@ -43,7 +44,7 @@ private[connector] trait TestV2SessionCatalogBase[T <: Table] extends Delegating

protected def fullIdentifier(ident: Identifier): Identifier = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this method completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Removed.

val ident = if (v2Catalog.name == SESSION_CATALOG_NAME) {
Identifier.of(nameParts.init.toArray, nameParts.last)
} else {
Identifier.of(Array.empty, nameParts.last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we write Identifier.of(nameParts.init.toArray, nameParts.last) for the else branch as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Updated.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118848 has finished for PR 27642 at commit b43890a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118858 has finished for PR 27642 at commit b43890a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -415,6 +404,10 @@ class ResolveSessionCatalog(
partition)

case ShowCreateTableStatement(tbl, asSerde) if !asSerde =>
if (isTempView(tbl)) {
throw new AnalysisException(
Copy link
Contributor

@cloud-fan cloud-fan Feb 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, can we just call parseTempViewOrV1Table here?

e.g.

val name = parseTempViewOrV1Table...
ShowCreateTableCommand(name.asTableIdentifier, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

That was my original implementation, but I did it this way because you will get SHOW CREATE TABLE is only supported with temp views or v1 tables then if you pass temp view, you get SHOW CREATE TABLE is not supported on a temporary view.

But I will just call parseTempVieworV1Table here. Thanks!


class DataSourceV2SQLSessionCatalogSuite
extends InsertIntoTests(supportsDynamicOverwrite = true, includeSQLOnlyTests = true)
with AlterTableTests
with SessionCatalogTest[InMemoryTable, InMemoryTableSessionCatalog] {

override protected val catalogAndNamespace = ""
override protected val catalogAndNamespace = "default."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking about this more. can we avoid changing it by updating the test cases that check error message?

I've already seen similar code, e.g. in InsertIntoTests

val tableName = if (catalogAndNamespace.isEmpty) s"default.$t1" else t1
assert(exc.getMessage.contains(s"Cannot write to '$tableName', too many data columns"))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea makes sense. Thanks for pointing this out.

@SparkQA
Copy link

SparkQA commented Feb 24, 2020

Test build #118881 has finished for PR 27642 at commit eb0ebf1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@yuchenhuo yuchenhuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM+1! Thanks for doing this!

@cloud-fan cloud-fan closed this in 0fd4fa7 Feb 25, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

cloud-fan pushed a commit that referenced this pull request Feb 25, 2020
… name is provided

For the following:
```
CREATE TABLE t USING json AS SELECT 1 AS i
SELECT * FROM spark_catalog.t
```
`spark_catalog.t` is resolved to `spark_catalog.default.t` assuming the current namespace is `default`. However, this is not consistent with V2 behavior where the namespace must be specified if the catalog name is provided. This PR proposes to fix this inconsistency.

To be consistent with V2 table naming scheme in SQL commands.

Yes, now the user has to specify the namespace if the catalog name is provided. For example,
```
SELECT * FROM spark_catalog.t # Will throw AnalysisException with 'Session catalog cannot have an empty namespace: spark_catalog.t'
SELECT * FROM spark_catalog.default.t # OK
```

Added new tests

Closes #27642 from imback82/disallow_spark_catalog_wihtout_db.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 0fd4fa7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Mar 4, 2020
… tables that are not fully qualified

### What changes were proposed in this pull request?

There are few V1 commands such as `REFRESH TABLE` that still allow `spark_catalog.t` because they run the commands with parsed table names without trying to load them in the catalog. This PR addresses this issue.

The PR also addresses the issue brought up in #27642 (comment).

### Why are the changes needed?

To fix a bug where for some V1 commands, `spark_catalog.t` is allowed.

### Does this PR introduce any user-facing change?

Yes, a bug is fixed and `REFRESH TABLE spark_catalog.t` is not allowed.

### How was this patch tested?

Added new test.

Closes #27718 from imback82/fix_TempViewOrV1Table.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Mar 4, 2020
… tables that are not fully qualified

### What changes were proposed in this pull request?

There are few V1 commands such as `REFRESH TABLE` that still allow `spark_catalog.t` because they run the commands with parsed table names without trying to load them in the catalog. This PR addresses this issue.

The PR also addresses the issue brought up in #27642 (comment).

### Why are the changes needed?

To fix a bug where for some V1 commands, `spark_catalog.t` is allowed.

### Does this PR introduce any user-facing change?

Yes, a bug is fixed and `REFRESH TABLE spark_catalog.t` is not allowed.

### How was this patch tested?

Added new test.

Closes #27718 from imback82/fix_TempViewOrV1Table.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit b302781)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
… name is provided

### What changes were proposed in this pull request?

For the following:
```
CREATE TABLE t USING json AS SELECT 1 AS i
SELECT * FROM spark_catalog.t
```
`spark_catalog.t` is resolved to `spark_catalog.default.t` assuming the current namespace is `default`. However, this is not consistent with V2 behavior where the namespace must be specified if the catalog name is provided. This PR proposes to fix this inconsistency.

### Why are the changes needed?

To be consistent with V2 table naming scheme in SQL commands.

### Does this PR introduce any user-facing change?

Yes, now the user has to specify the namespace if the catalog name is provided. For example,
```
SELECT * FROM spark_catalog.t # Will throw AnalysisException with 'Session catalog cannot have an empty namespace: spark_catalog.t'
SELECT * FROM spark_catalog.default.t # OK
```

### How was this patch tested?

Added new tests

Closes apache#27642 from imback82/disallow_spark_catalog_wihtout_db.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
… tables that are not fully qualified

### What changes were proposed in this pull request?

There are few V1 commands such as `REFRESH TABLE` that still allow `spark_catalog.t` because they run the commands with parsed table names without trying to load them in the catalog. This PR addresses this issue.

The PR also addresses the issue brought up in apache#27642 (comment).

### Why are the changes needed?

To fix a bug where for some V1 commands, `spark_catalog.t` is allowed.

### Does this PR introduce any user-facing change?

Yes, a bug is fixed and `REFRESH TABLE spark_catalog.t` is not allowed.

### How was this patch tested?

Added new test.

Closes apache#27718 from imback82/fix_TempViewOrV1Table.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants