-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided #27642
[SPARK-30885][SQL] V1 table name should be fully qualified if catalog name is provided #27642
Conversation
} | ||
} | ||
if (CatalogV2Util.isSessionCatalog(catalog) && ident.namespace.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan For a session catalog, I could make this assumption that the namespace is required, right? I looked at CatalogManager
that uses v1SessionCatalog
for setting current namespace if the current catalog is a session catalog; and v1SessionCatalog
requires the namespace (database) to already exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we do this check inside SessionCatalogAndIdentifier
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or inside ResolveSessionCatalog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CatalogAndIdentifier
should just focus on extracting catalog and identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I put here is that CatalogAndIdentifier
is also used here:
spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala
Lines 53 to 57 in 7c4ad63
private val (catalog, identifier) = { | |
val CatalogAndIdentifier(catalog, identifier) = tableName | |
(catalog.asTableCatalog, identifier) | |
} |
CatalogAndIdentifier
is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think V2SessionCatalog
should stop filling the default database. It should assume the input identifier is the final identifier like other catalogs, and fail if identifier doesn't have database part. Then we don't need to do the check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, let me explore this route. Thanks!
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala
Outdated
Show resolved
Hide resolved
Test build #118701 has finished for PR 27642 at commit
|
@@ -396,7 +397,7 @@ class ResolveSessionCatalog( | |||
} | |||
|
|||
case AnalyzeColumnStatement(tbl, columnNames, allColumns) => | |||
val v1TableName = parseV1Table(tbl, "ANALYZE TABLE") | |||
val v1TableName = parseTempViewOrV1Table(tbl, "ANALYZE TABLE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnalyzeColumnCommand
actually supports temp view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to check if I understand this correctly. This reason why we need to make this change is that parseV1Table
would accidentally add currentName as the namespace for a temp view which is incorrect because temp view shouldn't have a namespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and this command supports temp views.
@@ -415,6 +416,10 @@ class ResolveSessionCatalog( | |||
partition) | |||
|
|||
case ShowCreateTableStatement(tbl, asSerde) if !asSerde => | |||
if (isTempView(tbl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is moved from ShowCreateTableCommand
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Show resolved
Hide resolved
@@ -1085,47 +1085,42 @@ case class ShowCreateTableCommand(table: TableIdentifier) | |||
|
|||
override def run(sparkSession: SparkSession): Seq[Row] = { | |||
val catalog = sparkSession.sessionState.catalog | |||
if (catalog.isTemporaryTable(table)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is only removing this if
block, but diff is bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we keep the if
? it doesn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
Show resolved
Hide resolved
@@ -257,9 +257,10 @@ class ResolveSessionCatalog( | |||
case v1Table: V1Table => | |||
DescribeColumnCommand(tbl.asTableIdentifier, colNameParts, isExtended) | |||
}.getOrElse { | |||
if (isTempView(tbl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we follow how we deal with UncacheTableStatement
? Basically just call parseTempViewOrV1Table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
@@ -659,14 +669,7 @@ class ResolveSessionCatalog( | |||
object SessionCatalogAndTable { | |||
def unapply(nameParts: Seq[String]): Option[(CatalogPlugin, Seq[String])] = nameParts match { | |||
case SessionCatalogAndIdentifier(catalog, ident) => | |||
if (nameParts.length == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we are going to remove the hack in TempViewOrV1Table
in a followup PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will do it as a follow up. (I couldn't just remove the check because of SPARK-30799: temp view name can't contain catalog name
, but I will think about it as a follow up).
Test build #118751 has finished for PR 27642 at commit
|
Test build #118747 has finished for PR 27642 at commit
|
Test build #118752 has finished for PR 27642 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so familiar with this, but in general looks great! Left some questions. Thanks for doing this!
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Show resolved
Hide resolved
@@ -396,7 +397,7 @@ class ResolveSessionCatalog( | |||
} | |||
|
|||
case AnalyzeColumnStatement(tbl, columnNames, allColumns) => | |||
val v1TableName = parseV1Table(tbl, "ANALYZE TABLE") | |||
val v1TableName = parseTempViewOrV1Table(tbl, "ANALYZE TABLE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to check if I understand this correctly. This reason why we need to make this change is that parseV1Table
would accidentally add currentName as the namespace for a temp view which is incorrect because temp view shouldn't have a namespace?
@@ -396,7 +397,7 @@ class ResolveSessionCatalog( | |||
} | |||
|
|||
case AnalyzeColumnStatement(tbl, columnNames, allColumns) => | |||
val v1TableName = parseV1Table(tbl, "ANALYZE TABLE") | |||
val v1TableName = parseTempViewOrV1Table(tbl, "ANALYZE TABLE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and this command supports temp views.
@@ -1085,47 +1085,42 @@ case class ShowCreateTableCommand(table: TableIdentifier) | |||
|
|||
override def run(sparkSession: SparkSession): Seq[Row] = { | |||
val catalog = sparkSession.sessionState.catalog | |||
if (catalog.isTemporaryTable(table)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted.
@@ -257,9 +257,10 @@ class ResolveSessionCatalog( | |||
case v1Table: V1Table => | |||
DescribeColumnCommand(tbl.asTableIdentifier, colNameParts, isExtended) | |||
}.getOrElse { | |||
if (isTempView(tbl)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala
Show resolved
Hide resolved
@@ -659,14 +669,7 @@ class ResolveSessionCatalog( | |||
object SessionCatalogAndTable { | |||
def unapply(nameParts: Seq[String]): Option[(CatalogPlugin, Seq[String])] = nameParts match { | |||
case SessionCatalogAndIdentifier(catalog, ident) => | |||
if (nameParts.length == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I will do it as a follow up. (I couldn't just remove the check because of SPARK-30799: temp view name can't contain catalog name
, but I will think about it as a follow up).
// make sure table doesn't exist | ||
var e = intercept[AnalysisException](spark.table("nonexistentTable")).getMessage | ||
assert(e.contains(expectedErrorMsg)) | ||
assert(e.contains(s"$expectedErrorMsg nonexistentTable")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.table
just creates UnresolvedRelation(multipartIdentifier)
, so it doesn't add current namespace.
Test build #118825 has finished for PR 27642 at commit
|
Test build #118826 has finished for PR 27642 at commit
|
Test build #118827 has finished for PR 27642 at commit
|
Test build #118835 has finished for PR 27642 at commit
|
Test build #118836 has finished for PR 27642 at commit
|
@@ -44,7 +44,7 @@ SHOW CREATE TABLE tbl | |||
-- !query schema | |||
struct<createtab_stmt:string> | |||
-- !query output | |||
CREATE TABLE `tbl` ( | |||
CREATE TABLE `default`.`tbl` ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to this PR but a future improvement to v2 commands: since we resolve the catalog and tables during the analysis phase, it would be better to display the fully qualified table name(include catalog name) in EXPLAIN, to let users know which table exactly was picked by the command.
@@ -43,7 +44,7 @@ private[connector] trait TestV2SessionCatalogBase[T <: Table] extends Delegating | |||
|
|||
protected def fullIdentifier(ident: Identifier): Identifier = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this method completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed.
val ident = if (v2Catalog.name == SESSION_CATALOG_NAME) { | ||
Identifier.of(nameParts.init.toArray, nameParts.last) | ||
} else { | ||
Identifier.of(Array.empty, nameParts.last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we write Identifier.of(nameParts.init.toArray, nameParts.last)
for the else branch as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Updated.
Test build #118848 has finished for PR 27642 at commit
|
retest this please |
Test build #118858 has finished for PR 27642 at commit
|
@@ -415,6 +404,10 @@ class ResolveSessionCatalog( | |||
partition) | |||
|
|||
case ShowCreateTableStatement(tbl, asSerde) if !asSerde => | |||
if (isTempView(tbl)) { | |||
throw new AnalysisException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can we just call parseTempViewOrV1Table
here?
e.g.
val name = parseTempViewOrV1Table...
ShowCreateTableCommand(name.asTableIdentifier, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
That was my original implementation, but I did it this way because you will get SHOW CREATE TABLE is only supported with temp views or v1 tables
then if you pass temp view, you get SHOW CREATE TABLE is not supported on a temporary view
.
But I will just call parseTempVieworV1Table
here. Thanks!
|
||
class DataSourceV2SQLSessionCatalogSuite | ||
extends InsertIntoTests(supportsDynamicOverwrite = true, includeSQLOnlyTests = true) | ||
with AlterTableTests | ||
with SessionCatalogTest[InMemoryTable, InMemoryTableSessionCatalog] { | ||
|
||
override protected val catalogAndNamespace = "" | ||
override protected val catalogAndNamespace = "default." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this more. can we avoid changing it by updating the test cases that check error message?
I've already seen similar code, e.g. in InsertIntoTests
val tableName = if (catalogAndNamespace.isEmpty) s"default.$t1" else t1
assert(exc.getMessage.contains(s"Cannot write to '$tableName', too many data columns"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea makes sense. Thanks for pointing this out.
Test build #118881 has finished for PR 27642 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM+1! Thanks for doing this!
thanks, merging to master/3.0! |
… name is provided For the following: ``` CREATE TABLE t USING json AS SELECT 1 AS i SELECT * FROM spark_catalog.t ``` `spark_catalog.t` is resolved to `spark_catalog.default.t` assuming the current namespace is `default`. However, this is not consistent with V2 behavior where the namespace must be specified if the catalog name is provided. This PR proposes to fix this inconsistency. To be consistent with V2 table naming scheme in SQL commands. Yes, now the user has to specify the namespace if the catalog name is provided. For example, ``` SELECT * FROM spark_catalog.t # Will throw AnalysisException with 'Session catalog cannot have an empty namespace: spark_catalog.t' SELECT * FROM spark_catalog.default.t # OK ``` Added new tests Closes #27642 from imback82/disallow_spark_catalog_wihtout_db. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 0fd4fa7) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… tables that are not fully qualified ### What changes were proposed in this pull request? There are few V1 commands such as `REFRESH TABLE` that still allow `spark_catalog.t` because they run the commands with parsed table names without trying to load them in the catalog. This PR addresses this issue. The PR also addresses the issue brought up in #27642 (comment). ### Why are the changes needed? To fix a bug where for some V1 commands, `spark_catalog.t` is allowed. ### Does this PR introduce any user-facing change? Yes, a bug is fixed and `REFRESH TABLE spark_catalog.t` is not allowed. ### How was this patch tested? Added new test. Closes #27718 from imback82/fix_TempViewOrV1Table. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… tables that are not fully qualified ### What changes were proposed in this pull request? There are few V1 commands such as `REFRESH TABLE` that still allow `spark_catalog.t` because they run the commands with parsed table names without trying to load them in the catalog. This PR addresses this issue. The PR also addresses the issue brought up in #27642 (comment). ### Why are the changes needed? To fix a bug where for some V1 commands, `spark_catalog.t` is allowed. ### Does this PR introduce any user-facing change? Yes, a bug is fixed and `REFRESH TABLE spark_catalog.t` is not allowed. ### How was this patch tested? Added new test. Closes #27718 from imback82/fix_TempViewOrV1Table. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit b302781) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… name is provided ### What changes were proposed in this pull request? For the following: ``` CREATE TABLE t USING json AS SELECT 1 AS i SELECT * FROM spark_catalog.t ``` `spark_catalog.t` is resolved to `spark_catalog.default.t` assuming the current namespace is `default`. However, this is not consistent with V2 behavior where the namespace must be specified if the catalog name is provided. This PR proposes to fix this inconsistency. ### Why are the changes needed? To be consistent with V2 table naming scheme in SQL commands. ### Does this PR introduce any user-facing change? Yes, now the user has to specify the namespace if the catalog name is provided. For example, ``` SELECT * FROM spark_catalog.t # Will throw AnalysisException with 'Session catalog cannot have an empty namespace: spark_catalog.t' SELECT * FROM spark_catalog.default.t # OK ``` ### How was this patch tested? Added new tests Closes apache#27642 from imback82/disallow_spark_catalog_wihtout_db. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… tables that are not fully qualified ### What changes were proposed in this pull request? There are few V1 commands such as `REFRESH TABLE` that still allow `spark_catalog.t` because they run the commands with parsed table names without trying to load them in the catalog. This PR addresses this issue. The PR also addresses the issue brought up in apache#27642 (comment). ### Why are the changes needed? To fix a bug where for some V1 commands, `spark_catalog.t` is allowed. ### Does this PR introduce any user-facing change? Yes, a bug is fixed and `REFRESH TABLE spark_catalog.t` is not allowed. ### How was this patch tested? Added new test. Closes apache#27718 from imback82/fix_TempViewOrV1Table. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
For the following:
spark_catalog.t
is resolved tospark_catalog.default.t
assuming the current namespace isdefault
. However, this is not consistent with V2 behavior where the namespace must be specified if the catalog name is provided. This PR proposes to fix this inconsistency.Why are the changes needed?
To be consistent with V2 table naming scheme in SQL commands.
Does this PR introduce any user-facing change?
Yes, now the user has to specify the namespace if the catalog name is provided. For example,
How was this patch tested?
Added new tests