New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429
Conversation
} else { | ||
properties.toSeq.mkString("(", ", ", ")") | ||
} | ||
rows += toCatalystRow("Properties", propertiesStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan These are the differences between v1 and v2 command:
- For extended mode, v1 command prints the "Properties" row with an empty string whereas v2 command doesn't print the row if there are no properties.
- v1 command puts extra space after
,
among properties: e.g.,((a,b), (c,d))
in v1 vs.((a,b),(c,d))
in v2. - v1 command uses
Database Name
vs.Namespace Name
in v2.
1)
and 2)
are easy to resolve (e.g., just follow v1 behavior), but I wasn't sure the best way to address 3)
. Do we need to unify this property name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's OK to make this behavior change (database -> namespace). People can still fallback to v1 command if needed.
The tests need to be a bit more complicated though, to allow different results between v1 and v2 commands.
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
Outdated
Show resolved
Hide resolved
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144729 has finished for PR 34429 at commit
|
@@ -47,4 +48,21 @@ trait DescribeNamespaceSuiteBase extends QueryTest with DDLCommandTestUtils { | |||
// TODO: Move this to DropNamespaceSuite when the test suite is introduced. | |||
sql(s"DROP NAMESPACE IF EXISTS $catalog.$ns") | |||
} | |||
|
|||
test("Keep the legacy output schema") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now run for both v1/v2 catalogs.
@@ -43,35 +43,21 @@ trait DescribeNamespaceSuiteBase extends command.DescribeNamespaceSuiteBase { | |||
.where("key not like 'Owner%'") // filter for consistency with in-memory catalog | |||
.collect() | |||
|
|||
val namePrefix = if (conf.useV1Command) "Database" else "Namespace" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be the only difference b/w v1 and v2 command.
Kubernetes integration test starting |
Test build #144766 has finished for PR 34429 at commit
|
Kubernetes integration test status failure |
val properties = metadata.asScala -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES | ||
if (properties.nonEmpty) { | ||
rows += toCatalystRow("Properties", properties.toSeq.mkString("(", ",", ")")) | ||
val properties = metadata.asScala.toMap -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding toMap
similar to
spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala
Line 321 in 8238cdd
properties = metadata.asScala.toMap -- |
The Scala Map implementation uses a list underneath up to 4 elements and maintains it in the order it was inserted. So this make tests such as
spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
Line 778 in 8238cdd
Row("Properties", "((a,a), (b,b), (c,c), (d,d))") :: Nil) |
Maybe, just make the tests more robust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed the command to sort the properties by keys (similar to show tblproperties
).
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144768 has finished for PR 34429 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #144788 has finished for PR 34429 at commit
|
thanks, merging to master! |
What changes were proposed in this pull request?
This PR proposes to use V2 commands as default as outlined in SPARK-36588, and this PR migrates
DESCRIBE NAMESPACE
to use v2 command by default.Why are the changes needed?
It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.
Does this PR introduce any user-facing change?
For non-session catalogs (v2 command), the followings are changing:
Properties
row will be present with empty value even if there is no property (same behavior as v1).((a,b), (c,d))
(new) vs.((a,b),(c,d))
(old). This is also the same behavior as v1.For the session catalog, now that v2 command will be used, we will use
Namespace Name
instead ofDatabase Name
for the name row. The user can fall back to using v1 command by settingspark.sql.legacy.useV1Command
totrue
.How was this patch tested?
Added unit tests.