[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429

imback82 · 2021-10-29T01:38:16Z

What changes were proposed in this pull request?

This PR proposes to use V2 commands as default as outlined in SPARK-36588, and this PR migrates DESCRIBE NAMESPACE to use v2 command by default.

Why are the changes needed?

It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.

Does this PR introduce any user-facing change?

For non-session catalogs (v2 command), the followings are changing:

Properties row will be present with empty value even if there is no property (same behavior as v1).
There will be an empty space between each property pair: e.g. ((a,b), (c,d)) (new) vs. ((a,b),(c,d)) (old). This is also the same behavior as v1.

For the session catalog, now that v2 command will be used, we will use Namespace Name instead of Database Name for the name row. The user can fall back to using v1 command by setting spark.sql.legacy.useV1Command to true.

How was this patch tested?

Added unit tests.

imback82 · 2021-10-29T01:46:51Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala

+        } else {
+          properties.toSeq.mkString("(", ", ", ")")
+        }
+      rows += toCatalystRow("Properties", propertiesStr)


@cloud-fan These are the differences between v1 and v2 command:

For extended mode, v1 command prints the "Properties" row with an empty string whereas v2 command doesn't print the row if there are no properties.

v1 command puts extra space after , among properties: e.g., ((a,b), (c,d)) in v1 vs. ((a,b),(c,d)) in v2.

v1 command uses Database Name vs. Namespace Name in v2.

1) and 2) are easy to resolve (e.g., just follow v1 behavior), but I wasn't sure the best way to address 3). Do we need to unify this property name?

I think it's OK to make this behavior change (database -> namespace). People can still fallback to v1 command if needed.

The tests need to be a bit more complicated though, to allow different results between v1 and v2 commands.

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala

SparkQA · 2021-10-29T02:57:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49198/

SparkQA · 2021-10-29T03:39:23Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49198/

SparkQA · 2021-10-29T04:38:41Z

Test build #144729 has finished for PR 34429 at commit 8fa5201.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

imback82 · 2021-10-29T18:05:38Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/DescribeNamespaceSuiteBase.scala

@@ -47,4 +48,21 @@ trait DescribeNamespaceSuiteBase extends QueryTest with DDLCommandTestUtils {
    // TODO: Move this to DropNamespaceSuite when the test suite is introduced.
    sql(s"DROP NAMESPACE IF EXISTS $catalog.$ns")
  }
+
+  test("Keep the legacy output schema") {


This is now run for both v1/v2 catalogs.

imback82 · 2021-10-29T18:06:09Z

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeNamespaceSuite.scala

@@ -43,35 +43,21 @@ trait DescribeNamespaceSuiteBase extends command.DescribeNamespaceSuiteBase {
        .where("key not like 'Owner%'") // filter for consistency with in-memory catalog
        .collect()

+      val namePrefix = if (conf.useV1Command) "Database" else "Namespace"


This will be the only difference b/w v1 and v2 command.

SparkQA · 2021-10-29T18:16:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49235/

SparkQA · 2021-10-29T18:41:03Z

Test build #144766 has finished for PR 34429 at commit 628bf4c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-10-29T18:55:26Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49235/

imback82 · 2021-10-29T20:22:37Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala

-      val properties = metadata.asScala -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES
-      if (properties.nonEmpty) {
-        rows += toCatalystRow("Properties", properties.toSeq.mkString("(", ",", ")"))
+      val properties = metadata.asScala.toMap -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES


Adding toMap similar to

spark/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala

Line 321 in 8238cdd

properties = metadata.asScala.toMap --

.

The Scala Map implementation uses a list underneath up to 4 elements and maintains it in the order it was inserted. So this make tests such as

spark/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala

Line 778 in 8238cdd

Row("Properties", "((a,a), (b,b), (c,c), (d,d))") :: Nil)

pass.

Maybe, just make the tests more robust?

I just changed the command to sort the properties by keys (similar to show tblproperties).

SparkQA · 2021-10-29T21:08:10Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49237/

SparkQA · 2021-10-29T22:02:00Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49237/

SparkQA · 2021-10-30T04:37:40Z

Test build #144768 has finished for PR 34429 at commit 988187c.

This patch fails from timeout after a configured wait of 500m.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-10-31T06:06:34Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49258/

SparkQA · 2021-10-31T07:05:47Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49258/

SparkQA · 2021-10-31T10:53:07Z

Test build #144788 has finished for PR 34429 at commit 8da0022.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-11-01T05:55:06Z

thanks, merging to master!

initial commit

8fa5201

github-actions bot added the SQL label Oct 29, 2021

imback82 changed the title ~~[WIP][SPARK-37150][SQL] Migrate SHOW TABLES to use V2 command by default~~ [WIP][SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default Oct 29, 2021

imback82 marked this pull request as draft October 29, 2021 01:38

imback82 commented Oct 29, 2021

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala Outdated Show resolved Hide resolved

Address PR comments

628bf4c

imback82 changed the title ~~[WIP][SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default~~ [SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default Oct 29, 2021

imback82 marked this pull request as ready for review October 29, 2021 17:56

imback82 commented Oct 29, 2021

View reviewed changes

imback82 added 2 commits October 29, 2021 12:51

fix tests

4239111

format change

988187c

imback82 commented Oct 29, 2021

View reviewed changes

sort by keys

8da0022

cloud-fan approved these changes Nov 1, 2021

View reviewed changes

cloud-fan closed this in 13c372d Nov 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429

[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429

imback82 commented Oct 29, 2021 •

edited

imback82 Oct 29, 2021 •

edited

cloud-fan Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

imback82 Oct 29, 2021

imback82 Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

imback82 Oct 29, 2021

imback82 Oct 31, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 30, 2021

SparkQA commented Oct 31, 2021

SparkQA commented Oct 31, 2021

SparkQA commented Oct 31, 2021

cloud-fan commented Nov 1, 2021

[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429

[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429

Conversation

imback82 commented Oct 29, 2021 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

imback82 Oct 29, 2021 • edited

Choose a reason for hiding this comment

cloud-fan Oct 29, 2021

Choose a reason for hiding this comment

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

imback82 Oct 29, 2021

Choose a reason for hiding this comment

imback82 Oct 29, 2021

Choose a reason for hiding this comment

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

imback82 Oct 29, 2021

Choose a reason for hiding this comment

imback82 Oct 31, 2021

Choose a reason for hiding this comment

SparkQA commented Oct 29, 2021

SparkQA commented Oct 29, 2021

SparkQA commented Oct 30, 2021

SparkQA commented Oct 31, 2021

SparkQA commented Oct 31, 2021

SparkQA commented Oct 31, 2021

cloud-fan commented Nov 1, 2021

imback82 commented Oct 29, 2021 •

edited

imback82 Oct 29, 2021 •

edited