Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default #34429

Closed
wants to merge 5 commits into from

Conversation

imback82
Copy link
Contributor

@imback82 imback82 commented Oct 29, 2021

What changes were proposed in this pull request?

This PR proposes to use V2 commands as default as outlined in SPARK-36588, and this PR migrates DESCRIBE NAMESPACE to use v2 command by default.

Why are the changes needed?

It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.

Does this PR introduce any user-facing change?

For non-session catalogs (v2 command), the followings are changing:

  1. Properties row will be present with empty value even if there is no property (same behavior as v1).
  2. There will be an empty space between each property pair: e.g. ((a,b), (c,d)) (new) vs. ((a,b),(c,d)) (old). This is also the same behavior as v1.

For the session catalog, now that v2 command will be used, we will use Namespace Name instead of Database Name for the name row. The user can fall back to using v1 command by setting spark.sql.legacy.useV1Command to true.

How was this patch tested?

Added unit tests.

@github-actions github-actions bot added the SQL label Oct 29, 2021
@imback82 imback82 changed the title [WIP][SPARK-37150][SQL] Migrate SHOW TABLES to use V2 command by default [WIP][SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default Oct 29, 2021
@imback82 imback82 marked this pull request as draft October 29, 2021 01:38
} else {
properties.toSeq.mkString("(", ", ", ")")
}
rows += toCatalystRow("Properties", propertiesStr)
Copy link
Contributor Author

@imback82 imback82 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan These are the differences between v1 and v2 command:

  1. For extended mode, v1 command prints the "Properties" row with an empty string whereas v2 command doesn't print the row if there are no properties.
  2. v1 command puts extra space after , among properties: e.g., ((a,b), (c,d)) in v1 vs. ((a,b),(c,d)) in v2.
  3. v1 command uses Database Name vs. Namespace Name in v2.

1) and 2) are easy to resolve (e.g., just follow v1 behavior), but I wasn't sure the best way to address 3). Do we need to unify this property name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to make this behavior change (database -> namespace). People can still fallback to v1 command if needed.

The tests need to be a bit more complicated though, to allow different results between v1 and v2 commands.

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49198/

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49198/

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Test build #144729 has finished for PR 34429 at commit 8fa5201.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82 imback82 changed the title [WIP][SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default [SPARK-37150][SQL] Migrate DESCRIBE NAMESPACE to use V2 command by default Oct 29, 2021
@imback82 imback82 marked this pull request as ready for review October 29, 2021 17:56
@@ -47,4 +48,21 @@ trait DescribeNamespaceSuiteBase extends QueryTest with DDLCommandTestUtils {
// TODO: Move this to DropNamespaceSuite when the test suite is introduced.
sql(s"DROP NAMESPACE IF EXISTS $catalog.$ns")
}

test("Keep the legacy output schema") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now run for both v1/v2 catalogs.

@@ -43,35 +43,21 @@ trait DescribeNamespaceSuiteBase extends command.DescribeNamespaceSuiteBase {
.where("key not like 'Owner%'") // filter for consistency with in-memory catalog
.collect()

val namePrefix = if (conf.useV1Command) "Database" else "Namespace"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be the only difference b/w v1 and v2 command.

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49235/

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Test build #144766 has finished for PR 34429 at commit 628bf4c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49235/

val properties = metadata.asScala -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES
if (properties.nonEmpty) {
rows += toCatalystRow("Properties", properties.toSeq.mkString("(", ",", ")"))
val properties = metadata.asScala.toMap -- CatalogV2Util.NAMESPACE_RESERVED_PROPERTIES
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding toMap similar to

.

The Scala Map implementation uses a list underneath up to 4 elements and maintains it in the order it was inserted. So this make tests such as

Row("Properties", "((a,a), (b,b), (c,c), (d,d))") :: Nil)
pass.

Maybe, just make the tests more robust?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed the command to sort the properties by keys (similar to show tblproperties).

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49237/

@SparkQA
Copy link

SparkQA commented Oct 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49237/

@SparkQA
Copy link

SparkQA commented Oct 30, 2021

Test build #144768 has finished for PR 34429 at commit 988187c.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49258/

@SparkQA
Copy link

SparkQA commented Oct 31, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49258/

@SparkQA
Copy link

SparkQA commented Oct 31, 2021

Test build #144788 has finished for PR 34429 at commit 8da0022.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 13c372d Nov 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants