Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command #34686

Closed
wants to merge 3 commits into from

Conversation

imback82
Copy link
Contributor

What changes were proposed in this pull request?

Currently, there is an inconsistency when handling an empty location for ALTER NAMESPACE .. SET LOCATION between v1 and v2 command. In v1 command, an empty string location will result in the IllegalArgumentException exception thrown whereas v2 uses the empty string as it is.

This PR proposes to make the behavior consistent by following the v1 command behavior.

Why are the changes needed?

To make the behavior consistent and the reason for following v1 behavior is that "Spark should be responsible to qualify the user-specified path using its spark/hadoop configs, before passing the path to v2 sources": #34610 (comment)

Does this PR introduce any user-facing change?

Yes, now the empty string location will result in the IllegalArgumentException exception thrown even for v2 catalogs.

How was this patch tested?

Added a new test

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49997/

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49997/

@imback82
Copy link
Contributor Author

cc: @cloud-fan (GA test failure is from org.apache.spark.sql.jdbc.DB2KrbIntegrationSuite)

@SparkQA
Copy link

SparkQA commented Nov 23, 2021

Test build #145525 has finished for PR 34686 at commit 28ce116.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterDatabaseSetLocationCommand(databaseName: String, location: URI)

@@ -349,7 +351,7 @@ case class SetNamespaceProperties(
*/
case class SetNamespaceLocation(
namespace: LogicalPlan,
location: String) extends UnaryCommand {
location: URI) extends UnaryCommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit weird to have URI in the logical plan, can we keep it as String type and just put the qualified path string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and can you refer to the code for qualifying the path string in v1 commands? Let's make sure v1 and v2 are truly consistent.

Copy link
Contributor Author

@imback82 imback82 Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. This is where qualified path is generated for v1:

name = dbName, locationUri = makeQualifiedDBPath(dbDefinition.locationUri)))

val fs = hadoopPath.getFileSystem(hadoopConf)
fs.makeQualified(hadoopPath).toUri
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are copied from SessionCatalog.scala

@@ -311,10 +313,12 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat
AlterNamespaceSetPropertiesExec(catalog.asNamespaceCatalog, ns, properties) :: Nil

case SetNamespaceLocation(ResolvedNamespace(catalog, ns), location) =>
val nsPath = CatalogUtils.makeQualifiedNamespacePath(
CatalogUtils.stringToURI(location), conf.warehousePath, session.sharedState.hadoopConf)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will now qualify the namespace location for v2 command.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session.conf will contain the qualified warehouse path, but using SQLConf is fine since it will be qualified anyway (same as v1):

SharedState.setWarehousePathConf(confClone, hadoopConfClone, qualified)

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50037/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50037/

@@ -44,7 +45,8 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap
import org.apache.spark.storage.StorageLevel
import org.apache.spark.unsafe.types.UTF8String

class DataSourceV2Strategy(session: SparkSession) extends Strategy with PredicateHelper {
class DataSourceV2Strategy(session: SparkSession) extends Strategy
with PredicateHelper with SQLConfHelper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need SQLConfHelper. We can get the conf by session.sessionState.conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

val ns = "testcat.ns1.ns2"
withNamespace(ns) {
sql(s"CREATE NAMESPACE IF NOT EXISTS $ns COMMENT " +
"'test namespace' LOCATION '/tmp/ns_test_1'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we do the same thing for CREATE NAMESPACE? We can do it in a separated PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we should fix this. I will do it in a separate PR. Thanks!

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50042/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50042/

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145565 has finished for PR 34686 at commit 8c0a8c9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AlterDatabaseSetLocationCommand(databaseName: String, location: String)
  • class DataSourceV2Strategy(session: SparkSession) extends Strategy

@SparkQA
Copy link

SparkQA commented Nov 24, 2021

Test build #145570 has finished for PR 34686 at commit 068473f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class DataSourceV2Strategy(session: SparkSession) extends Strategy with PredicateHelper

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 24561ca Nov 24, 2021
cloud-fan pushed a commit that referenced this pull request Nov 25, 2021
… command by default

### What changes were proposed in this pull request?

This PR proposes to use V2 commands as default as outlined in [SPARK-36588](https://issues.apache.org/jira/browse/SPARK-36588), and this PR migrates `ALTER NAMESPACE ... SET LOCATION` to use v2 command by default.

Note that the work to make v1/v2 commands consistent was done in #34686 and tests covering both v1/v2 were done in #34610.

### Why are the changes needed?

It's been a while since we introduced the v2 commands,  and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing *AlterNamespaceSetLocationSuite tests cover this PR's change.

Closes #34708 from imback82/v2_alter_ns_location.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants