-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37444][SQL] ALTER NAMESPACE ... SET LOCATION should handle empty location consistently across v1 and v2 command #34686
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
cc: @cloud-fan (GA test failure is from |
Test build #145525 has finished for PR 34686 at commit
|
@@ -349,7 +351,7 @@ case class SetNamespaceProperties( | |||
*/ | |||
case class SetNamespaceLocation( | |||
namespace: LogicalPlan, | |||
location: String) extends UnaryCommand { | |||
location: URI) extends UnaryCommand { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a bit weird to have URI
in the logical plan, can we keep it as String
type and just put the qualified path string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and can you refer to the code for qualifying the path string in v1 commands? Let's make sure v1 and v2 are truly consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. This is where qualified path is generated for v1:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Line 285 in c1029dc
name = dbName, locationUri = makeQualifiedDBPath(dbDefinition.locationUri))) |
val fs = hadoopPath.getFileSystem(hadoopConf) | ||
fs.makeQualified(hadoopPath).toUri | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are copied from SessionCatalog.scala
@@ -311,10 +313,12 @@ class DataSourceV2Strategy(session: SparkSession) extends Strategy with Predicat | |||
AlterNamespaceSetPropertiesExec(catalog.asNamespaceCatalog, ns, properties) :: Nil | |||
|
|||
case SetNamespaceLocation(ResolvedNamespace(catalog, ns), location) => | |||
val nsPath = CatalogUtils.makeQualifiedNamespacePath( | |||
CatalogUtils.stringToURI(location), conf.warehousePath, session.sharedState.hadoopConf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will now qualify the namespace location for v2 command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
session.conf
will contain the qualified warehouse path, but using SQLConf is fine since it will be qualified anyway (same as v1):
SharedState.setWarehousePathConf(confClone, hadoopConfClone, qualified) |
Kubernetes integration test starting |
Kubernetes integration test status failure |
@@ -44,7 +45,8 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap | |||
import org.apache.spark.storage.StorageLevel | |||
import org.apache.spark.unsafe.types.UTF8String | |||
|
|||
class DataSourceV2Strategy(session: SparkSession) extends Strategy with PredicateHelper { | |||
class DataSourceV2Strategy(session: SparkSession) extends Strategy | |||
with PredicateHelper with SQLConfHelper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need SQLConfHelper
. We can get the conf by session.sessionState.conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
val ns = "testcat.ns1.ns2" | ||
withNamespace(ns) { | ||
sql(s"CREATE NAMESPACE IF NOT EXISTS $ns COMMENT " + | ||
"'test namespace' LOCATION '/tmp/ns_test_1'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we do the same thing for CREATE NAMESPACE? We can do it in a separated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should fix this. I will do it in a separate PR. Thanks!
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #145565 has finished for PR 34686 at commit
|
Test build #145570 has finished for PR 34686 at commit
|
thanks, merging to master! |
… command by default ### What changes were proposed in this pull request? This PR proposes to use V2 commands as default as outlined in [SPARK-36588](https://issues.apache.org/jira/browse/SPARK-36588), and this PR migrates `ALTER NAMESPACE ... SET LOCATION` to use v2 command by default. Note that the work to make v1/v2 commands consistent was done in #34686 and tests covering both v1/v2 were done in #34610. ### Why are the changes needed? It's been a while since we introduced the v2 commands, and it seems reasonable to use v2 commands by default even for the session catalog, with a legacy config to fall back to the v1 commands. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing *AlterNamespaceSetLocationSuite tests cover this PR's change. Closes #34708 from imback82/v2_alter_ns_location. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Currently, there is an inconsistency when handling an empty location for
ALTER NAMESPACE .. SET LOCATION
between v1 and v2 command. In v1 command, an empty string location will result in theIllegalArgumentException
exception thrown whereas v2 uses the empty string as it is.This PR proposes to make the behavior consistent by following the v1 command behavior.
Why are the changes needed?
To make the behavior consistent and the reason for following v1 behavior is that "Spark should be responsible to qualify the user-specified path using its spark/hadoop configs, before passing the path to v2 sources": #34610 (comment)
Does this PR introduce any user-facing change?
Yes, now the empty string location will result in the
IllegalArgumentException
exception thrown even for v2 catalogs.How was this patch tested?
Added a new test