-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-55250][SQL] Reduce Hive client calls on CREATE NAMESPACE #54026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
JIRA Issue Information=== Improvement SPARK-55250 === This comment was automatically generated by GitHub Actions |
| import org.apache.spark.sql.connector.catalog.SupportsNamespaces._ | ||
|
|
||
| val ns = namespace.toArray | ||
| if (!catalog.namespaceExists(ns)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this check because createNamespace will throw NamespaceAlreadyExistsException if the namespace already exists, by contract.
spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java
Lines 121 to 131 in 9ce067c
| /** | |
| * Create a namespace in the catalog. | |
| * | |
| * @param namespace a multi-part namespace | |
| * @param metadata a string map of properties for the given namespace | |
| * @throws NamespaceAlreadyExistsException If the namespace already exists | |
| * @throws UnsupportedOperationException If create is not a supported operation | |
| */ | |
| void createNamespace( | |
| String[] namespace, | |
| Map<String, String> metadata) throws NamespaceAlreadyExistsException; |
this also makes it atomic, previously, there are chance that the namespace is created by another request between catalog.namespaceExists and catalog.createNamespace, we should delegate it to the Connector to handle that.
| val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName()) | ||
| catalog.createNamespace(ns, (properties ++ ownership).asJava) | ||
| } catch { | ||
| case _: NamespaceAlreadyExistsException if ifNotExists => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pint out where Spark throws NamespaceAlreadyExistsException? In HiveExternalCatalog or HiveClient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happens here
spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
Lines 345 to 355 in 589fedc
| override def createDatabase( | |
| database: CatalogDatabase, | |
| ignoreIfExists: Boolean): Unit = withHiveState { | |
| val hiveDb = toHiveDatabase(database, Some(userName)) | |
| try { | |
| shim.createDatabase(client, hiveDb, ignoreIfExists) | |
| } catch { | |
| case _: AlreadyExistsException => | |
| throw new DatabaseAlreadyExistsException(database.name) | |
| } | |
| } |
spark/sql/api/src/main/scala/org/apache/spark/sql/catalyst/analysis/alreadyExistException.scala
Lines 31 to 32 in 589fedc
| class DatabaseAlreadyExistsException(db: String) | |
| extends NamespaceAlreadyExistsException(Array(db)) |
What changes were proposed in this pull request?
This PR reduces Hive client calls by eliminating unnecessary
catalog.databaseExistsinCreateNamespaceExec. Now the Hive client calls ofCREATE NAMESPACE [IF NOT EXISTS] foo.bardecreased from 3 to 1.Why are the changes needed?
Improve perf by reducing RPC.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT is added.
Was this patch authored or co-authored using generative AI tooling?
No.