Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Jan 28, 2026

What changes were proposed in this pull request?

This PR reduces Hive client calls by eliminating unnecessary catalog.databaseExists in CreateNamespaceExec. Now the Hive client calls of CREATE NAMESPACE [IF NOT EXISTS] foo.bar decreased from 3 to 1.

Why are the changes needed?

Improve perf by reducing RPC.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT is added.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jan 28, 2026
@github-actions
Copy link

JIRA Issue Information

=== Improvement SPARK-55250 ===
Summary: Reduce Hive client calls on CREATE NAMESPACE
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

import org.apache.spark.sql.connector.catalog.SupportsNamespaces._

val ns = namespace.toArray
if (!catalog.namespaceExists(ns)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this check because createNamespace will throw NamespaceAlreadyExistsException if the namespace already exists, by contract.

/**
* Create a namespace in the catalog.
*
* @param namespace a multi-part namespace
* @param metadata a string map of properties for the given namespace
* @throws NamespaceAlreadyExistsException If the namespace already exists
* @throws UnsupportedOperationException If create is not a supported operation
*/
void createNamespace(
String[] namespace,
Map<String, String> metadata) throws NamespaceAlreadyExistsException;

this also makes it atomic, previously, there are chance that the namespace is created by another request between catalog.namespaceExists and catalog.createNamespace, we should delegate it to the Connector to handle that.

val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName())
catalog.createNamespace(ns, (properties ++ ownership).asJava)
} catch {
case _: NamespaceAlreadyExistsException if ifNotExists =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pint out where Spark throws NamespaceAlreadyExistsException? In HiveExternalCatalog or HiveClient?

Copy link
Member Author

@pan3793 pan3793 Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happens here

override def createDatabase(
database: CatalogDatabase,
ignoreIfExists: Boolean): Unit = withHiveState {
val hiveDb = toHiveDatabase(database, Some(userName))
try {
shim.createDatabase(client, hiveDb, ignoreIfExists)
} catch {
case _: AlreadyExistsException =>
throw new DatabaseAlreadyExistsException(database.name)
}
}

class DatabaseAlreadyExistsException(db: String)
extends NamespaceAlreadyExistsException(Array(db))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants