[SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF NOT EXISTS on existing namespace#56027
Closed
cloud-fan wants to merge 4 commits into
Closed
[SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF NOT EXISTS on existing namespace#56027cloud-fan wants to merge 4 commits into
cloud-fan wants to merge 4 commits into
Conversation
…n existing namespace SPARK-55250 removed the namespaceExists short-circuit in CreateNamespaceExec. For SupportsNamespaces implementations that validate the request before checking existence, calling createNamespace on an already-existing namespace under IF NOT EXISTS can surface unrelated errors. Restore the short-circuit for the IF NOT EXISTS case only; the no-IF-NOT-EXISTS path still issues a single createNamespace call, so the hive-client-calls perf win from SPARK-55250 is preserved. Generated-by: Claude Code (Opus 4.7)
Replace the upfront namespaceExists short-circuit with a recovery path: attempt createNamespace first (1 RPC, SPARK-55250 behavior) and only consult namespaceExists when createNamespace fails under IF NOT EXISTS. This keeps the 1-RPC happy path for every catalog AND fixes catalogs that surface unrelated errors (ACLs, properties) on a pre-existing namespace before they get a chance to throw NamespaceAlreadyExistsException. The extra RPC is paid only on the recovery path that was previously a hard error, so the change is a strict Pareto improvement over SPARK-55250. Generated-by: Claude Code (Opus 4.7)
Replace the standalone CreateNamespaceExecSuite with: - ValidatingInMemoryTableCatalog: an InMemoryTableCatalog subclass whose createNamespace validates before checking existence and raises a non-NamespaceAlreadyExistsException on a pre-existing namespace. - Registration in v2.CommandSuiteBase as `validating_test_catalog`. - One SQL-level test in v2.CreateNamespaceSuite that creates a namespace and re-runs CREATE NAMESPACE IF NOT EXISTS — exercises the recovery path end-to-end through the analyzer/planner/exec. Generated-by: Claude Code (Opus 4.7)
The validating catalog is only used by the SPARK-55250 regression test. Move the catalog registration out of the shared CommandSuiteBase and into the suite that needs it, so the rest of the command tests don't pull in an unused catalog binding. Generated-by: Claude Code (Opus 4.7)
pan3793
approved these changes
May 21, 2026
Contributor
Author
|
thanks for review, merging to master/4.x/4.2! |
cloud-fan
added a commit
that referenced
this pull request
May 21, 2026
…n existing namespace ### What changes were proposed in this pull request? Follow-up to SPARK-55250. Add a recovery path to `CreateNamespaceExec.run()` so `IF NOT EXISTS` is a no-op when the namespace already exists, even if the catalog surfaces an error other than `NamespaceAlreadyExistsException`: ```scala try { val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName()) catalog.createNamespace(ns, (properties ++ ownership).asJava) } catch { case _: NamespaceAlreadyExistsException if ifNotExists => logWarning(...) case NonFatal(e) if ifNotExists => val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_) => false } if (exists) logWarning(..., e) else throw e } ``` The unconditional `createNamespace` call introduced by SPARK-55250 is preserved as the first step, so the perf win from that PR is kept on every happy path. The `namespaceExists` fallback runs only when `createNamespace` has already failed — a path that was previously an unrecoverable error. ### Why are the changes needed? SPARK-55250 changed `CREATE NAMESPACE IF NOT EXISTS foo` from "check existence first, skip if present" to "always call `createNamespace`, catch `NamespaceAlreadyExistsException`". This relies on the catalog raising `NamespaceAlreadyExistsException` rather than some other error when the namespace is pre-existing. For `SupportsNamespaces` implementations that validate the request (ACLs, properties, etc.) before checking existence, this assumption doesn't hold: the validation error surfaces first, the `NamespaceAlreadyExistsException` is never thrown, and the `IF NOT EXISTS` no-op semantic is lost. The fix asks the only question that actually matters under `IF NOT EXISTS` after a failure: "does the namespace exist now?" If yes, intent satisfied; otherwise the original error propagates. ### Does this PR introduce _any_ user-facing change? Yes. `CREATE NAMESPACE IF NOT EXISTS foo` is again a no-op when `foo` already exists, regardless of which error the catalog raised on the create attempt. This matches the pre-SPARK-55250 contract. RPC accounting (using UC as an example of a catalog that validates before existence check): | Scenario | SPARK-55250 | This PR | |---|---|---| | no `IF NOT EXISTS` | 1 | 1 | | `IF NOT EXISTS`, foo absent | 1 | 1 | | `IF NOT EXISTS`, foo exists, create succeeds-or-throws-AlreadyExists | 1 | 1 | | `IF NOT EXISTS`, foo exists, create throws something else | 1 (surfaces error ❌) | 2 (recovers ✅) | ### How was this patch tested? - New `ValidatingInMemoryTableCatalog` (an `InMemoryTableCatalog` subclass that validates before checking existence, so a pre-existing namespace raises a non-`NamespaceAlreadyExistsException`) registered as `validating_test_catalog` in `v2.CommandSuiteBase`. - New SQL-level regression test in `v2.CreateNamespaceSuite` that creates a namespace and re-runs `CREATE NAMESPACE IF NOT EXISTS` against that catalog — fails on master, passes with this PR. - Existing `org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite` "hive client calls" test still asserts exactly 1 RPC for each of the three `CREATE NAMESPACE` shapes it covers — confirming SPARK-55250's perf win is preserved on the happy path. - Existing v1 / v2 `CreateNamespaceSuite` pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #56027 from cloud-fan/wenchen/SPARK-55250-followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 4131b00) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan
added a commit
that referenced
this pull request
May 21, 2026
…n existing namespace ### What changes were proposed in this pull request? Follow-up to SPARK-55250. Add a recovery path to `CreateNamespaceExec.run()` so `IF NOT EXISTS` is a no-op when the namespace already exists, even if the catalog surfaces an error other than `NamespaceAlreadyExistsException`: ```scala try { val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName()) catalog.createNamespace(ns, (properties ++ ownership).asJava) } catch { case _: NamespaceAlreadyExistsException if ifNotExists => logWarning(...) case NonFatal(e) if ifNotExists => val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_) => false } if (exists) logWarning(..., e) else throw e } ``` The unconditional `createNamespace` call introduced by SPARK-55250 is preserved as the first step, so the perf win from that PR is kept on every happy path. The `namespaceExists` fallback runs only when `createNamespace` has already failed — a path that was previously an unrecoverable error. ### Why are the changes needed? SPARK-55250 changed `CREATE NAMESPACE IF NOT EXISTS foo` from "check existence first, skip if present" to "always call `createNamespace`, catch `NamespaceAlreadyExistsException`". This relies on the catalog raising `NamespaceAlreadyExistsException` rather than some other error when the namespace is pre-existing. For `SupportsNamespaces` implementations that validate the request (ACLs, properties, etc.) before checking existence, this assumption doesn't hold: the validation error surfaces first, the `NamespaceAlreadyExistsException` is never thrown, and the `IF NOT EXISTS` no-op semantic is lost. The fix asks the only question that actually matters under `IF NOT EXISTS` after a failure: "does the namespace exist now?" If yes, intent satisfied; otherwise the original error propagates. ### Does this PR introduce _any_ user-facing change? Yes. `CREATE NAMESPACE IF NOT EXISTS foo` is again a no-op when `foo` already exists, regardless of which error the catalog raised on the create attempt. This matches the pre-SPARK-55250 contract. RPC accounting (using UC as an example of a catalog that validates before existence check): | Scenario | SPARK-55250 | This PR | |---|---|---| | no `IF NOT EXISTS` | 1 | 1 | | `IF NOT EXISTS`, foo absent | 1 | 1 | | `IF NOT EXISTS`, foo exists, create succeeds-or-throws-AlreadyExists | 1 | 1 | | `IF NOT EXISTS`, foo exists, create throws something else | 1 (surfaces error ❌) | 2 (recovers ✅) | ### How was this patch tested? - New `ValidatingInMemoryTableCatalog` (an `InMemoryTableCatalog` subclass that validates before checking existence, so a pre-existing namespace raises a non-`NamespaceAlreadyExistsException`) registered as `validating_test_catalog` in `v2.CommandSuiteBase`. - New SQL-level regression test in `v2.CreateNamespaceSuite` that creates a namespace and re-runs `CREATE NAMESPACE IF NOT EXISTS` against that catalog — fails on master, passes with this PR. - Existing `org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite` "hive client calls" test still asserts exactly 1 RPC for each of the three `CREATE NAMESPACE` shapes it covers — confirming SPARK-55250's perf win is preserved on the happy path. - Existing v1 / v2 `CreateNamespaceSuite` pass. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #56027 from cloud-fan/wenchen/SPARK-55250-followup. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 4131b00) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Follow-up to SPARK-55250. Add a recovery path to
CreateNamespaceExec.run()soIF NOT EXISTSis a no-op when the namespace already exists, even if the catalog surfaces an error other thanNamespaceAlreadyExistsException:The unconditional
createNamespacecall introduced by SPARK-55250 is preserved as the first step, so the perf win from that PR is kept on every happy path. ThenamespaceExistsfallback runs only whencreateNamespacehas already failed — a path that was previously an unrecoverable error.Why are the changes needed?
SPARK-55250 changed
CREATE NAMESPACE IF NOT EXISTS foofrom "check existence first, skip if present" to "always callcreateNamespace, catchNamespaceAlreadyExistsException". This relies on the catalog raisingNamespaceAlreadyExistsExceptionrather than some other error when the namespace is pre-existing.For
SupportsNamespacesimplementations that validate the request (ACLs, properties, etc.) before checking existence, this assumption doesn't hold: the validation error surfaces first, theNamespaceAlreadyExistsExceptionis never thrown, and theIF NOT EXISTSno-op semantic is lost.The fix asks the only question that actually matters under
IF NOT EXISTSafter a failure: "does the namespace exist now?" If yes, intent satisfied; otherwise the original error propagates.Does this PR introduce any user-facing change?
Yes.
CREATE NAMESPACE IF NOT EXISTS foois again a no-op whenfooalready exists, regardless of which error the catalog raised on the create attempt. This matches the pre-SPARK-55250 contract.RPC accounting (using UC as an example of a catalog that validates before existence check):
IF NOT EXISTSIF NOT EXISTS, foo absentIF NOT EXISTS, foo exists, create succeeds-or-throws-AlreadyExistsIF NOT EXISTS, foo exists, create throws something elseHow was this patch tested?
ValidatingInMemoryTableCatalog(anInMemoryTableCatalogsubclass that validates before checking existence, so a pre-existing namespace raises a non-NamespaceAlreadyExistsException) registered asvalidating_test_cataloginv2.CommandSuiteBase.v2.CreateNamespaceSuitethat creates a namespace and re-runsCREATE NAMESPACE IF NOT EXISTSagainst that catalog — fails on master, passes with this PR.org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite"hive client calls" test still asserts exactly 1 RPC for each of the threeCREATE NAMESPACEshapes it covers — confirming SPARK-55250's perf win is preserved on the happy path.CreateNamespaceSuitepass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)