Skip to content

[SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF NOT EXISTS on existing namespace#56027

Closed
cloud-fan wants to merge 4 commits into
apache:masterfrom
cloud-fan:wenchen/SPARK-55250-followup
Closed

[SPARK-55250][SQL][FOLLOWUP] Skip createNamespace for IF NOT EXISTS on existing namespace#56027
cloud-fan wants to merge 4 commits into
apache:masterfrom
cloud-fan:wenchen/SPARK-55250-followup

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented May 21, 2026

What changes were proposed in this pull request?

Follow-up to SPARK-55250. Add a recovery path to CreateNamespaceExec.run() so IF NOT EXISTS is a no-op when the namespace already exists, even if the catalog surfaces an error other than NamespaceAlreadyExistsException:

try {
  val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName())
  catalog.createNamespace(ns, (properties ++ ownership).asJava)
} catch {
  case _: NamespaceAlreadyExistsException if ifNotExists =>
    logWarning(...)
  case NonFatal(e) if ifNotExists =>
    val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_) => false }
    if (exists) logWarning(..., e) else throw e
}

The unconditional createNamespace call introduced by SPARK-55250 is preserved as the first step, so the perf win from that PR is kept on every happy path. The namespaceExists fallback runs only when createNamespace has already failed — a path that was previously an unrecoverable error.

Why are the changes needed?

SPARK-55250 changed CREATE NAMESPACE IF NOT EXISTS foo from "check existence first, skip if present" to "always call createNamespace, catch NamespaceAlreadyExistsException". This relies on the catalog raising NamespaceAlreadyExistsException rather than some other error when the namespace is pre-existing.

For SupportsNamespaces implementations that validate the request (ACLs, properties, etc.) before checking existence, this assumption doesn't hold: the validation error surfaces first, the NamespaceAlreadyExistsException is never thrown, and the IF NOT EXISTS no-op semantic is lost.

The fix asks the only question that actually matters under IF NOT EXISTS after a failure: "does the namespace exist now?" If yes, intent satisfied; otherwise the original error propagates.

Does this PR introduce any user-facing change?

Yes. CREATE NAMESPACE IF NOT EXISTS foo is again a no-op when foo already exists, regardless of which error the catalog raised on the create attempt. This matches the pre-SPARK-55250 contract.

RPC accounting (using UC as an example of a catalog that validates before existence check):

Scenario SPARK-55250 This PR
no IF NOT EXISTS 1 1
IF NOT EXISTS, foo absent 1 1
IF NOT EXISTS, foo exists, create succeeds-or-throws-AlreadyExists 1 1
IF NOT EXISTS, foo exists, create throws something else 1 (surfaces error ❌) 2 (recovers ✅)

How was this patch tested?

  • New ValidatingInMemoryTableCatalog (an InMemoryTableCatalog subclass that validates before checking existence, so a pre-existing namespace raises a non-NamespaceAlreadyExistsException) registered as validating_test_catalog in v2.CommandSuiteBase.
  • New SQL-level regression test in v2.CreateNamespaceSuite that creates a namespace and re-runs CREATE NAMESPACE IF NOT EXISTS against that catalog — fails on master, passes with this PR.
  • Existing org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite "hive client calls" test still asserts exactly 1 RPC for each of the three CREATE NAMESPACE shapes it covers — confirming SPARK-55250's perf win is preserved on the happy path.
  • Existing v1 / v2 CreateNamespaceSuite pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

cloud-fan added 4 commits May 21, 2026 03:48
…n existing namespace

SPARK-55250 removed the namespaceExists short-circuit in CreateNamespaceExec.
For SupportsNamespaces implementations that validate the request before
checking existence, calling createNamespace on an already-existing namespace
under IF NOT EXISTS can surface unrelated errors. Restore the short-circuit
for the IF NOT EXISTS case only; the no-IF-NOT-EXISTS path still issues a
single createNamespace call, so the hive-client-calls perf win from
SPARK-55250 is preserved.

Generated-by: Claude Code (Opus 4.7)
Replace the upfront namespaceExists short-circuit with a recovery path:
attempt createNamespace first (1 RPC, SPARK-55250 behavior) and only consult
namespaceExists when createNamespace fails under IF NOT EXISTS. This keeps
the 1-RPC happy path for every catalog AND fixes catalogs that surface
unrelated errors (ACLs, properties) on a pre-existing namespace before they
get a chance to throw NamespaceAlreadyExistsException.

The extra RPC is paid only on the recovery path that was previously a hard
error, so the change is a strict Pareto improvement over SPARK-55250.

Generated-by: Claude Code (Opus 4.7)
Replace the standalone CreateNamespaceExecSuite with:
- ValidatingInMemoryTableCatalog: an InMemoryTableCatalog subclass whose
  createNamespace validates before checking existence and raises a
  non-NamespaceAlreadyExistsException on a pre-existing namespace.
- Registration in v2.CommandSuiteBase as `validating_test_catalog`.
- One SQL-level test in v2.CreateNamespaceSuite that creates a namespace
  and re-runs CREATE NAMESPACE IF NOT EXISTS — exercises the recovery
  path end-to-end through the analyzer/planner/exec.

Generated-by: Claude Code (Opus 4.7)
The validating catalog is only used by the SPARK-55250 regression test.
Move the catalog registration out of the shared CommandSuiteBase and into
the suite that needs it, so the rest of the command tests don't pull in an
unused catalog binding.

Generated-by: Claude Code (Opus 4.7)
@cloud-fan
Copy link
Copy Markdown
Contributor Author

thanks for review, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in 4131b00 May 21, 2026
cloud-fan added a commit that referenced this pull request May 21, 2026
…n existing namespace

### What changes were proposed in this pull request?

Follow-up to SPARK-55250. Add a recovery path to `CreateNamespaceExec.run()` so `IF NOT EXISTS` is a no-op when the namespace already exists, even if the catalog surfaces an error other than `NamespaceAlreadyExistsException`:

```scala
try {
  val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName())
  catalog.createNamespace(ns, (properties ++ ownership).asJava)
} catch {
  case _: NamespaceAlreadyExistsException if ifNotExists =>
    logWarning(...)
  case NonFatal(e) if ifNotExists =>
    val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_) => false }
    if (exists) logWarning(..., e) else throw e
}
```

The unconditional `createNamespace` call introduced by SPARK-55250 is preserved as the first step, so the perf win from that PR is kept on every happy path. The `namespaceExists` fallback runs only when `createNamespace` has already failed — a path that was previously an unrecoverable error.

### Why are the changes needed?

SPARK-55250 changed `CREATE NAMESPACE IF NOT EXISTS foo` from "check existence first, skip if present" to "always call `createNamespace`, catch `NamespaceAlreadyExistsException`". This relies on the catalog raising `NamespaceAlreadyExistsException` rather than some other error when the namespace is pre-existing.

For `SupportsNamespaces` implementations that validate the request (ACLs, properties, etc.) before checking existence, this assumption doesn't hold: the validation error surfaces first, the `NamespaceAlreadyExistsException` is never thrown, and the `IF NOT EXISTS` no-op semantic is lost.

The fix asks the only question that actually matters under `IF NOT EXISTS` after a failure: "does the namespace exist now?" If yes, intent satisfied; otherwise the original error propagates.

### Does this PR introduce _any_ user-facing change?

Yes. `CREATE NAMESPACE IF NOT EXISTS foo` is again a no-op when `foo` already exists, regardless of which error the catalog raised on the create attempt. This matches the pre-SPARK-55250 contract.

RPC accounting (using UC as an example of a catalog that validates before existence check):

| Scenario | SPARK-55250 | This PR |
|---|---|---|
| no `IF NOT EXISTS` | 1 | 1 |
| `IF NOT EXISTS`, foo absent | 1 | 1 |
| `IF NOT EXISTS`, foo exists, create succeeds-or-throws-AlreadyExists | 1 | 1 |
| `IF NOT EXISTS`, foo exists, create throws something else | 1 (surfaces error ❌) | 2 (recovers ✅) |

### How was this patch tested?

- New `ValidatingInMemoryTableCatalog` (an `InMemoryTableCatalog` subclass that validates before checking existence, so a pre-existing namespace raises a non-`NamespaceAlreadyExistsException`) registered as `validating_test_catalog` in `v2.CommandSuiteBase`.
- New SQL-level regression test in `v2.CreateNamespaceSuite` that creates a namespace and re-runs `CREATE NAMESPACE IF NOT EXISTS` against that catalog — fails on master, passes with this PR.
- Existing `org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite` "hive client calls" test still asserts exactly 1 RPC for each of the three `CREATE NAMESPACE` shapes it covers — confirming SPARK-55250's perf win is preserved on the happy path.
- Existing v1 / v2 `CreateNamespaceSuite` pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Closes #56027 from cloud-fan/wenchen/SPARK-55250-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 4131b00)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request May 21, 2026
…n existing namespace

### What changes were proposed in this pull request?

Follow-up to SPARK-55250. Add a recovery path to `CreateNamespaceExec.run()` so `IF NOT EXISTS` is a no-op when the namespace already exists, even if the catalog surfaces an error other than `NamespaceAlreadyExistsException`:

```scala
try {
  val ownership = Map(PROP_OWNER -> Utils.getCurrentUserName())
  catalog.createNamespace(ns, (properties ++ ownership).asJava)
} catch {
  case _: NamespaceAlreadyExistsException if ifNotExists =>
    logWarning(...)
  case NonFatal(e) if ifNotExists =>
    val exists = try catalog.namespaceExists(ns) catch { case NonFatal(_) => false }
    if (exists) logWarning(..., e) else throw e
}
```

The unconditional `createNamespace` call introduced by SPARK-55250 is preserved as the first step, so the perf win from that PR is kept on every happy path. The `namespaceExists` fallback runs only when `createNamespace` has already failed — a path that was previously an unrecoverable error.

### Why are the changes needed?

SPARK-55250 changed `CREATE NAMESPACE IF NOT EXISTS foo` from "check existence first, skip if present" to "always call `createNamespace`, catch `NamespaceAlreadyExistsException`". This relies on the catalog raising `NamespaceAlreadyExistsException` rather than some other error when the namespace is pre-existing.

For `SupportsNamespaces` implementations that validate the request (ACLs, properties, etc.) before checking existence, this assumption doesn't hold: the validation error surfaces first, the `NamespaceAlreadyExistsException` is never thrown, and the `IF NOT EXISTS` no-op semantic is lost.

The fix asks the only question that actually matters under `IF NOT EXISTS` after a failure: "does the namespace exist now?" If yes, intent satisfied; otherwise the original error propagates.

### Does this PR introduce _any_ user-facing change?

Yes. `CREATE NAMESPACE IF NOT EXISTS foo` is again a no-op when `foo` already exists, regardless of which error the catalog raised on the create attempt. This matches the pre-SPARK-55250 contract.

RPC accounting (using UC as an example of a catalog that validates before existence check):

| Scenario | SPARK-55250 | This PR |
|---|---|---|
| no `IF NOT EXISTS` | 1 | 1 |
| `IF NOT EXISTS`, foo absent | 1 | 1 |
| `IF NOT EXISTS`, foo exists, create succeeds-or-throws-AlreadyExists | 1 | 1 |
| `IF NOT EXISTS`, foo exists, create throws something else | 1 (surfaces error ❌) | 2 (recovers ✅) |

### How was this patch tested?

- New `ValidatingInMemoryTableCatalog` (an `InMemoryTableCatalog` subclass that validates before checking existence, so a pre-existing namespace raises a non-`NamespaceAlreadyExistsException`) registered as `validating_test_catalog` in `v2.CommandSuiteBase`.
- New SQL-level regression test in `v2.CreateNamespaceSuite` that creates a namespace and re-runs `CREATE NAMESPACE IF NOT EXISTS` against that catalog — fails on master, passes with this PR.
- Existing `org.apache.spark.sql.hive.execution.command.CreateNamespaceSuite` "hive client calls" test still asserts exactly 1 RPC for each of the three `CREATE NAMESPACE` shapes it covers — confirming SPARK-55250's perf win is preserved on the happy path.
- Existing v1 / v2 `CreateNamespaceSuite` pass.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Closes #56027 from cloud-fan/wenchen/SPARK-55250-followup.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 4131b00)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants