Skip to content

fix(secrets): worker persistent-secret hygiene + DROP passthrough#658

Merged
fuziontech merged 6 commits into
mainfrom
fix/persistent-secret-ambiguity
Jun 2, 2026
Merged

fix(secrets): worker persistent-secret hygiene + DROP passthrough#658
fuziontech merged 6 commits into
mainfrom
fix/persistent-secret-ambiguity

Conversation

@fuziontech
Copy link
Copy Markdown
Member

@fuziontech fuziontech commented Jun 2, 2026

Problem

A user hit DuckDB errors that persisted for weeks:

Invalid Configuration Error: Ambiguity detected for secret name 'customer_db',
  secret occurs in multiple storage backends.
Invalid Input Error: Ambiguity found for secret name 'customer_warehouse_prod_s3',
  secret occurs in multiple storages: [local_file,memory]

Root cause. They ran CREATE PERSISTENT SECRET ... once, days/weeks ago. A bare CREATE OR REPLACE SECRET defaults to a temporary (memory) secret, and OR REPLACE only replaces within the same storage backend — so the persistent copy was never touched. It was written under the worker's DataDir (/data/.duckdb/stored_secrets, since the worker's HOME is /data), which survives across worker incarnations on a persistent/long-lived /data. Every activation re-creates the same-named in-memory secret, so the name ends up in both local_file and memory → every reference and every DROP is ambiguous.

Two compounding bugs made cleanup impossible:

  1. The fix DuckDB suggests — DROP <PERSISTENT|TEMPORARY> SECRET ... — was unrunnable: those forms don't start with "DROP SECRET", so they fell through to the PG transpiler and died with syntax error at or near "PERSISTENT".
  2. DROP SECRET ... FROM local_file is unreliable too: once the file is gone but the secret is still in DuckDB's startup-loaded registry, DuckDB throws IO Error: Failed to remove secret file ... (reproduced against real DuckDB).

Changes

  1. Passthrough fix (server/conn.go) — add DROP PERSISTENT SECRET / DROP TEMPORARY SECRET (+ IF EXISTS) to the DuckDB utility-command allowlist so they reach DuckDB instead of the PG parser.

  2. Pin the secret directory on workers (server/server.go + duckdbservice/duckdb_pair.go) — new Config.PinSecretDirectory, applied in ConfigureMainDB as SET secret_directory = '<DataDir>/secrets', set for every worker via OpenDuckDBPair (the worker-only DB factory; standalone uses openBaseDB and is untouched). Redirecting away from DuckDB's $HOME default means stale secrets in the old location are no longer loaded, so they can't collide with the in-memory secret re-created at activation — and the location is now deterministic and wipeable.

  3. Wipe persisted secrets on worker recycle (duckdbservice/service.go) — Warmup wipes both <DataDir>/secrets (pinned) and the legacy <DataDir>/.duckdb/stored_secrets on worker startup (== recycle; a shared-warm worker is single-org-bound for life). Physically clears the historical files that caused the incident so they don't linger on a persistent DataDir.

Why not allow_persistent_secrets = false

An earlier revision disabled persistent secrets outright. CI caught that it breaks the DuckLake ATTACH path (controlplane-tests + k8s-integration-tests): disabling unregisters DuckDB's local_file secret storage backend, which DuckLake depends on (Unknown secret storage found: 'local_file'). Reverted. Pinning leaves local_file registered (just relocated), so DuckLake is unaffected, while the redirect + recycle wipe still keep workers clean.

Tests

  • server/conn_test.go::TestIsDuckDBUtilityCommand — DROP variants, quoting, comments, case, word-boundary guards.
  • server/persistent_secrets_test.goTestPinSecretDirectory (a persistent secret still works and lands under the pinned dir); TestPinSecretDirectoryIgnoresLegacyDefault (a secret in the legacy location is not loaded once pinned away); TestSecretDirectory / LegacySecretDirectory derivation.
  • duckdbservice/secret_recycle_test.go::TestWipePersistedSecretsOnRecycle — full recycle contract against real DuckDB, non-vacuous control phase (secret survives a plain reopen), asserts both pinned and legacy dirs are wiped.
  • Full go test ./server/ + ./duckdbservice/, go vet, and just lint (CI-pinned golangci-lint) clean on touched files.

Why no tests/k8s unit-test for the wipe

Worker DataDir (/data) is an EmptyDir (controlplane/k8s_pool.go:827), destroyed on pod deletion — a pod-kill test would pass even with the wipe removed. The boundary the wipe guards is a process restart over a surviving DataDir (container restart, persistent volume, warm node), modeled directly by the unit test. (Note: the real DuckLake activation path is still covered by the existing controlplane/k8s integration suites, which is what caught the allow_persistent_secrets regression.)

Behavior changes to note

  • Standalone is unchanged — pinning is gated behind PinSecretDirectory, which only workers set. Standalone keeps DuckDB's $HOME default and full persistent-secret behavior, so upgrading doesn't relocate existing secrets.
  • Workers: persistent secrets still work, but live under <DataDir>/secrets and are wiped on recycle. CREATE/OR REPLACE` (in-memory) secrets — what duckgres and the user's scripts use — are unaffected.

Immediate user remediation (pre-merge)

The DROP loop won't recover from the partially-deleted state. Clear the files and recycle the worker:

kubectl exec <worker-pod> -n duckgres -- rm -rf /data/.duckdb/stored_secrets
# then restart/recycle that worker

If /data is a persistent volume, deleting the pod alone won't help. After this ships, a worker recycle does this automatically. The user can also drop the DROP ... FROM local_file loop from their setup script.

🤖 Generated with Claude Code

fuziontech and others added 3 commits June 2, 2026 08:35
A user hit DuckDB "secret occurs in multiple storage backends" errors that
persisted for weeks. Root cause: a CREATE PERSISTENT SECRET written days/weeks
ago landed in DuckDB's default secret store ($HOME/.duckdb/stored_secrets),
which survives across worker restarts on any non-ephemeral disk. Each
activation re-creates the same-named in-memory secret, so the name then exists
in both local_file and memory → every reference and every DROP is ambiguous.

Worse, the fix DuckDB itself suggests (DROP <PERSISTENT|TEMPORARY> SECRET ...)
was unrunnable: those forms don't start with "DROP SECRET", so they fell
through to the PG transpiler and died with `syntax error at or near
"PERSISTENT"`. The user could not clean up the state through duckgres at all.

Three changes:

1. server/conn.go: add DROP PERSISTENT/TEMPORARY SECRET (and IF EXISTS forms)
   to the DuckDB utility-command passthrough so they reach DuckDB untouched.

2. server/server.go: pin secret_directory to <DataDir>/secrets in
   ConfigureMainDB instead of DuckDB's $HOME default. Deterministic, per-worker,
   and wipeable. As a side benefit, redirecting the directory means stale
   secrets in the old default path stop being loaded, so they can no longer
   collide with the in-memory secrets re-created at activation.

3. duckdbservice/service.go: wipe the secret directory at worker startup in
   Warmup (duckdb-service mode only). A shared-warm worker is single-org-bound
   for its whole life, so process start == recycle; this guarantees a
   CREATE PERSISTENT SECRET from a prior incarnation cannot resurface or leak
   across tenants. Standalone mode is untouched.

Adds TestIsDuckDBUtilityCommand covering the DROP variants, quoting, comments,
case-insensitivity, and word-boundary guards.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract the Warmup wipe into wipePersistedSecrets and add a unit-level test
that exercises the full recycle contract against real DuckDB:

  1. CREATE PERSISTENT SECRET -> lands on disk under the pinned secret_directory
  2. plain reopen -> secret reloads (the bug; also proves the test is non-vacuous)
  3. wipePersistedSecrets (the exact Warmup call) -> directory gone
  4. reopen after recycle -> secret is gone

Done here rather than tests/k8s on purpose: worker DataDir is an EmptyDir, so a
pod-kill test would pass even with the wipe removed (pod replacement already
yields a fresh empty /data). The boundary the wipe actually guards is a process
restart over a surviving DataDir — container restart within a pod, or a
persistent volume / warm node — which this test models directly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the same incident: the DROP-based cleanup workaround
(DROP SECRET ... FROM local_file) is unreliable — once a persistent secret's
file has been removed out of band but the secret is still in DuckDB's
in-memory registry (loaded at startup), DuckDB throws

  IO Error: Failed to remove secret file '.../<name>.duckdb_secret',
  the file may have been removed by another duckdb instance.

which is exactly the state the user is in. Relocating secret_directory alone
also doesn't stop a user from re-creating the landmine.

Structural fix, scoped to workers:

- server: add Config.DisablePersistentSecrets; ConfigureMainDB applies
  `SET allow_persistent_secrets = false`. Verified against real DuckDB that
  this both (a) refuses CREATE PERSISTENT SECRET and (b) does not load
  pre-existing on-disk secrets — so the ambiguity class is impossible.
- duckdbservice: OpenDuckDBPair (the worker-only DB factory; standalone uses
  server.openBaseDB) sets the flag for every worker. Standalone is left able
  to use persistent secrets.
- duckdbservice: wipePersistedSecrets now also clears the legacy
  <DataDir>/.duckdb/stored_secrets path (where prod files actually live, since
  the worker's HOME is its DataDir), not just the pinned secret_directory.

Tests: server/persistent_secrets_test.go (disable refuses create + skips
loading + still allows in-memory secrets; SecretDirectory derivation);
extends TestWipePersistedSecretsOnRecycle to assert the legacy path is wiped.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech changed the title fix(secrets): stop persistent-secret storage ambiguity on workers fix(secrets): disable persistent secrets on workers + fix DROP passthrough Jun 2, 2026
fuziontech and others added 3 commits June 2, 2026 09:03
…errcheck

- ConfigureMainDB now pins secret_directory only when DisablePersistentSecrets
  is set (i.e. workers). Standalone keeps DuckDB's $HOME default, so upgrading
  no longer silently relocates an existing standalone user's persistent secrets.
- secret_recycle_test pins secret_directory directly (it needs persistent
  secrets enabled to create one to wipe, which the worker config disables).
- persistent_secrets_test: check db.Close() in defer to satisfy errcheck.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y, more tests

Follow-ups from review (none change behavior):

- conn.go: clarify in the comment that the DROP SECRET / DROP PERSISTENT SECRET
  prefixes are order-independent ("DROP SECRET" is not a string-prefix of
  "DROP PERSISTENT SECRET"), pre-empting a misread that the ordering is load-bearing.
- server.go: extract LegacySecretDirectory(cfg) colocated with SecretDirectory
  so the recycle wipe's two targets can't drift; wipePersistedSecrets and the
  recycle test now use it. Covered in TestSecretDirectory.
- persistent_secrets_test: also assert CREATE OR REPLACE PERSISTENT SECRET is
  rejected when persistent secrets are disabled (the variant most callers write).

Left as-is by design: SET secret_directory interpolates DataDir unescaped,
consistent with the sibling extension_directory/temp_directory/cache SETs — it's
trusted server config, and a SQL-escaping helper would be a separate
codebase-wide hardening change, not part of this incident fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e ATTACH

CI caught a regression: disabling persistent secrets unregisters DuckDB's
local_file secret storage backend, and the DuckLake ATTACH path depends on it:

  Failed to attach DuckLake: Invalid Input Error: ...
  Unknown secret storage found: 'local_file'

(controlplane-tests TestDuckLakeBooleanPredicatesAndMergeInControlPlane and
k8s-integration-tests both failed.) Reproduced in isolation: with
allow_persistent_secrets=false, any reference to local_file storage errors.

Revert the disable; keep the safe, sufficient pieces:
- secret_directory is pinned to <DataDir>/secrets on workers (rename the gating
  field DisablePersistentSecrets -> PinSecretDirectory to match). Pinning leaves
  local_file storage registered (just relocated), so DuckLake ATTACH is
  unaffected, while stale secrets in the old $HOME default stop being loaded.
- the recycle wipe (Warmup) still clears both the pinned and legacy dirs.
- the DROP PERSISTENT/TEMPORARY SECRET passthrough fix is unchanged.

Tests: replace TestDisablePersistentSecrets with TestPinSecretDirectory (a
persistent secret still works and lands under the pinned dir) and
TestPinSecretDirectoryIgnoresLegacyDefault (a secret in the legacy location is
not loaded once pinned away).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech changed the title fix(secrets): disable persistent secrets on workers + fix DROP passthrough fix(secrets): worker persistent-secret hygiene + DROP passthrough Jun 2, 2026
@fuziontech fuziontech merged commit 2e3f3f4 into main Jun 2, 2026
22 checks passed
@fuziontech fuziontech deleted the fix/persistent-secret-ambiguity branch June 2, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant