Skip to content

feat(provisioner): Lakekeeper-as-Iceberg-catalog — PR1 (provisioning machinery)#574

Merged
fuziontech merged 12 commits into
mainfrom
lakekeeper-pr1
May 20, 2026
Merged

feat(provisioner): Lakekeeper-as-Iceberg-catalog — PR1 (provisioning machinery)#574
fuziontech merged 12 commits into
mainfrom
lakekeeper-pr1

Conversation

@fuziontech
Copy link
Copy Markdown
Member

Summary

First of three PRs adding Lakekeeper as the default Iceberg catalog for duckgres workers. This PR is provisioning scaffolding only — no worker behavior change.

PR1 (this PR) builds:

  • A thin Lakekeeper REST client (Info, Bootstrap, EnsureWarehouse)
  • A Postgres admin helper (EnsureDatabase) for creating the per-org lakekeeper_<orgid> database inside the org's existing managed-warehouse Aurora cluster
  • K8s EnsureCR / EnsureSecret / GetCR helpers for the Lakekeeper CR from the lakekeeper-operator
  • The LakekeeperProvisioner.EnsureForOrg reconciler that chains it all together idempotently
  • Schema additions to ManagedWarehouseIceberg (Backend selector + Lakekeeper-side config fields)
  • A default-deny baseline NetworkPolicy in the lakekeeper namespace

PR2 (next) will add the worker-side ATTACH SQL and activation-payload wiring. PR3 will swap allowall + NetworkPolicy isolation for OIDC SA-token auth.

Background

Prototype validation at tmp/lakekeeper-proto/ (not in this PR) empirically confirmed:

  • DuckDB 1.5.2 + the bundled iceberg extension can ATTACH a Lakekeeper REST catalog and CREATE/INSERT/SELECT through it end-to-end (parquet/manifest/snapshot all land in S3).
  • Lakekeeper warehouse storage profile needs sts-enabled: true, flavor: s3-compat (for MinIO; aws for AWS), and remote-signing-enabled: false for DuckDB to authenticate to S3.
  • Token rotation in DuckDB-iceberg is handled internally via the OAuth2 client_credentials flow — caller only manages long-lived CLIENT_ID/CLIENT_SECRET.

A FINDINGS.md writeup of the prototype is intentionally kept in tmp/ so it's clear it's not the source of truth for production code.

Design decisions captured

  • Default for new orgs: ManagedWarehouseIceberg.Backend defaults to "lakekeeper". Legacy "s3_tables" stays as an explicit escape hatch. ResolvedBackend() handles the empty-string case for rows that predate the column.
  • Reuse org's existing Aurora: per-org lakekeeper_<orgid> database, not a dedicated PG cluster. The provisioner takes the admin DSN as an injected dependency so the secret-lookup logic isn't baked in here — PR2 wires it to the K8s Secret managed by Crossplane.
  • Lazy provisioning: triggered from the worker activator at first activation. The reconciler is synchronous and idempotent; ErrBootstrapPending signals "requeue, not fail".
  • Status preservation in EnsureCR: spec drift correction never clobbers operator-set status. The fix was load-bearing for the fake-client tests but is also the right contract for real K8s.
  • CAS-mismatch is benign: configstore gained a typed ErrWarehouseStateMismatch sentinel. EnsureForOrg returns nil on a CAS miss because the Lakekeeper REST write is already committed by then.

Test plan

  • CI passes both default and -tags kubernetes test suites
  • Manual EnsureForOrg against a dev cluster with the lakekeeper-operator installed (PostHog/charts companion PR)
  • Live integration tests against the prototype stack at tmp/lakekeeper-proto/ (PG_ADMIN_DSN, LAKEKEEPER_SMOKE_URL env-gated) pass locally
  • No worker behavior change — ducklings still use the S3 Tables path on every existing org

Companion PR

PostHog/charts: feat(lakekeeper-operator): vendor chart + add dev AppSet — installs the operator + CRD on posthog-dev. The duckgres provisioner assumes the CRD is available; the charts PR provides it.

🤖 Generated with Claude Code

fuziontech and others added 10 commits May 19, 2026 12:02
…ector

First slice of the Lakekeeper-as-default-Iceberg-catalog work. No behavior
change for workers yet — these are the building blocks the upcoming
provisioner reconciler will use.

  * lakekeeper_client.go: thin HTTP client for Lakekeeper's management API
    surface we need (Info, Bootstrap, EnsureWarehouse). Idempotent helpers
    treat 409-on-bootstrap as success. Carries an optional bearer for the
    OIDC follow-up. Smoke test runs against a live Lakekeeper when
    LAKEKEEPER_SMOKE_URL is set; unit tests use httptest.

  * ManagedWarehouseIceberg gains Backend, Lakekeeper{Endpoint, Warehouse,
    ClientID, OAuth2ServerURI}, and a LakekeeperClientCredentials SecretRef.
    Backend defaults to "lakekeeper" via GORM tag; ResolvedBackend() applies
    the same default in code for rows migrated from earlier schemas. The
    legacy "s3_tables" backend stays as an explicit escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  * EnsureWarehouse now handles the GET-then-POST TOCTOU race: a 409 on
    create triggers a re-list and returns the winning warehouse rather
    than surfacing the conflict. Mirrors the Bootstrap 409-as-success
    pattern. New test covers the race resolution.
  * WithBearer doc'd as not concurrency-safe and token-must-be-single-line.
    Proper rotation primitives land with the OIDC work in PR3.
  * findWarehouseByName doc'd as single-page-only — fine for the one
    warehouse per Lakekeeper instance we deploy today; needs pagination
    if we ever host multiple warehouses per instance.
  * Smoke test uses errors.As directly instead of a one-off helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion

The Lakekeeper-reuses-org-PG plan needs a lakekeeper_<orgid> database
created inside the org's existing Aurora cluster. Crossplane composition
provisions one DB per cluster today; the controlplane creates the second
one on demand at provisioning time.

  * EnsureDatabase(ctx, adminDSN, dbName) — opens a pgx connection, probes
    pg_database, runs CREATE DATABASE if absent. Handles the 42P04
    duplicate-database SQLSTATE as a benign concurrent-create outcome.
  * Identifier validation via isSafePGIdent regex + quoteIdent escaping
    (belt and suspenders — CREATE DATABASE doesn't accept query parameters
    so the identifier must be interpolated).
  * Pure function over a DSN: how the admin DSN gets sourced from the K8s
    Secret managed by Crossplane is a separate concern that lands with the
    provisioner reconciler.

Tests: unit tests for isSafePGIdent, quoteIdent, unsafe-ident rejection.
Live-PG integration test gated on PG_ADMIN_DSN — passes against the
prototype Postgres in tmp/lakekeeper-proto.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LakekeeperK8sClient wraps the dynamic client (for the Lakekeeper CR) and
the typed clientset (for the backing Secret) so the provisioner reconciler
can drive both with one dependency. Mirrors the existing DucklingClient
construction pattern; same //go:build kubernetes isolation.

  * EnsureSecret(orgID, data) — creates or updates an Opaque Secret in
    the lakekeeper namespace holding db-user/db-password/encryption-key/
    oauth2-client-secret. Stamped with the duckgres/active-org label so
    the NetworkPolicy lands cleanly later.
  * EnsureCR(spec) — renders and applies the Lakekeeper CR (allowall
    authz, bootstrap.enabled=true, server.baseURI for clients). Idempotent
    create-or-update via dynamic client; replaces the spec wholesale on
    update. Required-field validation up front.
  * GetCR returns (nil, nil) for missing CRs so the reconciler can use
    "not yet created" as a state signal rather than an error path.

Tests use the dynamic fake + kubefake clientset (matches controller_test
pattern). int64-cast numeric fields in the unstructured map because the
DeepCopyJSON validator rejects int32.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  * EnsureCR now validates OrgID up front. An empty OrgID previously
    fell through to LakekeeperResourceName("") and would have created a
    'lakekeeper-' resource — wrong but hard to spot.
  * Both EnsureCR and EnsureSecret validate orgID against the K8s
    label-value grammar. The resource-name path stripped hyphens, but
    label values went through unsanitised, so a malformed orgID would
    have surfaced as an opaque API server rejection. A clear error from
    the helper is much easier to debug.
  * Tests cover empty + unsafe orgID paths and the isValidOrgIDLabel
    rule set (UUID-style IDs pass; leading hyphen, embedded spaces, and
    >63 chars are rejected).
  * TestEnsureCR_IdempotentUpdate now reads back the CR after the
    second call and asserts the image field actually changed, so the
    update path has a real regression guard rather than just "no error".
  * isDuplicateDatabase uses errors.As instead of a hand-rolled unwrap
    loop. Handles errors.Join-style multi-error chains.
  * Doc note on EnsureCR/EnsureSecret update conflicts: the wrapped
    error preserves apierrors.IsConflict for transient handling, so the
    reconciler can requeue rather than treating it as fatal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LakekeeperProvisioner.EnsureForOrg chains the existing helpers into one
idempotent pipeline:

  1. EnsureDatabase: CREATE DATABASE lakekeeper_<orgid> on the org's
     Aurora via the admin DSN supplied by the caller.
  2. resolveOrGenerateSecret: read the per-org Secret if present, or
     generate fresh creds (16/16/32-byte hex randoms) and write a new
     Secret. Re-running never rotates — that would silently desync the
     Postgres user's password from the Lakekeeper config.
  3. EnsureCR with the per-org spec (allowall authz, bootstrap enabled,
     PG host/port/db, BaseURI matching the operator-created Service).
  4. waitForBootstrap: poll the CR until status.bootstrappedAt is set
     (operator's job). On timeout returns the typed ErrBootstrapPending
     so the caller can requeue without treating it as fatal.
  5. EnsureWarehouse via REST: idempotent storage profile create with
     sts-enabled + s3-compat + remote-signing-disabled (the prototype
     proved this is the shape DuckDB-iceberg wants).
  6. Persist endpoint, client_id, and the SecretRef pointing at the
     OAuth2 client_secret back into ManagedWarehouseIceberg. Flips
     IcebergState to Ready.

The reconciler doesn't bake in K8s Secret lookup for the AdminDSN — the
caller computes ProvisioningInputs (admin DSN, PG host, S3 config) from
whatever source applies in their context. PR2 wiring will resolve these
from the Duckling CR status and the Crossplane-managed admin secret.

A side-bug fix in EnsureCR: the Update path was overwriting the existing
status block (the fake dynamic client doesn't honor the status
subresource split). Now copies status from existing CR before Update so
spec drift correction never clobbers operator-set status — which is the
right contract on real K8s too.

Tests:
  * TestEnsureForOrg_HappyPath_Fakes (PG-gated) — full e2e against a
    real Postgres + httptest Lakekeeper. Verifies row updates, exact
    storage-profile shape, and idempotency on a second call.
  * TestEnsureForOrg_BootstrapTimeoutReturnsTransient (PG-gated) —
    short timeout, no status injection; expects ErrBootstrapPending.
  * Validation tests for missing/invalid inputs and unsafe org IDs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  * configstore: typed ErrWarehouseStateMismatch sentinel. UpdateWarehouseState
    now wraps the "rows affected = 0" error with this so callers can use
    errors.Is to distinguish "CAS guard fired" from a real failure. The
    fakeStore in tests follows the same contract.

  * EnsureForOrg treats ErrWarehouseStateMismatch as benign success: by
    the time the row update fires, the Lakekeeper REST resources are
    already provisioned, and the next reconcile iteration will retry the
    persist step with the row's fresh state. Previously this was a
    confusing fatal error after a successful Lakekeeper write. New test
    TestEnsureForOrg_StateMismatchIsBenign locks the contract in.

  * secretFromExisting reads only from Secret.Data — the API server clears
    StringData on read, so the dual-branch check was dead code in
    production. Test coverage is preserved (the fake echoes StringData
    back; assertSecretData handles both paths there).

  * waitForBootstrap reuses one time.Timer instead of allocating a fresh
    timer per poll via time.After. Trivial leak that adds up over the
    bootstrap window.

  * Replaced the hand-rolled isErrBootstrap helper with errors.Is —
    handles wrapped errors and errors.Join chains correctly.

  * TestFakeStoreUpdateWarehouseState updated to assert the new
    ErrWarehouseStateMismatch contract on CAS-miss.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Default-deny on both ingress and egress, plus two scoped exceptions:
  * ingress 8181/TCP from any pod app=duckgres-worker in the duckgres
    namespace
  * egress to DNS, Postgres (5432), HTTPS (443), MinIO (9000)

Per-org isolation (only org X's workers can reach org X's Lakekeeper)
is deliberately out of scope here. The lakekeeper-operator only stamps
the standard app.kubernetes.io/{name,instance,managed-by} labels and
doesn't propagate CR labels onto the Deployment, so a NetworkPolicy
keyed on duckgres/active-org never matches. Two follow-up paths
documented inline:
  (a) upstream operator patch to propagate CR labels, or
  (b) per-org CiliumClusterwideNetworkPolicy templated in charts repo
      matching app.kubernetes.io/instance=lakekeeper-<orgid-suffix>.

This still gives a meaningful cluster boundary today; intra-cluster
org-to-org boundary lands with PR3 alongside OIDC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five real fixes from the holistic PR review:

  * **EnsureRole**: the missing piece that would have broken every first
    activation. A fresh database has no users, but Lakekeeper logs in as
    a role named after the database. EnsureRole creates/alters the
    login role with the password from the K8s Secret, then grants ALL
    on the database. Idempotent: re-runs with the same password are a
    no-op-ish ALTER; rotated passwords ALTER to the new one. Live-PG
    integration test verifies create + idempotent + rotate cycle.

  * **Bootstrap-pending no longer blocks the reconcile loop.** Replaced
    waitForBootstrap (up-to-30s internal poll) with checkBootstrap (one
    GetCR call). Returns ErrBootstrapPending immediately so the outer
    reconciler can requeue without stalling other orgs behind a slow
    operator bootstrap. WithBootstrapTimeout option removed since no
    polling happens anymore.

  * **UpdateIcebergConfig**: new WarehouseStore method that writes
    iceberg_* columns without a top-level state CAS. The previous code
    CASed on w.State, which silently swallowed the persist whenever the
    Duckling controller had already transitioned the row to Ready. Now
    Lakekeeper config always lands. New test
    TestEnsureForOrg_PersistsAfterTopLevelStateMoved locks this in.

  * **EncryptionKey size 16→32 bytes** (32 hex chars → 64 hex chars).
    Comment claimed "32-byte key" but mustRandomHex(16) only produces
    16 bytes of entropy. Bumped DBPassword to 32 too — same generator,
    consistent posture.

  * **EnsureSecret writes Data, not StringData**. The K8s API server
    converts StringData→Data on write, so production reads only ever
    see Data. The dynamic-fake clientset doesn't do that conversion, so
    writing Data directly makes the fake match real K8s behavior.
    secretFromExisting now reads Data exclusively without the dead-code
    fallback.

Plus three smaller items:

  * NetworkPolicy comment clarifies the `app.kubernetes.io/name` selector
    is for operator-stamped Deployment pods, and the `app: lakekeeper`
    label we put on the CR / Secret is intentionally non-overlapping.
  * Field comments on LakekeeperWarehouse and LakekeeperOAuth2ServerURI
    document the wire contract PR2 will consume.
  * isSafePGPassword + quoteLiteral added for the EnsureRole code path.
    The password whitelist is conservative (alphanumeric + a handful of
    symbols) so SQL injection via the unquoted password literal is
    structurally impossible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end test that drives EnsureForOrg against a real Kubernetes
cluster (orbstack), the real lakekeeper-operator, a real Postgres, and
real MinIO. Skipped unless LAKEKEEPER_E2E_KUBECONFIG is set.

The test caught three real bugs that the dynamic-fake clientset and
httptest server couldn't:

  * **Postgres 15+ public-schema permissions.** The freshly-created
    lakekeeper_<orgid> database has a `public` schema owned by the
    admin user. Lakekeeper's `migrate` step needs DDL there, but a
    non-owner role can't CREATE in public on PG 15+. EnsureRole now
    runs `ALTER DATABASE OWNER TO <role>` and (belt-and-suspenders)
    `ALTER SCHEMA public OWNER TO <role>` inside the target database
    via a re-DSN helper. Without this, the migration Job CrashLoopBackOffs
    forever and the operator never bootstraps.

  * **Lakekeeper PG SSL mode mismatch.** The operator's `sslMode`
    default is `require`, but our local Postgres has no TLS. Added a
    PGSSLMode field to LakekeeperCRSpec + ProvisioningInputs so callers
    can pass `disable` for local/dev environments. Production
    deployments leave it empty and inherit the secure default.

  * **`/management/v1/*` is at the server root, not under `/catalog`.**
    The provisioner was passing `baseURL + "/catalog"` to the management
    HTTP client, which then assembled `<x>/catalog/management/v1/warehouse`
    — a 404. /catalog is only for the Iceberg REST API that ducklings
    hit later. Now the LakekeeperClient gets the bare baseURL; the
    LakekeeperEndpoint we persist for DuckDB still includes /catalog.

The e2e test routes from the host to in-cluster Lakekeeper via the API
server's service-proxy subresource — orbstack publishes ClusterIPs but
the `.svc` hostnames aren't resolvable from the host. The clientFor
override is test-only; production runs in-cluster so the `.svc`
addresses work natively.

E2E result: full pipeline completes in ~20 seconds (image pull + PG
migrate + bootstrap + warehouse-create), reproducible across runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fuziontech
Copy link
Copy Markdown
Member Author

orbstack e2e validation — three production bugs caught

Stood the full pipeline up against a real orbstack k8s cluster + real lakekeeper-operator + real Postgres + real MinIO. Three bugs surfaced that the dynamic-fake clientset and httptest mocks couldn't catch (all fixed in 3d26bca):

  1. Postgres 15+ public-schema permissions — the freshly-created lakekeeper_<orgid> database's public schema is owned by the admin user, and PG 15 removed default CREATE perms for non-owner roles. Lakekeeper's migrate step CrashLoopBackOffs forever because it can't CREATE TABLE. EnsureRole now does ALTER DATABASE OWNER TO <role> + ALTER SCHEMA public OWNER TO <role> inside the target database via a re-DSN helper.

  2. TLS mismatch on PG connection — the operator's sslMode defaults to require, but local Postgres has no TLS. Added PGSSLMode to both LakekeeperCRSpec and ProvisioningInputs. Production leaves it empty and inherits the secure default; local/dev passes disable.

  3. /management/v1/* is at the server root, not under /catalog — the provisioner was passing baseURL + "/catalog" to the management HTTP client, assembling <x>/catalog/management/v1/warehouse which 404s. /catalog is only for the Iceberg REST API that ducklings hit later. Now the management client uses the bare baseURL; the persisted LakekeeperEndpoint still includes /catalog for DuckDB.

E2E result

=== RUN   TestE2E_LakekeeperProvisionerOnOrbstack
    orgID=e2e1779223717
    [0s] bootstrap pending, requeueing...
    [3s] bootstrap pending, requeueing...
    ...
    [18s] bootstrap pending, requeueing...
    EnsureForOrg succeeded after 21.5s
    warehouse row: endpoint=http://lakekeeper-e2e1779223717.lakekeeper.svc:8181/catalog
                   warehouse=org-e2e1779223717
                   client_id=duckling-e2e1779223717
--- PASS: TestE2E_LakekeeperProvisionerOnOrbstack (21.74s)

Full pipeline reconciles in ~20s including image pull. Reproducible across runs. The e2e test is gated by LAKEKEEPER_E2E_KUBECONFIG so CI won't run it without a cluster, but it's a one-command verification path for anyone bringing up the local stack:

LAKEKEEPER_E2E_KUBECONFIG=\$HOME/.kube/config \
PG_ADMIN_DSN='postgres://lakekeeper:lakekeeper@localhost:5434/lakekeeper?sslmode=disable' \
  go test -tags kubernetes ./controlplane/provisioner/ -run TestE2E -v

The Lakekeeper image is pulled from quay.io/lakekeeper/catalog:latest (works); the operator image at quay.io/lakekeeper/lakekeeper-operator:v0.0.1 doesn't exist upstream — the chart we vendored pins a tag that was never published. Local validation worked by building the operator from the source we have (docker build -t quay.io/lakekeeper/lakekeeper-operator:v0.0.1-local . inside lakekeeper-operator/). The charts PR (#PostHog/charts/11164) will need to either un-pin the operator image or coordinate an upstream release before this can deploy.

CI's golangci-lint flagged 7 errcheck issues — defer X.Close() without
checking the returned error. Wrapped each in `defer func() { _ = X.Close() }()`
to satisfy the linter without changing behavior. The Close errors here
are non-actionable: we're in cleanup paths or test teardown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fuziontech added a commit that referenced this pull request May 19, 2026
Wire the worker's Iceberg ATTACH and the control-plane activation
payload to handle the Lakekeeper backend in addition to the legacy
S3 Tables path. Together with PR1 (#574) this is the worker-side
counterpart to the per-org Lakekeeper provisioning machinery.

  * server/iceberg/config.go — Backend selector + Lakekeeper fields
    (Endpoint, Warehouse, ClientID, ClientSecret, OAuth2ServerURI).
    ResolvedBackend() treats empty Backend as "lakekeeper".
  * server/iceberg/migration.go — new BuildLakekeeperSecretStmt and
    BuildLakekeeperAttachStmt. The Secret statement returns "" in
    allowall mode (no OAUTH2_SERVER_URI), and the Attach statement
    emits AUTHORIZATION_TYPE 'none' instead of SECRET reference. With
    OAuth2 configured (PR3 path), it emits both.
  * server/server.go — AttachIcebergCatalog becomes a dispatcher on
    ResolvedBackend. attachLakekeeperCatalog is a parallel branch that
    skips the S3 SECRET (Lakekeeper vends STS itself via
    ACCESS_DELEGATION_MODE='vended_credentials') and uses the
    Lakekeeper SQL helpers. Same probe-and-detach posture as the S3
    Tables path so a freshly-provisioned empty warehouse doesn't break
    activation.
  * controlplane/shared_worker_activator.go — buildIcebergConfig
    helper extracted for direct testing. Resolves the
    LakekeeperClientCredentials SecretRef → OAuth2 client_secret via
    the existing readSecretValue path. PR3 will wire the lazy
    provisioning trigger; today the worker returns no-op when the
    row's LakekeeperEndpoint is empty (matches the S3 Tables empty-
    TableBucket behavior).

What does NOT change in PR2:
  * Existing s3_tables orgs: AttachIcebergCatalog still goes down the
    same code path, byte-identical SQL.
  * Orgs whose row has Backend="lakekeeper" but no LakekeeperEndpoint
    yet: AttachIcebergCatalog returns nil. Same posture as a tenant
    with iceberg.enabled=true and no TableBucketArn yet.

Tests (new):
  * server/iceberg: 5 cases for the Lakekeeper SQL builders covering
    allowall vs OAuth2 modes and quote-escaping for endpoint and
    warehouse values. Plus ResolvedBackend roundtrip.
  * controlplane: 5 cases for buildIcebergConfig covering s3_tables
    passthrough, lakekeeper allowall (no secret resolution), lakekeeper
    OAuth2 (secret resolved from K8s), empty Backend → lakekeeper
    default, and missing-secret error surfacing.

A pre-existing TestCatalogPsqlCommands/psql_dn failure on main
(unrelated to these changes) was confirmed via clean main checkout —
not a regression from this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ gitignore tmp/

Two cross-PR review findings:

  * **Backfill migration for iceberg_backend.** Existing rows in
    duckgres_managed_warehouses that have a populated iceberg_table_
    bucket_arn but no explicit iceberg_backend would, after this PR
    series merges, be treated as Lakekeeper orgs (ResolvedBackend
    defaults empty → "lakekeeper"). The worker activator's
    AttachIcebergCatalog would dispatch to the Lakekeeper branch and
    silently skip (LakekeeperEndpoint is empty), and the controller's
    reconcileLakekeeper would fire EnsureForOrg against them on every
    poll tick. Both behaviors are wrong for S3 Tables orgs.

    The new migrateIcebergBackendBackfill stamps Backend='s3_tables' on
    any row with a populated bucket ARN and no explicit Backend.
    Tracked in duckgres_schema_migrations so it runs exactly once.

  * **tmp/ added to .gitignore.** The local prototype directory
    (tmp/lakekeeper-proto/) was never committed — git log --all
    confirmed it appeared in no branch — but a cross-PR review flagged
    that an RSA private key existed there locally. Now explicitly
    ignored so a future accidental `git add -A` can't surface the
    same risk. The local prototype's oidc/ subdirectory has been
    deleted alongside this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fuziontech added a commit that referenced this pull request May 19, 2026
Wire the worker's Iceberg ATTACH and the control-plane activation
payload to handle the Lakekeeper backend in addition to the legacy
S3 Tables path. Together with PR1 (#574) this is the worker-side
counterpart to the per-org Lakekeeper provisioning machinery.

  * server/iceberg/config.go — Backend selector + Lakekeeper fields
    (Endpoint, Warehouse, ClientID, ClientSecret, OAuth2ServerURI).
    ResolvedBackend() treats empty Backend as "lakekeeper".
  * server/iceberg/migration.go — new BuildLakekeeperSecretStmt and
    BuildLakekeeperAttachStmt. The Secret statement returns "" in
    allowall mode (no OAUTH2_SERVER_URI), and the Attach statement
    emits AUTHORIZATION_TYPE 'none' instead of SECRET reference. With
    OAuth2 configured (PR3 path), it emits both.
  * server/server.go — AttachIcebergCatalog becomes a dispatcher on
    ResolvedBackend. attachLakekeeperCatalog is a parallel branch that
    skips the S3 SECRET (Lakekeeper vends STS itself via
    ACCESS_DELEGATION_MODE='vended_credentials') and uses the
    Lakekeeper SQL helpers. Same probe-and-detach posture as the S3
    Tables path so a freshly-provisioned empty warehouse doesn't break
    activation.
  * controlplane/shared_worker_activator.go — buildIcebergConfig
    helper extracted for direct testing. Resolves the
    LakekeeperClientCredentials SecretRef → OAuth2 client_secret via
    the existing readSecretValue path. PR3 will wire the lazy
    provisioning trigger; today the worker returns no-op when the
    row's LakekeeperEndpoint is empty (matches the S3 Tables empty-
    TableBucket behavior).

What does NOT change in PR2:
  * Existing s3_tables orgs: AttachIcebergCatalog still goes down the
    same code path, byte-identical SQL.
  * Orgs whose row has Backend="lakekeeper" but no LakekeeperEndpoint
    yet: AttachIcebergCatalog returns nil. Same posture as a tenant
    with iceberg.enabled=true and no TableBucketArn yet.

Tests (new):
  * server/iceberg: 5 cases for the Lakekeeper SQL builders covering
    allowall vs OAuth2 modes and quote-escaping for endpoint and
    warehouse values. Plus ResolvedBackend roundtrip.
  * controlplane: 5 cases for buildIcebergConfig covering s3_tables
    passthrough, lakekeeper allowall (no secret resolution), lakekeeper
    OAuth2 (secret resolved from K8s), empty Backend → lakekeeper
    default, and missing-secret error surfacing.

A pre-existing TestCatalogPsqlCommands/psql_dn failure on main
(unrelated to these changes) was confirmed via clean main checkout —
not a regression from this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech merged commit 3a3b2ea into main May 20, 2026
22 checks passed
@fuziontech fuziontech deleted the lakekeeper-pr1 branch May 20, 2026 00:22
fuziontech added a commit that referenced this pull request May 20, 2026
…r wiring) (#576)

* feat(provisioner): add Lakekeeper REST client and Iceberg backend selector

First slice of the Lakekeeper-as-default-Iceberg-catalog work. No behavior
change for workers yet — these are the building blocks the upcoming
provisioner reconciler will use.

  * lakekeeper_client.go: thin HTTP client for Lakekeeper's management API
    surface we need (Info, Bootstrap, EnsureWarehouse). Idempotent helpers
    treat 409-on-bootstrap as success. Carries an optional bearer for the
    OIDC follow-up. Smoke test runs against a live Lakekeeper when
    LAKEKEEPER_SMOKE_URL is set; unit tests use httptest.

  * ManagedWarehouseIceberg gains Backend, Lakekeeper{Endpoint, Warehouse,
    ClientID, OAuth2ServerURI}, and a LakekeeperClientCredentials SecretRef.
    Backend defaults to "lakekeeper" via GORM tag; ResolvedBackend() applies
    the same default in code for rows migrated from earlier schemas. The
    legacy "s3_tables" backend stays as an explicit escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address lakekeeper_client review feedback

  * EnsureWarehouse now handles the GET-then-POST TOCTOU race: a 409 on
    create triggers a re-list and returns the winning warehouse rather
    than surfacing the conflict. Mirrors the Bootstrap 409-as-success
    pattern. New test covers the race resolution.
  * WithBearer doc'd as not concurrency-safe and token-must-be-single-line.
    Proper rotation primitives land with the OIDC work in PR3.
  * findWarehouseByName doc'd as single-page-only — fine for the one
    warehouse per Lakekeeper instance we deploy today; needs pagination
    if we ever host multiple warehouses per instance.
  * Smoke test uses errors.As directly instead of a one-off helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): EnsureDatabase admin helper for Lakekeeper PG creation

The Lakekeeper-reuses-org-PG plan needs a lakekeeper_<orgid> database
created inside the org's existing Aurora cluster. Crossplane composition
provisions one DB per cluster today; the controlplane creates the second
one on demand at provisioning time.

  * EnsureDatabase(ctx, adminDSN, dbName) — opens a pgx connection, probes
    pg_database, runs CREATE DATABASE if absent. Handles the 42P04
    duplicate-database SQLSTATE as a benign concurrent-create outcome.
  * Identifier validation via isSafePGIdent regex + quoteIdent escaping
    (belt and suspenders — CREATE DATABASE doesn't accept query parameters
    so the identifier must be interpolated).
  * Pure function over a DSN: how the admin DSN gets sourced from the K8s
    Secret managed by Crossplane is a separate concern that lands with the
    provisioner reconciler.

Tests: unit tests for isSafePGIdent, quoteIdent, unsafe-ident rejection.
Live-PG integration test gated on PG_ADMIN_DSN — passes against the
prototype Postgres in tmp/lakekeeper-proto.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): K8s helpers for per-org Lakekeeper CRs and Secrets

LakekeeperK8sClient wraps the dynamic client (for the Lakekeeper CR) and
the typed clientset (for the backing Secret) so the provisioner reconciler
can drive both with one dependency. Mirrors the existing DucklingClient
construction pattern; same //go:build kubernetes isolation.

  * EnsureSecret(orgID, data) — creates or updates an Opaque Secret in
    the lakekeeper namespace holding db-user/db-password/encryption-key/
    oauth2-client-secret. Stamped with the duckgres/active-org label so
    the NetworkPolicy lands cleanly later.
  * EnsureCR(spec) — renders and applies the Lakekeeper CR (allowall
    authz, bootstrap.enabled=true, server.baseURI for clients). Idempotent
    create-or-update via dynamic client; replaces the spec wholesale on
    update. Required-field validation up front.
  * GetCR returns (nil, nil) for missing CRs so the reconciler can use
    "not yet created" as a state signal rather than an error path.

Tests use the dynamic fake + kubefake clientset (matches controller_test
pattern). int64-cast numeric fields in the unstructured map because the
DeepCopyJSON validator rejects int32.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address round-2 review feedback

  * EnsureCR now validates OrgID up front. An empty OrgID previously
    fell through to LakekeeperResourceName("") and would have created a
    'lakekeeper-' resource — wrong but hard to spot.
  * Both EnsureCR and EnsureSecret validate orgID against the K8s
    label-value grammar. The resource-name path stripped hyphens, but
    label values went through unsanitised, so a malformed orgID would
    have surfaced as an opaque API server rejection. A clear error from
    the helper is much easier to debug.
  * Tests cover empty + unsafe orgID paths and the isValidOrgIDLabel
    rule set (UUID-style IDs pass; leading hyphen, embedded spaces, and
    >63 chars are rejected).
  * TestEnsureCR_IdempotentUpdate now reads back the CR after the
    second call and asserts the image field actually changed, so the
    update path has a real regression guard rather than just "no error".
  * isDuplicateDatabase uses errors.As instead of a hand-rolled unwrap
    loop. Handles errors.Join-style multi-error chains.
  * Doc note on EnsureCR/EnsureSecret update conflicts: the wrapped
    error preserves apierrors.IsConflict for transient handling, so the
    reconciler can requeue rather than treating it as fatal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): LakekeeperProvisioner reconciler

LakekeeperProvisioner.EnsureForOrg chains the existing helpers into one
idempotent pipeline:

  1. EnsureDatabase: CREATE DATABASE lakekeeper_<orgid> on the org's
     Aurora via the admin DSN supplied by the caller.
  2. resolveOrGenerateSecret: read the per-org Secret if present, or
     generate fresh creds (16/16/32-byte hex randoms) and write a new
     Secret. Re-running never rotates — that would silently desync the
     Postgres user's password from the Lakekeeper config.
  3. EnsureCR with the per-org spec (allowall authz, bootstrap enabled,
     PG host/port/db, BaseURI matching the operator-created Service).
  4. waitForBootstrap: poll the CR until status.bootstrappedAt is set
     (operator's job). On timeout returns the typed ErrBootstrapPending
     so the caller can requeue without treating it as fatal.
  5. EnsureWarehouse via REST: idempotent storage profile create with
     sts-enabled + s3-compat + remote-signing-disabled (the prototype
     proved this is the shape DuckDB-iceberg wants).
  6. Persist endpoint, client_id, and the SecretRef pointing at the
     OAuth2 client_secret back into ManagedWarehouseIceberg. Flips
     IcebergState to Ready.

The reconciler doesn't bake in K8s Secret lookup for the AdminDSN — the
caller computes ProvisioningInputs (admin DSN, PG host, S3 config) from
whatever source applies in their context. PR2 wiring will resolve these
from the Duckling CR status and the Crossplane-managed admin secret.

A side-bug fix in EnsureCR: the Update path was overwriting the existing
status block (the fake dynamic client doesn't honor the status
subresource split). Now copies status from existing CR before Update so
spec drift correction never clobbers operator-set status — which is the
right contract on real K8s too.

Tests:
  * TestEnsureForOrg_HappyPath_Fakes (PG-gated) — full e2e against a
    real Postgres + httptest Lakekeeper. Verifies row updates, exact
    storage-profile shape, and idempotency on a second call.
  * TestEnsureForOrg_BootstrapTimeoutReturnsTransient (PG-gated) —
    short timeout, no status injection; expects ErrBootstrapPending.
  * Validation tests for missing/invalid inputs and unsafe org IDs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address round-3 review feedback

  * configstore: typed ErrWarehouseStateMismatch sentinel. UpdateWarehouseState
    now wraps the "rows affected = 0" error with this so callers can use
    errors.Is to distinguish "CAS guard fired" from a real failure. The
    fakeStore in tests follows the same contract.

  * EnsureForOrg treats ErrWarehouseStateMismatch as benign success: by
    the time the row update fires, the Lakekeeper REST resources are
    already provisioned, and the next reconcile iteration will retry the
    persist step with the row's fresh state. Previously this was a
    confusing fatal error after a successful Lakekeeper write. New test
    TestEnsureForOrg_StateMismatchIsBenign locks the contract in.

  * secretFromExisting reads only from Secret.Data — the API server clears
    StringData on read, so the dual-branch check was dead code in
    production. Test coverage is preserved (the fake echoes StringData
    back; assertSecretData handles both paths there).

  * waitForBootstrap reuses one time.Timer instead of allocating a fresh
    timer per poll via time.After. Trivial leak that adds up over the
    bootstrap window.

  * Replaced the hand-rolled isErrBootstrap helper with errors.Is —
    handles wrapped errors and errors.Join chains correctly.

  * TestFakeStoreUpdateWarehouseState updated to assert the new
    ErrWarehouseStateMismatch contract on CAS-miss.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8s): baseline NetworkPolicy for lakekeeper namespace

Default-deny on both ingress and egress, plus two scoped exceptions:
  * ingress 8181/TCP from any pod app=duckgres-worker in the duckgres
    namespace
  * egress to DNS, Postgres (5432), HTTPS (443), MinIO (9000)

Per-org isolation (only org X's workers can reach org X's Lakekeeper)
is deliberately out of scope here. The lakekeeper-operator only stamps
the standard app.kubernetes.io/{name,instance,managed-by} labels and
doesn't propagate CR labels onto the Deployment, so a NetworkPolicy
keyed on duckgres/active-org never matches. Two follow-up paths
documented inline:
  (a) upstream operator patch to propagate CR labels, or
  (b) per-org CiliumClusterwideNetworkPolicy templated in charts repo
      matching app.kubernetes.io/instance=lakekeeper-<orgid-suffix>.

This still gives a meaningful cluster boundary today; intra-cluster
org-to-org boundary lands with PR3 alongside OIDC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address PR1 deep-review findings

Five real fixes from the holistic PR review:

  * **EnsureRole**: the missing piece that would have broken every first
    activation. A fresh database has no users, but Lakekeeper logs in as
    a role named after the database. EnsureRole creates/alters the
    login role with the password from the K8s Secret, then grants ALL
    on the database. Idempotent: re-runs with the same password are a
    no-op-ish ALTER; rotated passwords ALTER to the new one. Live-PG
    integration test verifies create + idempotent + rotate cycle.

  * **Bootstrap-pending no longer blocks the reconcile loop.** Replaced
    waitForBootstrap (up-to-30s internal poll) with checkBootstrap (one
    GetCR call). Returns ErrBootstrapPending immediately so the outer
    reconciler can requeue without stalling other orgs behind a slow
    operator bootstrap. WithBootstrapTimeout option removed since no
    polling happens anymore.

  * **UpdateIcebergConfig**: new WarehouseStore method that writes
    iceberg_* columns without a top-level state CAS. The previous code
    CASed on w.State, which silently swallowed the persist whenever the
    Duckling controller had already transitioned the row to Ready. Now
    Lakekeeper config always lands. New test
    TestEnsureForOrg_PersistsAfterTopLevelStateMoved locks this in.

  * **EncryptionKey size 16→32 bytes** (32 hex chars → 64 hex chars).
    Comment claimed "32-byte key" but mustRandomHex(16) only produces
    16 bytes of entropy. Bumped DBPassword to 32 too — same generator,
    consistent posture.

  * **EnsureSecret writes Data, not StringData**. The K8s API server
    converts StringData→Data on write, so production reads only ever
    see Data. The dynamic-fake clientset doesn't do that conversion, so
    writing Data directly makes the fake match real K8s behavior.
    secretFromExisting now reads Data exclusively without the dead-code
    fallback.

Plus three smaller items:

  * NetworkPolicy comment clarifies the `app.kubernetes.io/name` selector
    is for operator-stamped Deployment pods, and the `app: lakekeeper`
    label we put on the CR / Secret is intentionally non-overlapping.
  * Field comments on LakekeeperWarehouse and LakekeeperOAuth2ServerURI
    document the wire contract PR2 will consume.
  * isSafePGPassword + quoteLiteral added for the EnsureRole code path.
    The password whitelist is conservative (alphanumeric + a handful of
    symbols) so SQL injection via the unquoted password literal is
    structurally impossible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): orbstack e2e test + bugs it surfaced

End-to-end test that drives EnsureForOrg against a real Kubernetes
cluster (orbstack), the real lakekeeper-operator, a real Postgres, and
real MinIO. Skipped unless LAKEKEEPER_E2E_KUBECONFIG is set.

The test caught three real bugs that the dynamic-fake clientset and
httptest server couldn't:

  * **Postgres 15+ public-schema permissions.** The freshly-created
    lakekeeper_<orgid> database has a `public` schema owned by the
    admin user. Lakekeeper's `migrate` step needs DDL there, but a
    non-owner role can't CREATE in public on PG 15+. EnsureRole now
    runs `ALTER DATABASE OWNER TO <role>` and (belt-and-suspenders)
    `ALTER SCHEMA public OWNER TO <role>` inside the target database
    via a re-DSN helper. Without this, the migration Job CrashLoopBackOffs
    forever and the operator never bootstraps.

  * **Lakekeeper PG SSL mode mismatch.** The operator's `sslMode`
    default is `require`, but our local Postgres has no TLS. Added a
    PGSSLMode field to LakekeeperCRSpec + ProvisioningInputs so callers
    can pass `disable` for local/dev environments. Production
    deployments leave it empty and inherit the secure default.

  * **`/management/v1/*` is at the server root, not under `/catalog`.**
    The provisioner was passing `baseURL + "/catalog"` to the management
    HTTP client, which then assembled `<x>/catalog/management/v1/warehouse`
    — a 404. /catalog is only for the Iceberg REST API that ducklings
    hit later. Now the LakekeeperClient gets the bare baseURL; the
    LakekeeperEndpoint we persist for DuckDB still includes /catalog.

The e2e test routes from the host to in-cluster Lakekeeper via the API
server's service-proxy subresource — orbstack publishes ClusterIPs but
the `.svc` hostnames aren't resolvable from the host. The clientFor
override is test-only; production runs in-cluster so the `.svc`
addresses work natively.

E2E result: full pipeline completes in ~20 seconds (image pull + PG
migrate + bootstrap + warehouse-create), reproducible across runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): satisfy errcheck on defer Close calls

CI's golangci-lint flagged 7 errcheck issues — defer X.Close() without
checking the returned error. Wrapped each in `defer func() { _ = X.Close() }()`
to satisfy the linter without changing behavior. The Close errors here
are non-actionable: we're in cleanup paths or test teardown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(configstore): backfill iceberg_backend for pre-Lakekeeper rows + gitignore tmp/

Two cross-PR review findings:

  * **Backfill migration for iceberg_backend.** Existing rows in
    duckgres_managed_warehouses that have a populated iceberg_table_
    bucket_arn but no explicit iceberg_backend would, after this PR
    series merges, be treated as Lakekeeper orgs (ResolvedBackend
    defaults empty → "lakekeeper"). The worker activator's
    AttachIcebergCatalog would dispatch to the Lakekeeper branch and
    silently skip (LakekeeperEndpoint is empty), and the controller's
    reconcileLakekeeper would fire EnsureForOrg against them on every
    poll tick. Both behaviors are wrong for S3 Tables orgs.

    The new migrateIcebergBackendBackfill stamps Backend='s3_tables' on
    any row with a populated bucket ARN and no explicit Backend.
    Tracked in duckgres_schema_migrations so it runs exactly once.

  * **tmp/ added to .gitignore.** The local prototype directory
    (tmp/lakekeeper-proto/) was never committed — git log --all
    confirmed it appeared in no branch — but a cross-PR review flagged
    that an RSA private key existed there locally. Now explicitly
    ignored so a future accidental `git add -A` can't surface the
    same risk. The local prototype's oidc/ subdirectory has been
    deleted alongside this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server,controlplane): worker-side Lakekeeper ATTACH wiring (PR2)

Wire the worker's Iceberg ATTACH and the control-plane activation
payload to handle the Lakekeeper backend in addition to the legacy
S3 Tables path. Together with PR1 (#574) this is the worker-side
counterpart to the per-org Lakekeeper provisioning machinery.

  * server/iceberg/config.go — Backend selector + Lakekeeper fields
    (Endpoint, Warehouse, ClientID, ClientSecret, OAuth2ServerURI).
    ResolvedBackend() treats empty Backend as "lakekeeper".
  * server/iceberg/migration.go — new BuildLakekeeperSecretStmt and
    BuildLakekeeperAttachStmt. The Secret statement returns "" in
    allowall mode (no OAUTH2_SERVER_URI), and the Attach statement
    emits AUTHORIZATION_TYPE 'none' instead of SECRET reference. With
    OAuth2 configured (PR3 path), it emits both.
  * server/server.go — AttachIcebergCatalog becomes a dispatcher on
    ResolvedBackend. attachLakekeeperCatalog is a parallel branch that
    skips the S3 SECRET (Lakekeeper vends STS itself via
    ACCESS_DELEGATION_MODE='vended_credentials') and uses the
    Lakekeeper SQL helpers. Same probe-and-detach posture as the S3
    Tables path so a freshly-provisioned empty warehouse doesn't break
    activation.
  * controlplane/shared_worker_activator.go — buildIcebergConfig
    helper extracted for direct testing. Resolves the
    LakekeeperClientCredentials SecretRef → OAuth2 client_secret via
    the existing readSecretValue path. PR3 will wire the lazy
    provisioning trigger; today the worker returns no-op when the
    row's LakekeeperEndpoint is empty (matches the S3 Tables empty-
    TableBucket behavior).

What does NOT change in PR2:
  * Existing s3_tables orgs: AttachIcebergCatalog still goes down the
    same code path, byte-identical SQL.
  * Orgs whose row has Backend="lakekeeper" but no LakekeeperEndpoint
    yet: AttachIcebergCatalog returns nil. Same posture as a tenant
    with iceberg.enabled=true and no TableBucketArn yet.

Tests (new):
  * server/iceberg: 5 cases for the Lakekeeper SQL builders covering
    allowall vs OAuth2 modes and quote-escaping for endpoint and
    warehouse values. Plus ResolvedBackend roundtrip.
  * controlplane: 5 cases for buildIcebergConfig covering s3_tables
    passthrough, lakekeeper allowall (no secret resolution), lakekeeper
    OAuth2 (secret resolved from K8s), empty Backend → lakekeeper
    default, and missing-secret error surfacing.

A pre-existing TestCatalogPsqlCommands/psql_dn failure on main
(unrelated to these changes) was confirmed via clean main checkout —
not a regression from this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(server,controlplane): address PR2 deep-review findings

  * **RefreshIcebergSecret is now backend-aware.** The hourly credential
    refresh cycle would have hard-errored on every Lakekeeper org with
    "no AWS credentials in activation payload" — Lakekeeper vends its
    own STS at table-load time via ACCESS_DELEGATION_MODE, so the
    iceberg_sigv4 secret simply doesn't exist there and nothing needs
    rotating. Added a Lakekeeper short-circuit that returns nil and a
    parallel test TestRefreshIcebergSecretLakekeeperNoOp.

  * **buildIcebergConfig no longer leaks backend-specific fields**
    across the selected backend. Previously the helper unconditionally
    copied TableBucketArn through, so a row migrated from S3 Tables to
    Lakekeeper would ship both LakekeeperEndpoint and a stale
    TableBucket on the wire. Now the switch on ResolvedBackend
    populates only the fields the chosen backend uses.

  * **Real-DuckDB integration test for Lakekeeper SQL** —
    TestLakekeeperSQLAcceptedByDuckDB mirrors the existing
    TestIcebergSecretAcceptedByDuckDB regression-net. The ATTACH hits
    an invalid endpoint and the SECRET hits an invalid OIDC issuer; a
    parse/bind-time error would mean a wrong keyword (e.g.
    ACCESS_DELEGATION_MODE vs DELEGATION_MODE), while a network error
    proves DuckDB accepted the option list and tried to use it.
    isParseOrBindError discriminates between the two.

  * **Dispatcher test** — TestDispatcher_BackendSelection asserts a
    Config with Backend="s3_tables" produces S3 Tables SQL and never
    Lakekeeper-only fragments, and vice versa. Includes the "stale
    TableBucket on an empty-Backend row" case explicitly, since empty
    Backend resolves to Lakekeeper.

  * **isIcebergCatalogEmptyError doc note** — added a TODO(PR3) on the
    helper explaining the patterns are S3-Tables-specific. Lakekeeper
    returns `{"namespaces":[]}` on an empty catalog (no error), so the
    probe-and-detach in attachLakekeeperCatalog handles the empty case
    via "SHOW TABLES returns 0 rows" rather than the error-string
    match. Untested against a live Lakekeeper warehouse with no
    namespaces; flagged for PR3.

The pre-existing TestRefreshIcebergSecretRejectsEmptyCredentials was
updated to set Backend=iceberg.BackendS3Tables explicitly — without
that, empty Backend now resolves to lakekeeper and the test's premise
(missing-AWS-credentials error) no longer holds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(duckdbservice): sameTenantActivationRuntime now includes Lakekeeper fields

Cross-PR review caught a real bug: the same-tenant runtime equality check
only compared the S3-Tables-shaped iceberg fields (Enabled, TableBucket,
Region, Namespace). When a Lakekeeper provisioning completed mid-flight
and the warehouse row gained an Endpoint/Warehouse/ClientID, a hot-idle
worker activated BEFORE provisioning would have been reclaimed for the
same org via reuseExistingActivation — never forced through a fresh
activation. Result: the worker silently kept running with no iceberg
catalog attached, even though the new payload carried the endpoint.

Now compares Backend + the four Lakekeeper identity fields. Excludes
LakekeeperClientSecret because (a) the activator re-resolves the same
SecretRef on each activation so it doesn't necessarily byte-stable
across calls and (b) a rotated client_secret already changes other
fields via re-provisioning.

Test TestSameTenantActivationRuntime_LakekeeperFields covers each new
field + the "secret intentionally NOT compared" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
benben added a commit that referenced this pull request May 20, 2026
Rebase onto origin/main pulled in the Lakekeeper-as-Iceberg-catalog work
(#574, #576), which split ManagedWarehouseIceberg into two backends. The
new iceberg_backend column defaults to 'lakekeeper'; the activator
dispatches on ResolvedBackend() and silently no-ops on the s3_tables path
when Backend is empty/lakekeeper without endpoint credentials. That
explains the previous count=0 — the seed never selected s3_tables, so
AttachIcebergCatalog was never called.

Pin Backend='s3_tables' in the seed (both the INSERT values and the
ON CONFLICT update set). This is the iceberg integration test we
designed: real S3 Tables, sigv4, the wiring PR #569 introduced. The
lakekeeper backend will need its own integration test later.

Also include iceberg_backend in the diagnostic warehouse-row dump so
the same regression is one log line away from being diagnosed next time.
benben added a commit that referenced this pull request May 20, 2026
* Add DuckLake round-trip + iceberg integration tests

Extends the k8s integration suite from "DuckLake catalog is attached" to
"DuckLake actually serves writes/reads through real MinIO, survives a
worker restart, and the fork's conflict-retry path works under
concurrent writers."

Adds the first end-to-end iceberg test against real AWS S3 Tables, hard-
gated on env vars so PR CI stays fast; a dedicated iceberg CI lane sets
DUCKGRES_K8S_ICEBERG_TABLE_BUCKET_ARN (a persistent sandbox bucket,
reused across runs) and gets the real signal. Stub catalogs (LocalStack
community, moto, REST-catalog substitutes) all hit a different DuckDB
code path than ENDPOINT_TYPE 's3_tables', so the only way to gain real
confidence is to test against actual S3 Tables.

Also adds the activation-layer regression net for #563: hot-idle
reclaim with rotated STS credentials must refresh the iceberg_sigv4
secret alongside DuckLake's S3 secret, with the new credentials. The
inverse case (iceberg disabled) asserts the refresh fn is not invoked.
These run on every PR with no AWS dependency.

* Move kubeconfig safety check before destructive setupMultiTenant

setupMultiTenant() begins with `kubectl delete namespace duckgres
--ignore-not-found --wait=true` against whatever kubeconfig kubectl
picks up by default. requireLocalKindCluster — the guard documented to
prevent exactly this — was placed after that call, so it only protected
read operations on the already-deleted namespace. Today (2026-05-19)
this destroyed mw-dev's duckgres namespace for the second time; the
prior incident is what the safety guard was added for in the first
place.

Move the env-var check + kubeconfig load + requireLocalKindCluster
above the setupMultiTenant call so failing the safety check exits
fatal before any destructive kubectl runs. Anything destructive must
live below the guard block — added an inline comment to that effect so
this doesn't regress a third time.

* Make safety-guard messages spell out the danger and the right path

The original "DUCKGRES_K8S_TEST_KUBECONFIG is required" message was
technically correct but easy to misread as a missing-config error
rather than a destructive-suite refusal. The two prior incidents both
involved engineers misreading the situation and trying to "set the
missing env var" — which is exactly the wrong fix.

Rewrite both guard messages (the kubeconfig-unset path and the
requireLocalKindCluster path) to lead with REFUSING / DESTRUCTIVE, name
the specific destructive action (kubectl delete namespace duckgres),
list the contexts where it must not run (local default, dev, shared,
production), and point at `just test-k8s-integration` as the only
supported way to run.

* Fix CI bootstrap: defer mandatory kubeconfig load until after setup

The earlier safety-ordering commit broke CI: BuildConfigFromFlags was
called BEFORE setupMultiTenant, but in CI the kubeconfig file is
created BY setupMultiTenant (via kind-cluster-reset). Cold runs hit
"stat /tmp/duckgres-kind-kubeconfig: no such file or directory" before
the test bodies could run.

Split the guard into two phases:

  Phase 1 (pre-setup, mandatory env-var check + conditional file check):
    - Always require DUCKGRES_K8S_TEST_KUBECONFIG to be set
    - If the file already exists (warm local rerun), validate it via
      requireLocalKindCluster BEFORE setupMultiTenant runs. This is
      what would have stopped the mw-dev incident in the
      env-pointed-at-real-cluster variant.
    - If the file doesn't exist yet (cold CI), skip the file load.
      setupMultiTenant's opening `kubectl delete namespace` runs
      against the missing path and fails inert ("no such file") — no
      damage possible because kubectl can't connect.

  Phase 2 (post-setup, mandatory):
    - File MUST exist now; load + requireLocalKindCluster. Final
      safety net for the cold-bootstrap path. Failure aborts before
      any test body runs.

The env-unset path (the actual mw-dev incident shape) still fail-fasts
in <1s with the existing REFUSING message, verified locally.

* Iceberg test fails openly instead of silently skipping

Per request: a skip-on-missing-env-vars path hides two failure modes
that matter more than the test itself running on a given PR.

  1. CI misconfiguration. A rotated secret, renamed bucket, or
     dropped env var renders as "missing env vars — skip". The job
     reports SUCCESS, nobody notices, and the test silently stops
     running on the iceberg lane until someone happens to look.
  2. A real iceberg regression that lands during an env-var gap is
     invisible — it hides behind the same "skipped" line that a
     misconfigured lane produces.

Replace t.Skipf with t.Fatalf. Diagnostic spells out the missing vars,
explains why the test refuses to skip, and walks through the sandbox
bucket setup. Empty env vars are treated as missing (a rotated CI
secret often renders as empty rather than absent).

Tradeoff worth noting: PR CI's k8s-integration-tests job will fail
until the iceberg env vars are wired into every lane that runs the
suite. If keeping default PR CI green matters more than uniform
coverage, the follow-up is to split this test behind a build tag
(`//go:build k8s_iceberg`) so it only compiles into a dedicated
iceberg lane.

* Wire OIDC + iceberg env vars on k8s-integration-tests job

Three additions to the k8s-integration-tests job, all of which start
working the moment cloud-infra PR #8124 applies:

  1. permissions: id-token: write — lets the job mint an OIDC token
     against GitHub's IdP. Falls back to the existing top-level
     contents: read since per-job permissions override.

  2. New "Configure AWS credentials via OIDC" step. Trades the OIDC
     token for STS credentials via aws-actions/configure-aws-credentials,
     assuming the new github-duckgres-iceberg-test-role in mw-dev (role
     trust policy: repo:PostHog/duckgres:*, scoped IAM policy on the
     two test buckets only). Action pinned to the same commit SHA
     cloud-infra workflows use.

  3. Three iceberg env vars hardcoded in the job's env block — the
     bucket ARN, region, and data bucket name. AWS_ACCESS_KEY_ID /
     AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN are populated by
     configure-aws-credentials, picked up by iceberg_test.go via
     os.Getenv. Hardcoding the bucket coordinates matches the
     cloud-infra workflow convention; the buckets are persistent
     fixtures with stable names.

Until cloud-infra #8124 applies the role and buckets don't exist yet,
so this job will fail on every PR until then. That's the fail-openly
behavior we just landed for iceberg_test.go — when the role appears,
the next run goes green automatically.

* Point CI at renamed iceberg role ARN

Coordinated with PostHog/posthog-cloud-infra#8124 which renames
github-duckgres-iceberg-test-role to
github-duckgres-iceberg-ci-testing-role (Michael's review nit — the
old "test-role" suffix invited "can we delete that?" janitor risk).

Comments updated to match.

* Disable Delta catalog on iceberg-test tenant fixture

TestK8sIcebergRoundTrip's tenant activation fails with a 403 during
the Delta catalog probe:

  delta catalog configured but attachment failed: failed to probe
  Delta catalog: ... 403 Forbidden ...
  for path orgs/delta/_delta_log/_last_checkpoint
  https://s3.us-east-1.amazonaws.com/orgs/delta/_delta_log/_last_checkpoint

ManagedWarehouseS3.DeltaCatalogEnabled defaults to true in the
configstore (GORM `default:true` on the column), so every newly
seeded tenant gets Delta attached at startup whether or not it
intends to use it. The Delta extension's probe URL on this tenant
ends up pointing at a bucket segment that isn't the tenant's data
bucket — the URL the request lands at is
`https://s3.us-east-1.amazonaws.com/orgs/delta/_delta_log/_last_checkpoint`
(bucket=`orgs` in path-style), despite IAM granting s3:GetObject on
`arn:aws:s3:::posthog-duckgres-iceberg-test-data-mw-dev`.

The iceberg integration test exercises iceberg-on-S3-Tables +
DuckLake only and doesn't need Delta. Setting
s3_delta_catalog_enabled = false on the test fixture sidesteps the
probe entirely. Inline comment in the seed builder flags the
underlying URL-construction bug as a separate item to chase if/when
this test grows a Delta scenario.

Also added s3_delta_catalog_enabled to the ON CONFLICT DO UPDATE set
so re-runs against an already-seeded configstore pick up the flag.

* [debug] Probe iceberg REST endpoint from CI as OIDC role

* Propagate optional session_token through static-secret S3 cred path

The S3-credentials secret payload schema in shared_worker_activator's
buildDuckLakeConfigFromConfigStore only extracts access_key_id and
secret_access_key. The companion STS-broker path (a few lines below)
correctly sets dl.S3SessionToken. The static-secret path silently
drops it.

This works for production today because production uses long-term IAM
user keys (no session token needed). It breaks any setup that sources
credentials from STS — STS-vended creds (AccessKeyId prefixed `ASIA…`)
are rejected by AWS without the matching session token, and the
iceberg REST endpoint returns 403 (`Forbidden`) without naming the
specific cause.

Spent several hours chasing this as an iceberg IAM/Lake-Formation
configuration problem. Direct signed curl from the CI runner with
all three header values returned 200; the activator-built DuckLake
config (missing the token) returned 403 from the worker. The fix is
one extra field on the JSON payload schema, plumbed through.

Also drops the temporary debug step from ci.yml that uncovered this.

* Iceberg test: USE catalog.schema then unqualified CREATE TABLE

3-part identifiers (`iceberg.main.t_<id>`) fail through duckgres'
flight-update path with `Catalog Error: Schema with name "" not found`,
even though plain DuckDB inside the cluster accepts the same SQL
verbatim against the same bucket (verified from an in-cluster debug
pod). pg_query's parse+deparse produces clean
`CREATE TABLE iceberg.main.t_x (id int)`; the duckgres transpiler
output is identical; the failure is somewhere downstream in the
flight-update pipeline.

Sidestep with `USE iceberg.<ns>` set on the connection before each
CREATE/INSERT/SELECT/DROP, then use unqualified table names. Same
end state in S3 Tables, just expressed in the form that survives the
pipeline. MaxOpenConns=1 on the test connection (set by
openDBConnAs) keeps the USE + DDL/DML on one underlying connection.

The 3-part bug deserves its own investigation — file follow-up once
this PR lands; the test should switch back to 3-part once duckgres
treats both forms uniformly.

* Iceberg test: probe-level coverage via external table

DuckDB iceberg ext (stable v11fea8ed and core_nightly v10e97957) is
broken on the s3_tables endpoint at the schema-by-name lookup layer:
USE/CREATE TABLE/SELECT/INSERT against iceberg.<ns>.<t> all fail with
"Schema with name ... not found" even though duckdb_schemas() and
information_schema.schemata correctly report the namespace as present.
Reproduced against plain duckdb-go (no duckgres) on the real mw-dev
sandbox bucket.

Rather than block the integration test on an upstream fix, this rewires
TestK8sIcebergRoundTrip to verify everything below the broken layer:
seed the tenant, pre-create a uniquely-named iceberg table via the AWS
S3 Tables API, then verify duckgres (control plane + worker activation
+ ATTACH + duckdb_tables() listing) sees it. That still exercises the
wiring PR #569 introduced — STS session_token plumbing, OIDC role,
secret payload shape, iceberg ext load+ATTACH, sigv4 listing — and
fails openly if any of it regresses. The CREATE/INSERT/SELECT portion
is documented in the SCOPE block and gated on the upstream fix.

Tests/k8s/iceberg_test.go shells out to `aws s3tables create-table` and
`delete-table` (matching the docker-exec style used elsewhere in this
package), keeping the test self-contained without adding the s3tables
Go SDK as a dependency.

* Iceberg test: drop probe-table dependency

The previous commit pre-created an iceberg table via aws s3tables
create-table before activation so the test could verify
duckdb_tables() listing. CI showed that with a freshly-created table
(no data files yet) present at activation time, AttachIcebergCatalog's
post-ATTACH `SHOW TABLES FROM iceberg` probe errors with a "no such
table"-shaped message, the activator treats that as the
freshly-provisioned-empty-catalog case, and DETACHes — leaving
duckdb_databases() count = 0. Doesn't reproduce against plain
duckdb-go locally; whether that's timing of S3 Tables metadata
propagation or a CI-vs-local ext difference, chasing it down isn't
worth the cycle cost when the test we actually need is the wiring
end-to-end, not the listing-API content.

Drop the probe table. The remaining test still seeds the tenant
fixture, waits for the worker to come up, and asserts
duckdb_databases() shows iceberg attached. That single assertion
covers every regression PR #569 fixed: STS session_token plumbing,
OIDC role assumption, secret payload shape, iceberg extension load
+ TYPE S3 secret + ATTACH, the post-attach SHOW TABLES probe, and
flight routing from the test client into the activated worker. If
any of those breaks, this test fails openly.

The trade-off (populate the probe table via Spark/PyIceberg vs.
relax the activator's detach heuristic) is documented in the SCOPE
block so future-us picking this up has context.

* Iceberg test: poll for catalog attach to absorb activation race

waitForTenantDBReady returns as soon as the tenant's SELECT 1
succeeds, which can complete before the worker's
AttachIcebergCatalog step has finished its
install+secret+ATTACH+probe round trip against AWS. The one-shot
duckdb_databases() count query then sees 0 attached catalogs
spuriously and the test fails despite the wiring being correct.
(That window is why the test passed before in slower CI runs where
activation happened to complete before the count query landed.)

Wrap the count check in a retry that pulls until count == 1 with a
60s ceiling. Genuine activation failures still surface — the loop
gives up at the deadline and reports the last-seen count — but
ordinary multi-second ATTACH latency no longer flakes the test.

* Iceberg test: dump control-plane + worker logs on activation failure

Activation regressed silently between the last successful CI run (count=1
on commit c87cb20, May 19) and the current branch despite no changes
to controlplane/server/duckdbservice code — only test-file edits. Local
repro with the same bucket and OIDC role works fine, so the difference
is somewhere in the kind-cluster activation path. Without worker pod
logs we can't tell whether ATTACH errored, the post-attach SHOW TABLES
probe triggered a DETACH, or activation never reached AttachIcebergCatalog
at all.

Capture kubectl logs for both the control plane and the worker pods,
plus the live duckgres_managed_warehouses row for the iceberg-test
tenant, and inline them in the t.Fatalf message. The kind cluster is
torn down right after the test exits so anything not surfaced here is
gone.

* Iceberg test: expand activation diagnostics to include iceberg_* columns

* Iceberg test: seed iceberg_backend=s3_tables for activation

Rebase onto origin/main pulled in the Lakekeeper-as-Iceberg-catalog work
(#574, #576), which split ManagedWarehouseIceberg into two backends. The
new iceberg_backend column defaults to 'lakekeeper'; the activator
dispatches on ResolvedBackend() and silently no-ops on the s3_tables path
when Backend is empty/lakekeeper without endpoint credentials. That
explains the previous count=0 — the seed never selected s3_tables, so
AttachIcebergCatalog was never called.

Pin Backend='s3_tables' in the seed (both the INSERT values and the
ON CONFLICT update set). This is the iceberg integration test we
designed: real S3 Tables, sigv4, the wiring PR #569 introduced. The
lakekeeper backend will need its own integration test later.

Also include iceberg_backend in the diagnostic warehouse-row dump so
the same regression is one log line away from being diagnosed next time.

* Iceberg test: drop schema-columns dump from activation diagnostics

Added during the lakekeeper-backend investigation to detect missing
iceberg_* columns from a stale GORM auto-migrate. The actual cause
turned out to be the iceberg_backend default ('lakekeeper'), not a
missing column. The schema dump is dead weight on every future failure.

The warehouse-row dump already prints iceberg_backend so the regression
remains one log line away.
fuziontech added a commit that referenced this pull request May 20, 2026
…igger) (#579)

* feat(provisioner): add Lakekeeper REST client and Iceberg backend selector

First slice of the Lakekeeper-as-default-Iceberg-catalog work. No behavior
change for workers yet — these are the building blocks the upcoming
provisioner reconciler will use.

  * lakekeeper_client.go: thin HTTP client for Lakekeeper's management API
    surface we need (Info, Bootstrap, EnsureWarehouse). Idempotent helpers
    treat 409-on-bootstrap as success. Carries an optional bearer for the
    OIDC follow-up. Smoke test runs against a live Lakekeeper when
    LAKEKEEPER_SMOKE_URL is set; unit tests use httptest.

  * ManagedWarehouseIceberg gains Backend, Lakekeeper{Endpoint, Warehouse,
    ClientID, OAuth2ServerURI}, and a LakekeeperClientCredentials SecretRef.
    Backend defaults to "lakekeeper" via GORM tag; ResolvedBackend() applies
    the same default in code for rows migrated from earlier schemas. The
    legacy "s3_tables" backend stays as an explicit escape hatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address lakekeeper_client review feedback

  * EnsureWarehouse now handles the GET-then-POST TOCTOU race: a 409 on
    create triggers a re-list and returns the winning warehouse rather
    than surfacing the conflict. Mirrors the Bootstrap 409-as-success
    pattern. New test covers the race resolution.
  * WithBearer doc'd as not concurrency-safe and token-must-be-single-line.
    Proper rotation primitives land with the OIDC work in PR3.
  * findWarehouseByName doc'd as single-page-only — fine for the one
    warehouse per Lakekeeper instance we deploy today; needs pagination
    if we ever host multiple warehouses per instance.
  * Smoke test uses errors.As directly instead of a one-off helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): EnsureDatabase admin helper for Lakekeeper PG creation

The Lakekeeper-reuses-org-PG plan needs a lakekeeper_<orgid> database
created inside the org's existing Aurora cluster. Crossplane composition
provisions one DB per cluster today; the controlplane creates the second
one on demand at provisioning time.

  * EnsureDatabase(ctx, adminDSN, dbName) — opens a pgx connection, probes
    pg_database, runs CREATE DATABASE if absent. Handles the 42P04
    duplicate-database SQLSTATE as a benign concurrent-create outcome.
  * Identifier validation via isSafePGIdent regex + quoteIdent escaping
    (belt and suspenders — CREATE DATABASE doesn't accept query parameters
    so the identifier must be interpolated).
  * Pure function over a DSN: how the admin DSN gets sourced from the K8s
    Secret managed by Crossplane is a separate concern that lands with the
    provisioner reconciler.

Tests: unit tests for isSafePGIdent, quoteIdent, unsafe-ident rejection.
Live-PG integration test gated on PG_ADMIN_DSN — passes against the
prototype Postgres in tmp/lakekeeper-proto.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): K8s helpers for per-org Lakekeeper CRs and Secrets

LakekeeperK8sClient wraps the dynamic client (for the Lakekeeper CR) and
the typed clientset (for the backing Secret) so the provisioner reconciler
can drive both with one dependency. Mirrors the existing DucklingClient
construction pattern; same //go:build kubernetes isolation.

  * EnsureSecret(orgID, data) — creates or updates an Opaque Secret in
    the lakekeeper namespace holding db-user/db-password/encryption-key/
    oauth2-client-secret. Stamped with the duckgres/active-org label so
    the NetworkPolicy lands cleanly later.
  * EnsureCR(spec) — renders and applies the Lakekeeper CR (allowall
    authz, bootstrap.enabled=true, server.baseURI for clients). Idempotent
    create-or-update via dynamic client; replaces the spec wholesale on
    update. Required-field validation up front.
  * GetCR returns (nil, nil) for missing CRs so the reconciler can use
    "not yet created" as a state signal rather than an error path.

Tests use the dynamic fake + kubefake clientset (matches controller_test
pattern). int64-cast numeric fields in the unstructured map because the
DeepCopyJSON validator rejects int32.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address round-2 review feedback

  * EnsureCR now validates OrgID up front. An empty OrgID previously
    fell through to LakekeeperResourceName("") and would have created a
    'lakekeeper-' resource — wrong but hard to spot.
  * Both EnsureCR and EnsureSecret validate orgID against the K8s
    label-value grammar. The resource-name path stripped hyphens, but
    label values went through unsanitised, so a malformed orgID would
    have surfaced as an opaque API server rejection. A clear error from
    the helper is much easier to debug.
  * Tests cover empty + unsafe orgID paths and the isValidOrgIDLabel
    rule set (UUID-style IDs pass; leading hyphen, embedded spaces, and
    >63 chars are rejected).
  * TestEnsureCR_IdempotentUpdate now reads back the CR after the
    second call and asserts the image field actually changed, so the
    update path has a real regression guard rather than just "no error".
  * isDuplicateDatabase uses errors.As instead of a hand-rolled unwrap
    loop. Handles errors.Join-style multi-error chains.
  * Doc note on EnsureCR/EnsureSecret update conflicts: the wrapped
    error preserves apierrors.IsConflict for transient handling, so the
    reconciler can requeue rather than treating it as fatal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): LakekeeperProvisioner reconciler

LakekeeperProvisioner.EnsureForOrg chains the existing helpers into one
idempotent pipeline:

  1. EnsureDatabase: CREATE DATABASE lakekeeper_<orgid> on the org's
     Aurora via the admin DSN supplied by the caller.
  2. resolveOrGenerateSecret: read the per-org Secret if present, or
     generate fresh creds (16/16/32-byte hex randoms) and write a new
     Secret. Re-running never rotates — that would silently desync the
     Postgres user's password from the Lakekeeper config.
  3. EnsureCR with the per-org spec (allowall authz, bootstrap enabled,
     PG host/port/db, BaseURI matching the operator-created Service).
  4. waitForBootstrap: poll the CR until status.bootstrappedAt is set
     (operator's job). On timeout returns the typed ErrBootstrapPending
     so the caller can requeue without treating it as fatal.
  5. EnsureWarehouse via REST: idempotent storage profile create with
     sts-enabled + s3-compat + remote-signing-disabled (the prototype
     proved this is the shape DuckDB-iceberg wants).
  6. Persist endpoint, client_id, and the SecretRef pointing at the
     OAuth2 client_secret back into ManagedWarehouseIceberg. Flips
     IcebergState to Ready.

The reconciler doesn't bake in K8s Secret lookup for the AdminDSN — the
caller computes ProvisioningInputs (admin DSN, PG host, S3 config) from
whatever source applies in their context. PR2 wiring will resolve these
from the Duckling CR status and the Crossplane-managed admin secret.

A side-bug fix in EnsureCR: the Update path was overwriting the existing
status block (the fake dynamic client doesn't honor the status
subresource split). Now copies status from existing CR before Update so
spec drift correction never clobbers operator-set status — which is the
right contract on real K8s too.

Tests:
  * TestEnsureForOrg_HappyPath_Fakes (PG-gated) — full e2e against a
    real Postgres + httptest Lakekeeper. Verifies row updates, exact
    storage-profile shape, and idempotency on a second call.
  * TestEnsureForOrg_BootstrapTimeoutReturnsTransient (PG-gated) —
    short timeout, no status injection; expects ErrBootstrapPending.
  * Validation tests for missing/invalid inputs and unsafe org IDs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address round-3 review feedback

  * configstore: typed ErrWarehouseStateMismatch sentinel. UpdateWarehouseState
    now wraps the "rows affected = 0" error with this so callers can use
    errors.Is to distinguish "CAS guard fired" from a real failure. The
    fakeStore in tests follows the same contract.

  * EnsureForOrg treats ErrWarehouseStateMismatch as benign success: by
    the time the row update fires, the Lakekeeper REST resources are
    already provisioned, and the next reconcile iteration will retry the
    persist step with the row's fresh state. Previously this was a
    confusing fatal error after a successful Lakekeeper write. New test
    TestEnsureForOrg_StateMismatchIsBenign locks the contract in.

  * secretFromExisting reads only from Secret.Data — the API server clears
    StringData on read, so the dual-branch check was dead code in
    production. Test coverage is preserved (the fake echoes StringData
    back; assertSecretData handles both paths there).

  * waitForBootstrap reuses one time.Timer instead of allocating a fresh
    timer per poll via time.After. Trivial leak that adds up over the
    bootstrap window.

  * Replaced the hand-rolled isErrBootstrap helper with errors.Is —
    handles wrapped errors and errors.Join chains correctly.

  * TestFakeStoreUpdateWarehouseState updated to assert the new
    ErrWarehouseStateMismatch contract on CAS-miss.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(k8s): baseline NetworkPolicy for lakekeeper namespace

Default-deny on both ingress and egress, plus two scoped exceptions:
  * ingress 8181/TCP from any pod app=duckgres-worker in the duckgres
    namespace
  * egress to DNS, Postgres (5432), HTTPS (443), MinIO (9000)

Per-org isolation (only org X's workers can reach org X's Lakekeeper)
is deliberately out of scope here. The lakekeeper-operator only stamps
the standard app.kubernetes.io/{name,instance,managed-by} labels and
doesn't propagate CR labels onto the Deployment, so a NetworkPolicy
keyed on duckgres/active-org never matches. Two follow-up paths
documented inline:
  (a) upstream operator patch to propagate CR labels, or
  (b) per-org CiliumClusterwideNetworkPolicy templated in charts repo
      matching app.kubernetes.io/instance=lakekeeper-<orgid-suffix>.

This still gives a meaningful cluster boundary today; intra-cluster
org-to-org boundary lands with PR3 alongside OIDC.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address PR1 deep-review findings

Five real fixes from the holistic PR review:

  * **EnsureRole**: the missing piece that would have broken every first
    activation. A fresh database has no users, but Lakekeeper logs in as
    a role named after the database. EnsureRole creates/alters the
    login role with the password from the K8s Secret, then grants ALL
    on the database. Idempotent: re-runs with the same password are a
    no-op-ish ALTER; rotated passwords ALTER to the new one. Live-PG
    integration test verifies create + idempotent + rotate cycle.

  * **Bootstrap-pending no longer blocks the reconcile loop.** Replaced
    waitForBootstrap (up-to-30s internal poll) with checkBootstrap (one
    GetCR call). Returns ErrBootstrapPending immediately so the outer
    reconciler can requeue without stalling other orgs behind a slow
    operator bootstrap. WithBootstrapTimeout option removed since no
    polling happens anymore.

  * **UpdateIcebergConfig**: new WarehouseStore method that writes
    iceberg_* columns without a top-level state CAS. The previous code
    CASed on w.State, which silently swallowed the persist whenever the
    Duckling controller had already transitioned the row to Ready. Now
    Lakekeeper config always lands. New test
    TestEnsureForOrg_PersistsAfterTopLevelStateMoved locks this in.

  * **EncryptionKey size 16→32 bytes** (32 hex chars → 64 hex chars).
    Comment claimed "32-byte key" but mustRandomHex(16) only produces
    16 bytes of entropy. Bumped DBPassword to 32 too — same generator,
    consistent posture.

  * **EnsureSecret writes Data, not StringData**. The K8s API server
    converts StringData→Data on write, so production reads only ever
    see Data. The dynamic-fake clientset doesn't do that conversion, so
    writing Data directly makes the fake match real K8s behavior.
    secretFromExisting now reads Data exclusively without the dead-code
    fallback.

Plus three smaller items:

  * NetworkPolicy comment clarifies the `app.kubernetes.io/name` selector
    is for operator-stamped Deployment pods, and the `app: lakekeeper`
    label we put on the CR / Secret is intentionally non-overlapping.
  * Field comments on LakekeeperWarehouse and LakekeeperOAuth2ServerURI
    document the wire contract PR2 will consume.
  * isSafePGPassword + quoteLiteral added for the EnsureRole code path.
    The password whitelist is conservative (alphanumeric + a handful of
    symbols) so SQL injection via the unquoted password literal is
    structurally impossible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): orbstack e2e test + bugs it surfaced

End-to-end test that drives EnsureForOrg against a real Kubernetes
cluster (orbstack), the real lakekeeper-operator, a real Postgres, and
real MinIO. Skipped unless LAKEKEEPER_E2E_KUBECONFIG is set.

The test caught three real bugs that the dynamic-fake clientset and
httptest server couldn't:

  * **Postgres 15+ public-schema permissions.** The freshly-created
    lakekeeper_<orgid> database has a `public` schema owned by the
    admin user. Lakekeeper's `migrate` step needs DDL there, but a
    non-owner role can't CREATE in public on PG 15+. EnsureRole now
    runs `ALTER DATABASE OWNER TO <role>` and (belt-and-suspenders)
    `ALTER SCHEMA public OWNER TO <role>` inside the target database
    via a re-DSN helper. Without this, the migration Job CrashLoopBackOffs
    forever and the operator never bootstraps.

  * **Lakekeeper PG SSL mode mismatch.** The operator's `sslMode`
    default is `require`, but our local Postgres has no TLS. Added a
    PGSSLMode field to LakekeeperCRSpec + ProvisioningInputs so callers
    can pass `disable` for local/dev environments. Production
    deployments leave it empty and inherit the secure default.

  * **`/management/v1/*` is at the server root, not under `/catalog`.**
    The provisioner was passing `baseURL + "/catalog"` to the management
    HTTP client, which then assembled `<x>/catalog/management/v1/warehouse`
    — a 404. /catalog is only for the Iceberg REST API that ducklings
    hit later. Now the LakekeeperClient gets the bare baseURL; the
    LakekeeperEndpoint we persist for DuckDB still includes /catalog.

The e2e test routes from the host to in-cluster Lakekeeper via the API
server's service-proxy subresource — orbstack publishes ClusterIPs but
the `.svc` hostnames aren't resolvable from the host. The clientFor
override is test-only; production runs in-cluster so the `.svc`
addresses work natively.

E2E result: full pipeline completes in ~20 seconds (image pull + PG
migrate + bootstrap + warehouse-create), reproducible across runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): satisfy errcheck on defer Close calls

CI's golangci-lint flagged 7 errcheck issues — defer X.Close() without
checking the returned error. Wrapped each in `defer func() { _ = X.Close() }()`
to satisfy the linter without changing behavior. The Close errors here
are non-actionable: we're in cleanup paths or test teardown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(configstore): backfill iceberg_backend for pre-Lakekeeper rows + gitignore tmp/

Two cross-PR review findings:

  * **Backfill migration for iceberg_backend.** Existing rows in
    duckgres_managed_warehouses that have a populated iceberg_table_
    bucket_arn but no explicit iceberg_backend would, after this PR
    series merges, be treated as Lakekeeper orgs (ResolvedBackend
    defaults empty → "lakekeeper"). The worker activator's
    AttachIcebergCatalog would dispatch to the Lakekeeper branch and
    silently skip (LakekeeperEndpoint is empty), and the controller's
    reconcileLakekeeper would fire EnsureForOrg against them on every
    poll tick. Both behaviors are wrong for S3 Tables orgs.

    The new migrateIcebergBackendBackfill stamps Backend='s3_tables' on
    any row with a populated bucket ARN and no explicit Backend.
    Tracked in duckgres_schema_migrations so it runs exactly once.

  * **tmp/ added to .gitignore.** The local prototype directory
    (tmp/lakekeeper-proto/) was never committed — git log --all
    confirmed it appeared in no branch — but a cross-PR review flagged
    that an RSA private key existed there locally. Now explicitly
    ignored so a future accidental `git add -A` can't surface the
    same risk. The local prototype's oidc/ subdirectory has been
    deleted alongside this commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server,controlplane): worker-side Lakekeeper ATTACH wiring (PR2)

Wire the worker's Iceberg ATTACH and the control-plane activation
payload to handle the Lakekeeper backend in addition to the legacy
S3 Tables path. Together with PR1 (#574) this is the worker-side
counterpart to the per-org Lakekeeper provisioning machinery.

  * server/iceberg/config.go — Backend selector + Lakekeeper fields
    (Endpoint, Warehouse, ClientID, ClientSecret, OAuth2ServerURI).
    ResolvedBackend() treats empty Backend as "lakekeeper".
  * server/iceberg/migration.go — new BuildLakekeeperSecretStmt and
    BuildLakekeeperAttachStmt. The Secret statement returns "" in
    allowall mode (no OAUTH2_SERVER_URI), and the Attach statement
    emits AUTHORIZATION_TYPE 'none' instead of SECRET reference. With
    OAuth2 configured (PR3 path), it emits both.
  * server/server.go — AttachIcebergCatalog becomes a dispatcher on
    ResolvedBackend. attachLakekeeperCatalog is a parallel branch that
    skips the S3 SECRET (Lakekeeper vends STS itself via
    ACCESS_DELEGATION_MODE='vended_credentials') and uses the
    Lakekeeper SQL helpers. Same probe-and-detach posture as the S3
    Tables path so a freshly-provisioned empty warehouse doesn't break
    activation.
  * controlplane/shared_worker_activator.go — buildIcebergConfig
    helper extracted for direct testing. Resolves the
    LakekeeperClientCredentials SecretRef → OAuth2 client_secret via
    the existing readSecretValue path. PR3 will wire the lazy
    provisioning trigger; today the worker returns no-op when the
    row's LakekeeperEndpoint is empty (matches the S3 Tables empty-
    TableBucket behavior).

What does NOT change in PR2:
  * Existing s3_tables orgs: AttachIcebergCatalog still goes down the
    same code path, byte-identical SQL.
  * Orgs whose row has Backend="lakekeeper" but no LakekeeperEndpoint
    yet: AttachIcebergCatalog returns nil. Same posture as a tenant
    with iceberg.enabled=true and no TableBucketArn yet.

Tests (new):
  * server/iceberg: 5 cases for the Lakekeeper SQL builders covering
    allowall vs OAuth2 modes and quote-escaping for endpoint and
    warehouse values. Plus ResolvedBackend roundtrip.
  * controlplane: 5 cases for buildIcebergConfig covering s3_tables
    passthrough, lakekeeper allowall (no secret resolution), lakekeeper
    OAuth2 (secret resolved from K8s), empty Backend → lakekeeper
    default, and missing-secret error surfacing.

A pre-existing TestCatalogPsqlCommands/psql_dn failure on main
(unrelated to these changes) was confirmed via clean main checkout —
not a regression from this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(server,controlplane): address PR2 deep-review findings

  * **RefreshIcebergSecret is now backend-aware.** The hourly credential
    refresh cycle would have hard-errored on every Lakekeeper org with
    "no AWS credentials in activation payload" — Lakekeeper vends its
    own STS at table-load time via ACCESS_DELEGATION_MODE, so the
    iceberg_sigv4 secret simply doesn't exist there and nothing needs
    rotating. Added a Lakekeeper short-circuit that returns nil and a
    parallel test TestRefreshIcebergSecretLakekeeperNoOp.

  * **buildIcebergConfig no longer leaks backend-specific fields**
    across the selected backend. Previously the helper unconditionally
    copied TableBucketArn through, so a row migrated from S3 Tables to
    Lakekeeper would ship both LakekeeperEndpoint and a stale
    TableBucket on the wire. Now the switch on ResolvedBackend
    populates only the fields the chosen backend uses.

  * **Real-DuckDB integration test for Lakekeeper SQL** —
    TestLakekeeperSQLAcceptedByDuckDB mirrors the existing
    TestIcebergSecretAcceptedByDuckDB regression-net. The ATTACH hits
    an invalid endpoint and the SECRET hits an invalid OIDC issuer; a
    parse/bind-time error would mean a wrong keyword (e.g.
    ACCESS_DELEGATION_MODE vs DELEGATION_MODE), while a network error
    proves DuckDB accepted the option list and tried to use it.
    isParseOrBindError discriminates between the two.

  * **Dispatcher test** — TestDispatcher_BackendSelection asserts a
    Config with Backend="s3_tables" produces S3 Tables SQL and never
    Lakekeeper-only fragments, and vice versa. Includes the "stale
    TableBucket on an empty-Backend row" case explicitly, since empty
    Backend resolves to Lakekeeper.

  * **isIcebergCatalogEmptyError doc note** — added a TODO(PR3) on the
    helper explaining the patterns are S3-Tables-specific. Lakekeeper
    returns `{"namespaces":[]}` on an empty catalog (no error), so the
    probe-and-detach in attachLakekeeperCatalog handles the empty case
    via "SHOW TABLES returns 0 rows" rather than the error-string
    match. Untested against a live Lakekeeper warehouse with no
    namespaces; flagged for PR3.

The pre-existing TestRefreshIcebergSecretRejectsEmptyCredentials was
updated to set Backend=iceberg.BackendS3Tables explicitly — without
that, empty Backend now resolves to lakekeeper and the test's premise
(missing-AWS-credentials error) no longer holds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(duckdbservice): sameTenantActivationRuntime now includes Lakekeeper fields

Cross-PR review caught a real bug: the same-tenant runtime equality check
only compared the S3-Tables-shaped iceberg fields (Enabled, TableBucket,
Region, Namespace). When a Lakekeeper provisioning completed mid-flight
and the warehouse row gained an Endpoint/Warehouse/ClientID, a hot-idle
worker activated BEFORE provisioning would have been reclaimed for the
same org via reuseExistingActivation — never forced through a fresh
activation. Result: the worker silently kept running with no iceberg
catalog attached, even though the new payload carried the endpoint.

Now compares Backend + the four Lakekeeper identity fields. Excludes
LakekeeperClientSecret because (a) the activator re-resolves the same
SecretRef on each activation so it doesn't necessarily byte-stable
across calls and (b) a rotated client_secret already changes other
fields via re-provisioning.

Test TestSameTenantActivationRuntime_LakekeeperFields covers each new
field + the "secret intentionally NOT compared" contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(provisioner): wire LakekeeperProvisioner into controller reconcile (PR3)

Adds the trigger that turns PR1's manual EnsureForOrg into automatic
per-org Lakekeeper provisioning. Until now, an org with
Iceberg.Backend=lakekeeper would sit indefinitely with an empty
LakekeeperEndpoint and the worker's AttachIcebergCatalog would no-op.
This PR closes that gap by running the provisioner once per reconcile
tick for each Ready warehouse that needs it.

  * Controller gains optional lakekeeperProvisioner +
    LakekeeperInputsResolver dependencies, set via
    WithLakekeeperProvisioner. Both nil → reconcileLakekeeper is a
    silent no-op so existing deployments (operator not installed) are
    unaffected.
  * reconcileLakekeeper runs from reconcileReady, after the iceberg
    status propagation so the controller sees the freshest view of
    w.Iceberg. Gated on Iceberg.Enabled + Backend=lakekeeper +
    LakekeeperEndpoint=="" so the early-return path is the common
    case for already-provisioned warehouses.
  * ErrBootstrapPending is treated as a normal "requeue next tick"
    signal (logged at debug). Any other error is logged at warn and
    the loop continues with the next warehouse.
  * LakekeeperInputsResolver is a function the controller calls to
    build ProvisioningInputs from the warehouse + a K8s Secret managed
    by Crossplane. The convention for sourcing the admin DSN isn't
    baked in here — it's a per-deployment concern that the caller
    plumbs at controller-construction time.

What's NOT in this PR:
  * The actual prod implementation of LakekeeperInputsResolver. That
    needs the Crossplane composition's master-credential Secret name
    pattern, which lives outside this repo. Wiring it is a follow-up.
  * OIDC SA-token auth — orgs provisioned through this trigger get
    Lakekeeper in allowall mode + NetworkPolicy isolation. PR4 swaps
    that for OIDC.

Tests: 5 unit tests covering the controller's gate logic — silent
no-op when provisioner is unconfigured, when Iceberg.Enabled=false,
when Backend=s3_tables, when LakekeeperEndpoint is already set, and
graceful continue when the inputs resolver returns an error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): address PR3 deep-review findings

  * **Stale w.Iceberg fixed.** The "Lakekeeper reconcile sees the most
    recent view" comment was wrong — the iceberg-propagation write went
    to the DB only; the in-memory pointer stayed stale. New
    applyIcebergUpdatesToWarehouse mirrors the persisted updates onto
    w right after the store write succeeds, so reconcileLakekeeper sees
    the freshest values without a re-read round-trip. Misleading
    comment removed.

  * **WithLakekeeperProvisioner now validates non-nil args** via panic.
    Partial wiring would silently disable the reconcile step; a panic
    surfaces the misconfiguration at startup instead. Two new tests
    cover the panic paths.

  * **Positive-path test added.** TestReconcileLakekeeper_HappyPath
    builds a real LakekeeperProvisioner with fakes (fake K8s client +
    httptest Lakekeeper + live prototype PG), seeds the CR with
    status.bootstrappedAt, and asserts that:
      - EnsureForOrg is actually invoked when all gates pass
      - The inputs resolver's ProvisioningInputs reach the provisioner
        field-for-field intact
      - The warehouse row gets the persisted Lakekeeper config
        (endpoint, warehouse name) at the end
    Closes the "tests only verify skip paths" gap from the review.

  * **TODO on duplicate Duckling Get round-trips.** reconcileReady now
    fetches the same CR three times when iceberg is enabled (pgbouncer
    drift, iceberg drift, status propagation) — flagged for
    consolidation when another caller adds a fourth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant