feat(provisioner): Lakekeeper-as-Iceberg-catalog — PR3 (controller trigger)#579
Merged
Conversation
5 tasks
64d3b7f to
17c2df5
Compare
e42ecd8 to
d91a322
Compare
This was referenced May 20, 2026
…le (PR3)
Adds the trigger that turns PR1's manual EnsureForOrg into automatic
per-org Lakekeeper provisioning. Until now, an org with
Iceberg.Backend=lakekeeper would sit indefinitely with an empty
LakekeeperEndpoint and the worker's AttachIcebergCatalog would no-op.
This PR closes that gap by running the provisioner once per reconcile
tick for each Ready warehouse that needs it.
* Controller gains optional lakekeeperProvisioner +
LakekeeperInputsResolver dependencies, set via
WithLakekeeperProvisioner. Both nil → reconcileLakekeeper is a
silent no-op so existing deployments (operator not installed) are
unaffected.
* reconcileLakekeeper runs from reconcileReady, after the iceberg
status propagation so the controller sees the freshest view of
w.Iceberg. Gated on Iceberg.Enabled + Backend=lakekeeper +
LakekeeperEndpoint=="" so the early-return path is the common
case for already-provisioned warehouses.
* ErrBootstrapPending is treated as a normal "requeue next tick"
signal (logged at debug). Any other error is logged at warn and
the loop continues with the next warehouse.
* LakekeeperInputsResolver is a function the controller calls to
build ProvisioningInputs from the warehouse + a K8s Secret managed
by Crossplane. The convention for sourcing the admin DSN isn't
baked in here — it's a per-deployment concern that the caller
plumbs at controller-construction time.
What's NOT in this PR:
* The actual prod implementation of LakekeeperInputsResolver. That
needs the Crossplane composition's master-credential Secret name
pattern, which lives outside this repo. Wiring it is a follow-up.
* OIDC SA-token auth — orgs provisioned through this trigger get
Lakekeeper in allowall mode + NetworkPolicy isolation. PR4 swaps
that for OIDC.
Tests: 5 unit tests covering the controller's gate logic — silent
no-op when provisioner is unconfigured, when Iceberg.Enabled=false,
when Backend=s3_tables, when LakekeeperEndpoint is already set, and
graceful continue when the inputs resolver returns an error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* **Stale w.Iceberg fixed.** The "Lakekeeper reconcile sees the most
recent view" comment was wrong — the iceberg-propagation write went
to the DB only; the in-memory pointer stayed stale. New
applyIcebergUpdatesToWarehouse mirrors the persisted updates onto
w right after the store write succeeds, so reconcileLakekeeper sees
the freshest values without a re-read round-trip. Misleading
comment removed.
* **WithLakekeeperProvisioner now validates non-nil args** via panic.
Partial wiring would silently disable the reconcile step; a panic
surfaces the misconfiguration at startup instead. Two new tests
cover the panic paths.
* **Positive-path test added.** TestReconcileLakekeeper_HappyPath
builds a real LakekeeperProvisioner with fakes (fake K8s client +
httptest Lakekeeper + live prototype PG), seeds the CR with
status.bootstrappedAt, and asserts that:
- EnsureForOrg is actually invoked when all gates pass
- The inputs resolver's ProvisioningInputs reach the provisioner
field-for-field intact
- The warehouse row gets the persisted Lakekeeper config
(endpoint, warehouse name) at the end
Closes the "tests only verify skip paths" gap from the review.
* **TODO on duplicate Duckling Get round-trips.** reconcileReady now
fetches the same CR three times when iceberg is enabled (pgbouncer
drift, iceberg drift, status propagation) — flagged for
consolidation when another caller adds a fourth.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…A-token auth) (#580) * feat(provisioner,worker): OIDC SA-token auth via in-process broker (PR4) Adds defense-in-depth for the duckling → Lakekeeper path. Until this PR, isolation between orgs depended entirely on NetworkPolicy + allowall on the Lakekeeper side. PR4 turns on the operator's authentication.kubernetes mode so Lakekeeper validates the duckling's projected SA token against the K8s TokenReview API before accepting any catalog request — even if a NetworkPolicy is misconfigured, only ducklings with a valid SA token signed by the cluster CA can talk to the catalog. DuckDB's iceberg extension fetches the bearer via OAuth2 (POST client_credentials), not by reading a file. The bridge is the new in-process broker that the worker runs on a loopback port. * server/lakekeeperbroker (NEW) — tiny HTTP server, loopback-only, handles POST /token by re-reading a projected SA token from disk each request and wrapping it as an OAuth2 response. Kubelet rotates the file in place; no in-process caching. Refuses to bind to non-loopback (exposing the SA token to any other caller would be a real leak). 10 unit tests cover happy path, GET rejection, file missing/empty 503s, health endpoint, double-start protection, expires_in override, non-loopback refusal. * cmd/duckgres-worker — starts the broker when DUCKGRES_LAKEKEEPER_ TOKEN_PATH is set. When unset (every existing duckling pod today), no broker starts and behavior is unchanged. * LakekeeperCRSpec gains KubernetesAuthAudiences. Non-empty populates spec.authentication.kubernetes on the CR (the operator turns this into LAKEKEEPER__K8S_AUTH_ENABLED=true + LAKEKEEPER__K8S_AUTH_AUDIENCES=<csv>). Empty omits the block entirely — Lakekeeper continues running in allowall mode. * ProvisioningInputs.KubernetesAuthAudiences threads the audience list through to the CR. When non-empty, EnsureForOrg also writes LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token (the worker-local broker) to the warehouse row, so the worker's server/iceberg ATTACH builder emits the OAuth2 secret + ATTACH instead of the AUTHORIZATION_TYPE 'none' form. What's NOT in this PR: * Pod spec / chart changes adding the projected SA volume mount — that's a follow-up in the charts repo where the duckling pod template lives. The broker is dormant until the env var + token file are wired by ops. * Changes to the controller's InputsResolver to set KubernetesAuthAudiences. The flag lives in ProvisioningInputs and callers opt in when ready; the prod resolver implementation is still on the deferred list (task #24). Tests: 4 new — 2 for the CR's authentication block (on/off), 2 for the OAUTH2_SERVER_URI population (OIDC mode → 127.0.0.1:9876; allowall mode → empty). Live-PG-gated. Plus the 10 broker unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fixup(provisioner,broker): address PR4 deep-review findings * **Pragma: no-cache** added to /token response. RFC 6749 §5.1 mandates both Cache-Control: no-store AND Pragma: no-cache; we only had the former. The HTTP/1.0 Pragma header is irrelevant on a loopback connection to DuckDB, but the fix is one line and the broker is otherwise spec-compliant. Test asserts both headers now. * **Cross-check on the OIDC test** — TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn now reads the Lakekeeper CR back from the fake dynamic client and asserts that spec.authentication.kubernetes.enabled is true with the right audiences IN THE SAME EnsureForOrg call. Without this, the DB row could carry the broker URL while the CR stayed in allowall mode — Lakekeeper would reject every token. A future refactor that splits or reorders the wiring would now fail the test instead of silently deploying broken auth. * **WithExpiresIn TODO** — added a TODO(PR5) noting the env-var wiring for the override is part of the same pod-spec work that lands the projected SA volume. The 60s default is intentional; the option is pre-staged for when the override actually has somewhere to live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fixup(provisioner): doc OIDC flag-day deploy ordering on KubernetesAuthAudiences Cross-PR review caught that once any org gets a non-empty KubernetesAuthAudiences value, the provisioner writes LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token to the row, and that value is permanent (no path clears it). Ducklings whose pod spec hasn't yet been wired to (a) mount the projected SA token at DUCKGRES_LAKEKEEPER_TOKEN_PATH and (b) start the broker on 9876 will have iceberg ATTACH fail with connection refused. Documents the required deploy ordering on the struct field comment: ship the pod spec change first, then the operator chart change, then flip the audiences in the inputs resolver. Codified guardrail is a follow-up — the provisioner would need a signal that the worker image has the broker compiled in (PR4 already ensures that) AND that the runtime env has DUCKGRES_LAKEKEEPER_TOKEN_PATH set, which only the cluster operator knows. For now, the comment is the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…R5) (#583) Enables the per-org Lakekeeper provisioning branch end-to-end. The provisioning controller already had WithLakekeeperProvisioner (PR3) but nothing called it, so reconcileLakekeeper was inert in every deployment. - newLakekeeperInputsResolver resolves per-org ProvisioningInputs: 1. Crossplane Duckling CR status (prod) — the metadata-store master creds double as the admin connection that CREATEs the lakekeeper_<orgid> db/role; data-store bucket is the S3 warehouse. Admin DDL + the Lakekeeper pod target the DIRECT Aurora endpoint, never the PgBouncer pooler (transaction pooling breaks CREATE DATABASE and Lakekeeper's own migrations). S3 uses pod IRSA. 2. Env fallback (dev/orbstack + MinIO) when no usable Duckling CR. - multitenant.go wires it after NewController, gated behind DUCKGRES_LAKEKEEPER_PROVISIONER_ENABLED (off by default; best-effort if the K8s client can't be built). S3-Tables warehouses are unaffected. - KubernetesAuthAudiences left empty: this is the allowall + NetworkPolicy deployment shape. OIDC SA-token auth remains a separate flag-day change. Builds clean with -tags kubernetes; resolver unit tests cover the CR path, env fallback, CR-error fallback, and the no-source error. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
8a76b67 to
5e5ac44
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Third of the Lakekeeper-as-Iceberg-catalog series. Adds the trigger that turns PR1's manual `EnsureForOrg` into automatic per-org provisioning by wiring it into the existing `Controller` reconcile loop. Targets `lakekeeper-pr2-worker-wiring` as the base — stacked PR; merge #574 and #576 first.
Without this PR, an org with `Iceberg.Backend="lakekeeper"` would sit indefinitely with an empty `LakekeeperEndpoint` and the worker's `AttachIcebergCatalog` would no-op forever. PR3 closes that gap by running the provisioner once per reconcile tick for each `Ready` warehouse that needs it.
What's in this PR
What's NOT in this PR
Test plan
Stacked
🤖 Generated with Claude Code