feat(provisioner,worker): Lakekeeper-as-Iceberg-catalog — PR4 (OIDC SA-token auth)#580
Merged
fuziontech merged 3 commits intoMay 20, 2026
Merged
Conversation
e42ecd8 to
d91a322
Compare
Adds defense-in-depth for the duckling → Lakekeeper path. Until this
PR, isolation between orgs depended entirely on NetworkPolicy +
allowall on the Lakekeeper side. PR4 turns on the operator's
authentication.kubernetes mode so Lakekeeper validates the duckling's
projected SA token against the K8s TokenReview API before accepting
any catalog request — even if a NetworkPolicy is misconfigured, only
ducklings with a valid SA token signed by the cluster CA can talk to
the catalog.
DuckDB's iceberg extension fetches the bearer via OAuth2 (POST
client_credentials), not by reading a file. The bridge is the new
in-process broker that the worker runs on a loopback port.
* server/lakekeeperbroker (NEW) — tiny HTTP server, loopback-only,
handles POST /token by re-reading a projected SA token from disk
each request and wrapping it as an OAuth2 response. Kubelet
rotates the file in place; no in-process caching. Refuses to bind
to non-loopback (exposing the SA token to any other caller would
be a real leak). 10 unit tests cover happy path, GET rejection,
file missing/empty 503s, health endpoint, double-start protection,
expires_in override, non-loopback refusal.
* cmd/duckgres-worker — starts the broker when DUCKGRES_LAKEKEEPER_
TOKEN_PATH is set. When unset (every existing duckling pod
today), no broker starts and behavior is unchanged.
* LakekeeperCRSpec gains KubernetesAuthAudiences. Non-empty
populates spec.authentication.kubernetes on the CR (the operator
turns this into LAKEKEEPER__K8S_AUTH_ENABLED=true +
LAKEKEEPER__K8S_AUTH_AUDIENCES=<csv>). Empty omits the block
entirely — Lakekeeper continues running in allowall mode.
* ProvisioningInputs.KubernetesAuthAudiences threads the audience
list through to the CR. When non-empty, EnsureForOrg also writes
LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token (the
worker-local broker) to the warehouse row, so the worker's
server/iceberg ATTACH builder emits the OAuth2 secret + ATTACH
instead of the AUTHORIZATION_TYPE 'none' form.
What's NOT in this PR:
* Pod spec / chart changes adding the projected SA volume mount —
that's a follow-up in the charts repo where the duckling pod
template lives. The broker is dormant until the env var + token
file are wired by ops.
* Changes to the controller's InputsResolver to set
KubernetesAuthAudiences. The flag lives in ProvisioningInputs and
callers opt in when ready; the prod resolver implementation is
still on the deferred list (task #24).
Tests: 4 new — 2 for the CR's authentication block (on/off), 2 for
the OAUTH2_SERVER_URI population (OIDC mode → 127.0.0.1:9876; allowall
mode → empty). Live-PG-gated. Plus the 10 broker unit tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* **Pragma: no-cache** added to /token response. RFC 6749 §5.1 mandates
both Cache-Control: no-store AND Pragma: no-cache; we only had the
former. The HTTP/1.0 Pragma header is irrelevant on a loopback
connection to DuckDB, but the fix is one line and the broker is
otherwise spec-compliant. Test asserts both headers now.
* **Cross-check on the OIDC test** —
TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn now reads the
Lakekeeper CR back from the fake dynamic client and asserts that
spec.authentication.kubernetes.enabled is true with the right
audiences IN THE SAME EnsureForOrg call. Without this, the DB row
could carry the broker URL while the CR stayed in allowall mode —
Lakekeeper would reject every token. A future refactor that splits
or reorders the wiring would now fail the test instead of silently
deploying broken auth.
* **WithExpiresIn TODO** — added a TODO(PR5) noting the env-var wiring
for the override is part of the same pod-spec work that lands the
projected SA volume. The 60s default is intentional; the option is
pre-staged for when the override actually has somewhere to live.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thAudiences Cross-PR review caught that once any org gets a non-empty KubernetesAuthAudiences value, the provisioner writes LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token to the row, and that value is permanent (no path clears it). Ducklings whose pod spec hasn't yet been wired to (a) mount the projected SA token at DUCKGRES_LAKEKEEPER_TOKEN_PATH and (b) start the broker on 9876 will have iceberg ATTACH fail with connection refused. Documents the required deploy ordering on the struct field comment: ship the pod spec change first, then the operator chart change, then flip the audiences in the inputs resolver. Codified guardrail is a follow-up — the provisioner would need a signal that the worker image has the broker compiled in (PR4 already ensures that) AND that the runtime env has DUCKGRES_LAKEKEEPER_TOKEN_PATH set, which only the cluster operator knows. For now, the comment is the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7102dbd to
d4f3362
Compare
db326ba
into
lakekeeper-pr3-provisioning-trigger
5 checks passed
fuziontech
added a commit
that referenced
this pull request
May 20, 2026
…A-token auth) (#580) * feat(provisioner,worker): OIDC SA-token auth via in-process broker (PR4) Adds defense-in-depth for the duckling → Lakekeeper path. Until this PR, isolation between orgs depended entirely on NetworkPolicy + allowall on the Lakekeeper side. PR4 turns on the operator's authentication.kubernetes mode so Lakekeeper validates the duckling's projected SA token against the K8s TokenReview API before accepting any catalog request — even if a NetworkPolicy is misconfigured, only ducklings with a valid SA token signed by the cluster CA can talk to the catalog. DuckDB's iceberg extension fetches the bearer via OAuth2 (POST client_credentials), not by reading a file. The bridge is the new in-process broker that the worker runs on a loopback port. * server/lakekeeperbroker (NEW) — tiny HTTP server, loopback-only, handles POST /token by re-reading a projected SA token from disk each request and wrapping it as an OAuth2 response. Kubelet rotates the file in place; no in-process caching. Refuses to bind to non-loopback (exposing the SA token to any other caller would be a real leak). 10 unit tests cover happy path, GET rejection, file missing/empty 503s, health endpoint, double-start protection, expires_in override, non-loopback refusal. * cmd/duckgres-worker — starts the broker when DUCKGRES_LAKEKEEPER_ TOKEN_PATH is set. When unset (every existing duckling pod today), no broker starts and behavior is unchanged. * LakekeeperCRSpec gains KubernetesAuthAudiences. Non-empty populates spec.authentication.kubernetes on the CR (the operator turns this into LAKEKEEPER__K8S_AUTH_ENABLED=true + LAKEKEEPER__K8S_AUTH_AUDIENCES=<csv>). Empty omits the block entirely — Lakekeeper continues running in allowall mode. * ProvisioningInputs.KubernetesAuthAudiences threads the audience list through to the CR. When non-empty, EnsureForOrg also writes LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token (the worker-local broker) to the warehouse row, so the worker's server/iceberg ATTACH builder emits the OAuth2 secret + ATTACH instead of the AUTHORIZATION_TYPE 'none' form. What's NOT in this PR: * Pod spec / chart changes adding the projected SA volume mount — that's a follow-up in the charts repo where the duckling pod template lives. The broker is dormant until the env var + token file are wired by ops. * Changes to the controller's InputsResolver to set KubernetesAuthAudiences. The flag lives in ProvisioningInputs and callers opt in when ready; the prod resolver implementation is still on the deferred list (task #24). Tests: 4 new — 2 for the CR's authentication block (on/off), 2 for the OAUTH2_SERVER_URI population (OIDC mode → 127.0.0.1:9876; allowall mode → empty). Live-PG-gated. Plus the 10 broker unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fixup(provisioner,broker): address PR4 deep-review findings * **Pragma: no-cache** added to /token response. RFC 6749 §5.1 mandates both Cache-Control: no-store AND Pragma: no-cache; we only had the former. The HTTP/1.0 Pragma header is irrelevant on a loopback connection to DuckDB, but the fix is one line and the broker is otherwise spec-compliant. Test asserts both headers now. * **Cross-check on the OIDC test** — TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn now reads the Lakekeeper CR back from the fake dynamic client and asserts that spec.authentication.kubernetes.enabled is true with the right audiences IN THE SAME EnsureForOrg call. Without this, the DB row could carry the broker URL while the CR stayed in allowall mode — Lakekeeper would reject every token. A future refactor that splits or reorders the wiring would now fail the test instead of silently deploying broken auth. * **WithExpiresIn TODO** — added a TODO(PR5) noting the env-var wiring for the override is part of the same pod-spec work that lands the projected SA volume. The 60s default is intentional; the option is pre-staged for when the override actually has somewhere to live. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fixup(provisioner): doc OIDC flag-day deploy ordering on KubernetesAuthAudiences Cross-PR review caught that once any org gets a non-empty KubernetesAuthAudiences value, the provisioner writes LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token to the row, and that value is permanent (no path clears it). Ducklings whose pod spec hasn't yet been wired to (a) mount the projected SA token at DUCKGRES_LAKEKEEPER_TOKEN_PATH and (b) start the broker on 9876 will have iceberg ATTACH fail with connection refused. Documents the required deploy ordering on the struct field comment: ship the pod spec change first, then the operator chart change, then flip the audiences in the inputs resolver. Codified guardrail is a follow-up — the provisioner would need a signal that the worker image has the broker compiled in (PR4 already ensures that) AND that the runtime env has DUCKGRES_LAKEKEEPER_TOKEN_PATH set, which only the cluster operator knows. For now, the comment is the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fourth and final PR in the Lakekeeper-as-Iceberg-catalog series. Adds defense-in-depth for the duckling → Lakekeeper path. Until now, isolation between orgs depended entirely on NetworkPolicy + allowall on the Lakekeeper side. PR4 turns on the operator's `authentication.kubernetes` mode so Lakekeeper validates the duckling's projected SA token against the K8s TokenReview API.
Targets `lakekeeper-pr3-provisioning-trigger` as the base — stacked PR; merge #574, #576, #579 first.
The bridge problem
DuckDB's iceberg extension fetches its bearer via OAuth2 (POST client_credentials), not by reading a file. The duckling has a projected SA token at `/var/run/secrets/lakekeeper/token`. The bridge is a tiny in-process HTTP server that handles POST /token by reading the file and wrapping it as an OAuth2 response. Kubelet rotates the file in place; the broker re-reads on every request, so no token-rotation surgery is needed.
What's in this PR
What's NOT in this PR
Test plan
Stacked
```
main
└── lakekeeper-pr1 (#574)
└── lakekeeper-pr2-worker-wiring (#576)
└── lakekeeper-pr3-provisioning-trigger (#579)
└── lakekeeper-pr4-oidc (this PR)
```
🤖 Generated with Claude Code