Skip to content

feat(provisioner,worker): Lakekeeper-as-Iceberg-catalog — PR4 (OIDC SA-token auth)#580

Merged
fuziontech merged 3 commits into
lakekeeper-pr3-provisioning-triggerfrom
lakekeeper-pr4-oidc
May 20, 2026
Merged

feat(provisioner,worker): Lakekeeper-as-Iceberg-catalog — PR4 (OIDC SA-token auth)#580
fuziontech merged 3 commits into
lakekeeper-pr3-provisioning-triggerfrom
lakekeeper-pr4-oidc

Conversation

@fuziontech
Copy link
Copy Markdown
Member

Summary

Fourth and final PR in the Lakekeeper-as-Iceberg-catalog series. Adds defense-in-depth for the duckling → Lakekeeper path. Until now, isolation between orgs depended entirely on NetworkPolicy + allowall on the Lakekeeper side. PR4 turns on the operator's `authentication.kubernetes` mode so Lakekeeper validates the duckling's projected SA token against the K8s TokenReview API.

Targets `lakekeeper-pr3-provisioning-trigger` as the base — stacked PR; merge #574, #576, #579 first.

The bridge problem

DuckDB's iceberg extension fetches its bearer via OAuth2 (POST client_credentials), not by reading a file. The duckling has a projected SA token at `/var/run/secrets/lakekeeper/token`. The bridge is a tiny in-process HTTP server that handles POST /token by reading the file and wrapping it as an OAuth2 response. Kubelet rotates the file in place; the broker re-reads on every request, so no token-rotation surgery is needed.

What's in this PR

  • `server/lakekeeperbroker` (new package) — HTTP broker, loopback-only. POST /token reads the SA token from disk + wraps it. Refuses to bind to non-loopback (exposing the wrapped SA token elsewhere would be a real leak). 10 unit tests covering happy path, file rotation pickup, GET rejection, missing/empty 503s, health endpoint, double-start protection, expires_in override, loopback enforcement.
  • `cmd/duckgres-worker` — starts the broker when `DUCKGRES_LAKEKEEPER_TOKEN_PATH` is set. Unset → no broker, existing duckling pods unaffected.
  • `LakekeeperCRSpec.KubernetesAuthAudiences` — non-empty populates `spec.authentication.kubernetes` on the CR (operator turns this into `LAKEKEEPER__K8S_AUTH_*`). Empty keeps Lakekeeper in allowall mode (PR1/2 deployment shape).
  • `ProvisioningInputs.KubernetesAuthAudiences` threads through to the CR. When non-empty, `EnsureForOrg` also writes `LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token\` (the worker-local broker) to the warehouse row, so the worker's ATTACH builder emits the OAuth2 secret + ATTACH form instead of `AUTHORIZATION_TYPE 'none'`.

What's NOT in this PR

  • Pod spec / chart changes adding the projected SA volume mount. That's a follow-up in PostHog/charts where the duckling pod template lives. Until that lands, the broker is dormant.
  • Changes to the controller's `LakekeeperInputsResolver` to set `KubernetesAuthAudiences` by default. The flag lives in `ProvisioningInputs` and callers opt in when ready.

Test plan

  • `go test ./server/lakekeeperbroker/` — 10 cases, all pass
  • `go test -tags kubernetes ./controlplane/provisioner/` — 4 new cases for the CR authentication block + OAUTH2_SERVER_URI persistence in both modes
  • Full sweep + `golangci-lint` clean
  • CI
  • (follow-up) Wire pod spec, run orbstack e2e with OIDC enabled, verify the broker → Lakekeeper round-trip end to end

Stacked

```
main
└── lakekeeper-pr1 (#574)
└── lakekeeper-pr2-worker-wiring (#576)
└── lakekeeper-pr3-provisioning-trigger (#579)
└── lakekeeper-pr4-oidc (this PR)
```

🤖 Generated with Claude Code

@fuziontech fuziontech force-pushed the lakekeeper-pr3-provisioning-trigger branch from e42ecd8 to d91a322 Compare May 19, 2026 22:22
fuziontech and others added 3 commits May 19, 2026 15:22
Adds defense-in-depth for the duckling → Lakekeeper path. Until this
PR, isolation between orgs depended entirely on NetworkPolicy +
allowall on the Lakekeeper side. PR4 turns on the operator's
authentication.kubernetes mode so Lakekeeper validates the duckling's
projected SA token against the K8s TokenReview API before accepting
any catalog request — even if a NetworkPolicy is misconfigured, only
ducklings with a valid SA token signed by the cluster CA can talk to
the catalog.

DuckDB's iceberg extension fetches the bearer via OAuth2 (POST
client_credentials), not by reading a file. The bridge is the new
in-process broker that the worker runs on a loopback port.

  * server/lakekeeperbroker (NEW) — tiny HTTP server, loopback-only,
    handles POST /token by re-reading a projected SA token from disk
    each request and wrapping it as an OAuth2 response. Kubelet
    rotates the file in place; no in-process caching. Refuses to bind
    to non-loopback (exposing the SA token to any other caller would
    be a real leak). 10 unit tests cover happy path, GET rejection,
    file missing/empty 503s, health endpoint, double-start protection,
    expires_in override, non-loopback refusal.

  * cmd/duckgres-worker — starts the broker when DUCKGRES_LAKEKEEPER_
    TOKEN_PATH is set. When unset (every existing duckling pod
    today), no broker starts and behavior is unchanged.

  * LakekeeperCRSpec gains KubernetesAuthAudiences. Non-empty
    populates spec.authentication.kubernetes on the CR (the operator
    turns this into LAKEKEEPER__K8S_AUTH_ENABLED=true +
    LAKEKEEPER__K8S_AUTH_AUDIENCES=<csv>). Empty omits the block
    entirely — Lakekeeper continues running in allowall mode.

  * ProvisioningInputs.KubernetesAuthAudiences threads the audience
    list through to the CR. When non-empty, EnsureForOrg also writes
    LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token (the
    worker-local broker) to the warehouse row, so the worker's
    server/iceberg ATTACH builder emits the OAuth2 secret + ATTACH
    instead of the AUTHORIZATION_TYPE 'none' form.

What's NOT in this PR:
  * Pod spec / chart changes adding the projected SA volume mount —
    that's a follow-up in the charts repo where the duckling pod
    template lives. The broker is dormant until the env var + token
    file are wired by ops.
  * Changes to the controller's InputsResolver to set
    KubernetesAuthAudiences. The flag lives in ProvisioningInputs and
    callers opt in when ready; the prod resolver implementation is
    still on the deferred list (task #24).

Tests: 4 new — 2 for the CR's authentication block (on/off), 2 for
the OAUTH2_SERVER_URI population (OIDC mode → 127.0.0.1:9876; allowall
mode → empty). Live-PG-gated. Plus the 10 broker unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
  * **Pragma: no-cache** added to /token response. RFC 6749 §5.1 mandates
    both Cache-Control: no-store AND Pragma: no-cache; we only had the
    former. The HTTP/1.0 Pragma header is irrelevant on a loopback
    connection to DuckDB, but the fix is one line and the broker is
    otherwise spec-compliant. Test asserts both headers now.

  * **Cross-check on the OIDC test** —
    TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn now reads the
    Lakekeeper CR back from the fake dynamic client and asserts that
    spec.authentication.kubernetes.enabled is true with the right
    audiences IN THE SAME EnsureForOrg call. Without this, the DB row
    could carry the broker URL while the CR stayed in allowall mode —
    Lakekeeper would reject every token. A future refactor that splits
    or reorders the wiring would now fail the test instead of silently
    deploying broken auth.

  * **WithExpiresIn TODO** — added a TODO(PR5) noting the env-var wiring
    for the override is part of the same pod-spec work that lands the
    projected SA volume. The 60s default is intentional; the option is
    pre-staged for when the override actually has somewhere to live.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thAudiences

Cross-PR review caught that once any org gets a non-empty
KubernetesAuthAudiences value, the provisioner writes
LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token to the row, and
that value is permanent (no path clears it). Ducklings whose pod spec
hasn't yet been wired to (a) mount the projected SA token at
DUCKGRES_LAKEKEEPER_TOKEN_PATH and (b) start the broker on 9876 will
have iceberg ATTACH fail with connection refused.

Documents the required deploy ordering on the struct field comment:
ship the pod spec change first, then the operator chart change, then
flip the audiences in the inputs resolver.

Codified guardrail is a follow-up — the provisioner would need a
signal that the worker image has the broker compiled in (PR4 already
ensures that) AND that the runtime env has DUCKGRES_LAKEKEEPER_TOKEN_PATH
set, which only the cluster operator knows. For now, the comment is
the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fuziontech fuziontech force-pushed the lakekeeper-pr4-oidc branch from 7102dbd to d4f3362 Compare May 19, 2026 22:23
@fuziontech fuziontech merged commit db326ba into lakekeeper-pr3-provisioning-trigger May 20, 2026
5 checks passed
@fuziontech fuziontech deleted the lakekeeper-pr4-oidc branch May 20, 2026 00:25
fuziontech added a commit that referenced this pull request May 20, 2026
…A-token auth) (#580)

* feat(provisioner,worker): OIDC SA-token auth via in-process broker (PR4)

Adds defense-in-depth for the duckling → Lakekeeper path. Until this
PR, isolation between orgs depended entirely on NetworkPolicy +
allowall on the Lakekeeper side. PR4 turns on the operator's
authentication.kubernetes mode so Lakekeeper validates the duckling's
projected SA token against the K8s TokenReview API before accepting
any catalog request — even if a NetworkPolicy is misconfigured, only
ducklings with a valid SA token signed by the cluster CA can talk to
the catalog.

DuckDB's iceberg extension fetches the bearer via OAuth2 (POST
client_credentials), not by reading a file. The bridge is the new
in-process broker that the worker runs on a loopback port.

  * server/lakekeeperbroker (NEW) — tiny HTTP server, loopback-only,
    handles POST /token by re-reading a projected SA token from disk
    each request and wrapping it as an OAuth2 response. Kubelet
    rotates the file in place; no in-process caching. Refuses to bind
    to non-loopback (exposing the SA token to any other caller would
    be a real leak). 10 unit tests cover happy path, GET rejection,
    file missing/empty 503s, health endpoint, double-start protection,
    expires_in override, non-loopback refusal.

  * cmd/duckgres-worker — starts the broker when DUCKGRES_LAKEKEEPER_
    TOKEN_PATH is set. When unset (every existing duckling pod
    today), no broker starts and behavior is unchanged.

  * LakekeeperCRSpec gains KubernetesAuthAudiences. Non-empty
    populates spec.authentication.kubernetes on the CR (the operator
    turns this into LAKEKEEPER__K8S_AUTH_ENABLED=true +
    LAKEKEEPER__K8S_AUTH_AUDIENCES=<csv>). Empty omits the block
    entirely — Lakekeeper continues running in allowall mode.

  * ProvisioningInputs.KubernetesAuthAudiences threads the audience
    list through to the CR. When non-empty, EnsureForOrg also writes
    LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token (the
    worker-local broker) to the warehouse row, so the worker's
    server/iceberg ATTACH builder emits the OAuth2 secret + ATTACH
    instead of the AUTHORIZATION_TYPE 'none' form.

What's NOT in this PR:
  * Pod spec / chart changes adding the projected SA volume mount —
    that's a follow-up in the charts repo where the duckling pod
    template lives. The broker is dormant until the env var + token
    file are wired by ops.
  * Changes to the controller's InputsResolver to set
    KubernetesAuthAudiences. The flag lives in ProvisioningInputs and
    callers opt in when ready; the prod resolver implementation is
    still on the deferred list (task #24).

Tests: 4 new — 2 for the CR's authentication block (on/off), 2 for
the OAUTH2_SERVER_URI population (OIDC mode → 127.0.0.1:9876; allowall
mode → empty). Live-PG-gated. Plus the 10 broker unit tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner,broker): address PR4 deep-review findings

  * **Pragma: no-cache** added to /token response. RFC 6749 §5.1 mandates
    both Cache-Control: no-store AND Pragma: no-cache; we only had the
    former. The HTTP/1.0 Pragma header is irrelevant on a loopback
    connection to DuckDB, but the fix is one line and the broker is
    otherwise spec-compliant. Test asserts both headers now.

  * **Cross-check on the OIDC test** —
    TestEnsureForOrg_PersistsOAuth2URIWhenKubernetesAuthOn now reads the
    Lakekeeper CR back from the fake dynamic client and asserts that
    spec.authentication.kubernetes.enabled is true with the right
    audiences IN THE SAME EnsureForOrg call. Without this, the DB row
    could carry the broker URL while the CR stayed in allowall mode —
    Lakekeeper would reject every token. A future refactor that splits
    or reorders the wiring would now fail the test instead of silently
    deploying broken auth.

  * **WithExpiresIn TODO** — added a TODO(PR5) noting the env-var wiring
    for the override is part of the same pod-spec work that lands the
    projected SA volume. The 60s default is intentional; the option is
    pre-staged for when the override actually has somewhere to live.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fixup(provisioner): doc OIDC flag-day deploy ordering on KubernetesAuthAudiences

Cross-PR review caught that once any org gets a non-empty
KubernetesAuthAudiences value, the provisioner writes
LakekeeperOAuth2ServerURI=http://127.0.0.1:9876/token to the row, and
that value is permanent (no path clears it). Ducklings whose pod spec
hasn't yet been wired to (a) mount the projected SA token at
DUCKGRES_LAKEKEEPER_TOKEN_PATH and (b) start the broker on 9876 will
have iceberg ATTACH fail with connection refused.

Documents the required deploy ordering on the struct field comment:
ship the pod spec change first, then the operator chart change, then
flip the audiences in the inputs resolver.

Codified guardrail is a follow-up — the provisioner would need a
signal that the worker image has the broker compiled in (PR4 already
ensures that) AND that the runtime env has DUCKGRES_LAKEKEEPER_TOKEN_PATH
set, which only the cluster operator knows. For now, the comment is
the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant