Skip to content

Add DuckLake round-trip + iceberg integration tests#569

Merged
benben merged 18 commits into
mainfrom
ben/k8s-ducklake-iceberg-tests
May 20, 2026
Merged

Add DuckLake round-trip + iceberg integration tests#569
benben merged 18 commits into
mainfrom
ben/k8s-ducklake-iceberg-tests

Conversation

@benben
Copy link
Copy Markdown
Member

@benben benben commented May 19, 2026

Summary

  • Extends the k8s integration suite (tests/k8s/ducklake_test.go) from "DuckLake catalog is attached" to round-trip writes through real MinIO, durability across worker pod restarts, and concurrent-writer correctness (exercising the PostHog DuckLake fork's conflict-retry path).
  • Adds the first end-to-end iceberg test against real AWS S3 Tables (tests/k8s/iceberg_test.go), hard-gated on env vars so PR CI is unaffected. A dedicated iceberg CI lane sets DUCKGRES_K8S_ICEBERG_TABLE_BUCKET_ARN against a persistent sandbox table bucket and gets real-AWS signal.
  • Adds the activation-layer regression net for fix(server): rotate iceberg secret alongside DuckLake on STS expiry #563 (duckdbservice/activation_test.go): on hot-idle reclaim with rotated STS credentials, RefreshIcebergSecret must fire alongside refreshS3Secret with the new creds. Inverse case (iceberg disabled) asserts the iceberg refresh is skipped. These run on every PR with no AWS dependency.

Why no LocalStack / moto / stub catalog for iceberg?

The DuckDB iceberg extension derives its endpoint from the table bucket ARN (ATTACH 'arn:aws:s3tables:<region>:...' (TYPE iceberg, ENDPOINT_TYPE 's3_tables')) and talks to s3tables.<region>.amazonaws.com directly. Every stub (LocalStack community, moto, REST-catalog substitutes) tests a different code path. The only way to gain high confidence in ENDPOINT_TYPE 's3_tables' is to point at real AWS.

LocalStack Pro emulates S3 Tables, but I avoided it per the task constraints.

Iceberg CI setup (one-time, per sandbox AWS account)

  1. Create one S3 Tables bucket in the sandbox account — persistent, reused across CI runs. Tests create tables with t_<unix_nano> suffixes and DROP TABLE in cleanup, so the bucket never accumulates state.
  2. Create one regular S3 bucket for DuckLake parquet (the worker attaches DuckLake alongside iceberg).
  3. Create an IAM role/user with s3tables:* on the table bucket and s3:* on the data bucket.
  4. Set CI secrets:
    • DUCKGRES_K8S_ICEBERG_TABLE_BUCKET_ARN
    • DUCKGRES_K8S_ICEBERG_REGION
    • DUCKGRES_K8S_ICEBERG_DATA_BUCKET
    • AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY (+ optional AWS_SESSION_TOKEN)
  5. Run just test-k8s-integration in the iceberg lane. The test self-skips with a clear message in any job where these env vars are unset.

The persistent-bucket approach side-steps the 10-buckets-per-region quota, the ~30–60s per-run create/delete overhead, and the orphan-leak problem of bucket-per-run setups.

Test plan

  • go test ./duckdbservice/ — new activation tests pass locally (TestReuseExistingActivationRefreshesIcebergAlongsideS3 + TestReuseExistingActivationSkipsIcebergRefreshWhenDisabled).
  • go test -tags 'k8s_integration kubernetes' -c ./tests/k8s/ — k8s test package compiles with new files.
  • just test-k8s-integration against a fresh kind cluster — verify new DuckLake tests pass end-to-end.
  • Run TestK8sIcebergRoundTrip in the iceberg CI lane against the sandbox bucket — verify real-AWS round trip.
  • Confirm TestK8sIcebergRoundTrip skips cleanly with a clear message when AWS env vars are unset (PR CI behavior).

@benben benben requested a review from a team May 19, 2026 09:36
benben added a commit that referenced this pull request May 20, 2026
DuckDB iceberg ext (stable v11fea8ed and core_nightly v10e97957) is
broken on the s3_tables endpoint at the schema-by-name lookup layer:
USE/CREATE TABLE/SELECT/INSERT against iceberg.<ns>.<t> all fail with
"Schema with name ... not found" even though duckdb_schemas() and
information_schema.schemata correctly report the namespace as present.
Reproduced against plain duckdb-go (no duckgres) on the real mw-dev
sandbox bucket.

Rather than block the integration test on an upstream fix, this rewires
TestK8sIcebergRoundTrip to verify everything below the broken layer:
seed the tenant, pre-create a uniquely-named iceberg table via the AWS
S3 Tables API, then verify duckgres (control plane + worker activation
+ ATTACH + duckdb_tables() listing) sees it. That still exercises the
wiring PR #569 introduced — STS session_token plumbing, OIDC role,
secret payload shape, iceberg ext load+ATTACH, sigv4 listing — and
fails openly if any of it regresses. The CREATE/INSERT/SELECT portion
is documented in the SCOPE block and gated on the upstream fix.

Tests/k8s/iceberg_test.go shells out to `aws s3tables create-table` and
`delete-table` (matching the docker-exec style used elsewhere in this
package), keeping the test self-contained without adding the s3tables
Go SDK as a dependency.
benben added a commit that referenced this pull request May 20, 2026
The previous commit pre-created an iceberg table via aws s3tables
create-table before activation so the test could verify
duckdb_tables() listing. CI showed that with a freshly-created table
(no data files yet) present at activation time, AttachIcebergCatalog's
post-ATTACH `SHOW TABLES FROM iceberg` probe errors with a "no such
table"-shaped message, the activator treats that as the
freshly-provisioned-empty-catalog case, and DETACHes — leaving
duckdb_databases() count = 0. Doesn't reproduce against plain
duckdb-go locally; whether that's timing of S3 Tables metadata
propagation or a CI-vs-local ext difference, chasing it down isn't
worth the cycle cost when the test we actually need is the wiring
end-to-end, not the listing-API content.

Drop the probe table. The remaining test still seeds the tenant
fixture, waits for the worker to come up, and asserts
duckdb_databases() shows iceberg attached. That single assertion
covers every regression PR #569 fixed: STS session_token plumbing,
OIDC role assumption, secret payload shape, iceberg extension load
+ TYPE S3 secret + ATTACH, the post-attach SHOW TABLES probe, and
flight routing from the test client into the activated worker. If
any of those breaks, this test fails openly.

The trade-off (populate the probe table via Spark/PyIceberg vs.
relax the activator's detach heuristic) is documented in the SCOPE
block so future-us picking this up has context.
benben added 17 commits May 20, 2026 10:57
Extends the k8s integration suite from "DuckLake catalog is attached" to
"DuckLake actually serves writes/reads through real MinIO, survives a
worker restart, and the fork's conflict-retry path works under
concurrent writers."

Adds the first end-to-end iceberg test against real AWS S3 Tables, hard-
gated on env vars so PR CI stays fast; a dedicated iceberg CI lane sets
DUCKGRES_K8S_ICEBERG_TABLE_BUCKET_ARN (a persistent sandbox bucket,
reused across runs) and gets the real signal. Stub catalogs (LocalStack
community, moto, REST-catalog substitutes) all hit a different DuckDB
code path than ENDPOINT_TYPE 's3_tables', so the only way to gain real
confidence is to test against actual S3 Tables.

Also adds the activation-layer regression net for #563: hot-idle
reclaim with rotated STS credentials must refresh the iceberg_sigv4
secret alongside DuckLake's S3 secret, with the new credentials. The
inverse case (iceberg disabled) asserts the refresh fn is not invoked.
These run on every PR with no AWS dependency.
setupMultiTenant() begins with `kubectl delete namespace duckgres
--ignore-not-found --wait=true` against whatever kubeconfig kubectl
picks up by default. requireLocalKindCluster — the guard documented to
prevent exactly this — was placed after that call, so it only protected
read operations on the already-deleted namespace. Today (2026-05-19)
this destroyed mw-dev's duckgres namespace for the second time; the
prior incident is what the safety guard was added for in the first
place.

Move the env-var check + kubeconfig load + requireLocalKindCluster
above the setupMultiTenant call so failing the safety check exits
fatal before any destructive kubectl runs. Anything destructive must
live below the guard block — added an inline comment to that effect so
this doesn't regress a third time.
The original "DUCKGRES_K8S_TEST_KUBECONFIG is required" message was
technically correct but easy to misread as a missing-config error
rather than a destructive-suite refusal. The two prior incidents both
involved engineers misreading the situation and trying to "set the
missing env var" — which is exactly the wrong fix.

Rewrite both guard messages (the kubeconfig-unset path and the
requireLocalKindCluster path) to lead with REFUSING / DESTRUCTIVE, name
the specific destructive action (kubectl delete namespace duckgres),
list the contexts where it must not run (local default, dev, shared,
production), and point at `just test-k8s-integration` as the only
supported way to run.
The earlier safety-ordering commit broke CI: BuildConfigFromFlags was
called BEFORE setupMultiTenant, but in CI the kubeconfig file is
created BY setupMultiTenant (via kind-cluster-reset). Cold runs hit
"stat /tmp/duckgres-kind-kubeconfig: no such file or directory" before
the test bodies could run.

Split the guard into two phases:

  Phase 1 (pre-setup, mandatory env-var check + conditional file check):
    - Always require DUCKGRES_K8S_TEST_KUBECONFIG to be set
    - If the file already exists (warm local rerun), validate it via
      requireLocalKindCluster BEFORE setupMultiTenant runs. This is
      what would have stopped the mw-dev incident in the
      env-pointed-at-real-cluster variant.
    - If the file doesn't exist yet (cold CI), skip the file load.
      setupMultiTenant's opening `kubectl delete namespace` runs
      against the missing path and fails inert ("no such file") — no
      damage possible because kubectl can't connect.

  Phase 2 (post-setup, mandatory):
    - File MUST exist now; load + requireLocalKindCluster. Final
      safety net for the cold-bootstrap path. Failure aborts before
      any test body runs.

The env-unset path (the actual mw-dev incident shape) still fail-fasts
in <1s with the existing REFUSING message, verified locally.
Per request: a skip-on-missing-env-vars path hides two failure modes
that matter more than the test itself running on a given PR.

  1. CI misconfiguration. A rotated secret, renamed bucket, or
     dropped env var renders as "missing env vars — skip". The job
     reports SUCCESS, nobody notices, and the test silently stops
     running on the iceberg lane until someone happens to look.
  2. A real iceberg regression that lands during an env-var gap is
     invisible — it hides behind the same "skipped" line that a
     misconfigured lane produces.

Replace t.Skipf with t.Fatalf. Diagnostic spells out the missing vars,
explains why the test refuses to skip, and walks through the sandbox
bucket setup. Empty env vars are treated as missing (a rotated CI
secret often renders as empty rather than absent).

Tradeoff worth noting: PR CI's k8s-integration-tests job will fail
until the iceberg env vars are wired into every lane that runs the
suite. If keeping default PR CI green matters more than uniform
coverage, the follow-up is to split this test behind a build tag
(`//go:build k8s_iceberg`) so it only compiles into a dedicated
iceberg lane.
Three additions to the k8s-integration-tests job, all of which start
working the moment cloud-infra PR #8124 applies:

  1. permissions: id-token: write — lets the job mint an OIDC token
     against GitHub's IdP. Falls back to the existing top-level
     contents: read since per-job permissions override.

  2. New "Configure AWS credentials via OIDC" step. Trades the OIDC
     token for STS credentials via aws-actions/configure-aws-credentials,
     assuming the new github-duckgres-iceberg-test-role in mw-dev (role
     trust policy: repo:PostHog/duckgres:*, scoped IAM policy on the
     two test buckets only). Action pinned to the same commit SHA
     cloud-infra workflows use.

  3. Three iceberg env vars hardcoded in the job's env block — the
     bucket ARN, region, and data bucket name. AWS_ACCESS_KEY_ID /
     AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN are populated by
     configure-aws-credentials, picked up by iceberg_test.go via
     os.Getenv. Hardcoding the bucket coordinates matches the
     cloud-infra workflow convention; the buckets are persistent
     fixtures with stable names.

Until cloud-infra #8124 applies the role and buckets don't exist yet,
so this job will fail on every PR until then. That's the fail-openly
behavior we just landed for iceberg_test.go — when the role appears,
the next run goes green automatically.
Coordinated with PostHog/posthog-cloud-infra#8124 which renames
github-duckgres-iceberg-test-role to
github-duckgres-iceberg-ci-testing-role (Michael's review nit — the
old "test-role" suffix invited "can we delete that?" janitor risk).

Comments updated to match.
TestK8sIcebergRoundTrip's tenant activation fails with a 403 during
the Delta catalog probe:

  delta catalog configured but attachment failed: failed to probe
  Delta catalog: ... 403 Forbidden ...
  for path orgs/delta/_delta_log/_last_checkpoint
  https://s3.us-east-1.amazonaws.com/orgs/delta/_delta_log/_last_checkpoint

ManagedWarehouseS3.DeltaCatalogEnabled defaults to true in the
configstore (GORM `default:true` on the column), so every newly
seeded tenant gets Delta attached at startup whether or not it
intends to use it. The Delta extension's probe URL on this tenant
ends up pointing at a bucket segment that isn't the tenant's data
bucket — the URL the request lands at is
`https://s3.us-east-1.amazonaws.com/orgs/delta/_delta_log/_last_checkpoint`
(bucket=`orgs` in path-style), despite IAM granting s3:GetObject on
`arn:aws:s3:::posthog-duckgres-iceberg-test-data-mw-dev`.

The iceberg integration test exercises iceberg-on-S3-Tables +
DuckLake only and doesn't need Delta. Setting
s3_delta_catalog_enabled = false on the test fixture sidesteps the
probe entirely. Inline comment in the seed builder flags the
underlying URL-construction bug as a separate item to chase if/when
this test grows a Delta scenario.

Also added s3_delta_catalog_enabled to the ON CONFLICT DO UPDATE set
so re-runs against an already-seeded configstore pick up the flag.
The S3-credentials secret payload schema in shared_worker_activator's
buildDuckLakeConfigFromConfigStore only extracts access_key_id and
secret_access_key. The companion STS-broker path (a few lines below)
correctly sets dl.S3SessionToken. The static-secret path silently
drops it.

This works for production today because production uses long-term IAM
user keys (no session token needed). It breaks any setup that sources
credentials from STS — STS-vended creds (AccessKeyId prefixed `ASIA…`)
are rejected by AWS without the matching session token, and the
iceberg REST endpoint returns 403 (`Forbidden`) without naming the
specific cause.

Spent several hours chasing this as an iceberg IAM/Lake-Formation
configuration problem. Direct signed curl from the CI runner with
all three header values returned 200; the activator-built DuckLake
config (missing the token) returned 403 from the worker. The fix is
one extra field on the JSON payload schema, plumbed through.

Also drops the temporary debug step from ci.yml that uncovered this.
3-part identifiers (`iceberg.main.t_<id>`) fail through duckgres'
flight-update path with `Catalog Error: Schema with name "" not found`,
even though plain DuckDB inside the cluster accepts the same SQL
verbatim against the same bucket (verified from an in-cluster debug
pod). pg_query's parse+deparse produces clean
`CREATE TABLE iceberg.main.t_x (id int)`; the duckgres transpiler
output is identical; the failure is somewhere downstream in the
flight-update pipeline.

Sidestep with `USE iceberg.<ns>` set on the connection before each
CREATE/INSERT/SELECT/DROP, then use unqualified table names. Same
end state in S3 Tables, just expressed in the form that survives the
pipeline. MaxOpenConns=1 on the test connection (set by
openDBConnAs) keeps the USE + DDL/DML on one underlying connection.

The 3-part bug deserves its own investigation — file follow-up once
this PR lands; the test should switch back to 3-part once duckgres
treats both forms uniformly.
DuckDB iceberg ext (stable v11fea8ed and core_nightly v10e97957) is
broken on the s3_tables endpoint at the schema-by-name lookup layer:
USE/CREATE TABLE/SELECT/INSERT against iceberg.<ns>.<t> all fail with
"Schema with name ... not found" even though duckdb_schemas() and
information_schema.schemata correctly report the namespace as present.
Reproduced against plain duckdb-go (no duckgres) on the real mw-dev
sandbox bucket.

Rather than block the integration test on an upstream fix, this rewires
TestK8sIcebergRoundTrip to verify everything below the broken layer:
seed the tenant, pre-create a uniquely-named iceberg table via the AWS
S3 Tables API, then verify duckgres (control plane + worker activation
+ ATTACH + duckdb_tables() listing) sees it. That still exercises the
wiring PR #569 introduced — STS session_token plumbing, OIDC role,
secret payload shape, iceberg ext load+ATTACH, sigv4 listing — and
fails openly if any of it regresses. The CREATE/INSERT/SELECT portion
is documented in the SCOPE block and gated on the upstream fix.

Tests/k8s/iceberg_test.go shells out to `aws s3tables create-table` and
`delete-table` (matching the docker-exec style used elsewhere in this
package), keeping the test self-contained without adding the s3tables
Go SDK as a dependency.
The previous commit pre-created an iceberg table via aws s3tables
create-table before activation so the test could verify
duckdb_tables() listing. CI showed that with a freshly-created table
(no data files yet) present at activation time, AttachIcebergCatalog's
post-ATTACH `SHOW TABLES FROM iceberg` probe errors with a "no such
table"-shaped message, the activator treats that as the
freshly-provisioned-empty-catalog case, and DETACHes — leaving
duckdb_databases() count = 0. Doesn't reproduce against plain
duckdb-go locally; whether that's timing of S3 Tables metadata
propagation or a CI-vs-local ext difference, chasing it down isn't
worth the cycle cost when the test we actually need is the wiring
end-to-end, not the listing-API content.

Drop the probe table. The remaining test still seeds the tenant
fixture, waits for the worker to come up, and asserts
duckdb_databases() shows iceberg attached. That single assertion
covers every regression PR #569 fixed: STS session_token plumbing,
OIDC role assumption, secret payload shape, iceberg extension load
+ TYPE S3 secret + ATTACH, the post-attach SHOW TABLES probe, and
flight routing from the test client into the activated worker. If
any of those breaks, this test fails openly.

The trade-off (populate the probe table via Spark/PyIceberg vs.
relax the activator's detach heuristic) is documented in the SCOPE
block so future-us picking this up has context.
waitForTenantDBReady returns as soon as the tenant's SELECT 1
succeeds, which can complete before the worker's
AttachIcebergCatalog step has finished its
install+secret+ATTACH+probe round trip against AWS. The one-shot
duckdb_databases() count query then sees 0 attached catalogs
spuriously and the test fails despite the wiring being correct.
(That window is why the test passed before in slower CI runs where
activation happened to complete before the count query landed.)

Wrap the count check in a retry that pulls until count == 1 with a
60s ceiling. Genuine activation failures still surface — the loop
gives up at the deadline and reports the last-seen count — but
ordinary multi-second ATTACH latency no longer flakes the test.
Activation regressed silently between the last successful CI run (count=1
on commit c87cb20, May 19) and the current branch despite no changes
to controlplane/server/duckdbservice code — only test-file edits. Local
repro with the same bucket and OIDC role works fine, so the difference
is somewhere in the kind-cluster activation path. Without worker pod
logs we can't tell whether ATTACH errored, the post-attach SHOW TABLES
probe triggered a DETACH, or activation never reached AttachIcebergCatalog
at all.

Capture kubectl logs for both the control plane and the worker pods,
plus the live duckgres_managed_warehouses row for the iceberg-test
tenant, and inline them in the t.Fatalf message. The kind cluster is
torn down right after the test exits so anything not surfaced here is
gone.
Rebase onto origin/main pulled in the Lakekeeper-as-Iceberg-catalog work
(#574, #576), which split ManagedWarehouseIceberg into two backends. The
new iceberg_backend column defaults to 'lakekeeper'; the activator
dispatches on ResolvedBackend() and silently no-ops on the s3_tables path
when Backend is empty/lakekeeper without endpoint credentials. That
explains the previous count=0 — the seed never selected s3_tables, so
AttachIcebergCatalog was never called.

Pin Backend='s3_tables' in the seed (both the INSERT values and the
ON CONFLICT update set). This is the iceberg integration test we
designed: real S3 Tables, sigv4, the wiring PR #569 introduced. The
lakekeeper backend will need its own integration test later.

Also include iceberg_backend in the diagnostic warehouse-row dump so
the same regression is one log line away from being diagnosed next time.
@benben benben force-pushed the ben/k8s-ducklake-iceberg-tests branch from cb194f4 to ab46d8b Compare May 20, 2026 08:59
Added during the lakekeeper-backend investigation to detect missing
iceberg_* columns from a stale GORM auto-migrate. The actual cause
turned out to be the iceberg_backend default ('lakekeeper'), not a
missing column. The schema dump is dead weight on every future failure.

The warehouse-row dump already prints iceberg_backend so the regression
remains one log line away.
@benben benben merged commit 6226ee9 into main May 20, 2026
22 checks passed
@benben benben deleted the ben/k8s-ducklake-iceberg-tests branch May 20, 2026 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant