Skip to content

feat: support multiple DuckDB and DuckLake versions in k8s multi-tenant CP#462

Merged
fuziontech merged 8 commits intomainfrom
feat/multi-version-ducklake-spec
Apr 28, 2026
Merged

feat: support multiple DuckDB and DuckLake versions in k8s multi-tenant CP#462
fuziontech merged 8 commits intomainfrom
feat/multi-version-ducklake-spec

Conversation

@fuziontech
Copy link
Copy Markdown
Member

@fuziontech fuziontech commented Apr 27, 2026

This PR introduces support for heterogeneous worker pools where different tenants can run different DuckDB and DuckLake versions.

Key features:

  • Tenant-Specific Builds: Orgs can now be pinned to specific worker images via the image field in the config store.
  • DuckLake Version Overrides: Support for designating the target DuckLake spec version per tenant, ensuring correct metadata migrations even when running mixed worker versions.
  • Version-Aware Coordination: The control plane now respects version affinity when claiming idle workers or spawning new ones, preventing build-mismatch hijacking.
  • Hot-Idle Safety: Strict version checking during hot-idle reclamation to ensure workers with attached DuckLake catalogs are only reused if their build matches the tenant's current requirement.
  • Global Fallbacks: Maintainers can set global default worker images and DuckLake spec versions in the control plane configuration.

This PR enables designating specific DuckDB/DuckLake builds per tenant by adding an image field to worker_records and OrgConfig. The K8sWorkerPool now respects these designations during worker acquisition and spawning.
@fuziontech fuziontech changed the title Support per-tenant DuckLake spec version overrides feat: support multiple DuckDB and DuckLake versions in k8s multi-tenant CP Apr 28, 2026
This change enables designating target DuckLake versions per tenant, ensuring that workers running different builds correctly trigger or skip metadata migrations based on their specific requirements.
@fuziontech fuziontech force-pushed the feat/multi-version-ducklake-spec branch from 7b449ce to 0acdaff Compare April 28, 2026 00:05
This update addresses the review concerns:
- Wired DuckLakeDefaultSpecVersion to CLI/Env/YAML.
- Explicitly retire mismatched hot-idle workers to prevent resource leaks.
- Added index to WorkerRecord.Image for performance.
- Fixed test coverage gaps for version affinity and precedence.
- Cleaned up misleading version getters.
This update addresses the deep self-review concerns:
- Refactored server/ducklake_migration.go to use a per-metadata-store cache (sync.Map), preventing cross-tenant contamination in the CP.
- Enhanced the Janitor to support per-tenant image reaping via resolveOrgConfig.
- Updated retireClaimedWorker to mark DB rows as retired immediately, preventing capacity deadlocks.
- Cleaned up K8sWorkerPoolConfig by moving DuckLake version concerns to OrgRouter/Activator.
- Fixed all build regressions in server and tests.
@fuziontech fuziontech force-pushed the feat/multi-version-ducklake-spec branch from 98e5956 to 630932a Compare April 28, 2026 01:34
…ls, dead code

- server: remove dead DuckLakeMigrationCheckedVersion(), delete dlMigrationMu
  entries after migration check to prevent unbounded sync.Map growth
- controlplane: carry image through ManagedWorker and UpsertWorkerRecord
  DoUpdates so runtime store records retain per-worker image
- controlplane: remove pod_uid from UpsertWorkerRecord DoUpdates (never
  populated, was blanking the column on every lifecycle transition)
- creds: move duckgres_local_test.yaml to .example and add to gitignore
- test: add table-driven unit test for workerImageForOrg nil/empty cases
- gofmt org_activation_test.go (leading tab on func decl)
- add debug log when duckdb-worker container not found in pod spec
@fuziontech fuziontech merged commit 82b55c9 into main Apr 28, 2026
21 checks passed
@fuziontech fuziontech deleted the feat/multi-version-ducklake-spec branch April 28, 2026 19:38
fuziontech added a commit that referenced this pull request May 1, 2026
Adds .github/workflows/container-image-worker-cd.yml — a new CD pipeline
that publishes one duckgres-worker image per (DuckDB version × arch),
using Dockerfile.worker (PR #501).

Matrix shape:
  - DuckDB 1.5.2 (default) → duckgres-worker:<sha>-duckdb1.5.2-{arm64,amd64}
                              + multi-arch :<sha>-duckdb1.5.2 manifest
                              + :<sha> and :latest (only on default rows)
  - DuckDB 1.5.1            → duckgres-worker:<sha>-duckdb1.5.1-{arm64,amd64}
                              + multi-arch :<sha>-duckdb1.5.1 manifest

Adding a DuckDB version is one new row under matrix.duckdb. The
DUCKDB_GO_VERSION / DUCKDB_BINDINGS_VERSION pair maps to duckdb-go
module versions; the encoding is `v0.<major><minor:02d><patch:02d>.0`
(see scripts/ducklake_version_matrix.sh for the same mapping in test
code), so DuckDB 1.5.1 → v2.10501.0 / v0.10501.0 and 1.5.2 →
v2.10502.0 / v0.10502.0.

The all-in-one image (.github/workflows/container-image-cd.yml) is
left untouched and continues to publish the existing duckgres image
unchanged. The new pipeline ships alongside it.

Operators flip a tenant's `image` config-store column to point at a
specific suffixed worker tag (e.g. duckgres-worker:<sha>-duckdb1.5.1)
to canary that DuckDB version for that org. PR #462 (the original
multi-version control plane work) wires the image-pinning lookup into
the worker activation path.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fuziontech added a commit that referenced this pull request May 1, 2026
…ake_version (#506)

Closes the operational loop opened by PR #462 (per-tenant pinning) and PR #502
(per-DuckDB-version matrix CD): operators no longer need to run direct
`UPDATE duckgres_managed_warehouses SET image=..., ducklake_version=...` SQL
against the config store to pin a tenant to a specific worker image / DuckLake
spec version. The new endpoint goes through the same row-locked
MutateManagedWarehouse path the PUT endpoint uses, so concurrent mutators are
serialized.

Adds `ducklake_version` to managedWarehouseUpsertColumns() — it was missing,
so prior PUTs that touched this column were silently no-oping. Also adds
Image and DuckLakeVersion to the strict-decode whitelist on
managedWarehouseRequest so the existing PUT path can carry them too.

Includes a regression-guard test asserting both columns stay in the upsert
list — losing either silently breaks the matrix-build cutover.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant