Skip to content

perf(docker): cache CGO dep compile via go.sum-keyed prewarm layer#654

Merged
benben merged 2 commits into
mainfrom
ben/docker-cgo-prewarm-cache
Jun 2, 2026
Merged

perf(docker): cache CGO dep compile via go.sum-keyed prewarm layer#654
benben merged 2 commits into
mainfrom
ben/docker-cgo-prewarm-cache

Conversation

@benben
Copy link
Copy Markdown
Member

@benben benben commented Jun 2, 2026

Problem

Docker image builds redo two expensive things on every PR that the
non-Docker CI jobs already cache:

  1. CGO dep compileduckdb-go (libduckdb) + pg_query_go
    (libpg_query). actions/setup-go caches this for the test jobs
    (keyed on go.sum), but the cache doesn't reach Docker, and
    COPY . . busts the build layer on any source edit, so the
    multi-minute C/C++ compile re-runs each time.
  2. Extension downloads — 5 curls (httpfs, ducklake, json,
    postgres_scanner, iceberg) that ran after COPY . . + build, so
    they re-fetched on every source edit despite depending only on the
    version args.

Measured CD build times (GHA, native arm64 runners):

image time matrix
worker ~367s ×4
main ~288s ×2
controlplane ~241s ×2

--mount=type=cache would not help on ephemeral GHA runners — the
GHA cache backend doesn't persist BuildKit cache mounts. Layer caching
(cache-to: type=gha,mode=max, already configured on all image
workflows) does, so the fix is to put every expensive step in a layer
keyed on go.sum / version args, ahead of the source COPY.

Fix

1. CGO dep-prewarm after go mod download:

RUN CGO_ENABLED=1 go build github.com/duckdb/duckdb-go/v2 \
    github.com/pganalyze/pg_query_go/v6 2>/dev/null || true

2. Move extension downloads before COPY . . (keyed on version args).

Net: every builder layer that costs time — module download, CGO dep
compile, extension fetch — now sits before the source COPY and caches
independently of first-party code. A source-only PR (the common case)
hits cache for all of them; the final go build recompiles only changed
packages.

Per-file

  • Dockerfile: prewarm duckdb-go + pg_query_go; extension download moved up.
  • Dockerfile.controlplane: prewarm pg_query_go only (CP doesn't link libduckdb; no extensions).
  • Dockerfile.worker: copy go.mod/go.sum first → pin → download → prewarm → extension download → COPY source. Per-DuckDB-version pin correctness (the old copy-source-first layout) preserved by stashing the pinned go.mod/go.sum and restoring after COPY . . (local copy, no network). Binary-dependent bindings-verify stays after the build; arg-only version cross-check moved up with the downloads.
  • cache-proxy Dockerfile unchanged — CGO_ENABLED=0, no C deps, no extensions.

Expected: warm-cache source-only PRs drop CGO compile (~3-5 min) + 5 extension fetches to near-zero; first build after a go.sum / extension-version bump pays it once.

Not included (deferred, separate PRs)

  • PR-triggered image builds arm64-only (mw-dev is 29 arm64 / 2 amd64); keep full matrix on merge-to-main.
  • Trim worker bindings download from 5 OS/arch to $TARGETARCH — now low-value since the pin+download layer is itself cached, and risks go mod tidy (which is exhaustive across GOOS/GOARCH) breakage.

🤖 Generated with Claude Code

benben added 2 commits June 2, 2026 11:59
Docker image builds recompile the heavy CGO dependencies — duckdb-go
(libduckdb) and pg_query_go (libpg_query) — from scratch on every PR.
The Go build cache that makes the non-Docker CI jobs fast (restored by
actions/setup-go, keyed on go.sum) does not persist inside Docker
builds, and `COPY . .` busts the build layer on any source edit, so
the multi-minute C/C++ compile re-runs every time.

Observed CD build times (GHA, native arm64 runners):
  worker image  ~367s ×4 matrix
  main image    ~288s ×2
  controlplane  ~241s ×2

Add a dep-prewarm RUN layer after `go mod download`, keyed only on
go.mod/go.sum, that compiles duckdb-go + pg_query_go into the build
cache:

  RUN CGO_ENABLED=1 go build github.com/duckdb/duckdb-go/v2 \
      github.com/pganalyze/pg_query_go/v6 2>/dev/null || true

On a source-only PR (the common case) this layer is a GHA layer-cache
hit (cache-to: type=gha,mode=max is already configured on all image
workflows), so the libduckdb/libpg_query compile is skipped and the
final `go build` recompiles only changed first-party packages.

- Dockerfile: prewarm duckdb-go + pg_query_go.
- Dockerfile.controlplane: prewarm pg_query_go only (CP doesn't link
  libduckdb).
- Dockerfile.worker: restructure to copy go.mod/go.sum first → pin →
  download → prewarm → COPY source, so the prewarm is cacheable. The
  per-DuckDB-version pin correctness the old COPY-source-first layout
  guarded is preserved by stashing the pinned go.mod/go.sum and
  restoring them after COPY . . (local copy, no network). The existing
  bindings-version verify layers still fail the build on any drift.

cache-proxy Dockerfile unchanged — it's CGO_ENABLED=0, no C deps.

Follow-ups (separate PRs): PR-triggered image builds are arm64-only
(mw-dev is arm64); trim the worker bindings download from 5 OS/arch to
the target arch.
The 5 bundled-extension downloads (httpfs, ducklake, json,
postgres_scanner, iceberg) ran AFTER `COPY . .` and the app build, so
`COPY . .` busting on any source edit re-ran all 5 curls every build.
They depend only on the extension version/tag args, not on source.

Move the download block (and, in the worker, the arg-only
extension/bindings version cross-check) ahead of `COPY . .` so the
layer is keyed on the version args alone and stays a GHA layer-cache hit
across source-only PRs. The binary-dependent bindings-pin verify stays
after the build (it inspects the compiled binary).

Pairs with the CGO dep-prewarm in the previous commit: now every
expensive builder layer — module download, CGO dep compile, extension
fetch — sits before the source COPY and caches independently of
first-party code changes.
@benben benben merged commit c4db37d into main Jun 2, 2026
22 checks passed
@benben benben deleted the ben/docker-cgo-prewarm-cache branch June 2, 2026 10:14
benben added a commit that referenced this pull request Jun 2, 2026
…rce (#655)

The prewarm restructure in #654 moved the per-DuckDB-version pin into a
module-files-only stage (COPY go.mod go.sum; go get; go mod tidy) ahead
of COPY . .. With no .go files present, `go mod tidy` sees zero imports
and prunes EVERY require directive, emptying go.mod. The stashed
go.mod.pinned was therefore empty too, and restoring it after COPY . .
gave the final build a go.mod with no dependencies:

  duckdbservice/appender_init.go:7:2: no required module provides
    package github.com/duckdb/duckdb-go/v2

All four worker matrix builds failed on main (container-image-worker-cd).
Main + control-plane images were unaffected — they never run tidy.

Fix: pin with `go get pkg@ver` (which records the require regardless of
imports) but do NOT tidy in the source-less stage; drop the
stash/restore. Re-run the pin + `go mod tidy` AFTER COPY . ., once the
real import set is visible, so tidy keeps every needed require. The
prewarm + module caches from the pre-COPY layers stay valid (same
versions → go get is a fast metadata no-op).

Verified locally: `go mod tidy` with only go.mod/go.sum prunes
duckdb-go to zero requires; `go get duckdb-go/v2@v2.10502.0` without
tidy keeps and pins it. Default-path worker binary builds clean.
benben added a commit that referenced this pull request Jun 2, 2026
The prewarm added in #654 was premised on a multi-minute CGO compile of
duckdb-go. That premise was wrong: duckdb-go-bindings ships PREBUILT
static libs (libduckdb_*.a etc, ~1.7GB in the module cache) — there is
no DuckDB C++ compile to cache. The other prewarmed dep, pg_query_go,
cold-compiles in ~4s. So the prewarm layer saved single-digit seconds
while adding a confusing extra build step.

Measured on warm main builds after #654:
  main         288s -> 264s
  controlplane 241s -> 229s
The ~10% saving came from the LAYER REORDERING (module + extension
downloads moved ahead of COPY . ., so they cache on source-only PRs),
not from the prewarm. The dominant remaining cost is the final CGO
*link* of the prebuilt static libs (~68s), which caching can't touch
because it changes with every first-party source edit.

Remove the prewarm RUN from all three Dockerfiles. Keep the valuable
reordering and the worker pin-without-tidy fix from #655.

Net build-time wins still live here:
  - module download + extension fetch cached before COPY . .
Real PR-latency lever (separate change): build only linux/arm64 in PR
CI (mw-dev is arm64), halving the worker matrix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant