Skip to content

Reorder Dockerfile builder: install deps before copying src#128

Open
Chr96er wants to merge 1 commit into
mainfrom
fix/dockerfile-layer-caching
Open

Reorder Dockerfile builder: install deps before copying src#128
Chr96er wants to merge 1 commit into
mainfrom
fix/dockerfile-layer-caching

Conversation

@Chr96er
Copy link
Copy Markdown
Contributor

@Chr96er Chr96er commented May 28, 2026

What

Move COPY src ./src in the builder stage to after the uv pip install -r requirements.txt step.

Why

The builder currently copies src/ before the dependency install:

COPY pyproject.toml requirements.txt README.md MANIFEST.in ./
COPY src ./src                                    # ← invalidates the next layer on any src edit
RUN ... uv pip install --prefix=/install -r requirements.txt && \
        uv pip install --prefix=/install --no-deps .

Docker layer caching is sequential, so editing any file under src/ invalidates the requirements.txt install layer — every rebuild re-resolves and reinstalls apache-beam[gcp] and its transitives, even though dependencies didn't change.

After the reorder, a source-only change re-runs only the package-install layer (--no-deps . + .), turning cached source rebuilds from ~1–2 min into seconds. The deps layer is keyed on pyproject.toml / requirements.txt only. This is the same layering anchorages_pipeline already uses (its Dockerfile even documents it).

Safety

No behavioural change to the resulting image — identical install commands, only the layer ordering changed. The --no-deps . install still has everything it needs (pyproject.toml, README.md, MANIFEST.in copied earlier; src/ copied just before it).

Context

This speeds up dit's auto-built worker images (it builds this Dockerfile's prod target per unreviewed run); with the current layering those rebuilds pay the full deps cost on every source change.

The builder stage copied `src/` before running the dependency install, so any
source edit invalidated the (expensive) `uv pip install -r requirements.txt`
layer -- every rebuild re-resolved/reinstalled apache-beam[gcp] + transitives.

Move `COPY src ./src` after the requirements install so a source-only change
only re-runs the package-install layer (`--no-deps .` + `.`), turning cached
source rebuilds from ~1-2 min into seconds. The deps layer is now keyed on
pyproject.toml / requirements.txt only. This matches the layering
anchorages_pipeline already uses.

No behavioural change to the resulting image -- same install commands, only
the layer ordering changed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant