fix: use checkpoints base image from Docker Hub for CI builds#143
fix: use checkpoints base image from Docker Hub for CI builds#143marcuscollins merged 2 commits intomainfrom
Conversation
Replace curl downloads and build-context COPY of checkpoints with a single COPY --from=diffuseproject/sampleworks-checkpoints:latest so CI (ubuntu-latest) can build without needing checkpoint files locally. Also adds checkpoints/ to .dockerignore, docker-entrypoint.sh to CI path triggers, and makes run_all_models.sh use a configurable IMAGE env var.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors Docker image build to source model checkpoints from an external pre-built image. Updates .dockerignore, Dockerfile (multi-stage COPY from diffuseproject/sampleworks-checkpoints:latest, ENV keys, ENTRYPOINT/CMD), CI workflow triggers, and run_all_models.sh to parameterize the image reference. Changes
Sequence Diagram(s)sequenceDiagram
participant Dev as Developer
participant CI as CI runner
participant Docker as Docker build
participant Checkpoints as diffuseproject/sampleworks-checkpoints:latest
rect rgba(200,230,255,0.5)
Dev->>CI: Push changes (Dockerfile, .dockerignore, run_all_models.sh)
end
rect rgba(200,255,200,0.5)
CI->>Docker: Start build
Docker->>Checkpoints: COPY --from=... /checkpoints/ -> /checkpoints/
Docker-->>CI: Built image includes /checkpoints/
end
rect rgba(255,230,200,0.5)
CI->>Docker: Run container (uses ENTRYPOINT/CMD)
Docker-->>CI: Container starts with ENV checkpoint paths set
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
Dockerfile (1)
123-123: Consider pinning the checkpoints image to a specific tag or digest for reproducibility.Using
:latestmeans builds are not deterministic—if the checkpoints image is updated on Docker Hub, subsequent builds will pull different checkpoints without any code change in this repository. This could lead to subtle inconsistencies between builds.For improved reproducibility, consider:
- Using a versioned tag (e.g.,
diffuseproject/sampleworks-checkpoints:v1.0.0)- Using a digest (e.g.,
diffuseproject/sampleworks-checkpoints@sha256:...)That said, if checkpoints rarely change and this is acceptable for your workflow, the current approach is functional.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` at line 123, The Dockerfile currently copies checkpoints with the mutable image reference "COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/"; change that to a pinned reference by replacing ":latest" with a specific versioned tag (e.g., :v1.0.0) or an immutable digest (e.g., `@sha256`:...), update any build scripts or CI to use the chosen pinned identifier, and add a short comment or README note near the COPY line that documents the chosen tag/digest and when to update it.run_all_models.sh (1)
52-52: Consider quoting$IMAGEfor shell best practices.While Docker image names don't typically contain spaces, quoting variables prevents potential issues with word splitting and glob expansion.
Suggested change
docker run $DOCKER_OPTS \ --gpus "\"device=$gpus\"" \ -v /mnt/diffuse-private:/mnt/diffuse-private:ro \ -v "$RESULTS_DIR:/data/results" \ - $IMAGE \ + "$IMAGE" \ -e "$env" run_grid_search.py \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@run_all_models.sh` at line 52, The shell variable $IMAGE is unquoted in run_all_models.sh which can lead to word-splitting or glob expansion; update the Docker invocation(s) and any other usages of $IMAGE (the token "$IMAGE" in the script) to use double quotes instead, i.e., replace occurrences of $IMAGE with "$IMAGE" so the image name is treated as a single argument.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Dockerfile`:
- Line 123: The Dockerfile currently copies checkpoints with the mutable image
reference "COPY --from=diffuseproject/sampleworks-checkpoints:latest
/checkpoints/ /checkpoints/"; change that to a pinned reference by replacing
":latest" with a specific versioned tag (e.g., :v1.0.0) or an immutable digest
(e.g., `@sha256`:...), update any build scripts or CI to use the chosen pinned
identifier, and add a short comment or README note near the COPY line that
documents the chosen tag/digest and when to update it.
In `@run_all_models.sh`:
- Line 52: The shell variable $IMAGE is unquoted in run_all_models.sh which can
lead to word-splitting or glob expansion; update the Docker invocation(s) and
any other usages of $IMAGE (the token "$IMAGE" in the script) to use double
quotes instead, i.e., replace occurrences of $IMAGE with "$IMAGE" so the image
name is treated as a single argument.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5eb7bbe1-b586-43ce-a2d0-33d08bd5b55c
📒 Files selected for processing (4)
.dockerignore.github/workflows/docker.ymlDockerfilerun_all_models.sh
There was a problem hiding this comment.
Pull request overview
This PR updates the container build and related tooling so CI can build the Docker image without requiring large checkpoint files to exist in the build context, by pulling pre-baked checkpoints from a separate Docker Hub image.
Changes:
- Switch Dockerfile checkpoint acquisition to
COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/. - Add
checkpoints/to.dockerignoreto avoid accidentally sending checkpoint files in build context. - Update CI workflow path triggers and make
run_all_models.shuse a configurableIMAGEenv var.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
Dockerfile |
Replaces local/curl checkpoint acquisition with a cross-image COPY --from from Docker Hub. |
run_all_models.sh |
Makes the Docker image configurable via IMAGE instead of hardcoding sampleworks:latest. |
.github/workflows/docker.yml |
Documents the checkpoints base-image approach and rebuild-trigger paths include docker-entrypoint.sh. |
.dockerignore |
Excludes checkpoints/ from the Docker build context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # cp /mnt/diffuse-private/raw/checkpoints/protenix_base_default_v0.5.0.pt checkpoints/ | ||
| COPY checkpoints/rf3_foundry_01_24_latest.ckpt /checkpoints/rf3_foundry_01_24_latest.ckpt | ||
| COPY checkpoints/protenix_base_default_v0.5.0.pt /checkpoints/protenix_base_default_v0.5.0.pt | ||
| COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/ |
There was a problem hiding this comment.
COPY --from=diffuseproject/sampleworks-checkpoints:latest ... pins to the mutable latest tag, which makes builds non-reproducible and can break unexpectedly if the checkpoints image is updated. Consider pinning to an immutable digest (or at least a versioned tag), and optionally wiring it through an ARG CHECKPOINTS_IMAGE=... so CI and local builds can override intentionally.
| @@ -46,7 +49,7 @@ run_model() { | |||
| --gpus "\"device=$gpus\"" \ | |||
There was a problem hiding this comment.
The --gpus flag value is being passed with literal quote characters ("device=..."). Docker CLI does not strip those quotes, so this can fail to parse as a valid --gpus value. Pass the value as device=$gpus (quoted by the shell if needed) without embedding quotes in the argument.
| --gpus "\"device=$gpus\"" \ | |
| --gpus "device=$gpus" \ |
Copilot suggested to put the docker image name (sampleworks:latest) in quotes; it is passed by variable $IMAGE to `docker`, so this seems a little safer. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
marcuscollins
left a comment
There was a problem hiding this comment.
LGTM, thanks for the updates!
| name: Build and Push Docker Images | ||
|
|
||
| # CI builds pull all model checkpoints (~10 GB) from Docker Hub automatically via: | ||
| # COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/ |
There was a problem hiding this comment.
I had no idea you could do this! Fantastic!
There was a problem hiding this comment.
Although, couldn't you just use that as a base image?
SHOULD ONLY MERGE AFTER #143 Adds python-semantic-release (v10) to automate version bumps, changelog generation, GitHub releases, and versioned Docker image tags. Includes commitizen pre-commit hook for commit message validation and developer documentation for the release process. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Added detailed Release Process and Commit Message (Conventional Commits) guidance. * Updated README with commit-hook install instructions and release guidance. * Introduced a CHANGELOG placeholder for future release notes. * **Chores** * Added an automated Release workflow to compute and publish releases on main. * Enhanced CI to publish Docker images on semantic tags with dynamic tags/labels and updated action versions. * Integrated commit-msg validation (commitizen) and semantic-release configuration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/so that CI (ubuntu-latest) can build without needing checkpoint files locally (~10 GB). Docker automatically pulls the pre-built checkpoints layer from Docker Hub during the build.checkpoints/to.dockerignoreto prevent accidental inclusion in build contextdocker-entrypoint.shto CI workflow path triggers so changes to the entrypoint trigger a rebuilddocker.ymlexplaining the checkpoints base image patternrun_all_models.shuse a configurableIMAGEenv var (default:diffuseproject/sampleworks:latest) instead of hardcodedsampleworks:latestContext
PR #141 merged the RF3/Protenix checkpoint support but the Dockerfile still used
curldownloads for Boltz checkpoints andCOPY checkpoints/...from build context for RF3/Protenix. This caused CI to fail because:COPY checkpoints/rf3_...andCOPY checkpoints/protenix_...files don't exist on the CI runnerDOCKERHUB_USERNAMEandDOCKERHUB_TOKENsecrets were also missing (now added)The checkpoints base image (
diffuseproject/sampleworks-checkpoints:latest) was already built and pushed to Docker Hub from the GPU server, containing all 6 checkpoint items (boltz1, boltz2, ccd, mols/, rf3, protenix).Testing
DOCKERHUB_USERNAME,DOCKERHUB_TOKEN) have been added to the repoSummary by CodeRabbit
New Features
Chores