Add push_workers_to_ecr.sh script for ECR image deployment by arjunrajlab · Pull Request #141 · arjunrajlaboratory/ImageAnalysisProject

arjunrajlab · 2026-05-29T16:00:34Z

Summary

scripts/push_workers_to_ecr.sh builds NimbusImage worker Docker images for
linux/amd64 and pushes them to AWS ECR. Each worker gets two tags (short git
SHA + latest); needed base images are built for amd64 and each worker's
FROM is redirected to the ECR copy via docker buildx --build-context (no
Dockerfile edits). Full reference: scripts/PUSH_WORKERS_TO_ECR.md.

Where to run it

Two supported paths:

Laptop (macOS). Cross-builds linux/amd64 under QEMU. Fine for standard
CPU workers; needs bash 4+ (brew install bash, run with /opt/homebrew/bin/bash).
GPU/ML workers are slow/fragile under emulation — use --skip-ml for a quick
standard-only run.
Native amd64 / EC2 (recommended for bulk + GPU/ML). On a native amd64 (or
amd64/GPU) host there is no emulation — builds are fast and GPU/ML workers
build normally. Spin up an ephemeral instance with an IAM instance profile
that has ECR push permissions (so aws_credentials_prod.sh isn't needed),
run the script, then terminate it (a spot instance keeps the one-off cost
low). See "Building on a native amd64 machine" in the doc for the existing-box
and ephemeral-EC2 recipes.

GPU/ML + piscis coverage

The script now builds and pushes GPU/ML workers too (cellpose family,
sam/sam2 family, stardist, condensatenet, deconwolf, deepcell, cellori). They
build FROM public nvidia/cuda images, so they need no base redirect and
push like any other worker.
piscis (non-standard predict/ + train/ layout) is handled as two
pseudo-workers, piscis_predict and piscis_train, both built with the
piscis dir as context (matching its docker-compose.yaml).
New flags: --skip-ml (exclude nvidia/cuda workers incl. piscis) and
--only-ml (build only those, e.g. on a GPU box). --list now shows a
GPU/ML column, and the script warns before cross-building any CUDA worker
under QEMU on a non-amd64 host.
Net result: a single ./scripts/push_workers_to_ecr.sh --all on a native
amd64/GPU EC2 host builds + pushes everything (standard + GPU/ML +
piscis). build_machine_learning_workers.sh remains the local-dev build path
(local image tags, no ECR push).

Build parallelism

COMPOSE_PARALLEL_LIMIT in build_workers.sh is now env-overridable (default
1, safe on a memory-constrained laptop). Even a beefy EC2 GPU build instance
can't handle many concurrent heavy CUDA/torch builds, so 2–3 is the practical
ceiling, not higher:

COMPOSE_PARALLEL_LIMIT=3 ./build_workers.sh

Key implementation details

Discovers workers from workers/annotations/ and workers/properties/;
*_M1 arm64 dev variants are excluded entirely, and test/sample workers are
skipped by --all (still pushable by explicit name).
Per-worker build context is auto-detected (repo-root vs worker-dir); piscis's
context is pinned via an override map since it can't be inferred.
Validates AWS credentials early (sts get-caller-identity); appends -dirty
to the SHA tag on a dirty working tree; auto-creates ECR repos on first push
(with a prompt; -y to skip); re-logs-in + retries once on a push auth error;
keeps a per-worker buildx cache under .cache/buildx/ (--no-cache to
disable).

Image naming / tags

<acct>.dkr.ecr.<region>.amazonaws.com/<prefix>/<category>/<worker>:<short-git-sha>
<acct>.dkr.ecr.<region>.amazonaws.com/<prefix>/<category>/<worker>:latest

Prod registry: account 677276105390, region us-east-1, prefix nimbus.

Verified offline (syntax, --list, stubbed-aws --dry-run). An actual
GPU/ML build + push should be confirmed on a native amd64/GPU host with live
credentials.

🤖 Generated with Claude Code

Local-only workflow for building one or more worker images and pushing them to AWS ECR after sourcing AWS credentials. Discovers workers from the standard workers/annotations/<name> and workers/properties/<sub>/ <name> layouts (piscis is intentionally skipped), tags each image with both the short git SHA and 'latest', uses a per-worker local buildx cache, and creates ECR repos on first push (with a confirmation prompt). Includes --list, --all, --dry-run, --region, --prefix, and -y flags. Also adds a top-level .gitignore covering the buildx cache and the existing build.log, and documents the workflow in CLAUDE.md alongside the other build commands.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ff5dd26a6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T16:02:47Z

+            -t "$img_sha" \
+            -t "$img_latest" \
+            --push \
+            . 2> >(tee "$stderr_file" >&2)


Build worker-context Dockerfiles from their directories

When pushing any discovered worker whose Dockerfile was written for its own directory as the build context (for example workers/annotations/cellpose/Dockerfile:35 and workers/annotations/random_point/Dockerfile:7 use COPY ./environment.yml, and there is no repo-root environment.yml), this repo-root build context makes Docker look in the wrong place and the build fails before anything can be pushed. --all includes many of these Dockerfiles, and named pushes for those workers are broken unless the context is switched to the worker directory or those files are excluded/rewritten.

Useful? React with 👍 / 👎.

Resolved in d938a07 (before this review's commit). detect_context() now picks the build context per worker by checking where each COPY/ADD source resolves — repo-root for compose-style workers, worker-dir for ones that COPY ./environment.yml (cellpose, random_point, …). Verified: cellpose → worker dir, blob_metrics_worker → repo root; both build and push.

push_workers_to_ecr.sh always used the repo root as the build context and had no concept of base images, so: - Workers whose Dockerfile expects its own directory as context (cellpose, stardist, random_point, ...) failed to build under --all (Codex P1: no repo-root environment.yml). Now detect_context() picks repo-root vs worker-dir per worker by checking where each COPY/ADD source resolves. - Worker `FROM nimbusimage/<base>:latest` images aren't on any registry and the local copies are the host arch (arm64), so amd64 cross-builds couldn't resolve them. ensure_base_image() now builds the needed base for the target platform, pushes it to nimbus/base/<base>, and redirects FROM to the ECR copy via `docker buildx --build-context` (no worker Dockerfile edits). Add --skip-base to reuse a base already in ECR. Validated on this Mac: built+pushed amd64 nimbus/base/worker-base and nimbus/properties/blob_metrics_worker (both confirmed linux/amd64). Docs (CLAUDE.md): correct creds path, note worker names are directory names, document base-image build/redirect and --skip-base. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Document the full ECR push flow in scripts/PUSH_WORKERS_TO_ECR.md: auth via ../AWSDeploy/aws_credentials_prod.sh, prerequisites (bash 4+ on macOS), all options, image naming/tags, automatic base-image build + FROM redirect, build-context auto-detection, the per-base-image worker groups, and gotchas (QEMU emulation on macOS is slow; GPU/ML workers should be built natively on an x86_64 machine via build_machine_learning_workers.sh; test/sample and _M1 workers are not deployed). Link it from CLAUDE.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ECR repository names and Docker image tags must be lowercase, but some worker directory names aren't (blob_point_count_3D_projection_worker, the *_M1 variants). Lowercase the repo component when forming the repo and tags so these build instead of failing with "repository name must be lowercase". Also fix ensure_repo() to check the create-repository exit code and return non-zero on failure, instead of always logging "Created repo" even when the AWS call errored. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a "Building on a native amd64 machine" section covering (A) running the existing script on an amd64 Linux box you already have (no emulation, fast, GPU workers build normally) and (B) spinning up an ephemeral EC2 instance (amd64 or GPU) with an ECR IAM instance role to build + push everything, then terminate. Notes that GPU/ML workers go through build_machine_learning_workers.sh and that AWSDeploy Terraform could template the build instance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

arjunrajlab · 2026-05-29T20:22:29Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a821bed23a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-29T20:25:15Z

+    while IFS= read -r f; do
+        name="$(basename "$(dirname "$f")")"
+        printf '%s\tannotations\t%s\n' "$name" "$f"
+    done < <(find workers/annotations -mindepth 2 -maxdepth 2 -name Dockerfile -type f 2>/dev/null | sort)


Exclude M1-only worker Dockerfiles from amd64 pushes

When --all is used, this find includes directories such as workers/annotations/annulus_generator_M1, random_point_annotation_M1, and test_multiple_annotation_M1; those Dockerfiles download Miniconda3-latest-Linux-aarch64.sh, so the script's default --platform linux/amd64 build will fail once it reaches them. Filter out the _M1 worker directories (or only include the production workers) so --all can complete for the documented amd64 ECR workflow.

Useful? React with 👍 / 👎.

Fixed in b2696bf. *_M1 directories are now excluded from discovery (they hardcode the aarch64 Miniconda installer and can't build linux/amd64), so --all no longer breaks on them. Also added a denylist so --all skips the dev/test workers (random_squares, sample_interface, test_multiple_annotation, test_file_creation_worker); those stay pushable by explicit name. Confirmed --all now resolves to 49 buildable workers with no _M1.

chatgpt-codex-connector · 2026-05-29T20:25:15Z

+if ! git -C "$REPO_ROOT" diff --quiet 2>/dev/null || \
+   ! git -C "$REPO_ROOT" diff --staged --quiet 2>/dev/null; then


Include untracked files in dirty SHA tagging

This dirty check only looks at tracked unstaged/staged diffs, so an untracked file that is copied into the Docker build context (for example a newly added worker helper or entrypoint dependency) can be baked into the image while the tag remains the clean git SHA. That breaks the provenance guarantee documented for -dirty; include untracked files in the check before deciding the image corresponds to HEAD.

Useful? React with 👍 / 👎.

Fixed in b2696bf. The dirty check now uses git status --porcelain, which catches tracked changes, staged changes, and untracked (non-gitignored) files. Since the build context is usually the repo root, an untracked file can be COPYed into an image, so it must flip the tag to -dirty. .gitignored artifacts (.cache/, etc.) are still excluded.

…k untracked Address Codex review on PR #141: - Exclude `*_M1` worker dirs from discovery. Their Dockerfiles hardcode the aarch64 Miniconda installer (Apple-Silicon dev only) and cannot build for linux/amd64, so `--all` previously broke when it reached them. - `--all` now skips dev/test/sample workers (random_squares, sample_interface, test_multiple_annotation, test_file_creation_worker) per the deployment policy; they remain pushable by explicit name. - Dirty SHA check now uses `git status --porcelain`, which also catches untracked (non-gitignored) files. With the repo-root build context an untracked file can be baked into an image, so the `-dirty` provenance tag must account for it. Docs updated to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…elism push_workers_to_ecr.sh now covers GPU/ML workers (they build FROM public nvidia/cuda images, so no base redirect is needed) and piscis, which is emitted as two pseudo-workers piscis_predict/piscis_train with the build context pinned to the piscis dir (matching its docker-compose). Adds --skip-ml/--only-ml, a GPU/ML column in --list, and a QEMU emulation warning before cross-building any CUDA worker on a non-amd64 host. Make COMPOSE_PARALLEL_LIMIT in build_workers.sh env-overridable (default 1). Even an EC2 GPU build instance can't handle many concurrent heavy CUDA/torch builds, so 2-3 is the practical ceiling, not higher. Update PUSH_WORKERS_TO_ECR.md and CLAUDE.md: a single --all run on a native amd64/GPU host now builds+pushes everything (standard + GPU/ML + piscis); build_machine_learning_workers.sh remains the local-dev (no-ECR) build path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Regenerated by the update-worker-docs Claude Code hook after PR creation.

Resolve the target registry from $ECR_REGISTRY when set, defaulting to the URL derived from the authenticated account + region (which still respects --region). This is the shared env-var contract with the AWSDeploy consume side (Celery workers pull prebuilt images from the same registry); the full contract is documented in AWSDeploy's doc/Build_and_Push_Worker_Images_to_ECR.md. No new flag, no behavior change when $ECR_REGISTRY is unset. The override flows to worker SHA/latest tags, the base-image push, and the FROM --build-context redirect. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…cripts Capture the deferred consume-side link (make build_workers.sh and build_machine_learning_workers.sh pull prebuilt images from ECR when $ECR_REGISTRY is set, falling back to a source build) along with the two blockers that make it non-trivial: the unknown local image tag Girder discovery expects, and the lack of a single name mapping across worker dir / ECR repo / ML-script local tag / compose service name. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…hanges generate_worker_docs.py: prefer the concise Docker `description` label for the REGISTRY one-liner (fall back to the in-app notes only when there's no label), and make _strip_html replace tags with a space + drop a trailing "Learn more" link label. This removes the verbose/truncated rows, the leaked "Learn more" anchor text, and fused sentences (e.g. "z-slices.It") that the regenerated registry had picked up from the notes fields. update-worker-docs.sh: only regenerate + commit docs when the branch diff vs. the base actually touches workers/ or the generator, so an unrelated PR (e.g. a build-script change) no longer picks up a registry-churn commit. REGISTRY.md regenerated with the fixed generator (idempotent). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

arjunrajlab and others added 4 commits May 29, 2026 12:26

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

arjunrajlab and others added 6 commits May 29, 2026 16:31

docs: auto-update worker docs and registry.md

40ed8d4

Regenerated by the update-worker-docs Claude Code hook after PR creation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add push_workers_to_ecr.sh script for ECR image deployment#141

Add push_workers_to_ecr.sh script for ECR image deployment#141
arjunrajlab wants to merge 11 commits into
masterfrom
claude/worker-image-registry-build-M4qXH

arjunrajlab commented May 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

arjunrajlab May 29, 2026

Uh oh!

arjunrajlab commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

arjunrajlab May 29, 2026

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Uh oh!

arjunrajlab May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if ! git -C "$REPO_ROOT" diff --quiet 2>/dev/null \|\| \
		! git -C "$REPO_ROOT" diff --staged --quiet 2>/dev/null; then

Conversation

arjunrajlab commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Where to run it

GPU/ML + piscis coverage

Build parallelism

Key implementation details

Image naming / tags

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

arjunrajlab May 29, 2026

Choose a reason for hiding this comment

Uh oh!

arjunrajlab commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

arjunrajlab May 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 29, 2026

Choose a reason for hiding this comment

Uh oh!

arjunrajlab May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arjunrajlab commented May 29, 2026 •

edited

Loading