Skip to content

feat(ci): add shadow-docker-build workflow for OS-49 Phase 3#964

Merged
pimlock merged 1 commit intomainfrom
jtoelke/os-127-shadow-docker-build
Apr 24, 2026
Merged

feat(ci): add shadow-docker-build workflow for OS-49 Phase 3#964
pimlock merged 1 commit intomainfrom
jtoelke/os-127-shadow-docker-build

Conversation

@jtoelke2
Copy link
Copy Markdown
Collaborator

Summary

Add .github/workflows/shadow-docker-build.yml — a dispatch-driven shadow of docker-build.yml using buildx's local driver (via the recently-merged setup-buildx driver: local) and GHA cache (type=gha, scoped per (component, arch)). Proves Docker builds can run on nv-gha-runners without reaching back to the in-cluster EKS BuildKit pods. Non-blocking; push: main + workflow_dispatch only.

Related Issue

OS-49 runner migration, Phase 3 / OS-127. This is PR 3 of the two-PR phase; PR 2 (setup-buildx driver input) already merged in #941.

Plan and decision thresholds live in OS-127 as an attached comment (Linear MCP create_document remains out of service).

Changes

  • .github/workflows/shadow-docker-build.yml: new workflow.
    • Matrix: {gateway, supervisor, cluster} × {amd64, arm64} = 6 jobs.
    • Runner: linux-{amd64,arm64}-cpu8 (native per arch, no QEMU).
    • Container: ghcr.io/nvidia/openshell/ci:latest with --privileged + docker-socket mount — mirrors docker-build.yml.
    • Buildx: ./.github/actions/setup-buildx with driver: local.
    • Cluster helm prep: helm package deploy/helm/openshell -d deploy/docker/.build/charts/ before the build (only on matrix.component == 'cluster').
    • Build: docker buildx build --platform linux/<arch> --cache-from/--cache-to type=gha,scope=<component>-<arch>,mode=max --build-arg EXTRA_CARGO_FEATURES=openshell-core/dev-settings --load --file deploy/docker/Dockerfile.images --target <component>.
    • Post-step: docker buildx du for cache-size visibility.

Deliberate divergences from docker-build.yml

All per the OS-127 plan:

  • Does not call docker-build.yml as a reusable workflow. Phase 6 folds the driver input into the reusable workflow; the shadow is self-contained.
  • No multi-arch manifest merge. Per-arch images land separately; manifest stitching enters in Phase 6.
  • No --push. Shadow measures build + cache mechanics, not publish.
  • No OPENSHELL_CARGO_VERSION injection. Binary will report 0.0.0; wall-time unaffected.
  • No SCCACHE_MEMCACHED_ENDPOINT. Won't resolve off-EKS anyway; Dockerfile falls back to local disk cache.

Risk note: --privileged on shared runners

nv-gha-runners may or may not permit --privileged containers — this hasn't been confirmed. Per the OS-127 plan, the first dispatch is the test. If rejected, the job fails at container-start with an explicit policy error — clean failure, not partial execution. Phase 3 redesign would then consider non-DinD alternatives (docker-container driver in a sidecar, for instance). Same --privileged pattern that docker-build.yml on ARC uses daily.

Testing

  • mise run pre-commit⚠️ exit 1, but failure is pre-existing on main, unrelated to this PR. 8 MD040 errors in architecture/podman-rootless-networking.md (from PR Openshell driver podman #904's podman driver doc, landed today). Worth a separate tiny fix PR.
  • Unit tests added/updated — N/A; workflow config change.
  • E2E tests added/updated — N/A.
  • Dispatch validation — planned immediately post-merge (gh workflow run shadow-docker-build.yml --ref main). First dispatch doubles as the --privileged-policy test.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated — N/A; plan lives on OS-127 (Linear comment due to MCP create_document outage).

Signed-off-by: Jonas Toelke <jtoelke@nvidia.com>
@jtoelke2 jtoelke2 self-assigned this Apr 24, 2026
@jtoelke2 jtoelke2 requested a review from a team as a code owner April 24, 2026 19:05
@jtoelke2 jtoelke2 requested a review from pimlock April 24, 2026 19:05
Copy link
Copy Markdown
Collaborator

@pimlock pimlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I have a fix for the markdown lint here: #965

@pimlock pimlock merged commit d331ed5 into main Apr 24, 2026
21 of 22 checks passed
@pimlock pimlock deleted the jtoelke/os-127-shadow-docker-build branch April 24, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants