Skip to content

refactor: generalize container build tooling for Docker and Podman compatibility #968

@maxamillion

Description

@maxamillion

Problem Statement

All build scripts, mise tasks, and environment variables use Docker-branded naming (DOCKER_REGISTRY, DOCKER_PLATFORM, build:docker:*, etc.) despite the underlying shell abstraction (container-engine.sh) already supporting both Docker and Podman. This creates a confusing user experience where the tooling appears Docker-only even though it isn't, and makes it harder for Podman users to discover and use the generic container support that already exists.

Additionally, all build scripts hardcode Dockerfile paths via -f flags, with no ability to discover or use Containerfile — the Podman-native naming convention. Both naming conventions are OCI-compatible and should be supported transparently.

The goal is to rename Docker-specific naming to generic container terminology, add Containerfile auto-detection, and maintain full backwards compatibility with existing workflows, scripts, and muscle memory.

Technical Context

The codebase already has a well-designed container engine abstraction in tasks/scripts/container-engine.sh (~324 lines) that handles Docker/Podman switching for build, info, network, prune, buildx, context, and imagetools operations. All build scripts use this abstraction layer via ce_build, ce_push, etc. The problem is purely one of naming — env vars, file names, and task names are all Docker-branded despite the underlying logic being engine-neutral.

Every build invocation in the codebase explicitly specifies a Dockerfile path via -f or --file. No build relies on default name discovery. This means auto-detection must be added explicitly — it won't happen by accident.

Affected Components

Component Key Files Role
Container engine abstraction tasks/scripts/container-engine.sh Core abstraction layer — needs ce_resolve_containerfile() helper added
Build image script tasks/scripts/docker-build-image.sh Main image build script, uses ce_build but has Docker-branded env vars and hardcoded Dockerfile path
CI build script tasks/scripts/docker-build-ci.sh CI-specific build wrapper, hardcoded -f deploy/docker/Dockerfile.ci inline
Multi-arch publish script tasks/scripts/docker-publish-multiarch.sh Multi-arch publishing, already has Docker/Podman dual code paths
Cleanup script scripts/docker-cleanup.sh Resource cleanup, uses ce wrappers throughout
Mise task definitions tasks/docker.toml User-facing task names (build:docker:*, docker:cleanup)
Root mise config mise.toml Sets DOCKER_BUILDKIT=1 env var
VM build script crates/openshell-vm/scripts/build-rootfs.sh VM rootfs build, uses Docker-branded env vars
Benchmark script scripts/build-benchmark/cluster-deploy-fast-test.sh Build benchmark, passes DOCKER_BUILD_CACHE_DIR
Python wheel builder tasks/python.toml macOS wheel build, hardcoded -f deploy/docker/Dockerfile.python-wheels-macos inline
Fast deploy change detection tasks/scripts/cluster-deploy-fast.sh Change detection patterns only match Dockerfile.images, would miss Containerfile changes

Technical Investigation

Architecture Overview

The container build tooling has a clean layered architecture:

mise tasks (tasks/docker.toml)
  └─ build scripts (tasks/scripts/docker-build-image.sh, etc.)
       └─ container-engine.sh abstraction layer
            └─ docker CLI or podman CLI

The bottom layer (container-engine.sh) is already engine-neutral. The upper layers use Docker-branded naming but functionally work with both engines. This refactor renames the upper layers and adds Containerfile auto-detection to the abstraction layer.

Code References

Location Description
tasks/scripts/container-engine.sh:1-324 Core abstraction — handles all Docker/Podman differences. Needs ce_resolve_containerfile() helper added.
tasks/scripts/container-engine.sh:108 Documents that -f is a pass-through flag — abstraction has zero awareness of Dockerfile names today
tasks/scripts/docker-build-image.sh:47 DOCKERFILE="deploy/docker/Dockerfile.images" — single-point hardcoded assignment, used via variable throughout
tasks/scripts/docker-build-image.sh:48-49 Existence check and error message referencing "Dockerfile"
tasks/scripts/docker-build-image.sh:195 -f "${DOCKERFILE}" passed to ce_build
tasks/scripts/docker-build-image.sh:55-78 DOCKER_TARGET env var used to select build target
tasks/scripts/docker-build-image.sh:91 DOCKER_BUILD_CACHE_DIR env var for build cache location
tasks/scripts/docker-build-image.sh:97-98 DOCKER_BUILDER env var for buildx builder name
tasks/scripts/docker-build-image.sh:99,187 DOCKER_PLATFORM env var for target platform
tasks/scripts/docker-build-image.sh:161 DOCKER_OUTPUT env var for build output type
tasks/scripts/docker-build-image.sh:164 DOCKER_PUSH env var to control push behavior
tasks/scripts/docker-build-ci.sh:24 -f deploy/docker/Dockerfile.ci hardcoded inline in ce_build call
tasks/scripts/docker-publish-multiarch.sh:14 DOCKER_REGISTRY env var for registry URL
tasks/scripts/docker-publish-multiarch.sh:16 DOCKER_PLATFORMS env var for multi-arch platform list
tasks/scripts/docker-publish-multiarch.sh:18 EXTRA_DOCKER_TAGS env var for additional tags
tasks/scripts/docker-publish-multiarch.sh:32-43 DOCKER_BUILDER, DOCKER_PLATFORM, DOCKER_PUSH env vars
tasks/docker.toml:1-77 Task definitions: build:docker:*, docker:build:*, docker:cleanup
mise.toml:44 DOCKER_BUILDKIT = "1" — harmless for Podman (ignored), comment already notes this
crates/openshell-vm/scripts/build-rootfs.sh:84,90 DOCKER_PLATFORM env var for VM build
crates/openshell-vm/scripts/build-rootfs.sh:297-298 Uses -f - (stdin heredoc) — immune to Containerfile naming, no changes needed
scripts/build-benchmark/cluster-deploy-fast-test.sh:233,255 DOCKER_BUILD_CACHE_DIR passthrough
tasks/python.toml:177-178 ce build -f deploy/docker/Dockerfile.python-wheels-macos hardcoded inline
tasks/scripts/cluster-deploy-fast.sh:155 Pattern match deploy/docker/Dockerfile.images in matches_gateway()
tasks/scripts/cluster-deploy-fast.sh:176 Pattern match deploy/docker/Dockerfile.images in matches_supervisor()
tasks/scripts/cluster-deploy-fast.sh:215 git ls-tree for gateway fingerprint references Dockerfile.images
tasks/scripts/cluster-deploy-fast.sh:218 git ls-tree for supervisor fingerprint references Dockerfile.images

Current Behavior

All build scripts read Docker-branded env vars (DOCKER_REGISTRY, DOCKER_PLATFORM, DOCKER_PUSH, etc.) and pass them to the container-engine.sh abstraction functions (ce_build, ce_push, etc.). The abstraction layer handles all Docker/Podman differences transparently. Users and CI set DOCKER_* env vars to control build behavior.

Mise tasks are namespaced under docker:* (e.g., mise run build:docker:gateway, mise run docker:cleanup).

Every build invocation explicitly specifies a Dockerfile path via -f. The abstraction layer (ce_build) passes -f through as-is with no awareness of the file name. There is no auto-detection of Containerfile as an alternative.

What Would Need to Change

Environment Variables (9 variables):

Current New (generic) Used in
DOCKER_REGISTRY CONTAINER_REGISTRY publish-multiarch.sh
DOCKER_PLATFORM CONTAINER_PLATFORM build-image.sh, publish-multiarch.sh, build-rootfs.sh
DOCKER_PLATFORMS CONTAINER_PLATFORMS publish-multiarch.sh
DOCKER_PUSH CONTAINER_PUSH build-image.sh, publish-multiarch.sh
DOCKER_BUILDER CONTAINER_BUILDER build-image.sh, publish-multiarch.sh
DOCKER_OUTPUT CONTAINER_OUTPUT build-image.sh
DOCKER_TARGET CONTAINER_TARGET build-image.sh
DOCKER_BUILD_CACHE_DIR CONTAINER_BUILD_CACHE_DIR build-image.sh, cluster-deploy-fast-test.sh
EXTRA_DOCKER_TAGS EXTRA_CONTAINER_TAGS publish-multiarch.sh

Each variable should use a fallback pattern for backwards compatibility:

CONTAINER_REGISTRY="${CONTAINER_REGISTRY:-${DOCKER_REGISTRY:-}}"

File Renames (5 files):

Current New
tasks/scripts/docker-build-image.sh tasks/scripts/container-build-image.sh
tasks/scripts/docker-build-ci.sh tasks/scripts/container-build-ci.sh
tasks/scripts/docker-publish-multiarch.sh tasks/scripts/container-publish-multiarch.sh
scripts/docker-cleanup.sh scripts/container-cleanup.sh
tasks/docker.toml tasks/container.toml

Task Name Renames (in tasks/docker.toml → tasks/container.toml):

Current New Alias needed?
build:docker:<component> build:container:<component> Yes
docker:cleanup container:cleanup Yes
docker:build container:build Yes

Mise supports task aliases — the old docker:* names should remain as aliases pointing to the new container:* tasks.

Containerfile Auto-Detection (new capability):

Location Current Change needed
container-engine.sh No awareness of Dockerfile names Add ce_resolve_containerfile() helper that probes Containerfile.X then Dockerfile.X, returns the first match
docker-build-image.sh:47 DOCKERFILE="deploy/docker/Dockerfile.images" Use ce_resolve_containerfile "deploy/docker" "images"
docker-build-ci.sh:24 -f deploy/docker/Dockerfile.ci hardcoded inline Extract to variable, use ce_resolve_containerfile "deploy/docker" "ci"
python.toml:178 -f deploy/docker/Dockerfile.python-wheels-macos inline Extract to variable or shell wrapper, use helper
cluster-deploy-fast.sh:155,176,215,218 Pattern matches only Dockerfile.images Add Containerfile.images to match patterns and fingerprint inputs

Alternative Approaches Considered

  1. Rename deploy/docker/ directory — Rejected. All Dockerfiles live here and the contents are OCI-compatible. Renaming the directory would touch every script and workflow reference for cosmetic benefit only. High churn, low value.

  2. Rename existing Dockerfiles to Containerfiles — Not needed. Both naming conventions are OCI-compatible. Instead, auto-detection supports both naming conventions transparently. Users can add Containerfile.X alongside or instead of Dockerfile.X and the tooling will find it.

  3. Generalize CI workflows — Deferred. CI runs on Docker infrastructure. The build scripts already abstract the local developer path. CI generalization adds complexity with no current benefit. Noted as future work.

  4. Abstract Rust openshell-bootstrap layer — Deferred. The bollard crate in crates/openshell-bootstrap/src/docker.rs (~1400 LOC) is Docker-API specific. Making the gateway bootstrap work with Podman's API is a separate, larger effort. Noted as future work.

Patterns to Follow

The container-engine.sh abstraction pattern is the gold standard to follow:

  • Detect engine via CONTAINER_ENGINE env var or auto-detect
  • Provide ce_* wrapper functions that handle engine differences
  • Strip unsupported flags transparently (e.g., --provenance, --load for Podman)
  • Use conditional code paths where semantics differ (e.g., _publish_multiarch_docker vs _publish_multiarch_podman)

For backwards compatibility, follow the standard env var fallback pattern:

NEW_VAR="${NEW_VAR:-${OLD_VAR:-default}}"

For Containerfile auto-detection, follow the Podman convention: prefer Containerfile when present, fall back to Dockerfile. This matches podman build's native behavior.

Proposed Approach

Three parallel workstreams, all backwards-compatible:

  1. Env var generalization: Add CONTAINER_* equivalents for all DOCKER_* env vars with fallback patterns. All scripts read the generic names; old names continue to work.

  2. Task and file naming: Rename scripts and mise task file to use container terminology. Preserve docker:* task aliases. Update internal references.

  3. Containerfile auto-detection: Add a ce_resolve_containerfile() helper to container-engine.sh that probes for Containerfile.X then Dockerfile.X in a given directory, returning the first match. Update build scripts to use this helper instead of hardcoding Dockerfile paths. Update change-detection patterns to recognize both naming conventions.

No existing files are renamed from Dockerfile to Containerfile. No CI workflows are modified. No Rust code is changed.

Scope Assessment

  • Complexity: Low
  • Confidence: High — clear path, existing abstraction pattern to follow
  • Estimated files to change: ~10
  • Issue type: refactor

Risks & Open Questions

  • CI pipelines setting DOCKER_* env vars: The fallback pattern ensures these continue to work, but CI workflows should eventually migrate to the new names. A deprecation notice in the env var fallback (e.g., a warn log) could help surface this.
  • User muscle memory: mise run build:docker:* is established. Aliases must be maintained indefinitely or until a major version bump.
  • Documentation updates: Any docs referencing DOCKER_* env vars or build:docker:* tasks need updating to mention both old and new names.
  • DOCKER_BUILDKIT=1 in mise.toml: This is a Docker-specific env var. Podman ignores it, and the comment already notes this. Leave as-is since it's a Docker-native feature flag, not an OpenShell convention.
  • Containerfile precedence: When both Containerfile.X and Dockerfile.X exist in the same directory, which wins? Recommendation: Containerfile takes precedence (matches Podman's native behavior), with an info-level log noting the choice.
  • Change detection fingerprints: cluster-deploy-fast.sh uses git ls-tree hashes of Dockerfile.images for cache fingerprinting. If a Containerfile.images is added, the fingerprint inputs must include both files to avoid stale cache hits.

Future Work (Out of Scope)

  • CI workflow generalization: Replace bare docker commands in .github/workflows/ with mise task calls or engine-neutral scripts.
  • Rust bootstrap abstraction: Make crates/openshell-bootstrap/src/docker.rs work with Podman's REST API via bollard or a trait-based abstraction.
  • deploy/docker/ directory rename: Cosmetic rename with high churn.

Test Considerations

  • Verify all mise tasks work with both CONTAINER_* and DOCKER_* env var names
  • Verify mise run build:container:* and mise run build:docker:* (alias) both work
  • Verify the file renames don't break any script that sources or calls the old paths
  • Existing CI workflows should pass unchanged (backwards compatibility)
  • Manual test: set CONTAINER_ENGINE=podman and run a build task to confirm end-to-end Podman path still works
  • Place a Containerfile.images alongside Dockerfile.images and verify auto-detection picks it up
  • Verify that when only Dockerfile.X exists, behavior is unchanged (no regressions)
  • Verify cluster-deploy-fast.sh change detection triggers on both Dockerfile.images and Containerfile.images modifications
  • Test levels needed: integration (mise task execution), manual validation
  • No new test infrastructure needed — this is a naming refactor with auto-detection and backwards compat

Created by spike investigation. Use build-from-issue to plan and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions