Skip to content

chore: migrate base container images to nvcr.io/nvidia/base/ubuntu:noble-20251013 #244

@drew

Description

@drew

Summary

Migrate all production container base images from Debian/Ubuntu upstream to nvcr.io/nvidia/base/ubuntu:noble-20251013 (Ubuntu 24.04 Noble). This standardizes on NVIDIA's hardened base image across all our containers.

Motivation

  • Standardize on a single, NVIDIA-maintained base image for supply chain consistency
  • Align with NVIDIA container security and compliance requirements
  • Ubuntu Noble (24.04) is LTS and provides a modern, well-supported foundation

Scope

The following Dockerfiles need their runtime/final stages migrated:

Dockerfile Current Base (runtime) Change Required
deploy/docker/Dockerfile.ci ubuntu:24.04 Replace with nvcr.io/nvidia/base/ubuntu:noble-20251013
deploy/docker/Dockerfile.server debian:bookworm-20260223-slim Replace runtime stage with nvcr.io/nvidia/base/ubuntu:noble-20251013
deploy/docker/sandbox/Dockerfile.base python:3.12.13-slim-bookworm Replace base stage with nvcr.io/nvidia/base/ubuntu:noble-20251013 + manually install Python
deploy/docker/Dockerfile.cluster rancher/k3s:v1.35.2-k3s1 Multistage build: extract k3s artifacts from rancher image, run on nvidia base

Out of Scope

  • Builder stages (rust:1.88-slim, python:*-slim for wheel building, crazymax/osxcross) — these are build-time only and don't ship
  • scratch stages (wheel output stages) — empty base, nothing to migrate
  • Example Dockerfiles (examples/) — not production images
  • Dockerfile.nvidia — inherits from sandbox base, will automatically pick up the change
  • macOS cross-compilation Dockerfiles — build tooling only

Implementation Notes

Dockerfile.cluster (multistage approach)

The current image is FROM rancher/k3s (Alpine-based). To move to the nvidia base:

  1. Use rancher/k3s as a build stage to extract k3s binaries and supporting files
  2. Use nvcr.io/nvidia/base/ubuntu:noble-20251013 as the final runtime stage
  3. Copy k3s artifacts (binaries, scripts, etc.) from the rancher stage
  4. Ensure cgroup, iptables, and other k3s runtime dependencies are installed on Ubuntu

Dockerfile.base (sandbox)

The current image uses python:3.12.13-slim-bookworm for the runtime base. To move to nvidia base:

  1. Install Python 3.12 from Ubuntu's deadsnakes PPA or build from source
  2. Ensure pip/venv support is available
  3. Verify all apt packages (iproute2, dnsutils, etc.) are available in Ubuntu repos
  4. Update NodeSource setup for Ubuntu Noble

Dockerfile.server

Straightforward swap from debian:bookworm-slim to nvidia base. Only needs ca-certificates.

Dockerfile.ci

Already on ubuntu:24.04 — direct swap to nvidia base. All apt packages should be compatible.

Acceptance Criteria

  • All four Dockerfiles use nvcr.io/nvidia/base/ubuntu:noble-20251013 as their runtime base
  • mise run docker:build succeeds for all affected images
  • Cluster image boots and passes healthcheck
  • Sandbox image starts and supervisor process runs correctly
  • Server image starts and serves HTTP on port 8080
  • CI image has all required tools (mise, docker, aws cli, rust, etc.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions