Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ of the bare-metal lifecycle to fast-track building next generation AI Cloud offe
## Getting Started

- Go to the [NCX Infra Controller overview](https://nvidia.github.io/ncx-infra-controller-core/) to get an overview of NICo architecture and capabilities.
- Follow the [End-to-End Installation Guide](https://nvidia.github.io/ncx-infra-controller-core/manuals/installation-guide.html) for a complete walkthrough from cluster setup to first provisioned host.
- Or jump to the [Site Setup guide](https://nvidia.github.io/ncx-infra-controller-core/manuals/site-setup.html) to start setting up your site for NICo.
- Or jump to the [Building Containers guide](https://nvidia.github.io/ncx-infra-controller-core/manuals/building_nico_containers.html) to see an overview for building the containers.
- Check out [Local Development with DevSpace](dev/deployment/devspace/README.md) to run NICo locally with mock systems.
Expand Down
4 changes: 3 additions & 1 deletion book/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# NCX Infra Controller

- [Introduction](README.md)
- [Hardware Compatbility List](hcl.md)
- [Hardware Compatibility List](hcl.md)
- [Release Notes](release-notes.md)
- [FAQs](faq.md)

Expand All @@ -25,10 +25,12 @@

# Manuals

- [End-to-End Installation Guide](manuals/installation-guide.md)
- [Site Setup](manuals/site-setup.md)
- [Site Reference Architecture](manuals/site-reference-arch.md)
- [Networking Requirements](manuals/networking_requirements.md)
- [Building NICo Containers](manuals/building_nico_containers.md)
- [Tagging and Pushing Containers](manuals/pushing_containers.md)
- [Ingesting Hosts](manuals/ingesting_machines.md)
- [Updating Expected Hosts Manifest](manuals/expected_machine_update.md)
- [Host Validation](manuals/machine_validation.md)
Expand Down
89 changes: 78 additions & 11 deletions book/src/manuals/building_nico_containers.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,35 @@
# Building NICo Containers

This section provides instructions for building the containers for NCX Infra Controller (NICo).
For the complete deployment workflow, refer to the [End-to-End Installation Guide](installation-guide.md).

## Container Image Summary

The following table lists all container images produced by this build process:

| Image Name | Dockerfile | Purpose | Architecture |
|------------|-----------|---------|-------------|
| `nico-buildcontainer-x86_64` | `dev/docker/Dockerfile.build-container-x86_64` | Intermediate build container (Rust toolchain, libraries) | x86_64 |
| `nico-runtime-container-x86_64` | `dev/docker/Dockerfile.runtime-container-x86_64` | Intermediate runtime base image | x86_64 |
| `nico` (nvmetal-carbide) | `dev/docker/Dockerfile.release-container-sa-x86_64` | Carbide API, DHCP, DNS, PXE, hardware health, SSH console | x86_64 |
| `boot-artifacts-x86_64` | `dev/docker/Dockerfile.release-artifacts-x86_64` | PXE boot artifacts for x86 hosts | x86_64 |
| `boot-artifacts-aarch64` | `dev/docker/Dockerfile.release-artifacts-aarch64` | PXE boot artifacts for DPU BFB provisioning | x86_64 (bundles aarch64 binaries) |
| `machine-validation-runner` | `dev/docker/Dockerfile.machine-validation-runner` | Machine validation / burn-in test runner | x86_64 |
| `machine-validation-config` | `dev/docker/Dockerfile.machine-validation-config` | Machine validation config (bundles runner tar) | x86_64 |
| `build-artifacts-container-cross-aarch64` | `dev/docker/Dockerfile.build-artifacts-container-cross-aarch64` | Intermediate cross-compile container for aarch64 | x86_64 |

The intermediate images (`nico-buildcontainer-x86_64`, `nico-runtime-container-x86_64`,
`build-artifacts-container-cross-aarch64`) are used during the build process and do not
need to be pushed to your registry. The remaining images must be pushed to a registry
accessible by your Kubernetes cluster.

## Installing Prerequisite Software

Before you begin, ensure you have the following prerequisites:

* An Ubuntu 24.04 Host or VM with 150GB+ of disk space (MacOS is not supported)
* For REST containers: Go (refer to the `go.mod` file in the [REST repo](https://github.com/NVIDIA/ncx-infra-controller-rest) for the current required version), Docker 20.10+ with BuildKit enabled
* An [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/) account (free). Required for pulling base images such as the DOCA HBN container used in the aarch64/DPU BFB build. Sign up at [ngc.nvidia.com](https://ngc.nvidia.com) and generate an API key under **API Keys** > **Generate Personal Key**.

Use the following steps to install the prerequisite software on the Ubuntu Host or VM. These instructions
assume an `apt`-based distribution such as Ubuntu 24.04.
Expand Down Expand Up @@ -55,27 +78,34 @@ cargo make --cwd pxe --env SA_ENABLEMENT=1 build-boot-artifacts-x86-host-sa
docker build --build-arg "CONTAINER_RUNTIME_X86_64=alpine:latest" -t boot-artifacts-x86_64 -f dev/docker/Dockerfile.release-artifacts-x86_64 .
```

## Building the Machine Validation images
## Building the Machine Validation Images

```sh
docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 -t machine-validation-runner -f dev/docker/Dockerfile.machine-validation-runner .

docker save --output crates/machine-validation/images/machine-validation-runner.tar machine-validation-runner:latest
docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 \
-t machine-validation-runner -f dev/docker/Dockerfile.machine-validation-runner .

// This copies `machine-validation-runner.tar` into the `/images` directory on the `machine-validation-config` container. When using a kubernetes deployment model
// this is the only `machine-validation` container you need to configure on the `carbide-pxe` pod.

docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 -t machine-validation-config -f dev/docker/Dockerfile.machine-validation-config .
docker save --output crates/machine-validation/images/machine-validation-runner.tar \
machine-validation-runner:latest

docker build --build-arg CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64 \
-t machine-validation-config -f dev/docker/Dockerfile.machine-validation-config .
```

## Building nico-core container
The `machine-validation-config` container bundles `machine-validation-runner.tar` into its
`/images` directory. In a Kubernetes deployment, this is the only machine-validation
container you need to configure on the `carbide-pxe` pod.

## Building nico-core Container

```sh
docker build --build-arg "CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64" --build-arg "CONTAINER_BUILD_X86_64=nico-buildcontainer-x86_64" -f dev/docker/Dockerfile.release-container-sa-x86_64 -t nico .
docker build \
--build-arg "CONTAINER_RUNTIME_X86_64=nico-runtime-container-x86_64" \
--build-arg "CONTAINER_BUILD_X86_64=nico-buildcontainer-x86_64" \
-f dev/docker/Dockerfile.release-container-sa-x86_64 \
-t nico .
```

## Building the AARCH64 Containers and artifacts
## Building the AARCH64 Containers and Artifacts

### Building the Cross-compile container

Expand All @@ -94,10 +124,47 @@ BUILD_CONTAINER_X86_URL="nico-buildcontainer-x86_64" cargo make build-cli

### Building the DPU BFB

The BFB build automatically pulls the HBN container from `nvcr.io`. You must
authenticate with NGC before building:

```sh
docker login nvcr.io -u '$oauthtoken' -p <NGC_API_KEY>
```

```sh
cargo make --cwd pxe --env SA_ENABLEMENT=1 build-boot-artifacts-bfb-sa

docker build --build-arg "CONTAINER_RUNTIME_AARCH64=alpine:latest" -t boot-artifacts-aarch64 -f dev/docker/Dockerfile.release-artifacts-aarch64 .
```

**NOTE**: The `CONTAINER_RUNTIME_AARCH64=alpine:latest` build argument must be included. The aarch64 binaries are bundled into an x86 container.

## Building REST Containers

The REST components (cloud-api, cloud-workflow, site-manager, site-agent,
db migrations, cert-manager) are built from the
[ncx-infra-controller-rest](https://github.com/NVIDIA/ncx-infra-controller-rest) repository.

```sh
cd ncx-infra-controller-rest
make docker-build IMAGE_REGISTRY=<your-registry.example.com/carbide> IMAGE_TAG=<your-version-tag>
```

### REST Image Summary

| Image | Purpose |
|-------|---------|
| `carbide-rest-api` | REST API server (port 8388) |
| `carbide-rest-workflow` | Temporal workflow worker |
| `carbide-rest-site-manager` | Site management and registry service |
| `carbide-rest-site-agent` | On-site Temporal agent |
| `carbide-rest-db` | Database migration job (runs once per upgrade) |
| `carbide-rest-cert-manager` | PKI certificate manager |
| `carbide-rla` | Rack Level Abstraction service |
| `carbide-psm` | Power Shelf Manager service |
| `carbide-nsm` | NVSwitch Manager service |

## Next Steps

After building all images, you will need to tag them and push them to your private registry.
Refer to the [Tagging and Pushing Containers](pushing_containers.md) section for more details.
Loading
Loading