Skip to content

Conversation

@elezar
Copy link
Member

@elezar elezar commented Apr 2, 2025

This change adds support for specifying the container runtime executable path. This can be used if, for example, there are two containerd executables and a specific one must be used.

Deploying to a k0s system with:

helm install gpu-operator -n gpu-operator --create-namespace \
  nvidia/gpu-operator $HELM_OPTIONS \
    --version=v25.3.0 \
    --set toolkit.repository=ghcr.io/nvidia \
    --set toolkit.version=ae385428-ubuntu20.04 \
    --set toolkit.env[0].name=RUNTIME_CONFIG \
    --set toolkit.env[0].value=/etc/k0s/containerd.d/nvidia.toml \
    --set toolkit.env[1].name=RUNTIME_SOCKET \
    --set toolkit.env[1].value=/run/k0s/containerd.sock \
    --set toolkit.env[2].name=RUNTIME_EXECUTABLE_PATH \
    --set toolkit.env[2].value=/var/lib/k0s/bin/containerd \
    --set toolkit.env[3].name=NVIDIA_RUNTIME_NAME \
    --set toolkit.env[3].value=nvidia

Allows the config to be extracted correctly and unblocks the deployment.

Fixes #803

@elezar elezar self-assigned this Apr 2, 2025
@elezar elezar requested a review from tariq1890 April 2, 2025 15:58
@elezar elezar force-pushed the allow-runtime-path branch from 3d0b984 to 1dbba17 Compare April 2, 2025 16:03
EnvVars: []string{"RUNTIME_CONFIG", "CONTAINERD_CONFIG", "DOCKER_CONFIG"},
},
&cli.StringFlag{
Name: "container-runtime-executable-path",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just executable-path or runtime-executable-path? This is consistent with how socket is named

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this may be better. Let me take another pass.

@elezar elezar force-pushed the allow-runtime-path branch from 1dbba17 to ae38542 Compare April 3, 2025 07:18
@elezar elezar linked an issue Apr 3, 2025 that may be closed by this pull request
@elezar elezar marked this pull request as ready for review April 3, 2025 12:46
@elezar elezar requested a review from cdesiniotis April 3, 2025 12:46
@elezar
Copy link
Member Author

elezar commented Apr 3, 2025

@tariq1890 do we need to backport this?

@tariq1890
Copy link
Contributor

Backporting makes sense since we are most likely releasing gpu-operator v25.3.1

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the allow-runtime-path branch from ae38542 to 3e77955 Compare April 4, 2025 08:57
This change adds support for specifying the container runtime
executable path. This can be used if, for example, there are
two containerd or crio executables and a specific one must be used.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the allow-runtime-path branch from 3e77955 to c57bdf3 Compare April 7, 2025 15:28
@elezar elezar added the must-backport The changes in PR need to be backported to at least one stable release branch. label Apr 7, 2025
@elezar elezar merged commit b4edc3e into NVIDIA:main Apr 8, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

must-backport The changes in PR need to be backported to at least one stable release branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

container-toolkit on k0s leads to unsupported config version: 3

2 participants