Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overriding nvidia-container-runtime/config.toml with XDG_CONFIG_HOME #56

Open
hholst80 opened this issue Mar 18, 2023 · 2 comments
Open

Comments

@hholst80
Copy link

Per documentation:

The NVIDIA Container Runtime uses file-based configuration, with the config stored in /etc/nvidia-container-runtime/config.toml. The /etc path can be overridden using the XDG_CONFIG_HOME environment variable with the ${XDG_CONFIG_HOME}/nvidia-container-runtime/config.toml file used instead if this environment variable is set.

Where is this supposed to be injected? I supposed that nvidia-container-runtime itself reads this file upon being spawned by dockerd. But it does not seem to honor its promise. I copied /etc/nvidia-container-runtime to /root/.config and changed no-cgroups = true. This does not work for root and it will fail if I have the file /etc/nvidia-container-runtime/config.toml and make the changes there. However, nvidia-container-runtime will gladly ignore my /root/.config file and use the default (no-groups = false).

[root@goblin docker]# cat daemon.json
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime-wrapper"
        }
    }
}
[root@goblin docker]# cat /usr/local/bin/nvidia-container-runtime-wrapper
#!/bin/sh
export XDG_CONFIG_HOME=/root/.config
echo "$@" >> /tmp/wrapper.log
exec nvidia-container-runtime "$@"
[root@goblin docker]#
TTRPC_ADDRESS=/run/containerd/containerd.sock.ttrpc XDG_CONFIG_HOME=/root/.config PWD=/run/containerd/io.containerd.runtime.v2.task/moby/a5cacc4d2c4c1cf56b1051379016a0ddebea9fafc2ffdf8754c628f057c98697 SYSTEMD_EXEC_PID=1652 LANG=en_US.UTF-8 INVOCATION_ID=59ee96a5f2754a3aab30b521a3505398 GOMAXPROCS=4 SHLVL=0 LD_LIBRARY_PATH=/opt/containerd/lib: JOURNAL_STREAM=8:53427 XDG_DATA_DIRS=/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/ PATH=/opt/containerd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin _=/usr/bin/env
--root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/a5cacc4d2c4c1cf56b1051379016a0ddebea9fafc2ffdf8754c628f057c98697/log.json --log-format json --systemd-cgroup delete a5cacc4d2c4c1cf56b1051379016a0ddebea9fafc2ffdf8754c628f057c98697
TTRPC_ADDRESS=/run/containerd/containerd.sock.ttrpc XDG_CONFIG_HOME=/root/.config PWD=/run/containerd/io.containerd.runtime.v2.task/moby/a5cacc4d2c4c1cf56b1051379016a0ddebea9fafc2ffdf8754c628f057c98697 SYSTEMD_EXEC_PID=1652 LANG=en_US.UTF-8 INVOCATION_ID=59ee96a5f2754a3aab30b521a3505398 GOMAXPROCS=2 SHLVL=0 LD_LIBRARY_PATH=/opt/containerd/lib: JOURNAL_STREAM=8:53427 XDG_DATA_DIRS=/var/lib/flatpak/exports/share:/usr/local/share/:/usr/share/ PATH=/opt/containerd/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin _=/usr/bin/env
--root /var/run/docker/runtime-runc/moby --log /run/containerd/io.containerd.runtime.v2.task/moby/a5cacc4d2c4c1cf56b1051379016a0ddebea9fafc2ffdf8754c628f057c98697/log.json --log-format json delete --force a5cacc4d2c4c1cf56b1051379016a0ddebea9fafc2ffdf8754c628f057c98697
@hholst80
Copy link
Author

hholst80 commented Mar 18, 2023

The only way I can get this to work is to specify a custom nvidia-runtime-cli where I do something really hacky.

[omnicoder@goblin bin]$ cat nvidia-container-cli-wrapper
#!/bin/sh
if [ -n "$HOME" ] # HOME is not set for dockerd as system service
then
	set -- nvidia-container-cli "$@" --no-cgroups
else
	set -- nvidia-container-cli "$@"
fi
echo "$@" >> /tmp/nvidia-container-cli-wrapper.log
exec "$@"
[omnicoder@goblin bin]$

@elezar
Copy link
Member

elezar commented Mar 21, 2023

Hi @hholst80 looking at the code for handling the XDG_CONFIG_HOME in the nvidia-container-runtime the path to the config file in your case is expected to be:

/root/.config/nvidia-container-runtime/config.toml

That is to say the environment variable represents the /etc part of the path and not the directory where the config.toml exists.

We use this envvar in our GPU Operator to apply a specific config file and as such it should work as expected.

Note that using CDI to request NVIDIA Devices instead of the traditional injection mechanisms would support your use case (both rootfull and rootless) out of the box since the components of the NVIDIA container stack would no longer be responsible for setting up cgroups for device access in the container -- instead relying on the low-level runtime such as runc.

We are in the process of adding support to the Docker daemon (moby/moby#45134) as well as the Docker CLI (docker/cli#4084) and these should be available as an experimental feature soon.

If you want to try this out for your usecase beforehand, the nvidia-container-runtime can be set to cdi mode and a CDI specification for available devices generated using the nvidia-ctk cdi generate command. This would require a recent version of the NVIDIA Container Toolkit, and if you are interested I can provide more detailed instructions.

An alternative would be to use podman which offers native support for CDI as of v4.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants