Inject Nvidia GPUs using volume-mounts to isolate them to assigned pods #3718

chiragjn · 2024-01-18T21:36:00Z

Issue number: NA

Motivation
When using nodes with multiple gpus (e.g. g4dn.12xlarge), the default way of using nvidia-container-toolkit and nvidia-device-plugin leads to a problem where gpus are not exclusively isolated to the pods they are assigned to.
This is because nvidia-device-plugin by default looks up NVIDIA_VISIBLE_DEVICES to decide which gpu devices to pass on to nvidia-container-toolkit / nvidia-container-cli to inject in the pod.

Most nvidia cuda base images have env NVIDIA_VISIBLE_DEVICES=all baked into them which means a pod with such an image will get access to all gpu cards instead of exclusively getting the number requested in resources.limits yet the kubelet will only report the number in resources.limits as allocated.

E.g. On a 4 GPU node,

Pod 1 requests (with NVIDIA_VISIBLE_DEVICES=all in image)

resources:
  limits:
    nvidia.com/gpu: 1

Pod 2 requests (with NVIDIA_VISIBLE_DEVICES=all in image)

resources:
  limits:
    nvidia.com/gpu: 2

In this scenario, both pods get access to all 4 cards and node will report

nvidia.com/gpu
Allocated: 3
Free: 1

This isn't good because the deployed non-privileged pods are unaware of each other and expect exclusive access to requested cards.

References:

Description of changes:

We follow the guidelines in the above docs.

For nvidia-device-plugin, we set

--device-list-strategy volume-mounts to pass allocated devices as volume mounts instead of bypassing and relying on value of NVIDIA_VISIBLE_DEVICES

Configure the toolkit to only accept devices as volume mounts when the pod is not privileged.

NVIDIA_VISIBLE_DEVICES as env var will still be considered for privileged pods.

Testing done:

~~I would need help/advice testing this out on actual bottlerocket nodes.~~

EDIT: I built a custom AMI for EKS 1.28 following the docs and was able to confirm these changes work as expected.

My employer has been running this config on AL2 nodes without any issues ensuring correct bin packing per node. I am attaching tests we did for different scenarios.

Tests on AL2

g4dn.12xlarge - 4 GPUS
Privileged	Requested GPU Count	env `NVIDIA_VISIBLE_DEVICES`	Assigned GPUs
No	2	all	2 GPUs assigned
No	2	none	2 GPUs assigned
No	2	void	2 GPUs assigned
No	2	0,2,3	2 GPUs assigned
No	0	all	No GPUs assigned
No	0	none	No GPUs assigned
No	0	void	No GPUs assigned
No	0	0,2,3	No GPUs assigned
Yes	2	all	All 4 assigned
Yes	2	none	All 4 assigned
Yes	2	void	All 4 assigned
Yes	2	0,2,3	All 4 assigned
Yes	0	all	All 4 assigned
Yes	0	none	All 4 assigned
Yes	0	void	No GPUs assigned
Yes	0	0,2,3	All 4 assigned

As you can see, in non-privileged mode NVIDIA_VISIBLE_DEVICES will be ignored entirely
Some of the info in the Google Docs linked above is outdated so doesn't align with above results

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

chiragjn · 2024-01-21T17:17:03Z

@arnaldo2792 Would really appreciate your review or someone else's from the team who has worked on this :)

packages/nvidia-container-toolkit/nvidia-oci-hooks-json

chiragjn · 2024-01-23T12:08:03Z

On another thought, since bottlerocket is not EKS exclusive, would it make sense to have these as settings so users can configure according to their needs?

arnaldo2792 · 2024-01-23T18:50:27Z

Thanks for the great catch @chiragjn! (yet another, nice!). As I mentioned elsewhere, we are still thinking on what changes we want to ask for this PR since as it is, it could break the ECS variant. Once again, thanks for the contribution, and I'll reply back soon with the suggestions!

chiragjn · 2024-01-24T20:38:52Z

FWIW, I was able to build a custom AMI with these changes for 1.28-nvidia variant fairly easily
Kudos for the great docs and setup
I can now confirm these changes work as expected :)

Maybe till the contributing team decides on how to expose these in Settings API,
The container runtime config can be templated like follows

{{#if K8s}}
accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false
{{/if}}

...

AFAIK, the device plugin is K8s only so the config can be safely changed.

Not relevant to this discussion directly:

Additionally I removed the GSP firmware files because some versions of 535 driver cause XID 119 errors and make the whole GPU unresponsive when trying to run DCGM. See Problem with NVIDIA GSP and g4dn, g5, and g5g instances awslabs/amazon-eks-ami#1523

arnaldo2792 · 2024-01-24T23:04:20Z

re: suggested implementation

{{#if K8s}}

Unfortunately, we don't provide a handlebars helper to evaluate if the current host is k8s or ECS. The solution that aligns best with Out-of-tree-builds (see #2669) is to have two different sub-packages for nvidia-container-toolkit, one per variant, and include it as needed while building the final image. But, to accomplish this, we need more changes on other places.

Additionally I removed the GSP firmware files because some versions of 535 driver cause XID 119 errors and make the whole GPU unresponsive when trying to run DCGM

Thanks for the heads up! I read the threads and I'll contact the author of this comment since we should align to what they are planning to do to fix this problem.

chiragjn · 2024-01-29T15:12:01Z

Just curious if providing a handlebar for variant name being built might help and what kind of effort might that take, I'll be happy to give it a shot (with some guidance) if that is a good approach :)

arnaldo2792 · 2024-01-29T23:10:50Z

packages/nvidia-container-toolkit/nvidia-container-toolkit-config.toml

@@ -1,3 +1,6 @@
+accept-nvidia-visible-devices-as-volume-mounts = true


Hey @chiragjn, we had an internal discussion about the changes in this PR. I'll provide diffs of examples of how to accomplish what we need and explanations of why they are needed to try to ease the learning curve.

The first thing is to have two configuration files:

nvidia-container-toolkit-config-ecs.toml: this should be the file as it is today

nvidia-container-toollkit-config-k8s.toml: this is thew new file with the configurations as you have them here.

Once you have the two files, you need to update the nvidia-container-toolkit.spec file two add both sources as follows:

Source0: https://%{goimport}/archive/v%{gover}/nvidia-container-toolkit-%{gover}.tar.gz -Source1: nvidia-container-toolkit-config.toml +Source1: nvidia-container-toolkit-config-k8s.toml Source2: nvidia-container-toolkit-tmpfiles.conf Source3: nvidia-oci-hooks-json Source4: nvidia-gpu-devices.rules +Source5: nvidia-container-toolkit-config-ecs.toml

And install them like this under the %install section in the spec file:

-install -m 0644 %{S:1} %{buildroot}%{_cross_factorydir}/etc/nvidia-container-runtime/config.toml +install -m 0644 %{S:1} %{buildroot}%{_cross_factorydir}/etc/nvidia-container-runtime/ +install -m 0644 %{S:5} %{buildroot}%{_cross_factorydir}/etc/nvidia-container-runtime/

Having two files will allow us to prevent conditionally including one or the other based on the variant information. But, we still need a way to include either one, we will accomplish this by creating two sub-packages of nvidia-container-toolkit. You can do this by adding something similar to the following lines right after the last %description section in the spec:

%description %{summary}. +%package ecs +Summary: Files specific for the ECS variants +Requires: %{name} + +%description ecs +%{summary}. + +%package k8s +Summary: Files specific for the Kubernetes variants +Requires: %{name} + +%description k8s +%{summary}. + %prep

This will create the two subpackages: nvidia-container-toolkit-ecs and nvidia-container-toolkit-k8s. Notice in the diff the Requires: %{name} snippet, this will guarantee that nvidia-container-toolkit is installed alongside nvidia-container-toolkit-<subpackage>. After this, you need to include the correct file per package in the %files section:

%{_cross_templatedir}/nvidia-oci-hooks-json -%{_cross_factorydir}/etc/nvidia-container-runtime/config.toml %{_cross_tmpfilesdir}/nvidia-container-toolkit.conf %{_cross_udevrulesdir}/90-nvidia-gpu-devices.rules + +%files ecs +%{_cross_factorydir}/etc/nvidia-container-runtime/nvidia-container-toolkit-config-ecs.toml + +%files k8s +%{_cross_factorydir}/etc/nvidia-container-runtime/nvidia-container-toolkit-config-k8s.toml

The last change in the spec file is to create the actual configuration file that will be used by nvidia-container-runtime-hook. In Bottlerocket, we use the "factory" feature of tmpfilesd to create certain files at /etc. For /etc/nvidia-container-runtime/config.toml the source of the factory is the file at %{_cross_factorydir}/etc/nvidia-container-runtime/config.toml. Thus, to provide the file for the factory you will create a symlink at this location that points to the correct configuration file per variant. You can do this in a post install script for each sub-package as follows:

+%post ecs -p <lua> +posix.link("%{_cross_factorydir}/etc/nvidia-container-runtime/nvidia-container-toolkit-config-ecs.toml", "%{_cross_factorydir}/etc/nvidia-container-runtime/config.toml") + +%post k8s -p <lua> +posix.link("%{_cross_factorydir}/etc/nvidia-container-runtime/nvidia-container-toolkit-config-k8s.toml", "%{_cross_factorydir}/etc/nvidia-container-runtime/config.toml") +

The %post scripts should be placed in between the %install and %files sections.

The last thing to glue the changes together is to update each *-nvidia variant to include the variant-specific nvidia-container-toolkit sub-package. You can do this by updating the file at variants/*-nvidia/Cargo.toml file as follows:

included-packages = [ "ecs-agent", # NVIDIA support "ecs-gpu-init", - "nvidia-container-toolkit", + "nvidia-container-toolkit-ecs", "kmod-6.1-nvidia-tesla-535", ]

And that's it! If all this is too overwhelming, or you don't have the bandwidth to work on it, please let me know, I can take over your changes and drive the PR to completion 👍 .

Thank you for the extremely precise diff, you practically did all the work 😅
Anyway I have made the changes and was able to build an AMI and test it out and can confirm they work as expected

I also confirmed changes using an admin container

bash-5.1# pwd /x86_64-bottlerocket-linux-gnu/sys-root/usr/share/factory/etc/nvidia-container-runtime bash-5.1# ls -li total 8 2325 -rw-r--r--. 2 root root 237 Jan 30 18:53 config.toml 2325 -rw-r--r--. 2 root root 237 Jan 30 18:53 nvidia-container-toolkit-config-k8s.toml

Just two questions

Do the sub-packages need to appear in Cargo.lock too? I tried cargo generate-lockfile but nothing changed.

Should config.toml be a hard link or soft link? If I read the docs correctly posix.link by default creates hard link

I forgot to answer your questions, sorry!

No, they don't, included-packages is a field we use in the build system

A hard link should be OK.

arnaldo2792

Changes look great 🎉 ! Just have one request, could you please squash your commits? We don't squash them when we merge the PR 😅. And FYI, it seems like the key you used to sign the last commits wasn't uploaded to GitHub and the commits show as Unverified.

In my end, I'm just missing testing the kubernetes variants. I'll do that first thing tomorrow. I already validated the ECS variants and things look good.

chiragjn · 2024-02-01T08:28:46Z

No worries, I have squashed and rebased :)

packages/nvidia-container-toolkit/nvidia-container-toolkit.spec

bcressey · 2024-02-05T21:38:23Z

variants/aws-ecs-1-nvidia/Cargo.toml

@@ -30,6 +30,7 @@ included-packages = [
 # NVIDIA support
    "ecs-gpu-init",
    "nvidia-container-toolkit",
+    "nvidia-container-toolkit-ecs",


nit: I prefer to list only the most specific set of leaf packages here - you could drop "nvidia-container-toolkit" everywhere since the -k8s or -ecs subpackage will pull that in by way of dependencies.

Done :)
Thanks for the review!

bcressey · 2024-02-05T21:43:24Z

@chiragjn - thanks for the contribution! It looks pretty much ready to me, with a couple of minor nits that you could address if you have the cycles.

arnaldo2792 · 2024-02-06T01:31:08Z

I confirmed the PR fixes the problem, and I can't get all the devices when NVIDIA_VISIBLE_DEVICES is all:

k8s-1.23

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

Fri Feb  2 21:10:44 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   20C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Fri Feb  2 21:12:05 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   20C    P8     8W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   19C    P8     8W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

k8s-1.24

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

Fri Feb  2 21:51:08 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1D.0 Off |                    0 |
| N/A   22C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Fri Feb  2 21:51:50 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1B.0 Off |                    0 |
| N/A   22C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:1C.0 Off |                    0 |
| N/A   22C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

k8s-1.25

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

nvidia-pr on Fedora ❯ k exec gpu-tests-2-87rnk -- nvidia-smi
Sat Feb  3 00:16:40 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    Off | 00000000:00:1E.0 Off |                    0 |
|  0%   16C    P8               8W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Sat Feb  3 00:17:15 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    Off | 00000000:00:1C.0 Off |                    0 |
|  0%   16C    P8               9W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G                    Off | 00000000:00:1D.0 Off |                    0 |
|  0%   17C    P8               8W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

k8s-1.26

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

Mon Feb  5 18:36:42 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    Off | 00000000:00:1D.0 Off |                    0 |
|  0%   16C    P8               9W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Mon Feb  5 18:37:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A10G                    Off | 00000000:00:1B.0 Off |                    0 |
|  0%   15C    P8               8W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A10G                    Off | 00000000:00:1C.0 Off |                    0 |
|  0%   15C    P8               9W / 300W |      4MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

k8s-1.27

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

Mon Feb  5 23:14:51 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1D.0 Off |                    0 |
| N/A   19C    P8              10W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Mon Feb  5 23:15:24 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1C.0 Off |                    0 |
| N/A   20C    P8               8W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   20C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

k8s-1.28

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

Mon Feb  5 23:24:52 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   19C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Mon Feb  5 23:25:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1B.0 Off |                    0 |
| N/A   19C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:1C.0 Off |                    0 |
| N/A   19C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

k8s-1.29

With NVIDIA_VISIBLE_DEVICES=all and nvidia.com/gpu=1

Tue Feb  6 01:24:04 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1C.0 Off |                    0 |
| N/A   20C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Without NVIDIA_VISIBLE_DEVICES and nvidia.com/gpu=2

Tue Feb  6 01:24:34 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1B.0 Off |                    0 |
| N/A   21C    P8              11W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   21C    P8              11W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

…ed pods Create separate container toolkit config for ECS and K8s Apply suggestions from code review Drop `nvidia-container-toolkit` because it is now a transitive dependency Motivation --- When using nodes with multiple gpus (e.g. g4dn.12xlarge), the default way of using nvidia-container-toolkit and nvidia-device-plugin leads to a problem where gpus are not exclusively isolated to the pods they are assigned to. This is because nvidia-device-plugin by default looks up NVIDIA_VISIBLE_DEVICES to decide which gpu devices to pass on to nvidia-container-toolkit / nvidia-container-cli to inject in the pod. Most nvidia cuda base images have env NVIDIA_VISIBLE_DEVICES=all baked into them which means a pod with such an image will get access to all gpu cards instead of exclusively getting the number requested in resources.limits yet the kubelet will only report the number in resources.limits as allocated. E.g. On a 4 GPU node, Pod 1 requests (with NVIDIA_VISIBLE_DEVICES=all in image) ``` resources: limits: nvidia.com/gpu: 1 ``` Pod 2 requests (with NVIDIA_VISIBLE_DEVICES=all in image) ``` resources: limits: nvidia.com/gpu: 2 ``` In this scenario, both pods get access to all 4 cards and node will report ``` nvidia.com/gpu Allocated: 3 Free: 1 ``` This isn't good because the deployed non-privileged pods are unaware of each other and expect exclusive access to requested cards.

bcressey

🚀

arnaldo2792 requested changes Jan 22, 2024

View reviewed changes

packages/nvidia-container-toolkit/nvidia-oci-hooks-json Outdated Show resolved Hide resolved

chiragjn requested a review from arnaldo2792 January 23, 2024 07:21

arnaldo2792 reviewed Jan 30, 2024

View reviewed changes

chiragjn marked this pull request as draft January 30, 2024 18:04

chiragjn marked this pull request as ready for review January 30, 2024 20:35

chiragjn requested a review from arnaldo2792 January 31, 2024 05:06

arnaldo2792 reviewed Feb 1, 2024

View reviewed changes

chiragjn force-pushed the cj_nvidia_gpu_isolation branch from 8d1adb6 to 6150a72 Compare February 1, 2024 08:27

chiragjn requested a review from arnaldo2792 February 4, 2024 17:02

bcressey reviewed Feb 5, 2024

View reviewed changes

chiragjn force-pushed the cj_nvidia_gpu_isolation branch from 428fe9f to 911775f Compare February 6, 2024 05:15

yeazelm approved these changes Feb 6, 2024

View reviewed changes

bcressey approved these changes Feb 6, 2024

View reviewed changes

arnaldo2792 approved these changes Feb 7, 2024

View reviewed changes

arnaldo2792 merged commit 0da372a into bottlerocket-os:develop Feb 7, 2024
50 checks passed

vyaghras mentioned this pull request Feb 21, 2024

v1.19.2 💘 Tracking Issue #3795

Closed

9 tasks

ginglis13 mentioned this pull request May 6, 2024

Fail to detect GPU on Bottlerocket v1.19 within AWS g4dn instance #3937

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inject Nvidia GPUs using volume-mounts to isolate them to assigned pods #3718

Inject Nvidia GPUs using volume-mounts to isolate them to assigned pods #3718

chiragjn commented Jan 18, 2024 •

edited

chiragjn commented Jan 21, 2024

chiragjn commented Jan 23, 2024

arnaldo2792 commented Jan 23, 2024

chiragjn commented Jan 24, 2024 •

edited

arnaldo2792 commented Jan 24, 2024

chiragjn commented Jan 29, 2024

arnaldo2792 Jan 29, 2024

chiragjn Jan 30, 2024

arnaldo2792 Feb 2, 2024

arnaldo2792 left a comment

chiragjn commented Feb 1, 2024

bcressey Feb 5, 2024

chiragjn Feb 6, 2024

bcressey commented Feb 5, 2024

arnaldo2792 commented Feb 6, 2024

bcressey left a comment

		@@ -1,3 +1,6 @@
		accept-nvidia-visible-devices-as-volume-mounts = true

Inject Nvidia GPUs using volume-mounts to isolate them to assigned pods #3718

Inject Nvidia GPUs using volume-mounts to isolate them to assigned pods #3718

Conversation

chiragjn commented Jan 18, 2024 • edited

chiragjn commented Jan 21, 2024

chiragjn commented Jan 23, 2024

arnaldo2792 commented Jan 23, 2024

chiragjn commented Jan 24, 2024 • edited

arnaldo2792 commented Jan 24, 2024

chiragjn commented Jan 29, 2024

arnaldo2792 Jan 29, 2024

Choose a reason for hiding this comment

chiragjn Jan 30, 2024

Choose a reason for hiding this comment

arnaldo2792 Feb 2, 2024

Choose a reason for hiding this comment

arnaldo2792 left a comment

Choose a reason for hiding this comment

chiragjn commented Feb 1, 2024

bcressey Feb 5, 2024

Choose a reason for hiding this comment

chiragjn Feb 6, 2024

Choose a reason for hiding this comment

bcressey commented Feb 5, 2024

arnaldo2792 commented Feb 6, 2024

bcressey left a comment

Choose a reason for hiding this comment

chiragjn commented Jan 18, 2024 •

edited

chiragjn commented Jan 24, 2024 •

edited