GPU isolation options #45

andy108369 · 2024-01-08T12:55:45Z

We want to make sure one cannot request more AMD GPU than he should by using certain environment variables. (e.g. HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES).
I am not sure whether this is an issue as of today, we cannot verify this since we don't have a box with more than one AMD GPU at the present time.

To bring more clarity, it is possible to expose access to all NVIDIA GPU on the Host via NVIDIA_VISIBLE_DEVICES=all env. variable set to the Pod. Luckily, we were able to work it around by setting --set deviceListStrategy=volume-mounts for nvdp/nvidia-device-plugin helm chart along with these configs in /etc/nvidia-container-runtime/config.toml file:

accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false

The text was updated successfully, but these errors were encountered:

andy108369 mentioned this issue Jan 8, 2024

AMD Support akash-network/support#142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU isolation options #45

GPU isolation options #45

andy108369 commented Jan 8, 2024

GPU isolation options #45

GPU isolation options #45

Comments

andy108369 commented Jan 8, 2024