Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU isolation options #45

Open
andy108369 opened this issue Jan 8, 2024 · 0 comments
Open

GPU isolation options #45

andy108369 opened this issue Jan 8, 2024 · 0 comments

Comments

@andy108369
Copy link

We want to make sure one cannot request more AMD GPU than he should by using certain environment variables. (e.g. HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES).
I am not sure whether this is an issue as of today, we cannot verify this since we don't have a box with more than one AMD GPU at the present time.

To bring more clarity, it is possible to expose access to all NVIDIA GPU on the Host via NVIDIA_VISIBLE_DEVICES=all env. variable set to the Pod. Luckily, we were able to work it around by setting --set deviceListStrategy=volume-mounts for nvdp/nvidia-device-plugin helm chart along with these configs in /etc/nvidia-container-runtime/config.toml file:

accept-nvidia-visible-devices-as-volume-mounts = true
accept-nvidia-visible-devices-envvar-when-unprivileged = false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant