Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia GPU Support for Windows #19005

Closed
juliusfrost opened this issue Jun 26, 2023 · 11 comments
Closed

Nvidia GPU Support for Windows #19005

juliusfrost opened this issue Jun 26, 2023 · 11 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. machine windows issue/bug on Windows

Comments

@juliusfrost
Copy link

Feature request description

Hello, I was looking into Podman as an alternative to Docker for machine learning applications. As far as I can tell there is support for Linux systems through nvidia-container-toolkit but I couldn't find instructions for this on Windows. It would be great if there was support for this as this + Podman Desktop would be a nice drop in replacement for Docker Desktop which is currently the only viable solution for GPU containers on Windows.

Suggest potential solution

It seems that the Podman virtual machine doesn't have access to the Nvidia drivers, but other machines on WSL2 do (tested by running nvidia-smi). I don't understand why, but fixing this may be a start.

Have you considered any alternatives?

Not sure what the best approach is here.

Additional context

No response

@juliusfrost juliusfrost added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 26, 2023
@Luap99 Luap99 added machine windows issue/bug on Windows labels Jun 27, 2023
@rhatdan
Copy link
Member

rhatdan commented Jul 2, 2023

@n1hility @baude WDYT?

@github-actions
Copy link

github-actions bot commented Aug 2, 2023

A friendly reminder that this issue had no activity for 30 days.

@n1hility
Copy link
Member

n1hility commented Aug 2, 2023

this one is still relevant

@github-actions
Copy link

github-actions bot commented Sep 2, 2023

A friendly reminder that this issue had no activity for 30 days.

@rootfs
Copy link

rootfs commented Oct 8, 2023

I tested podman desktop on my windows 11 pro, with nvidia cuda for wsl2 installed. I can verify podman virtual machine has the same access to nvidia-smi as other wsl2 machines.
image

@rootfs
Copy link

rootfs commented Oct 8, 2023

In order to use gpu, I also did the following:

ssh to podman machine

podman machine ssh

inside podman machine, install nvidia container toolkit

following instructions from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-with-yum-or-dnf

#curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# sudo yum install -y nvidia-container-toolkit
# sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# nvidia-ctk cdi list

If successful, nvidia-ctk cdi list will show all the devices.

test it out

image

@aa956
Copy link

aa956 commented Jan 26, 2024

Does not seem to work anymore.

PS C:\> podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Warning: Permanently added '[localhost]:51973' (ED25519) to the list of known hosts.
Last login: Fri Jan 26 21:01:31 2024 from ::1
[root@desktop02 ~]# nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all
[root@desktop02 ~]# nvidia-container-cli info
NVRM version:   546.33
CUDA version:   12.3

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 4060 Ti
Brand:          GeForce
GPU UUID:       GPU-47bcd798-877b-083b-5b3c-4ceae75bb8a5
Bus Location:   00000000:01:00.0
Architecture:   8.9
[root@desktop02 ~]#
logout
Connection to localhost closed.
PS C:\> podman run --device nividia.com/gpu=all --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -gpu
Error: preparing container 9f627b7fe28f8765aad55445b172ffc08f22f6fc24fa6a6b24b9b883d49a3aec for attach: setting up CDI devices: unresolvable CDI devices nividia.com/gpu=all
PS C:\> podman --version
podman.exe version 4.9.0
PS C:\>

@rootfs
Copy link

rootfs commented Jan 26, 2024

Does not seem to work anymore.

PS C:\> podman machine ssh
Connecting to vm podman-machine-default. To close connection, use `~.` or `exit`
Warning: Permanently added '[localhost]:51973' (ED25519) to the list of known hosts.
Last login: Fri Jan 26 21:01:31 2024 from ::1
[root@desktop02 ~]# nvidia-ctk cdi list
INFO[0000] Found 1 CDI devices
nvidia.com/gpu=all
[root@desktop02 ~]# nvidia-container-cli info
NVRM version:   546.33
CUDA version:   12.3

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 4060 Ti
Brand:          GeForce
GPU UUID:       GPU-47bcd798-877b-083b-5b3c-4ceae75bb8a5
Bus Location:   00000000:01:00.0
Architecture:   8.9
[root@desktop02 ~]#
logout
Connection to localhost closed.
PS C:\> podman run --device nividia.com/gpu=all --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -gpu
Error: preparing container 9f627b7fe28f8765aad55445b172ffc08f22f6fc24fa6a6b24b9b883d49a3aec for attach: setting up CDI devices: unresolvable CDI devices nividia.com/gpu=all
PS C:\> podman --version
podman.exe version 4.9.0
PS C:\>
podman run --device nividia.com/gpu=all 

is a typo, you need to set nvidia.com/gpu=all

@aa956
Copy link

aa956 commented Jan 26, 2024

is a typo, you need to set nvidia.com/gpu=all

Thank you!!!!

I feel soo dumb now!!

Works nicely.

PS C:\> podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4060 Ti (UUID: GPU-47bcd798-877b-083b-5b3c-4ceae75bb8a5)
PS C:\> podman run --device nvidia.com/gpu=all --security-opt=label=disable nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -gpu
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 8.9 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 8.9

> Compute 8.9 CUDA device: [NVIDIA GeForce RTX 4060 Ti]
34816 bodies, total time for 10 iterations: 19.922 ms
= 608.457 billion interactions per second
= 12169.144 single-precision GFLOP/s at 20 flops per interaction
PS C:\>

@rhatdan
Copy link
Member

rhatdan commented Jan 28, 2024

Looks like this works so closing.

@rhatdan rhatdan closed this as completed Jan 28, 2024
@znmeb
Copy link

znmeb commented Mar 31, 2024

Can this be added to the official documentation at the next release? This looks insanely useful!! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. machine windows issue/bug on Windows
Projects
None yet
Development

No branches or pull requests

7 participants