-
Notifications
You must be signed in to change notification settings - Fork 512
Docker and GPUs
This is a planning document for supporting GPU computing in Docker/Podman with BOINC. The eventual goal is to support
- multiple GPU types (discrete: NVIDIA, AMD; integrated: Intel, Apple Silicon, ARM Mali, Qualcomm Adreno);
- multiple API/toolkits: CUDA, OpenCL, Metal;
- multiple OSs: Windows (with WSL), Linux, MacOS.
A BOINC project scientist wanting to do Docker/GPU computing would need to supply (for Intel and/or ARM):
- A Dockerfile specifying a base image and possibly some libraries.
- An executable that runs in the container. They'd build this on Linux, possibly in a container; we call this the 'build environment'.
The scientist would then create BUDA variants for each processor/GPU combination, with plan classes like 'docker_nvidia_opencl'.
If the scientist uses OpenCL, they could create a 'generic app' that can use any GPU that supports OpenCL, and maybe multicore CPU as well. (They'd use multiple BUDA variants, with the same executable). However, to maximize performance they might want to create versions for each GPU type.
We need to
- Create cookbooks showing project how to do the above.
- figure out what changes are needed in the client and docker_wrapper.
And we also need to define the 'execution environment':
- for Windows, what does our WSL distro (boinc-buda-runner) need to contain?
- for Linux/MacOS, what libraries (if any) does the volunteer need to install?
In a Debian container:
apt-get install wget
wget https://developer.download.nvidia.com/compute/cuda/repos/debian13/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2
Instructions for building test app?
Download the Debian 13 Slim tarball eg debian13_slim.tar.gz OR build it yourself from docker eg:
# get the Debian 13 slim image
docker pull debian:13-slim
# save to local file
docker save -o debian13_slim.tar debian:13-slim
gzip debian13_slim.tar
Now in Windows you can create a WSL Linux using this small Debian13 distro, ie in an administrator Windows PowerShell run:
# to remove old wsl install:
wsl --unregister debian_slim
# NB: replace [username] with your Users\* directory
wsl --import debian_slim C:\Users\[username]\AppData\Local\wsl\debian_slim ./debian13_slim.tar.gz
# to launch this distro
wsl -d debian_slim
# we'll at least need sudo right away
apt update ; apt install -y sudo vim wget
# probably sensible to add a boinc user once in the WSL shell eg:
useradd boinc --shell /bin/bash
# probably add the boinc account to the /etc/sudoers file eg add a line after %sudo ie vi /etc/sudoers
useradd boinc --shell /bin/bash
# you probably have to make a boinc home dir:
mkdir /home/boinc
chown boinc:boinc /home/boinc
# add this line after root entry in /etc/sudoers
boinc ALL=(ALL:ALL) ALL
# set password for boinc account
passwd boinc
# do the rest from this new boinc account
su - boinc
Windows needs it's native drivers for NVidia of course, but containers launched from WSL also need the NVidia Container Toolkit to pass through CUDA instructions to the underlying operating system (so no NVidia drivers needed in WSL)
In the running WSL, eg wsl -d debian_slim, install NVidia Container Toolkit and podman https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
sudo apt-get update && sudo apt-get install -y --no-install-recommends \
ca-certificates \
curl \
gnupg2
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.19.0-1
sudo apt-get install -y \
nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}
# important step is configuring WSL for the NVidia containers:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# now install podman and run containers with CUDA GPU apps
sudo apt install -y podman
# install nvidia container toolkit for this WSL installation
# per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html:
sudo apt-get install -y nvidia-container-toolkit
podman example run command:
podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
This will pull a small NVidia ready container, output should be a line listing your GPU information e.g.:
GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-0c5b2b4f-7b5e-0e50-b7b6-934235d41d3e)
You can specify a GPU to use in an environment variable CUDA_VISIBLE_DEVICES.
export CUDA_VISIBLE_DEVICES=0 (where 0 is the GPU # to run on)
This can also be set on the podman (or docker) command line:
podman run --rm -e CUDA_VISIBLE_DEVICES=0 --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
- pull a slim debian container to run in podman
podman pull docker.io/library/debian:stable-slim
- list podman images - you can see it's a small container image ie 80MB
podman images
- run nvidia command in the container (returns a line of GPUs installed, if any)
podman run --rm --device nvidia.com/gpu=all --name boinc_cuda docker.io/library/debian:stable-slim nvidia-smi -L
- run this container as a shell
podman run -it --rm --device nvidia.com/gpu=all --security-opt=label=disable --name boinc_cuda docker.io/library/debian:stable-slim /bin/bash
For development purposes, you can install the NVidia and GNU compilers into your WSL instance (note this will blow up the install to 10-20 GB).
wget https://developer.download.nvidia.com/compute/cuda/repos/debian13/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-3
export PATH=$PATH:/usr/local/cuda-13.3/bin
Build CUDA Samples: https://github.com/NVIDIA/cuda-samples/blob/master/README.md
sudo apt install -y git cmake
git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples
cmake -B build
cd build
make -j$(nproc)
If all builds well you can also "make install" to put the binaries in a central location. You can then use "podman cp" to copy execs to a running podman container to see how they run NVidia apps. After "make install" they will be in cuda-samples/release (../release in relation to your build directory):
cd ../release
podman run -it --rm --device nvidia.com/gpu=all --security-opt=label=disable --name boinc_cuda docker.io/library/debian:stable-slim /bin/bash
podman cp vectorAdd boinc_cuda:/ podman cp transpose boinc_cuda:/ etc
/vectorAdd /transpose etc
Note of course that once you close this container you lose all these programs, so you may want to save this running container as a new image to reuse later, ie via the "podman commit" command.
Carl did the above starting with a Debian WSL distro.
- Can we use Alpine (with its Musl libc)?
- If not, is there a thin glibc distro we can use?
There are a lot of steps, and this is only for CUDA. Should we do this configuration:
-
Ourselves, hardwired into boinc-buda-runner? (downside: it might get big, with stuff that a particular host would never use)
-
As commands from the BOINC client? (could tailor these based on GPUs present on host).
-
As a script included in boinc-buda-runner? (e.g. the client would run config_cuda in the distro if an NVIDIA GPU is present).