Docker and GPUs

This is a planning document for supporting GPU computing in Docker/Podman with BOINC. The eventual goal is to support

multiple GPU types (discrete: NVIDIA, AMD; integrated: Intel, Apple Silicon, ARM Mali, Qualcomm Adreno);
multiple API/toolkits: CUDA, OpenCL, Metal;
multiple OSs: Windows (with WSL), Linux, MacOS.

A BOINC project scientist wanting to do Docker/GPU computing would need to supply (for Intel and/or ARM):

A Dockerfile specifying a base image and possibly some libraries.
An executable that runs in the container. They'd build this on Linux, possibly in a container; we call this the 'build environment'.

The scientist would then create BUDA variants for each processor/GPU combination, with plan classes like 'docker_nvidia_opencl'.

If the scientist uses OpenCL, they could create a 'generic app' that can use any GPU that supports OpenCL, and maybe multicore CPU as well. (They'd use multiple BUDA variants, with the same executable). However, to maximize performance they might want to create versions for each GPU type.

We need to

Create cookbooks showing project how to do the above.
figure out what changes are needed in the client and docker_wrapper.

And we also need to define the 'execution environment':

for Windows, what does our WSL distro (boinc-buda-runner) need to contain?
for Linux/MacOS, what libraries (if any) does the volunteer need to install?

Build enviroment

In a Debian container:

apt-get install wget
wget https://developer.download.nvidia.com/compute/cuda/repos/debian13/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2

Instructions for building test app?

Execution environment: Windows

CUDA

Download the Debian 13 Slim tarball eg debian13_slim.tar.gz OR build it yourself from docker eg:

https://hub.docker.com/layers/library/debian/13-slim/images/sha256-16299d0907c6383b73d648a0eb69a1c76b753b859e4998a490f8fe3cd909bebf

# get the Debian 13 slim image
docker pull debian:13-slim

# save to local file
docker save -o debian13_slim.tar debian:13-slim
gzip debian13_slim.tar

Now in Windows you can create a WSL Linux using this small Debian13 distro, ie in an administrator Windows PowerShell run:

# to remove old wsl install:
wsl --unregister debian_slim

# NB: replace [username] with your Users\* directory
wsl --import debian_slim C:\Users\[username]\AppData\Local\wsl\debian_slim ./debian13_slim.tar.gz

# to launch this distro
wsl -d debian_slim

# we'll at least need sudo right away
apt update ; apt install -y sudo vim wget

# probably sensible to add a boinc user once in the WSL shell eg:
useradd boinc --shell /bin/bash

# probably add the boinc account to the /etc/sudoers file eg add a line after %sudo ie vi /etc/sudoers
useradd boinc --shell /bin/bash

# you probably have to make a boinc home dir:
mkdir /home/boinc
chown boinc:boinc /home/boinc

# add this line after root entry in /etc/sudoers
boinc   ALL=(ALL:ALL) ALL

# set password for boinc account
passwd boinc

# do the rest from this new boinc account
su - boinc

Windows needs it's native drivers for NVidia of course, but containers launched from WSL also need the NVidia Container Toolkit to pass through CUDA instructions to the underlying operating system (so no NVidia drivers needed in WSL)

In the running WSL, eg wsl -d debian_slim, install NVidia Container Toolkit and podman https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

sudo apt-get update && sudo apt-get install -y --no-install-recommends \
   ca-certificates \
   curl \
   gnupg2

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.19.0-1
  sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

# important step is configuring WSL for the NVidia containers:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# now install podman and run containers with CUDA GPU apps
sudo apt install -y podman

# install nvidia container toolkit for this WSL installation
# per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html:
sudo apt-get install -y nvidia-container-toolkit

podman example run command:

podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L

This will pull a small NVidia ready container, output should be a line listing your GPU information e.g.:

GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-0c5b2b4f-7b5e-0e50-b7b6-934235d41d3e)

You can specify a GPU to use in an environment variable CUDA_VISIBLE_DEVICES.
export CUDA_VISIBLE_DEVICES=0 (where 0 is the GPU # to run on)

This can also be set on the podman (or docker) command line:

podman run --rm -e CUDA_VISIBLE_DEVICES=0 --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L

Example Commands with Podman

pull a slim debian container to run in podman

podman pull docker.io/library/debian:stable-slim

list podman images - you can see it's a small container image ie 80MB

podman images

run nvidia command in the container (returns a line of GPUs installed, if any)

podman run --rm --device nvidia.com/gpu=all --name boinc_cuda docker.io/library/debian:stable-slim nvidia-smi -L

run this container as a shell

podman run -it --rm --device nvidia.com/gpu=all --security-opt=label=disable --name boinc_cuda docker.io/library/debian:stable-slim /bin/bash

Development of CUDA applications (C++, Python, NVidia compiler nvcc)

For development purposes, you can install the NVidia and GNU compilers into your WSL instance (note this will blow up the install to 10-20 GB).

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=13

wget https://developer.download.nvidia.com/compute/cuda/repos/debian13/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-3
export PATH=$PATH:/usr/local/cuda-13.3/bin

Build CUDA Samples: https://github.com/NVIDIA/cuda-samples/blob/master/README.md

sudo apt install -y git cmake
git clone https://github.com/NVIDIA/cuda-samples.git
cd cuda-samples
cmake -B build
cd build
make -j$(nproc)

If all builds well you can also "make install" to put the binaries in a central location. You can then use "podman cp" to copy execs to a running podman container to see how they run NVidia apps. After "make install" they will be in cuda-samples/release (../release in relation to your build directory):

cd ../release

start an interactive podman container shell in one wsl window:

podman run -it --rm --device nvidia.com/gpu=all --security-opt=label=disable --name boinc_cuda docker.io/library/debian:stable-slim /bin/bash

copy to podman container

podman cp vectorAdd boinc_cuda:/ podman cp transpose boinc_cuda:/ etc

go to the WSL window running the podman bash shell and try to run ie:

/vectorAdd /transpose etc

Note of course that once you close this container you lose all these programs, so you may want to save this running container as a new image to reuse later, ie via the "podman commit" command.

to do: Python CUDA apps & containers

OpenCL

Discussion

Carl did the above starting with a Debian WSL distro.

Can we use Alpine (with its Musl libc)?
If not, is there a thin glibc distro we can use?

There are a lot of steps, and this is only for CUDA. Should we do this configuration:

Ourselves, hardwired into boinc-buda-runner? (downside: it might get big, with stuff that a particular host would never use)
As commands from the BOINC client? (could tailor these based on GPUs present on host).
As a script included in boinc-buda-runner? (e.g. the client would run config_cuda in the distro if an NVIDIA GPU is present).

Docker and GPUs

Build enviroment

Execution environment: Windows

CUDA

Example Commands with Podman

Development of CUDA applications (C++, Python, NVidia compiler nvcc)

start an interactive podman container shell in one wsl window:

copy to podman container

go to the WSL window running the podman bash shell and try to run ie:

to do: Python CUDA apps & containers

OpenCL

Discussion

Linux

CUDA

OpenCL

MacOS

OpenCL

Metal

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!