Docker and GPUs

This is a planning document for supporting GPU computing in Docker/Podman with BOINC. The eventual goal is to support

multiple GPU types (discrete: NVIDIA, AMD; integrated: Intel, Apple Silicon, ARM Mali, Qualcomm Adreno);
multiple API/toolkits: CUDA, OpenCL, Metal;
multiple OSs: Windows (with WSL), Linux, MacOS.

A BOINC project scientist wanting to do Docker/GPU computing would need to supply (for Intel and/or ARM):

A Dockerfile specifying a base image and possibly some libraries.
An executable that runs in the container. They'd build this on Linux, possibly in a container; we call this the 'build environment'.

The scientist would then create BUDA variants for each processor/GPU combination, with plan classes like 'docker_nvidia_opencl'.

If the scientist uses OpenCL, they could create a 'generic app' that can use any GPU that supports OpenCL, and maybe multicore CPU as well. (They'd use multiple BUDA variants, with the same executable). However, to maximize performance they might want to create versions for each GPU type.

We need to

Create cookbooks showing project how to do the above.
figure out what changes are needed in the client and docker_wrapper.

And we also need to define the 'execution environment':

for Windows, what does our WSL distro (boinc-buda-runner) need to contain?
for Linux/MacOS, what libraries (if any) does the volunteer need to install?

Build enviroment

In a Debian container:

apt-get install wget
wget https://developer.download.nvidia.com/compute/cuda/repos/debian13/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-13-2

Instructions for building test app?

Execution environment: Windows

CUDA

Carl created a WSL distro as follows:

wsl.exe --install Debian
??? wsl run Debian?
sudo apt-get update
sudo apt-get install podman
sudo apt-get install -y nvidia-container-toolkit
sudo apt-get podman (???)

Then, as described in https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html:

sudo apt-get update && sudo apt-get install -y --no-install-recommends \
   ca-certificates \
   curl \
   gnupg2

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update

export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.19.0-1
  sudo apt-get install -y \
      nvidia-container-toolkit=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools=${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1=${NVIDIA_CONTAINER_TOOLKIT_VERSION}

NVidia suggests CDI (Container Device Interface) for podman: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html

check status in WSL:

sudo systemctl status nvidia-cdi-refresh.path

enable nvidia services in WSL:

sudo systemctl enable --now nvidia-cdi-refresh.path
sudo systemctl enable --now nvidia-cdi-refresh.service

podman example run command:

podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L

I needed to manually regenerate CDI in WSL:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

this lists my GPU:

nvidia-smi -L
outputs: GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-0c5b2b4f-7b5e-0e50-b7b6-934235d41d3e)

podman run --rm --device nvidia.com/gpu=all --security-opt=label=disable ubuntu nvidia-smi -L
outputs: GPU 0: NVIDIA GeForce RTX 5060 Ti (UUID: GPU-0c5b2b4f-7b5e-0e50-b7b6-934235d41d3e)

view CDI config file:

cat /etc/cdi/nvidia.yaml

can manually edit the CDI config file to reference a GPU name

sudo vi /etc/cdi/nvidia.yaml

cdiVersion: 0.3.0
kind: nvidia.com/gpu
devices:
    - name: gpu0
      containerEdits:
        deviceNodes:
            - path: /dev/dxg
              major: 10
              minor: 125
              fileMode: 438
              permissions: rwm

podman run --rm --device nvidia.com/gpu=gpu0 --security-opt=label=disable ubuntu nvidia-smi -L

so instead of "all" I put "gpu0". But it seems to just be a dumb label, I put my name "carl" and as long as my podman run gpu=carl it worked fine

using nvidia/cuda container in podman:

podman run -it --rm --device nvidia.com/gpu=gpu0 nvcr.io/nvidia/cuda:13.2.0-cudnn-runtime-ubuntu24.04 nvidia-smi -L

if go back to an /etc/cdi/nvidia.yaml with gpu=all set and run:

podman run -it --rm --device nvidia.com/gpu=gpu0 nvcr.io/nvidia/cuda:13.2.0-cudnn-runtime-ubuntu24.04 nvidia-smi

it seems then you get the GPU#0 ID and boinc or the application can set which GPU to run via environment variable: export CUDA_VISIBLE_DEVICES=0 (where 0 is the GPU # to run on)

podman run -e CUDA_VISIBLE_DEVICES=0 -it --rm --device nvidia.com/gpu=all localhost/boinc_cuda /cuda/add_test

podman run -e CUDA_VISIBLE_DEVICES=1 -it --rm --device nvidia.com/gpu=all localhost/boinc_cuda /cuda/add_test

Discussion

Carl did the above starting with a Debian WSL distro.

Can we use Alpine (with its Musl libc)?
If not, is there a thin glibc distro we can use?

There are a lot of steps, and this is only for CUDA. Should we do this configuration:

Ourselves, hardwired into boinc-buda-runner? (downside: it might get big, with stuff that a particular host would never use)
As commands from the BOINC client? (could tailor these based on GPUs present on host).
As a script included in boinc-buda-runner? (e.g. the client would run config_cuda in the distro if an NVIDIA GPU is present).

Linux

MacOS

Home

Docker and GPUs

Build enviroment

Execution environment: Windows

CUDA

Discussion

Linux

MacOS

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!