# GPU & CUDA Integration This module details how NVIDIA GPU drivers, CUDA toolkits, and machine learning frameworks (PyTorch, TensorFlow, PaddlePaddle) are integrated into the LabNow Docker ecosystem. --- ## 1. CUDA Wrappers and Images (`docker_cuda`) GPU compatibility is achieved by wrapping official NVIDIA CUDA development images with LabNow customizations. ### Wrap Hierarchy Because NVIDIA images start from raw OS configurations, the build system wraps CUDA base images in a multi-step pipeline: 1. **Atom Wrap**: `docker_atom/atom.Dockerfile` is built using the NVIDIA base image (e.g. `nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04`) passed via `BASE_IMG`. This yields an `nvidia-cuda` atom image. 2. **Base Wrap**: `docker_base/base.Dockerfile` is built on top of the `nvidia-cuda` atom image to add Conda, Python, and base tools. 3. **CUDA Finalize**: `docker_cuda/nvidia-cuda.Dockerfile` inherits the base-wrapped image, configures `NVIDIA_DISABLE_REQUIRE=1`, updates debpython path configurations, compiles and installs GPU monitoring utilities, and cleans up. ### GPU Monitoring (`setup_nvtop`) - Downloads and builds **`nvtop`** from source to display NVIDIA, AMD, and Intel GPU status. - Requires CMake >= 3.18. If the base OS has an older CMake, it temporarily adds the Kitware APT repository during build. - Compiles `nvtop` binding to host NVML libraries and cleans up compile dependencies post-install to minimize layer sizes. --- ## 2. Machine Learning Framework Installation When `ARG_PROFILE_PYTHON` is populated with `torch`, `tf2`, or `paddle`, the core docker installation hook runs specialized setup procedures to configure CUDA acceleration. ### CUDA Version & Device Index Auto-Detection The build script automatically checks if CUDA compiler compiler (`nvcc`) is present: - Evaluates `$CUDA_VERSION` to generate a shortened string `$CUDA_VER` (e.g., `12.1` -> `121`). - Sets `$IDX` to `cu${CUDA_VER}` (e.g. `cu121`) if a GPU compiler is present, else defaults to `cpu`. ### PyTorch Setup (`torch` Profile) - Evaluates GPU compatibility: If CUDA version is `< 11.7`, it installs PyTorch 1.x, else installs PyTorch 2.x. - Runs `pip install` targeting the official PyTorch wheel index: ```bash pip install --no-cache-dir --root-user-action=ignore -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/${IDX} ``` ### TensorFlow Setup (`tf` Profile) - Installs either `tensorflow` (CPU/v2) or `tensorflow-gpu` (v1) based on profile version (`tf1` or `tf2`). ### PaddlePaddle Setup (`paddle` Profile) - Evaluates if NVCC is present to install either `paddlepaddle-gpu` or `paddlepaddle`. - Uses official index-url `https://www.paddlepaddle.org.cn/packages/stable/${IDX}/`. --- ## 3. NVIDIA Package Size Optimization (Crucial Step) A major source of layer bloat in GPU images is duplicate NVIDIA CUDA runtime wheels shipped via pip packages (e.g. `nvidia-cuda-runtime-cu12`, `nvidia-cudnn-cu12`). These duplicate files already present in the host system. To drastically reduce image size: 1. Searches pip freeze outputs for `nvidia-*` packages and purges them: ```bash pip freeze | awk -F= 'tolower($1) ~ /^nvidia-/ {print $1}' | xargs -r pip uninstall -y ``` 2. Installs lightweight, system-wide C++ shared libraries instead: ```bash apt-get update && apt-get install -y --no-install-recommends libcusparselt0 libnccl2 libnccl-dev ``` This step typically shaves **several gigabytes** off the final GPU image layers while maintaining full PyTorch/TensorFlow execution functionality.