# GPU & CUDA Integration

This module details how NVIDIA GPU drivers, CUDA toolkits, and machine learning frameworks (PyTorch, TensorFlow, PaddlePaddle) are integrated into the LabNow Docker ecosystem.

---

## 1. CUDA Wrappers and Images (`docker_cuda`)

GPU compatibility is achieved by wrapping official NVIDIA CUDA development images with LabNow customizations.

### Wrap Hierarchy
Because NVIDIA images start from raw OS configurations, the build system wraps CUDA base images in a multi-step pipeline:
1. **Atom Wrap**: `docker_atom/atom.Dockerfile` is built using the NVIDIA base image (e.g. `nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04`) passed via `BASE_IMG`. This yields an `nvidia-cuda` atom image.
2. **Base Wrap**: `docker_base/base.Dockerfile` is built on top of the `nvidia-cuda` atom image to add Conda, Python, and base tools.
3. **CUDA Finalize**: `docker_cuda/nvidia-cuda.Dockerfile` inherits the base-wrapped image, configures `NVIDIA_DISABLE_REQUIRE=1`, updates debpython path configurations, compiles and installs GPU monitoring utilities, and cleans up.

### GPU Monitoring (`setup_nvtop`)
- Downloads and builds **`nvtop`** from source to display NVIDIA, AMD, and Intel GPU status.
- Requires CMake >= 3.18. If the base OS has an older CMake, it temporarily adds the Kitware APT repository during build.
- Compiles `nvtop` binding to host NVML libraries and cleans up compile dependencies post-install to minimize layer sizes.

---

## 2. Machine Learning Framework Installation

When `ARG_PROFILE_PYTHON` is populated with `torch`, `tf2`, or `paddle`, the core docker installation hook runs specialized setup procedures to configure CUDA acceleration.

### CUDA Version & Device Index Auto-Detection
The build script automatically checks if CUDA compiler compiler (`nvcc`) is present:
- Evaluates `$CUDA_VERSION` to generate a shortened string `$CUDA_VER` (e.g., `12.1` -> `121`).
- Sets `$IDX` to `cu${CUDA_VER}` (e.g. `cu121`) if a GPU compiler is present, else defaults to `cpu`.

### PyTorch Setup (`torch` Profile)
- Evaluates GPU compatibility: If CUDA version is `< 11.7`, it installs PyTorch 1.x, else installs PyTorch 2.x.
- Runs `pip install` targeting the official PyTorch wheel index:
  ```bash
  pip install --no-cache-dir --root-user-action=ignore -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/${IDX}
  ```

### TensorFlow Setup (`tf` Profile)
- Installs either `tensorflow` (CPU/v2) or `tensorflow-gpu` (v1) based on profile version (`tf1` or `tf2`).

### PaddlePaddle Setup (`paddle` Profile)
- Evaluates if NVCC is present to install either `paddlepaddle-gpu` or `paddlepaddle`.
- Uses official index-url `https://www.paddlepaddle.org.cn/packages/stable/${IDX}/`.

---

## 3. NVIDIA Package Size Optimization (Crucial Step)

A major source of layer bloat in GPU images is duplicate NVIDIA CUDA runtime wheels shipped via pip packages (e.g. `nvidia-cuda-runtime-cu12`, `nvidia-cudnn-cu12`). These duplicate files already present in the host system.

To drastically reduce image size:
1. Searches pip freeze outputs for `nvidia-*` packages and purges them:
   ```bash
   pip freeze | awk -F= 'tolower($1) ~ /^nvidia-/ {print $1}' | xargs -r pip uninstall -y
   ```
2. Installs lightweight, system-wide C++ shared libraries instead:
   ```bash
   apt-get update && apt-get install -y --no-install-recommends libcusparselt0 libnccl2 libnccl-dev
   ```
This step typically shaves **several gigabytes** off the final GPU image layers while maintaining full PyTorch/TensorFlow execution functionality.