kubernetes · ComradeProgrammer · Apr 1, 2024
diff --git a/site/content/en/docs/tutorials/nvidia.md b/site/content/en/docs/tutorials/nvidia.md
@@ -7,7 +7,7 @@ date: 2018-01-02
 
 ## Prerequisites
 
-- Linux
+- Linux or Windows with WSL2 installed
 - Latest NVIDIA GPU drivers
 - minikube v1.32.0-beta.0 or later (docker driver only)
 
@@ -31,6 +31,8 @@ date: 2018-01-02
 
 - Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) on your host machine
 
+
+
 - Configure Docker:
   ```shell
   sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
@@ -40,6 +42,41 @@ date: 2018-01-02
   minikube start --driver docker --container-runtime docker --gpus all
   ```
 {{% /tab %}}
+{{% tab Windows-WSL %}}
+## Using the Windows-WSL2 driver
+
+- Endure you have already enabled WSL2. You also need to install the Docker Desktop For Windows.
+
+- Ensure you have an NVIDIA driver installed(via Windows only), if one is not installed follow the [NVIDIA Driver Installation Guide](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html)
+
+**Note: Make sure only install the driver on windows, and DO NOT install any linux nvidia driver**
+
+- After instalation of windows driver, you many also need to execute `cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi` and `chmod ogu+x /usr/bin/nvidia-smi` in WSL2, because otherwise the nvidia-smi may not be found in PATH. You can check if one is installed by running `nvidia-smi`,
+
+-  Install the [Cuda Toolkit for WSL2](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local) inside WSL2. Note you need to select targetOS as linux and distribution as WSL-Ubuntu
+
+- Check if `bpf_jit_harden` is set to `0` inside WSL2
+  ```shell
+  sudo sysctl net.core.bpf_jit_harden
+  ```
+  - If it's not `0` run:
+  ```shell
+  echo "net.core.bpf_jit_harden=0" | sudo tee -a /etc/sysctl.conf
+  sudo sysctl -p
+  ```
+
+- Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) inside WSL2
+
+- Configure Docker inside WSL2:
+  ```shell
+  sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
+  ```
+- Start minikube inside WSL2:
+  ```shell
+  minikube start --driver docker --container-runtime docker --gpus all
+  ```
+{{% /tab %}}
+
 {{% tab none %}}
 ## Using the 'none' driver
 
@@ -145,19 +182,150 @@ Also:
 - nvidia-docker [doesn't support
   macOS](https://github.com/NVIDIA/nvidia-docker/issues/101) either.
 
-## Why does minikube not support NVIDIA GPUs on Windows?
-
-minikube supports Windows host through Hyper-V or VirtualBox.
-
-- VirtualBox doesn't support PCI passthrough for [Windows
-  host](https://www.virtualbox.org/manual/ch09.html#pcipassthrough).
 
-- Hyper-V supports DDA (discrete device assignment) but [only for Windows Server
-  2016](https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment)
+## Hand-on try: an example about training ML model in a Pod of minikube k8s cluster
+Here is a simplest example program from [Pytorch website](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html), which trains a model on MNIST data set. Have a try on it to see that minikube gpu support actually works.
+
+```python
+import torch
+from torch import nn
+from torch.utils.data import DataLoader
+from torchvision import datasets
+from torchvision.transforms import ToTensor
+
+# Download training data from open datasets.
+training_data = datasets.FashionMNIST(
+    root="data",
+    train=True,
+    download=True,
+    transform=ToTensor(),
+)
+
+# Download test data from open datasets.
+test_data = datasets.FashionMNIST(
+    root="data",
+    train=False,
+    download=True,
+    transform=ToTensor(),
+)
+
+batch_size = 64
+
+# Create data loaders.
+train_dataloader = DataLoader(training_data, batch_size=batch_size)
+test_dataloader = DataLoader(test_data, batch_size=batch_size)
+
+for X, y in test_dataloader:
+    print(f"Shape of X [N, C, H, W]: {X.shape}")
+    print(f"Shape of y: {y.shape} {y.dtype}")
+    break
+# Get cpu, gpu or mps device for training.
+device = (
+    "cuda"
+    if torch.cuda.is_available()
+    else "mps"
+    if torch.backends.mps.is_available()
+    else "cpu"
+)
+print(f"Using {device} device")
+
+# Define model
+class NeuralNetwork(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.flatten = nn.Flatten()
+        self.linear_relu_stack = nn.Sequential(
+            nn.Linear(28*28, 512),
+            nn.ReLU(),
+            nn.Linear(512, 512),
+            nn.ReLU(),
+            nn.Linear(512, 10)
+        )
+
+    def forward(self, x):
+        x = self.flatten(x)
+        logits = self.linear_relu_stack(x)
+        return logits
+
+model = NeuralNetwork().to(device)
+print(model)
+
+loss_fn = nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
+
+def train(dataloader, model, loss_fn, optimizer):
+    size = len(dataloader.dataset)
+    model.train()
+    for batch, (X, y) in enumerate(dataloader):
+        X, y = X.to(device), y.to(device)
+
+        # Compute prediction error
+        pred = model(X)
+        loss = loss_fn(pred, y)
+
+        # Backpropagation
+        loss.backward()
+        optimizer.step()
+        optimizer.zero_grad()
+
+        if batch % 100 == 0:
+            loss, current = loss.item(), (batch + 1) * len(X)
+            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")
+
+def test(dataloader, model, loss_fn):
+    size = len(dataloader.dataset)
+    num_batches = len(dataloader)
+    model.eval()
+    test_loss, correct = 0, 0
+    with torch.no_grad():
+        for X, y in dataloader:
+            X, y = X.to(device), y.to(device)
+            pred = model(X)
+            test_loss += loss_fn(pred, y).item()
+            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
+    test_loss /= num_batches
+    correct /= size
+    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
+
+epochs = 5
+for t in range(epochs):
+    print(f"Epoch {t+1}\n-------------------------------")
+    train(train_dataloader, model, loss_fn, optimizer)
+    test(test_dataloader, model, loss_fn)
+print("Done!")
+torch.save(model.state_dict(), "model.pth")
+print("Saved PyTorch Model State to model.pth")
+```
+
+Start minikube with gpu support:
+```shell
+minikube start --driver docker --container-runtime docker --gpus all
+```
+
+Create a pod using `pytorch/pytorch` image, which have all necessary libraries installed, and get a shell from this pod.
+```
+kubectl run torch --image=pytorch/pytorch -it -- /bin/bash
+```
+
+Now copy the file into the pod, and run it with python3. You will see the model is trained with Nvidia GPU and Cuda acceleration. 
+
+```
+... ...
+Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
+Shape of y: torch.Size([64]) torch.int64
+Using cuda device
+NeuralNetwork(
+  (flatten): Flatten(start_dim=1, end_dim=-1)
+  (linear_relu_stack): Sequential(
+    (0): Linear(in_features=784, out_features=512, bias=True)
+    (1): ReLU()
+    (2): Linear(in_features=512, out_features=512, bias=True)
+    (3): ReLU()
+    (4): Linear(in_features=512, out_features=10, bias=True)
+  )
+)
+Epoch 1
+... ...
+```
 
-Since the only possibility of supporting GPUs on minikube on Windows is on a
-server OS where users don't usually run minikube, we haven't invested time in
-trying to support NVIDIA GPUs on minikube on Windows.
 
-Also, nvidia-docker [doesn't support
-Windows](https://github.com/NVIDIA/nvidia-docker/issues/197) either.