Skip to content

Commit

Permalink
doc: update minikube gpu support on windows wsl2
Browse files Browse the repository at this point in the history
  • Loading branch information
ComradeProgrammer committed Apr 1, 2024
1 parent 86f5c14 commit e75bd6a
Showing 1 changed file with 151 additions and 14 deletions.
165 changes: 151 additions & 14 deletions site/content/en/docs/tutorials/nvidia.md
Expand Up @@ -7,7 +7,7 @@ date: 2018-01-02

## Prerequisites

- Linux
- Linux or Windows with WSL2 installed
- Latest NVIDIA GPU drivers
- minikube v1.32.0-beta.0 or later (docker driver only)

Expand All @@ -19,6 +19,10 @@ date: 2018-01-02

- Ensure you have an NVIDIA driver installed, you can check if one is installed by running `nvidia-smi`, if one is not installed follow the [NVIDIA Driver Installation Guide](https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html)

*Note: if you are using Windows WSL2, only install the driver on windows, and DO NOT install any linux nvidia driver. After instalation of windows driver, you many also need to execute `cp /usr/lib/wsl/lib/nvidia-smi /usr/bin/nvidia-smi` and `chmod ogu+x /usr/bin/nvidia-smi` in WSL2, because otherwise the nvidia-smi may not be found in PATH*

- For(Windows WSL2 users only): Install the [Cuda Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local) inside WSL2. Note you need to select targetOS as linux and distribution as WSL-Ubuntu

- Check if `bpf_jit_harden` is set to `0`
```shell
sudo sysctl net.core.bpf_jit_harden
Expand All @@ -31,6 +35,8 @@ date: 2018-01-02

- Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) on your host machine



- Configure Docker:
```shell
sudo nvidia-ctk runtime configure --runtime=docker && sudo systemctl restart docker
Expand Down Expand Up @@ -145,19 +151,150 @@ Also:
- nvidia-docker [doesn't support
macOS](https://github.com/NVIDIA/nvidia-docker/issues/101) either.

## Why does minikube not support NVIDIA GPUs on Windows?

minikube supports Windows host through Hyper-V or VirtualBox.

- VirtualBox doesn't support PCI passthrough for [Windows
host](https://www.virtualbox.org/manual/ch09.html#pcipassthrough).

- Hyper-V supports DDA (discrete device assignment) but [only for Windows Server
2016](https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/plan/plan-for-deploying-devices-using-discrete-device-assignment)
## Hand-on try: an example about training ML model in a Pod of minikube k8s cluster
Here is a simplest example program from [Pytorch website](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html), which trains a model on MNIST data set. Have a try on it to see that minikube gpu support actually works.

```python
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

# Download training data from open datasets.
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor(),
)

# Download test data from open datasets.
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor(),
)

batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
print(f"Shape of X [N, C, H, W]: {X.shape}")
print(f"Shape of y: {y.shape} {y.dtype}")
break
# Get cpu, gpu or mps device for training.
device = (
"cuda"
if torch.cuda.is_available()
else "mps"
if torch.backends.mps.is_available()
else "cpu"
)
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
def __init__(self):
super().__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 10)
)

def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits

model = NeuralNetwork().to(device)
print(model)

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
model.train()
for batch, (X, y) in enumerate(dataloader):
X, y = X.to(device), y.to(device)

# Compute prediction error
pred = model(X)
loss = loss_fn(pred, y)

# Backpropagation
loss.backward()
optimizer.step()
optimizer.zero_grad()

if batch % 100 == 0:
loss, current = loss.item(), (batch + 1) * len(X)
print(f"loss: {loss:>7f} [{current:>5d}/{size:>5d}]")

def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
model.eval()
test_loss, correct = 0, 0
with torch.no_grad():
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
test_loss += loss_fn(pred, y).item()
correct += (pred.argmax(1) == y).type(torch.float).sum().item()
test_loss /= num_batches
correct /= size
print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

epochs = 5
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(train_dataloader, model, loss_fn, optimizer)
test(test_dataloader, model, loss_fn)
print("Done!")
torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
```

Start minikube with gpu support:
```shell
minikube start --driver docker --container-runtime docker --gpus all
```

Create a pod using `pytorch/pytorch` image, which have all necessary libraries installed, and get a shell from this pod.
```
kubectl run torch --image=pytorch/pytorch -it -- /bin/bash
```

Now copy the file into the pod, and run it with python3. You will see the model is trained with Nvidia GPU and Cuda acceleration.

```
... ...
Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64
Using cuda device
NeuralNetwork(
(flatten): Flatten(start_dim=1, end_dim=-1)
(linear_relu_stack): Sequential(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): ReLU()
(2): Linear(in_features=512, out_features=512, bias=True)
(3): ReLU()
(4): Linear(in_features=512, out_features=10, bias=True)
)
)
Epoch 1
... ...
```

Since the only possibility of supporting GPUs on minikube on Windows is on a
server OS where users don't usually run minikube, we haven't invested time in
trying to support NVIDIA GPUs on minikube on Windows.

Also, nvidia-docker [doesn't support
Windows](https://github.com/NVIDIA/nvidia-docker/issues/197) either.

0 comments on commit e75bd6a

Please sign in to comment.