# Using a GPU with Nvidia CUDA

## Setup

If you're using Google Colabaratory:
* Runtime > Change runtime type
* Set the Hardware Acccelerator to GPU (NOT TPU, this uses a different process)

If you're creating your project locally:
* Make sure your GPU is in this list: https://developer.nvidia.com/cuda-gpus
* If it is, download PyTorch to your environment from https://pytorch.org/get-started/locally/. Make sure you select either CUDA 11.7 or CUDA 11.8 for the compute platform. 
* Download and install the CUDA toolkit for your platform. It is recommended that you download the toolkit version that corresponds to the PyTorch CUDA version you downloaded.
    * For 11.7: https://developer.nvidia.com/cuda-11-7-0-download-archive
    * For 11.8: https://developer.nvidia.com/cuda-11-8-0-download-archive
    * Please note: If you're using WSL Ubuntu, select Linux and then WSL-Ubuntu. Any CUDA toolkit version you install for Windows specifically won't be detected by a PyTorch install in WSL. 

In [3]:
import torch
print(torch.cuda.is_available())

True


If you have CUDA configured correctly, the above command should print "True", and you're good to continue.

## Accelerating Training and Testing with CUDA

To accelerate training and testing your model, the following needs to be moved to your GPU:
* The instance of the model you're training/testing
* The inputs being passed into the model
* The targets (classes) being used for training and testing

In [6]:
# To create device agnostic code (code that can run on any computer regardless of whether it has a GPU or not), we can use the following code:
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
# Now, we can use this device variable whenever we have a tensor that can be moved to the GPU.

For demonstration, I'll be using the CNN model and dataset from resources/models/image_classification.ipynb

In [8]:
from torch import nn, optim
from torchvision import datasets, transforms

# Preparing the data (for a more detailed explanation, see resources/models/image_classification.ipynb)
transform = transforms.Compose([
    transforms.ToTensor(), # Convert image to tensor
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # Normalize image
])
train_data = datasets.CIFAR10(root='./data/CIFAR10', train=True, download=True , transform=transform)
test_data = datasets.CIFAR10(root='./data/CIFAR10', train=False, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True, num_workers=2)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False, num_workers=2)

# Defining the model
class CNN(nn.Module):
    def __init__(self, input_channels = 3, output_classes = 10): # Define the layers of the network
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(input_channels, 16, kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU()
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1)
        self.fc1 = nn.Linear(32 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, output_classes)

    def forward(self, x): # Process an image through the network and output a prediction
        x = self.conv1(x) # Convolutional layer
        x = self.relu(x) # Activation function
        x = self.pool(x) # Pooling layer
        x = self.conv2(x) # Second convolutional layer
        x = self.relu(x) # Activation function
        x = self.pool(x) # Second pooling layer
        x = x.view(-1, 32 * 8 * 8) # Flatten the image
        x = self.fc1(x) # Fully connected layer
        x = self.relu(x) # Activation function
        x = self.fc2(x) # Second fully connected layer
        return x

Files already downloaded and verified
Files already downloaded and verified


In [18]:
# Defining hyperparameters
cnn_model = CNN() # Initialize the model
loss_fn = nn.CrossEntropyLoss() # Define the loss function
learning_rate = 1e-3 # Define the learning rate
optimizer = optim.SGD(cnn_model.parameters(), learning_rate) # Define the optimizer
epochs = 20 # Define the number of epochs

Now, we'll train the model. This is where things start being moved to the GPU. 

In [19]:
cnn_model = cnn_model.to(DEVICE) # Move the model instance to the GPU if available. Otherwise, it will stay on the CPU. (You could also just initialize the model like this: cnn_model = CNN().to(DEVICE))

# A slightly modified training loop
def train(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)
  model.train()
  for batch, (X, y) in enumerate(dataloader):
    X = X.to(DEVICE) # Move the data to the GPU if available (If the data is already on the GPU before being passed in this will do nothing)
    y = y.to(DEVICE) # Move the training targets to the GPU if available
    prediction = model(X)
    loss = loss_fn(prediction, y)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    if batch % 100 == 0:
      loss, current = loss.item(), (batch + 1) * len(X)
      print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

# A slightly modified testing loop
def test(dataloader, model, loss_fn):
  model.eval()
  size = len(dataloader.dataset)
  num_batches = len(dataloader)
  test_loss, correct = 0, 0
  with torch.no_grad():
    for X, y in dataloader:
      X = X.to(DEVICE) # Move the data to the GPU if available
      y = y.to(DEVICE) # Move the testing targets to the GPU if available
      prediction = model(X)
      test_loss += loss_fn(prediction, y).item()
      correct += (prediction.argmax(1) == y).type(torch.float).sum().item()
  test_loss /= num_batches
  correct /= size
  print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

With the modified model instance and training/testing methods, we can now train the model.

In [20]:
# Training and Testing the model

for t in range(epochs):
  print(f"Epoch {t+1}\n-------------------------------")
  train(train_loader, cnn_model, loss_fn, optimizer)
  test(test_loader, cnn_model, loss_fn)

Epoch 1
-------------------------------
loss: 2.298092  [   64/50000]
loss: 2.307607  [ 6464/50000]
loss: 2.305479  [12864/50000]
loss: 2.307073  [19264/50000]
loss: 2.297185  [25664/50000]
loss: 2.306237  [32064/50000]
loss: 2.309997  [38464/50000]
loss: 2.295903  [44864/50000]
Test Error: 
 Accuracy: 13.0%, Avg loss: 2.294235 

Epoch 2
-------------------------------
loss: 2.298394  [   64/50000]
loss: 2.292838  [ 6464/50000]
loss: 2.287094  [12864/50000]
loss: 2.287765  [19264/50000]
loss: 2.295240  [25664/50000]
loss: 2.295973  [32064/50000]
loss: 2.272044  [38464/50000]
loss: 2.283824  [44864/50000]
Test Error: 
 Accuracy: 14.1%, Avg loss: 2.282372 

Epoch 3
-------------------------------
loss: 2.275762  [   64/50000]
loss: 2.279895  [ 6464/50000]
loss: 2.287581  [12864/50000]
loss: 2.271885  [19264/50000]
loss: 2.259082  [25664/50000]
loss: 2.265243  [32064/50000]
loss: 2.264558  [38464/50000]
loss: 2.284806  [44864/50000]
Test Error: 
 Accuracy: 17.2%, Avg loss: 2.262413 

Epoc

The model training and testing was much, much faster using CUDA. 