<a href="https://colab.research.google.com/github/jonkrohn/DLTFpT/blob/master/notebooks/deep_net_in_pytorch_with_gpu.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Neural Network in PyTorch with GPU

In this notebook, we decrease the training time of the [Deep Net in PyTorch](https://github.com/jonkrohn/DLTFpT/blob/master/notebooks/deep_net_in_pytorch.ipynb) by enabling GPU training as well as mixed-precision training (uses lower-precision data types, e.g., `float16`, for some of the calculations, which can improve performance without significantly affecting model accuracy).

**In Colab, be sure to update the Runtime to use an accelerator.**

#### Load dependencies

In [1]:
import torch
import torch.nn as nn

from torchvision.datasets import MNIST
from torchvision import transforms

from torchsummary import summary

import time

#### Load data

In [2]:
train = MNIST('data', train=True, transform=transforms.ToTensor(), download=True)
test = MNIST('data', train=False, transform=transforms.ToTensor())

#### Batch data

In [3]:
train_loader = torch.utils.data.DataLoader(train, batch_size=128) 
test_loader = torch.utils.data.DataLoader(test, batch_size=128) 

#### Design neural network architecture

In [4]:
n_input = 784
n_dense_1 = 64
n_dense_2 = 64
n_dense_3 = 64
n_out = 10

In [5]:
model = nn.Sequential(
    
    # first hidden layer: 
    nn.Linear(n_input, n_dense_1), 
    nn.ReLU(), 
    nn.BatchNorm1d(n_dense_1),
    
    # second hidden layer: 
    nn.Linear(n_dense_1, n_dense_2), 
    nn.ReLU(), 
    nn.BatchNorm1d(n_dense_2),
    
    # third hidden layer: 
    nn.Linear(n_dense_2, n_dense_3), 
    nn.ReLU(), 
    nn.BatchNorm1d(n_dense_3),
    nn.Dropout(),  
    
    # output layer: 
    nn.Linear(n_dense_3, n_out) 
)

In [6]:
torch.cuda.is_available()

False

In [7]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

In [8]:
summary(model, input_size=(n_input,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 64]          50,240
              ReLU-2                   [-1, 64]               0
       BatchNorm1d-3                   [-1, 64]             128
            Linear-4                   [-1, 64]           4,160
              ReLU-5                   [-1, 64]               0
       BatchNorm1d-6                   [-1, 64]             128
            Linear-7                   [-1, 64]           4,160
              ReLU-8                   [-1, 64]               0
       BatchNorm1d-9                   [-1, 64]             128
          Dropout-10                   [-1, 64]               0
           Linear-11                   [-1, 10]             650
Total params: 59,594
Trainable params: 59,594
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/ba

#### Configure training hyperparameters

In [9]:
cost_fxn = nn.CrossEntropyLoss() # includes softmax activation

In [10]:
optimizer = torch.optim.Adam(model.parameters())

#### Train

In [11]:
def accuracy_pct(pred_y, true_y):
  _, prediction = torch.max(pred_y, 1) # returns maximum values, indices; fed tensor, dim to reduce
  correct = (prediction == true_y).sum().item()
  return (correct / true_y.shape[0]) * 100.0

In [12]:
n_batches = len(train_loader)
n_batches

469

In [13]:
# for mixed-precision training: 
from torch.cuda.amp import GradScaler, autocast
scaler = GradScaler()

  scaler = GradScaler()


In [14]:
n_epochs = 10 

print('Training for {} epochs. \n'.format(n_epochs))

for epoch in range(n_epochs):

  start_time = time.time()
  
  avg_cost = 0.0
  avg_accuracy = 0.0
  
  for i, (X, y) in enumerate(train_loader): # enumerate() provides count of iterations  

    # move the input data and labels to the GPU:
    X, y = X.to(device), y.to(device)
    X_flat = X.view(X.shape[0], -1)
    
    # forward propagation, NOW WITH autocast automated mixed-precision: 
    with autocast():
      y_hat = model(X_flat)
      cost = cost_fxn(y_hat, y)

    avg_cost += cost / n_batches
    
    # backprop and optimization via gradient descent: 
    optimizer.zero_grad() # set gradients to zero; .backward() accumulates them in buffers
    scaler.scale(cost).backward() # scale the gradients so that, with low precision, they don't become so small that the vanish (underflow)
    scaler.step(optimizer)
    scaler.update() # checks if overflow (inf or NaN values) occurred in previous iteration; reduces scale factor if so
    
    # calculate accuracy metric:
    accuracy = accuracy_pct(y_hat, y)
    avg_accuracy += accuracy / n_batches
    
    if (i + 1) % 100 == 0:
      print('Step {}'.format(i + 1))

  end_time = time.time()
  time_delta = end_time - start_time
    
  print('Epoch {}/{} complete. Cost: {:.3f}, Accuracy: {:.1f}%, Time: {:.2f} seconds \n'
        .format(epoch + 1, n_epochs, avg_cost, avg_accuracy, time_delta)) 

print('Training complete.')

Training for 10 epochs. 



  with autocast():


Step 100
Step 200
Step 300
Step 400
Epoch 1/10 complete. Cost: 0.381, Accuracy: 89.8%, Time: 4.25 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 2/10 complete. Cost: 0.155, Accuracy: 95.7%, Time: 4.29 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 3/10 complete. Cost: 0.116, Accuracy: 96.7%, Time: 4.22 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 4/10 complete. Cost: 0.092, Accuracy: 97.4%, Time: 4.09 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 5/10 complete. Cost: 0.080, Accuracy: 97.7%, Time: 4.08 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 6/10 complete. Cost: 0.067, Accuracy: 98.0%, Time: 3.97 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 7/10 complete. Cost: 0.060, Accuracy: 98.3%, Time: 3.97 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 8/10 complete. Cost: 0.051, Accuracy: 98.5%, Time: 4.00 seconds 

Step 100
Step 200
Step 300
Step 400
Epoch 9/10 complete. Cost: 0.046, Accuracy: 98.6%, Time: 3.99 seconds 

Step 100
Step 200
Step 300
S

#### Test model

In [15]:
n_test_batches = len(test_loader)
n_test_batches

79

In [16]:
model.eval() # disables dropout and batch norm

with torch.no_grad(): # disables autograd, reducing memory consumption
  
  avg_test_cost = 0.0
  avg_test_acc = 0.0
  
  for X, y in test_loader:
    
    # move the input data and labels to the GPU:
    X, y = X.to(device), y.to(device)

    # make predictions: 
    X_flat = X.view(X.shape[0], -1)
    y_hat = model(X_flat)
    
    # calculate cost: 
    cost = cost_fxn(y_hat, y)
    avg_test_cost += cost / n_test_batches
    
    # calculate accuracy:
    test_accuracy = accuracy_pct(y_hat, y)
    avg_test_acc += test_accuracy / n_test_batches

print('Test cost: {:.3f}, Test accuracy: {:.1f}%'.format(avg_test_cost, avg_test_acc))

# model.train() # 'undoes' model.eval()

Test cost: 0.111, Test accuracy: 97.1%
