<a href="https://colab.research.google.com/github/ccarpenterg/LearningPyTorch1.x/blob/master/01_getting_started_with_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Getting Started with PyTorch: Training a NN on MNIST

This a small series of notebooks in which I introduce PyTorch, Facebook's machine learning framework. We start by training an artifical neural network on the MNIST dataset. In this example we are doing image classification, so we feed an image to the neural network and it outputs the probabilities of that image belongs to a particular class.

We demonstrate the use of a GPU for training and validation.

Let's start by importing torch, which is the main library, torchvision and numpy:

In [0]:
import numpy as np

import torch
import torch.nn.functional as F
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

import matplotlib.pyplot as plt

import statistics

Pytorch is included by default in the Colab notebooks, but it's a good idea to check the installed version:

In [0]:
print('PyTorch version:', torch.__version__)
print('Torchvision version:', torchvision.__version__)

PyTorch version: 1.1.0
Torchvision version: 0.3.0


### Simple Neural Network

We will start with the basic example of a shallow NN: an input layer, a hidden layer and the output layer. We'll use dropout to avoid overfitting.

Each MNIST training example consists of a 28x28 pixels image in grayscale (1 channel), that is turned into a 784-elements vector. The input layer has 784 neurons, and we have a hidden layer of 128 neurons. The output layer has 1 neuron for each one of the classes, in this case 10 neurons (10 digits - 0, 1 2, 3, etc).

To implement our neural network, we create the class **BasicNN** and inherit the methods and properties from the Module class (**nn.Module**):

In [0]:
class BasicNN(nn.Module):
    
    def __init__(self, input_size, hidden_size, num_classes):
        super(BasicNN, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.fc2 = nn.Linear(hidden_size, num_classes)
        self.drop = nn.Dropout(0.2)
        self.relu = nn.ReLU()
    
    
    def forward(self, x):
        x = x.reshape(-1, 28*28)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.drop(x)
        x = self.fc2(x)
        return x

We'll be using the GPU that is included in the Colab notebooks, so we create a pytorch device and send the model to the device:

In [0]:
cuda = torch.device('cuda')

model = BasicNN(28*28, 128, 10)
model.to(cuda)

BasicNN(
  (fc1): Linear(in_features=784, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=10, bias=True)
  (drop): Dropout(p=0.2)
  (relu): ReLU()
)

### MNIST

The MNIST dataset is included in the torchvision module and it's really strighforward to download it. Before using MNIST we need to define a couple of transformations, which are map functions that are run through the dataset:


In [0]:
train_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.1307], [0.3081])
])

valid_transform = train_transform

train_set = MNIST('./data/mnist', train=True, download=True, transform=train_transform)
valid_set = MNIST('./data/mnist', train=False, download=True, transform=train_transform)

Now let's take a look at the shape of our datasets; as you can see we have a training set with 60,000 images (28x28) and a validation/test set with 10,000 images:

In [0]:
print(train_set.data.shape)
print(valid_set.data.shape)

torch.Size([60000, 28, 28])
torch.Size([10000, 28, 28])


In [0]:
train_loader = DataLoader(train_set, batch_size=128, num_workers=0, shuffle=True)
valid_loader = DataLoader(valid_set, batch_size=512, num_workers=0, shuffle=False)

Now we create a 2D tensor of 28x28 with random elements:

In [0]:
input = torch.randn(28, 28, device=cuda)
out = model(input)
print(out.shape)

torch.Size([1, 10])


### Training the Model

**Optimizer: Stochastic Gradient Descent**

Stochastic gradient descent is our choice for the optimizer. In pytorch we set the learning rate (lr) and momentum:


In [0]:
# https://pytorch.org/docs/stable/optim.html#torch.optim.SGD
# Stochastic gradient descent
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

**Train function**

The train function is fairly simple:

(i) get a batch of training examples and send them to the GPU (CUDA)

(ii) set all gradients to zero

(iii) forward propagate the batch through the NN and calculate the loss

(iv) backpropagate the loss through the NN and update its parameters (weights and biases)

In [0]:

def train(model, loss_fn, optimizer):
    
    # https://pytorch.org/docs/stable/nn.html#torch.nn.Module.train
    # set the module in training mode
    model.train()
    
    train_batch_losses = []
    
    for batch, labels in train_loader:
        
        #send the training data to the GPU
        batch = batch.to(cuda)
        labels = labels.to(cuda)
        
        #set all gradient to zero
        optimizer.zero_grad()
        
        #forward propagation
        y_pred = model(batch)
        
        #calculate loss
        loss = loss_fn(y_pred, labels)
        
        #backpropagation
        loss.backward()
        
        #update the parameters (weights and biases)
        optimizer.step()
        
        train_batch_losses.append(float(loss))
        
        mean_loss = statistics.mean(train_batch_losses)
    
    return mean_loss

**Validation function**

Since we already have updated the NN's parameters, now we calculate the loss and eventually the accuracy for our validation set:

In [0]:
def validate(model, loss_fn, optimizer):
    
    # set the model in validation mode
    model.eval()
    
    # save predictions for later
    predictions = []
    
    # stop tracking the parameters for backpropagation
    with torch.no_grad():
        
        validation_batch_losses = []
        
        for batch, labels in valid_loader:
            
            # send the validation data to the GPU
            batch = batch.to(cuda)
            labels = labels.to(cuda)
            
            # forward propagation
            labels_pred = model(batch)
            
            # calculate the loss
            loss = loss_fn(labels_pred, labels)
            
            validation_batch_losses.append(float(loss))
            
            mean_loss = statistics.mean(validation_batch_losses)
    
    return mean_loss
            
    

**Training our Neural Network**

Now we are ready to train our neural network. We start by choosing the cross entropy function as our loss function, and defining a couple of variables to keep track of the losses:

In [0]:

loss_fn = nn.CrossEntropyLoss()


train_losses = []
valid_losses = []

for epoch in range(1, 1+10):
    
    print('Epoch number ', epoch)
    
    train_loss = train(model, loss_fn, optimizer)
    train_losses.append(train_loss)
    
    print('Training loss:', train_loss)
    
    valid_loss = validate(model, loss_fn, optimizer)
    valid_losses.append(valid_loss)
    
    print('Validation loss:', valid_loss)
    
    


Epoch number  1
Training loss: 0.32757653519987806
Validation loss: 0.17104689218103886
Epoch number  2
Training loss: 0.19310535363424053
Validation loss: 0.12585099386051296
Epoch number  3
Training loss: 0.16371712965894736
Validation loss: 0.139599384739995
Epoch number  4
Training loss: 0.15241221532718077
Validation loss: 0.1351794194430113
Epoch number  5
Training loss: 0.14145297177834934
Validation loss: 0.1391471903771162
Epoch number  6
Training loss: 0.12733066283754194
Validation loss: 0.1658850120380521
Epoch number  7
Training loss: 0.12171410629823645
Validation loss: 0.1515562731772661
Epoch number  8
Training loss: 0.11769230431442194
Validation loss: 0.16119218626990914
Epoch number  9
Training loss: 0.11373686957270351
Validation loss: 0.14944909736514092
Epoch number  10
Training loss: 0.10663571729382346
Validation loss: 0.15546929985284805


In [0]:
correct = 0
total = 0

with torch.no_grad():
    for batch, labels in valid_loader:
        batch = batch.to(cuda)
        labels = labels.to(cuda)
        
        labels_pred = model(batch)
        
        _, predicted = torch.max(labels_pred.data, 1)
        
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy for test set: %d %%' % (100 * correct / total))


Accuracy for test set: 96 %
