<a href="https://colab.research.google.com/github/ccarpenterg/LearningPyTorch1.x/blob/master/02_introduction_to_convnets_with_pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Introduction to Convolutional Neural Networks with PyTorch

Similar to traditional Neural Networks, Convolutional Neural Networks are built using neurons but instead of only fully connected layers, convolutional networks also have convolutional layers.

In general, convnets consist of two parts: a convolutional base and a fully connected classifier. The convolutional base automatically extract the features that are subsequently feed to a dense classifier, which outputs the probabilities of an image to belong to a certain class.


So let's start by importing some standard modules and the MNIST dataset module. 


In [0]:
import numpy as np

import torch
import torch.nn.functional as F
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

import statistics

PyTorch is installed in Colab by default, but it's always a good practice to check what version we'll be working with. 

In [2]:
print('PyTorch version:', torch.__version__)
print('Torchvision version:', torchvision.__version__)

PyTorch version: 1.3.1
Torchvision version: 0.4.2


## Convolutional and Pooling Layers

A convolutional layer using pyTorch:



```
torch.nn.Conv2d(num_in_channels, num_out_channels, kernel_size)
```

num_in_channels is the number of channels of the input tensor. If the previous layer is the input layer, num_in_channels is the number of channels of the image (3 channels for RGB images), otherwise num_in_channels is equal to the number of feature maps of the previous layer.

num_out_channels is the number of filters (feature extractor) that this layer will apply over the image or feature maps generated by the previous layer.

So for instance, if we have an RGB image and we are going to apply 32 filters of 3x3:



```
torch.nn.Conv2d(3, 32, 3)
```





## A Simple Convolutional Neural Network

In our convnet we'll use the next structure:

*input -> convolution -> pooling-> convolution -> pooling -> convolution* (convolutional base)

*fully connected -> fully connected -> output*


In [0]:
class BasicCNN(nn.Module):
    
    def __init__(self, num_channels, num_classes):
        super(BasicCNN, self).__init__()
        self.conv1 = nn.Conv2d(num_channels, 32, 3, stride=1, padding=0)
        self.conv2 = nn.Conv2d(32, 64, 3, stride=1, padding=0)
        self.conv3 = nn.Conv2d(64, 64, 3, stride=1, padding=0)
        self.pool1 = nn.MaxPool2d(2)
        self.pool2 = nn.MaxPool2d(2)
        self.fc1 = nn.Linear(3*3*64, 64, bias=True)
        self.fc2 = nn.Linear(64, 10)
        
    def forward(self, X):
        x = F.relu(self.conv1(X))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)
        x = F.relu(self.conv3(x))
        x = x.reshape(-1, 3*3*64)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

**Convolution #1**

32 kernels of 3x3; *Width/Height:* (28 - 3 + 2x0) / 1 + 1 = 26; *Output dimensions:* (32, 26, 26)

**Max Pooling #1**

filter size = 2, stride = 2; *Width/Height:* (26 - 2) / 2 + 1 = 13; *Output dimensions:* (32, 13, 13)

**Convolution #2**

64 kernels of 3x3; *Width/Height:* (13 - 3 + 2x0) / 1 + 1 = 11; *Output dimensions:* (64, 11, 11)

**Max Pooling #2**

filter size = 2, stride = 2; *Width/Height:* (11 - 2) / 2 + 1 = 5; *Output dimensions:* (64, 5, 5)

**Convolution #3**

64 kernels of 3x3; *Width/Height:* (5 - 3 + 2x0) / 1 + 1 = 3; *Output dimensions:* (64, 3, 3)

So at the end of the last convolutional layer we get a tensor of dimension (64, 3, 3). And since now we are going to feed it to out fully connected classifier, we need to convert it into a vector, and for that we use the reshape method:



```
x = x.reshape(-1, 3*3*64)
```

## Starting Up Our Model

We'll send the model to our GPU so we need to create a CUDA device and instantiate our model:

In [4]:
cuda = torch.device('cuda')

model = BasicCNN(1, 10)
model.to(cuda)

BasicCNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
  (conv3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1))
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=576, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=10, bias=True)
)

## MNIST Datatset

In [0]:
dataset_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.1307], [0.3081])
])

train_set = MNIST('./mnist', train=True, download=True, transform=dataset_transform)
valid_set  = MNIST('./mnist', train=False, download=True, transform=dataset_transform)


#let's check the size of our tensors
print(train_set.data.shape)
print(valid_set.data.shape)

In [0]:
train_loader = DataLoader(train_set, batch_size=128, num_workers=0, shuffle=True)
valid_loader = DataLoader(valid_set, batch_size=512, num_workers=0, shuffle=False)

We now create a dummy matrix x to simulate the input of a MNIST image, and check we get the right output in terms of dimensions:

In [7]:
# https://pytorch.org/docs/stable/nn.html#conv2d
# input: (N, C_in, H, W) -> N: batch, C_in: number of channels, H: height, W: width
x = torch.randn(128, 1, 28, 28, device=cuda)
output = model(x)
print(output.shape)

torch.Size([128, 10])


### Training the Model

**Optimizer: Stochastic Gradient Descent**




In [0]:
# https://pytorch.org/docs/stable/optim.html#torch.optim.SGD
# Stochastic gradient descent optimizer
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

**Train function**

In [0]:
def train(model, loss_fn, optimizer):
    
    #set the module in training mode
    model.train()
    
    train_batch_losses = []
    
    for batch, labels in train_loader:
        
        #send the training data to the GPU
        batch = batch.to(cuda)
        labels = labels.to(cuda)
        
        #set all gradients to zero
        optimizer.zero_grad()
        
        #forward propagate
        y_pred = model(batch)
        
        #calculate the loss
        loss = loss_fn(y_pred, labels)
        
        #bachpropagate
        loss.backward()
        
        #update the parameters (weights and biases)
        optimizer.step()
        
        train_batch_losses.append(float(loss))
        
        mean_loss = statistics.mean(train_batch_losses)
        
    return mean_loss

**Validation function**

In [0]:
def validate(model, loss_fn, optimizer):
    
    # set the model in evaluation mode
    model.eval()
    
    # save predictions for later
    pedrictions = []
    
    # stop tracking the parameters for backpropagation
    with torch.no_grad():
        
        validation_batch_losses = []
        
        for batch, labels in valid_loader:
            
            # send the validation data to GPU
            batch = batch.to(cuda)
            labels = labels.to(cuda)
            
            # forward propagate
            labels_pred = model(batch)
            
            # calculate loss
            loss = loss_fn(labels_pred, labels)
            
            validation_batch_losses.append(float(loss))
            
            mean_loss = statistics.mean(validation_batch_losses)
           
    return mean_loss 

**Accuracy function**



In [0]:
def accuracy(model, loader):
    correct = 0
    total = 0
    
    model.eval()
    
    with torch.no_grad():
        for batch, labels in loader:
            batch = batch.to(cuda)
            labels = labels.to(cuda)
            
            labels_pred = model(batch)
            
            _, predicted = torch.max(labels_pred.data, 1)
        
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            
            return (100 * correct / total)

**Training Statistics function**

In [0]:
def training_stats(train_loss, train_accuracy, val_loss, val_accuracy):
    print(('training loss: {:.3f} '
           'training accuracy: {:.2f}% || '
           'val. loss: {:.3f} '
           'val. accuracy: {:.2f}%').format(train_loss, train_accuracy,
                                            val_loss, val_accuracy))

**Training our Convolutional Neural Network**

Now it's time to train our brand new convolutional neural network. We'll use the cross entropy function as our loss function (in pytorch softmax is included in the cross entropy function).

In [13]:
loss_fn = nn.CrossEntropyLoss()


train_losses = []
valid_losses = []

for epoch in range(1, 1+10):
    
    print('Epoch number', epoch)
    
    train_loss = train(model, loss_fn, optimizer)
    train_losses.append(train_loss)
    train_accuracy = accuracy(model, train_loader)
    
    valid_loss = validate(model, loss_fn, optimizer)
    valid_losses.append(valid_loss)
    valid_accuracy = accuracy(model, valid_loader)

    training_stats(train_loss, train_accuracy, valid_loss, valid_accuracy)

Epoch number 1
training loss: 0.277 training accuracy: 100.00% || val. loss: 0.079 val. accuracy: 97.27%
Epoch number 2
training loss: 0.068 training accuracy: 99.22% || val. loss: 0.065 val. accuracy: 98.05%
Epoch number 3
training loss: 0.048 training accuracy: 99.22% || val. loss: 0.042 val. accuracy: 99.22%
Epoch number 4
training loss: 0.037 training accuracy: 98.44% || val. loss: 0.083 val. accuracy: 97.66%
Epoch number 5
training loss: 0.036 training accuracy: 99.22% || val. loss: 0.046 val. accuracy: 98.63%
Epoch number 6
training loss: 0.031 training accuracy: 99.22% || val. loss: 0.042 val. accuracy: 99.22%
Epoch number 7
training loss: 0.027 training accuracy: 99.22% || val. loss: 0.049 val. accuracy: 99.02%
Epoch number 8
training loss: 0.026 training accuracy: 99.22% || val. loss: 0.057 val. accuracy: 98.24%
Epoch number 9
training loss: 0.025 training accuracy: 98.44% || val. loss: 0.048 val. accuracy: 99.02%
Epoch number 10
training loss: 0.026 training accuracy: 100.00%