This tutorial will show you how to use PyTorch to classify the MNIST dataset.

The code is largely borrowed from this [`tutorial`](https://medium.com/@nutanbhogendrasharma/pytorch-convolutional-neural-network-with-mnist-dataset-4e8a4265e118). It explains the code in more detail than is covered here.

The notebook also includes other helpful reference links to relevant documentations / math details.

You will need the following packages, `pip install` them if you don't have them already:

- `numpy`
- `matplotlib`
- `torch`
- `torchvision`


In [None]:
import torch
from torch import optim
from torchvision import datasets
import torch.nn as nn
from torch.autograd import Variable
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

[`ToTensor`](https://pytorch.org/vision/stable/generated/torchvision.transforms.ToTensor.html#torchvision.transforms.ToTensor) transforms an image into a tensor object that can be understood by PyTorch.

In [None]:
# Download the datasets

train_data = datasets.MNIST(
    root='data',
    train=True,                         
    transform=ToTensor(), 
    download=True,            
)

test_data = datasets.MNIST(
    root='data', 
    train=False, 
    transform=ToTensor()
)

In [None]:
# The MNIST datset has 60,000 training images and 10,000 test images
print(train_data.targets.size())

In [None]:
# Checkout some images in the dataset
# change the index to see different images
plt.imshow(train_data.data[0], cmap='gray')
plt.show()

Pytorch [`DataLoader`](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) help speed up data retrieval and reduce overfitting during training. Follow the link to learn more about the arguments.

In [None]:
loaders = {
    'train' : torch.utils.data.DataLoader(train_data, 
                                          batch_size=100, 
                                          shuffle=True, 
                                          num_workers=1),
    
    'test'  : torch.utils.data.DataLoader(test_data, 
                                          batch_size=100, 
                                          shuffle=True, 
                                          num_workers=1),
}

Next we will encounter the following PyTorch layers:

- [`Conv2D`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html): The basic convolutional layer.
- [`ReLU`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html): Applies the RELU function element-wise, you can experience with other activation functions sigmoid by swapping them out. 
- [`MaxPool2D`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html): Pooling / summarizing layer that takes the maximum value of the kernel.
- [`Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html): This is how PyTorch refer to a fully connected neural network, like the one you saw in the video last week or earlier today in the playground.


In [None]:
# You define all of your layers in a class
# Feel free to play with the hyperparameters
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # The Sequential function essentially squeezes conv, ReLU, and MaxPool layer into one
        self.conv1 = nn.Sequential(         
            nn.Conv2d(
                in_channels=1,              
                out_channels=16,            
                kernel_size=5,              
                stride=1,                   
                padding=2,                  
            ),                              
            nn.ReLU(),                      
            nn.MaxPool2d(kernel_size=2),    
        )

        # The order of arguments are the same as in conv1
        # Note the in_channels is now 16, correspoding to the number of out_channels
        # or filters produced by conv1
        self.conv2 = nn.Sequential(         
            nn.Conv2d(16, 32, 5, 1, 2),     
            nn.ReLU(),                      
            nn.MaxPool2d(2),                
        )

        # fully connected layer, output 10 classes
        # Woah where does these numbers come from?
        # See kernel_size, padding, and stride parameters
        self.out = nn.Linear(32 * 7 * 7, 10)

    # The forward function defines how each input is processed by the network
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
        x = x.view(x.size(0), -1)       
        output = self.out(x)
        # return x for visualization
        return output, x    

cnn = CNN()

In [None]:
# Recall from the playground, a loss function helps us evaluate the model.
loss_func = nn.CrossEntropyLoss()   

[`CrossEntropyLoss`](https://pytorch.org/docs/stable/nn.html#loss-functions) is a commonly used loss function but it is far from the only one. Follow the link to read more about it and see what other loss functions are available.

In [None]:
# Optimizers are how PyTorch improves the models
# lr stands for learning rate, play around with it a bit
optimizer = optim.Adam(cnn.parameters(), lr = 0.01)

Similar to the loss function, [`Adam`](https://pytorch.org/docs/stable/optim.html) is just one of many available optimizers.

In [None]:
# Here we implement model training
# Adjust the num_epochs to your liking

num_epochs = 10
def train(num_epochs, cnn, loaders):
    
    cnn.train()
        
    # Train the model
    total_step = len(loaders['train'])
        
    for epoch in range(num_epochs):
        for i, (images, labels) in enumerate(loaders['train']):
            
            # gives batch data, normalize x when iterate train_loader
            b_x = Variable(images)   # batch x
            b_y = Variable(labels)   # batch y
            output = cnn(b_x)[0]               
            loss = loss_func(output, b_y)
            
            # clear gradients for this training step   
            optimizer.zero_grad()           
            
            # backpropagation, compute gradients 
            loss.backward()    
            # apply gradients             
            optimizer.step()                
            
            if (i+1) % 100 == 0:
                print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                       .format(epoch + 1, num_epochs, i + 1, total_step, loss.item()))
                       
train(num_epochs, cnn, loaders)

In [None]:
# Some code to evaluate the model performance on the test set.
def test():
    # Test the model
    cnn.eval()
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in loaders['test']:
            test_output, last_layer = cnn(images)
            pred_y = torch.max(test_output, 1)[1].data.squeeze()
            accuracy = (pred_y == labels).sum().item() / float(labels.size(0))
            pass
    
        print('Test Accuracy of the model on the 10000 test images: %.3f' % accuracy)


In [None]:
test()