# Workshop 3 - Pytorch Model Creation
DeepNeuron summer training 2020.

Create a model using Pytorch which acts as a classifier for the CIFAR-10 dataset

**Before starting:**

1. **Don't edit this file, make a copy first:**
  * Click on File -> Save a copy in Drive

2. Also do the following:
  * Click on Runtime -> Change runtime type -> Make sure hardware accelerator is set to GPU


## Imports
Do all your imports here

In [None]:
import torch
from torch import nn
from torch import optim
from torchvision import datasets, transforms

from tqdm.notebook import tqdm

## Model Creation
Create your model here. The model stub has already been created, you will need to define an `__init__` and a `forward()` method for your class.

Your model will need:
1. Convolutional Layer (3 input channels, 7 output channels, kernel size = 5)
2. Max Pool (kernel size = 2, stride = 2)
3. Convolutional Layer (7 input channels, 14 output channels, kernel size = 5)
4. Fully Connected Layer (Figure out the input size, output size is 64)
5. Fully Connected Layer (input size is 64, output size is the number of classes)

Then, in your forward pass, the input should flow like so:
1. Convolutional Layer 1
2. ReLU
3. Max Pool
4. Convolutional Layer 2
5. ReLU
6. Max Pool (use the same max pool layer)
7. Flatten the input, so it can be passed through the fully connected layers
8. Fully Connected Layer 1
9. ReLU
10. Fully Connected Layer 2

### Hints

All layers are found in nn:
* Fully Connected Layer: `Linear(num inputs, num outputs)`
* Max Pooling: `MaxPool2d(kernel size, stride)`
* Convolutional Layer: `Conv2d(input channels, output channels, kernel size, stride)`
* Define your layers in the `__init__` method of your model
* Flatten a PyTorch tensor using `.flatten()`

The ReLU function can be found in `nn.functional.relu()`



In [None]:
class MyCNN(nn.Module):
  def __init__(self, output_size):
      pass

  def forward(self, x):
    pass

## Training Using Your Model
It's time to train your model!

We use a basic PyTorch training loop, with standard built-in datasets, dataloaders and training loops

In [None]:
# Function for the training 

def train(model, train_loader, loss_fn, optimizer, device):
    model.train() # puts the model in training mode
    running_loss = 0
    with tqdm(total=len(train_loader)) as pbar:
        for i, data in enumerate(train_loader, 0): # loops through training data
            inputs, labels = data # separate inputs and labels (outputs)
            inputs, labels = inputs.to(device), labels.to(device) # puts the data on the GPU

            # forward + backward + optimize                                          
            optimizer.zero_grad() # clear the gradients in model parameters
            outputs = model(inputs) # forward pass and get predictions
            loss = loss_fn(outputs, labels) # calculate loss
            loss.backward() # calculates gradient w.r.t to loss for all parameters in model that have requires_grad=True
            optimizer.step() # iterate over all parameters in the model with requires_grad=True and update their weights.

            running_loss += loss.item() # sum total loss in current epoch for print later

            pbar.update(1) #increment our progress bar

    return running_loss/len(train_loader) # returns the total training loss for the epoch

In [None]:
# Function for the validation pass

def validation(model, val_loader, loss_fn, device):
    model.eval() # puts the model in validation mode
    running_loss = 0
    total = 0
    correct = 0
    
    with torch.no_grad(): # save memory by not saving gradients which we don't need 
        with tqdm(total=len(val_loader)) as pbar:
            for images, labels in iter(val_loader):
                images, labels = images.to(device), labels.to(device) # put the data on the GPU
                outputs = model(images) # passes image to the model, and gets a ouput which is the class probability prediction

                val_loss = loss_fn(outputs, labels) # calculates val_loss from model predictions and true labels
                running_loss += val_loss.item()
                _, predicted = torch.max(outputs, 1) # turns class probability predictions to class labels
                total += labels.size(0) # sums the number of predictions
                correct += (predicted == labels).sum().item() # sums the number of correct predictions
        
                pbar.update(1)

        return running_loss/len(val_loader), correct/total # return loss value, accuracy

## Dataset
Set a path for the dataset downloads

In [None]:
train_path = 'data/train'
valid_path = 'data/valid'

In [None]:
# Define transforms for the training and validation set
training_transforms = transforms.Compose([transforms.RandomRotation(30),
                                          transforms.RandomHorizontalFlip(),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406], 
                                                               [0.229, 0.224, 0.225])])

validation_transforms = transforms.Compose([transforms.ToTensor(),
                                            transforms.Normalize([0.485, 0.456, 0.406], 
                                                                 [0.229, 0.224, 0.225])])

In [None]:
training_dataset = datasets.CIFAR10(train_path, train=True, transform=training_transforms, download=True)
validation_dataset = datasets.CIFAR10(valid_path, train=False, transform=validation_transforms, download=True)

In [None]:
training_loader = torch.utils.data.DataLoader(training_dataset, batch_size=32, shuffle=True)
validation_loader = torch.utils.data.DataLoader(validation_dataset, batch_size=32, shuffle=False)

In [None]:
# Check what classes are in our dataset

training_dataset.classes, validation_dataset.classes

num_classes = len(training_dataset.classes)

## Model Instantiation

In [None]:
model = MyCNN(num_classes)

We always need to keep track of where our PyTorch tensors are being kept i.e. whether or not they're on the GPU.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # Determine whether a GPU is available
model.to(device) # send model to GPU

We now need to define our loss function and optimiser

In [None]:
loss_fn = nn.CrossEntropyLoss() # We use Cross Entropy Loss, as this is a classification task
optimizer = optim.Adam(model.parameters(), lr=0.001) # If in doubt, we use Adam as our optimiser

## Training Time!

In [None]:
total_epoch = 10 # Define how many epochs of training we want

# keep track of things we'd like to plot later
training_losses = []
validation_losses = []
accuracies = []

for epoch in range(total_epoch): # loops through number of epochs
  train_loss = train(model, training_loader, loss_fn, optimizer, device)  # train the model for one epoch
  val_loss, accuracy = validation(model, validation_loader, loss_fn, device) # after training for one epoch, run the validation() function to see how the model is doing on the validation dataset
  
  # keep track of interesting stuff
  training_losses.append(train_loss)
  validation_losses.append(val_loss)
  accuracies.append(accuracy)
  
  print("Epoch: {}/{}, Training Loss: {}, Val Loss: {}, Val Accuracy: {}".format(epoch+1, total_epoch, train_loss, val_loss, accuracy))
  print('-' * 20)

print("Finished Training")

# Save the queen
torch.save(model.state_dict(), 'finished')

We now want to visualise our results:

In [None]:
import matplotlib.pyplot as plt
plt.plot(training_losses, label="Training Losses")
plt.plot(validation_losses, label="Validation Losses")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.title("Loss vs Epoch")
plt.legend()
plt.show()

plt.plot(accuracies, label="Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
plt.title("Accuracy vs Epoch")
plt.legend()
plt.show()

## Once you're done...
Even though the accuracy increases and the loss decreases, our model is not very good. This is normal. Try to improve your accuracy! Things you can change:
- Learning rate
- Number of epochs of training
- Batch size
- Different transforms
- Model structure (number of layers, convolutional layer properties, new layer types like Dropout)

Ultimately, it takes a lot of experimentation and gut feel to go from a basic training loop to an optimised model. There are tools to optimise these hyperparameters, but its always useful to be able to know a good place to start. As with most things, practice makes perfect. 