## Classification of digits using MNIST

1. Data pre-processing
2. Define Model
3. Define loss function, optimizers, hyperparameters
4. Define evaluation function
5. Write up training loop

In [3]:
import typing as t

from tqdm import tqdm
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torchvision.datasets.mnist import MNIST
from torchvision.transforms import transforms

### Data Preprocessing 

After retrieving the dataset, we need to do the following steps

1. Preprocess the data
- Feature normalization
- Convert PIL images into PyTorch tensors
2. Divide the data into minibatches and shuffle the data using DataLoaders 

In [4]:

def get_mnist_dataloader(batch_size: int,
                         is_training_dataset: bool) -> DataLoader:
    """
    Retrieve Mnist dataloader
    Args:
        batch_size: The batch size during training and evaluation
        is_training_dataset: Set to true to retrieve training set. Otherwise, return
    Returns:
        A dataloader object
    """
    # PIL Image (Python imaging library)
    mnist_dataset = MNIST(root='.', download=True, train=is_training_dataset, transform=transforms.Compose([
        # Convert PIL to tensor and normalize values between 0 and 1 by dividing all pixels by 255
        # / 255
        # Tanh function
        transforms.ToTensor()
    ]))
    
    return DataLoader(mnist_dataset,
                      # Size of mini-batch
                      batch_size=batch_size,
                      # Shuffle data each time during training to randomize samples
                      shuffle=True,
                      # We want to drop the last few to ensure that our batch_size remains constant.
                      drop_last=True,
                      # Pin memory speeds up the host to device (usually gpu in the real-world)
                      # during training
                      # Ideally, having all data on the GPU makes training faster, but most GPUs do not have enough
                      # memory to host an entire dataset, so during training, we need to move the tensors from
                      # cpu -> gpu to speed up operations
                      pin_memory=True)

In [5]:
batch_size = 32
training_dataloader = get_mnist_dataloader(batch_size, is_training_dataset=True)
test_dataloader = get_mnist_dataloader(batch_size, is_training_dataset=False)

### 2. Define model

In this section, we will build a classification model. For this task, let's build a MLP (Multi-layer perceptron)
made up of fully connected layers. To do so, we need to do the following.

```python
import torch.nn as nn

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        # define layers
        
    def forward(self, input_images):
        # define forward logic
        pass
```

This is a great opportunity to fiddle around with PyTorch features. For starters, try building your models with `nn.Linear()`. For activation function, `LeakyReLU` might be a good start (with a slope value of 0.1 ~ 0.2). To make your model non-linear, try adding the activation function after each linear layer except for the last.

In [6]:
# Write your model code here
# For starters, build your layer in the following format
"""
(
nn.Linear,
activation,
nn.Linear,
activation,
...
nn.Linear,
# No activation after final layer
)
"""

'\n(\nnn.Linear,\nactivation,\nnn.Linear,\nactivation,\n...\nnn.Linear,\n# No activation after final layer\n)\n'

### 3. Define Optimizers, Loss functions, etc.

For the neural network to train, we need to choose the optimizer. The current silver bullet is `Adam`, but feel free to experiment with other optimizers. Since this is a multi-class classifier, we will be minimizing the cross entropy loss W.R.T the weights of the neural network.

Setting the learning rate is kind of an art, but a good place to start would be `0.001` or `1e-3`. Try fiddling around with the learning rate to see what kind of result you get. 

In [7]:
# Initialize the loss function, optimizer, etc. 
# I recommend using Adam as a starting point. Feel free to experiment with learning rate and other 
# hyperparameters

### 4. Evaluate accuracy of your model

Traditionally, we would split the training set into a training and validation set and use the validation set to assess the progress of our training or the generalization capability of our model during training. 

The most commonly used technique is the [k-fold Cross Validation](https://machinelearningmastery.com/k-fold-cross-validation/) technique. 

The value `k` is a value representing the number of splits or sub-groups we will have. The main idea is to shuffle the dataset and split it into k groups. One of the k groups is used as validation data: the remaining is used to train the model.

We repeat this process until each group in the k groups has been used as the validation dataset.

Generally, `k = 5 or 10` is a good place to start. Remember with higher values of K, the lower the bias. It is also important to split the groups in such a way that we have a roughly balanced distribution of classes within each group.

In [8]:
# No need to accumulate gradients during the evaluation phase
# do this to reduce needless computation
@torch.no_grad()
def get_model_accuracy(model: nn.Module, dataloader: torch.utils.data.DataLoader) -> float:
    # During evaluation, set the model's mode to eval to avoid
    # unexpected surprises with layers such as Batch Normalization and Dropout
    # since they behave differently during training and inference phase.
    model.eval()
    
    for x_eval, y_eval in tqdm(dataloader):
        # Define your evaluation logic right here
        pass
    
    model.train()

### 5. Define training loop

In [9]:
epochs = 10

width = 28
height = 28
area = width * height

for epoch in tqdm(range(epochs)):
    for step, (x_train, y_train) in enumerate(tqdm(training_dataloader)):
        pass
        # TODO: write your code here
        # -----------------------------------
        # TODO: Zero_grad your optimizer so you are not accumulating gradients during training phase
    
        # TODO: First, try flattening your images since
        # our input dimensions of x_train are (batch_size, height, width)
        # .view(), .reshape() or .flatten(start_dim=1) should work
        # prefer reshape and flatten since it work on both contiguous and non-contiguous data
        
        # TODO: Afterwards, feed x_train to your model and obtain predictions
        
        # TODO: Good job, now calculate your loss using nn.CrossEntropyLoss 
        # For more information, look up the PyTorch documentation. You got this!
        
        # Once you have calculated your loss, do backprop on that loss. 
        # Afterwards, update the parameters of your model
                
        # TODO (Optional): For each n steps, log your loss and training progress
        
    
    # TODO: calculate Training accuracy
    training_accuracy = get_model_accuracy(model, training_dataloader)
    training_accuracy = training_accuracy * 100
    # Convert floating point decimal to percentage.
    # e.g. 0.895 -> 89.5
    print(f'Epoch {epoch + 1} model accuracy on training dataset: {training_accuracy:.2f}%')

    # TODO: Calculate your evaluation metric on the evaluation dataset. We will use accuracy, which can be defined as
    # follows: total number of correct predictions / total number of samples
    # For now, we will use the test dataset as the evaluation dataset, but in the real-world
    # make sure to use the k-fold cross validation
    # Note: For each of these operations, in the real-world, try to keep each unit of operation in a single function
    # to enable testing
    test_accuracy = get_model_accuracy(model, test_dataloader)
    test_accuracy = test_accuracy * 100
    # Print out accuracy on test dataset
    print(f'Epoch {epoch + 1} model accuracy on test dataset: {test_accuracy:.2f}%')

  0%|          | 0/10 [00:00<?, ?it/s]
  0%|          | 0/1875 [00:00<?, ?it/s][A
  5%|▍         | 92/1875 [00:00<00:01, 912.30it/s][A
 11%|█         | 200/1875 [00:00<00:01, 1008.03it/s][A
 17%|█▋        | 310/1875 [00:00<00:01, 1049.50it/s][A
 23%|██▎       | 422/1875 [00:00<00:01, 1074.81it/s][A
 28%|██▊       | 534/1875 [00:00<00:01, 1088.38it/s][A
 34%|███▍      | 646/1875 [00:00<00:01, 1098.55it/s][A
 40%|████      | 758/1875 [00:00<00:01, 1103.48it/s][A
 46%|████▋     | 869/1875 [00:00<00:00, 1104.09it/s][A
 52%|█████▏    | 982/1875 [00:00<00:00, 1108.81it/s][A
 58%|█████▊    | 1093/1875 [00:01<00:00, 1108.01it/s][A
 64%|██████▍   | 1205/1875 [00:01<00:00, 1109.17it/s][A
 70%|███████   | 1316/1875 [00:01<00:00, 1107.10it/s][A
 76%|███████▌  | 1427/1875 [00:01<00:00, 1107.34it/s][A
 82%|████████▏ | 1538/1875 [00:01<00:00, 1105.71it/s][A
 88%|████████▊ | 1650/1875 [00:01<00:00, 1108.47it/s][A
 94%|█████████▍| 1762/1875 [00:01<00:00, 1110.81it/s][A
100%|██████████|

NameError: name 'model' is not defined