# Notebook 3: Cifar10 Classification in Pytorch

In this notebook, we will train an image classifier for the CIFAR-10 dataset, that you already know from exercise 6. Today, however, we will use the PyTorch framework which makes everything much more convenient!
We will show you how to implement the deep learning pipeline in simple PyTorch. You could also, for the first time, utilize the GPUs on colab.

## (Optional) Mount in Google Colab

In [1]:
# Use the following lines if you want to use Google Colab
# We presume you created a folder "i2dl" within your main drive folder, and put the exercise there.
# NOTE: terminate all other colab sessions that use GPU!
# NOTE 2: Make sure the correct exercise folder (e.g exercise_07) is given.
# OPTIONAL: Enable GPU via Runtime --> Change runtime type --> GPU

"""
from google.colab import drive
import os

gdrive_path='/content/gdrive/MyDrive/i2dl/exercise_07'

# This will mount your google drive under 'MyDrive'
drive.mount('/content/gdrive', force_remount=True)
# In order to access the files in this notebook we have to navigate to the correct folder
os.chdir(gdrive_path)
# Check manually if all files are present
print(sorted(os.listdir()))
"""

"\nfrom google.colab import drive\nimport os\n\ngdrive_path='/content/gdrive/MyDrive/i2dl/exercise_07'\n\n# This will mount your google drive under 'MyDrive'\ndrive.mount('/content/gdrive', force_remount=True)\n# In order to access the files in this notebook we have to navigate to the correct folder\nos.chdir(gdrive_path)\n# Check manually if all files are present\nprint(sorted(os.listdir()))\n"

### Set up PyTorch environment in colab

For your regular environment this should already have been installed in the previous notebooks.

In [2]:
# Optional: install correct libraries in google colab
# !python -m pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
# !python -m pip install torchtext==0.12.0 torchaudio==0.11.0
# !python -m pip install tensorboard==2.9.1
# !python -m pip install pytorch-lightning==1.6.0

## Imports

In [3]:
import os
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import DataLoader, random_split
import torchvision
import torchvision.transforms as transforms
%load_ext autoreload
%autoreload 2

os.environ['KMP_DUPLICATE_LIB_OK']='True' # To prevent the kernel from dying.

### Get Device
In this exercise, we'll use PyTorch Lightning to build an image classifier for the CIFAR-10 dataset. As you know from exercise 06, processing a large set of images is quite computation extensive. Luckily, with PyTorch we're now able to make use of our GPU to significantly speed things up!

In case you don't have a GPU, you can run this notebook on Google Colab where you can access a GPU for free! 

Of course, you can also run this notebook on your CPU only - though this is definitely not recommended.


In [4]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cpu


## Setup TensorBoard
In exercise 07 you've already learned how to use TensorBoard. Let's use it again to make the debugging of our network and training process more convenient! Throughout this notebook, feel free to add further logs or visualizations to your TensorBoard!

In [5]:
# Delete previous instances of tensorboard
import shutil
tensorboard_path = os.path.abspath("logs")
if os.path.exists(tensorboard_path):
    shutil.rmtree(tensorboard_path)
os.makedirs(tensorboard_path, exist_ok=True)



## Define your Network

Do you remember the good old times when we used to implement everything in plain numpy? Luckily, these times are over and we're using PyTorch which makes everything MUCH easier!

Instead of implementing your own model, solver and dataloader, all you have to do is defining a `nn.Module`.

We've prepared the class `exercise_code/MyPytorchModel` for you, that you'll now finalize to build an image classifier with PyTorch Lightning.

### 0. Dataset & Dataloaders
Check out the function `prepare_data` of the `CIFAR10DataModule` class that loads the dataset, using the class `torchvision.datasets.ImageFolder` (or the previous `MemoryImageFolder` dataset from exercise 3), which is very similar to the class `ImageFolderDataset` that you implemented earlier!

Implement a **transform** to pre-process the raw data (standardize it and convert it to tensors) and assign it to the variable `my_transform`. Note: On the submission server, the normalization as in the notebook 3 on data augmentation will be performed, so please make sure to use the same normalization! For convenience, we added the precomputed normalization values for you. All normalization you are defining here are tailored to your training.

In pytorch-lightning we could also include the dataset and other classes in our model, but a more reasonable way is to define it outside since it usually is used across multiple projects. If you prefer the all-in-one solution, that is great as well, but here we put it separately.

If you want to improve your performance, you can also perform extensive **data augmentation** here!

Also check out the `DataLoader` class that is used to create  `train_dataloader` and `val_dataloader` and that is very similar to your previous implementation of the DataLoader.

### 1. Define your model
Next, let's define your model. Think about a good network architecture. You're completely free here and you can come up with any network you like! (\*)

Have a look at the documentation of `torch.nn` at https://pytorch.org/docs/stable/nn.html to learn how to use use this module to build your network!

Then implement your architecture: initialize it in `__init__()` and assign it to `self.model`. This is particularly easy using `nn.Sequential()` which you only have to pass the list of your layers. 

To make your model customizable and support parameter search, don't use hardcoded hyperparameters - instead, pass them as dictionary `hparams` (here, `n_hidden` is the number of neurons in the hidden layer) when initializing `MyPytorchModel`.

Here's an easy example:

```python

    class MyPytorchModel(nn.Module):

        def __init__(self, hparams):
            super().__init__()
            self.hparams = hparams
           
            self.model = nn.Sequential(
                nn.Linear(input_size, self.hparams["n_hidden"]),
                nn.ReLU(),            
                nn.Linear(self.hparams["n_hidden"], num_classes)
            )

        def forward(self, x):
            # Forward pass
            out = self.model(x)
            return out
```

or 

```python

    class MyPytorchModel(nn.Module):
        def __init__(self, hparams):
            super().__init__()
            self.hparams = hparams
           
            # Model
            self.linear_1 = nn.Linear(input_size, self.hparams["n_hidden"])
            self.activation = nn.ReLU()
            self.linear_2 = nn.Linear(self.hparams["n_hidden"], num_classes)

        def forward(self, x):
            # Forward pass
            x = self.linear_1(x)
            x = self.activation(x)
            x = self.linear_2(x)
            return x
```

or 

```python

    class MyPytorchModel(nn.Module):
        def __init__(self, hparams):
            super().__init__()
            self.hparams = hparams
           
            # Model
            self.linear_1 = nn.Sequential(
                nn.Linear(input_size, self.hparams["n_hidden"]),
                nn.BatchNorm1d(self.hparams["n_hidden"]),
                nn.ReLU()
            )

            self.classifier_layer = nn.Linear(self.hparams["n_hidden"], num_classes)

        def forward(self, x):
            # Forward pass
            x = self.linear_1(x)
            x = self.classifier_layer(x)
            return x


Have a look at the forward path in `forward(self, x)`, which is so easy, that you don't need to implement it yourself. As PyTorch automatically computes the gradients, that's all we need to do! No need anymore to manually calculate derivatives for the backward paths! :)


____
\* *The size of your final model must be less than 20 MB, which is approximately equivalent to 5 Mio. params. Note that this limit is quite lenient, you will probably need much less parameters!*

*Also, don't use convolutional layers as they've not been covered yet in the lecture and build your network with fully connected layers (```nn.Linear()```)!*

### 2. Training & Validation Step
Down below we've implemented the deep learning pipeline for you. Read it carefully, and see how things are implemented in PyTorch.
Read the comments that explain each step of the pipline.

But first, let's choose our hyperparameters!

It could look something like this:

```python
hparams = {
    "batch_size": 64,
    "learning_rate": 3e-3,
    "n_hidden": 180,
    "input_size": 3 * 32 * 32,
    "num_classes": 10,
    "num_workers": 2,
    "device": device,
}
```

In [6]:
from exercise_code.MyPytorchModel import MyPytorchModel, CIFAR10DataModule
# make sure you have downloaded the Cifar10 dataset on root: "../datasets/cifar10", if not, please check exercise 03.
hparams = {}

########################################################################
# TODO: Define your hyper parameters here!                             #
########################################################################


# using the settings I obtained in Ex 6
hparams = {
    "device" : device,
    'loading_method' : 'Memory',
    "batch_size" : 1024,
    "learning_rate" : 1.0e-5,
    "n_hidden" : 800,
    "input_size" : 3 * 32 * 32,
    "num_classes" : 10,
    "num_workers" : 16,
    "epochs" : 200,
    "momentum" : 0.9, # Only for SGD
    
}

########################################################################
#                           END OF YOUR CODE                           #
########################################################################

# Make sure you downloaded the CIFAR10 dataset already when using this cell
# since we are showcasing the pytorch inhering ImageFolderDataset that
# doesn't automatically download our data. Check exercise 3

# If you want to switch to the memory dataset instead of image folder use
# hparams["loading_method"] = 'Memory'
# The default is hparams["loading_method"] = 'Image'
# You will notice that it takes way longer to initialize a MemoryDataset
# method because we have to load the data points into memory all the time.

# You might get warnings below if you use too few workers. Pytorch uses
# a more sophisticated Dataloader than the one you implemented previously.
# In particular it uses multi processing to have multiple cores work on
# individual data samples. You can enable more than workers (default=2)
# via 
# hparams['num_workers'] = 8

# Set up the data module including your implemented transforms
data_module = CIFAR10DataModule(hparams)
data_module.prepare_data()

Some tests to check whether we'll accept your model:

In [7]:
model = MyPytorchModel(hparams)
from exercise_code.Util import printModelInfo
_ = printModelInfo(model)

FYI: Your model has 3.107 params.
Model accepted!


In [8]:
# If the follwing does not work, try to open tesnorboard through the command line.
# %load_ext tensorboard
# %tensorboard --logdir logs --port 6006

In [9]:
from tqdm import tqdm
from exercise_code.MyPytorchModel import MyPytorchModel
from torch.utils.tensorboard import SummaryWriter
 

def create_tqdm_bar(iterable, desc):
    return tqdm(enumerate(iterable),total=len(iterable), ncols=150, desc=desc)


def train_model(model, train_loader, val_loader, loss_func, tb_logger, epochs=10, name="default"):
    """
    Train the classifier for a number of epochs.
    """
    loss_cutoff = len(train_loader) // 10
    optimizer = torch.optim.Adam(model.parameters(), hparams["learning_rate"])
    
    # The scheduler is used to change the learning rate every few "n" steps.
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=int(epochs * len(train_loader) / 5), gamma=hparams.get('gamma', 0.8))
    
    for epoch in range(epochs):
        
        # Training stage, where we want to update the parameters.
        model.train()  # Set the model to training mode
        
        training_loss = []
        validation_loss = []
        
        # Create a progress bar for the training loop.
        training_loop = create_tqdm_bar(train_loader, desc=f'Training Epoch [{epoch + 1}/{epochs}]')
        for train_iteration, batch in training_loop:
            optimizer.zero_grad() # Reset the gradients - VERY important! Otherwise they accumulate.
            images, labels = batch # Get the images and labels from the batch, in the fashion we defined in the dataset and dataloader.
            images, labels = images.to(device), labels.to(device) # Send the data to the device (GPU or CPU) - it has to be the same device as the model.

            # Flatten the images to a vector. This is done because the classifier expects a vector as input.
            # Could also be done by reshaping the images in the dataset.
            images = images.view(images.shape[0], -1) 

            pred = model(images) # Stage 1: Forward().
            loss = loss_func(pred, labels) # Compute the loss over the predictions and the ground truth.
            loss.backward()  # Stage 2: Backward().
            optimizer.step() # Stage 3: Update the parameters.
            scheduler.step() # Update the learning rate.


            training_loss.append(loss.item())
            training_loss = training_loss[-loss_cutoff:]

            # Update the progress bar.
            training_loop.set_postfix(curr_train_loss = "{:.8f}".format(np.mean(training_loss)), 
                                      lr = "{:.8f}".format(optimizer.param_groups[0]['lr'])
            )

            # Update the tensorboard logger.
            tb_logger.add_scalar(f'classifier_{name}/train_loss', loss.item(), epoch * len(train_loader) + train_iteration)

        # Validation stage, where we don't want to update the parameters. Pay attention to the classifier.eval() line
        # and "with torch.no_grad()" wrapper.
        model.eval()
        val_loop = create_tqdm_bar(val_loader, desc=f'Validation Epoch [{epoch + 1}/{epochs}]')
        
        with torch.no_grad():
            for val_iteration, batch in val_loop:
                images, labels = batch
                images, labels = images.to(device), labels.to(device)

                images = images.view(images.shape[0], -1) 
                pred = model(images)
                loss = loss_func(pred, labels)
                validation_loss.append(loss.item())
                # Update the progress bar.
                val_loop.set_postfix(val_loss = "{:.8f}".format(np.mean(validation_loss)))

                # Update the tensorboard logger.
                tb_logger.add_scalar(f'classifier_{name}/val_loss', loss.item(), epoch * len(val_loader) + val_iteration)
        

# Create a tensorboard logger.
# NOTE: In order to see the logs, run the following command in the terminal: tensorboard --logdir=./
# Also, in order to reset the logs, delete the logs folder MANUALLY.

path = "logs"
num_of_runs = len(os.listdir(path)) if os.path.exists(path) else 0
path = os.path.join(path, f'run_{num_of_runs + 1}')

tb_logger = SummaryWriter(path)

# Train the classifier.
labled_train_loader = data_module.train_dataloader()
labled_val_loader = data_module.val_dataloader()

epochs = hparams.get('epochs', 4)
loss_func = nn.CrossEntropyLoss() # The loss function we use for classification.
model = MyPytorchModel(hparams).to(device)
train_model(model, labled_train_loader, labled_val_loader, loss_func, tb_logger, epochs=epochs, name="Default")

print()
print("Finished training!")
print("How did we do? Let's check the accuracy of the defaut classifier on the training and validation sets:")
print(f"Training Acc: {model.getTestAcc(labled_train_loader)[1] * 100}%")
print(f"Validation Acc: {model.getTestAcc(labled_val_loader)[1] * 100}%")



Training Epoch [1/200]: 100%|██████████████████████████████████████████████| 30/30 [00:03<00:00,  9.58it/s, curr_train_loss=2.16849311, lr=0.00001000]
Validation Epoch [1/200]: 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 14.77it/s, val_loss=2.16311898]
Training Epoch [2/200]: 100%|██████████████████████████████████████████████| 30/30 [00:03<00:00,  9.27it/s, curr_train_loss=2.07788690, lr=0.00001000]
Validation Epoch [2/200]: 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15.84it/s, val_loss=2.06772149]
Training Epoch [3/200]: 100%|██████████████████████████████████████████████| 30/30 [00:03<00:00,  9.91it/s, curr_train_loss=2.01206883, lr=0.00001000]
Validation Epoch [3/200]: 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 14.19it/s, val_loss=1.99447708]
Training Epoch [4/200]: 100%|██████████████████████████████████████████████| 30/30 [00:03<00:0


Finished training!
How did we do? Let's check the accuracy of the defaut classifier on the training and validation sets:


100%|██████████| 30/30 [00:01<00:00, 15.58it/s]


Training Acc: 65.60666666666667%


100%|██████████| 10/10 [00:00<00:00, 18.95it/s]


Validation Acc: 52.12%


Now that everything is working, feel free to play around with different architectures. As you've seen, it's really easy to define your model or do changes there.

To pass this submission, you'll need **50%** accuracy.


# Save your model & Report Test Accuracy

When you've done with your **hyperparameter tuning**, have achieved **at least 50% validation accuracy** and are happy with your final model, you can save it here.

Before that, we will check again whether the number of parameters is below 5 Mi and the file size is below 20 MB.

When your final model is saved, we'll lastly report the test accuracy.

In [10]:
from exercise_code.Util import test_and_save

test_and_save(model, data_module.val_dataloader(), data_module.test_dataloader())

100%|██████████| 10/10 [00:00<00:00, 11.28it/s]


Validation Accuracy: 52.12%
FYI: Your model has 3.107 params.
Saving model...
Checking size...
Great! Your model size is less than 20 MB and will be accepted :)
Your model has been saved and is ready to be submitted. 
NOW, let's check the test accuracy:


100%|██████████| 30/30 [00:01<00:00, 17.06it/s]

Test Accuracy: 65.35666666666667%





Congrats! You've now finished your first image classifier in PyTorch Lightning! Much easier than in plain numpy, right? Time to get started with some more complex neural networks - see you at the next exercise!

To create a zip file with your submission, run the following cell:

In [11]:
from exercise_code.submit import submit_exercise

submit_exercise('../output/exercise07')

relevant folders: ['exercise_code', 'models']
notebooks files: ['3_Cifar10_Pytorch.ipynb', '2_tensorboard.ipynb', '1_pytorch.ipynb']
Adding folder exercise_code
Adding folder models
Adding notebook 3_Cifar10_Pytorch.ipynb
Adding notebook 2_tensorboard.ipynb
Adding notebook 1_pytorch.ipynb
Zipping successful! Zip is stored under: /home/timm_pop/Documents/i2dl/output/exercise07.zip


# Submission Instructions

Congratulations! You've just built your first image classifier with PyTorch Lightning! To complete the exercise, submit your final model to our submission portal - you probably know the procedure by now.

1. Go on [our submission page](https://i2dl.vc.in.tum.de/submission/), register for an account and login. We use your matriculation number and send an email with the login details to the mail account associated. When in doubt, login into tum online and check your mails there. You will get an ID which we need in the next step.
2. Log into [our submission page](https://i2dl.vc.in.tum.de/submission/) with your account details and upload the `zip` file. Once successfully uploaded, you should be able to see the submitted file selectable on the top.
3. Click on this file and run the submission script. You will get an email with your score as well as a message if you have surpassed the threshold.

# Submission Goals

- Goal: Successfully implement a a fully connected NN image classifier for CIFAR-10 with PyTorch Lightning

- Passing Criteria: Similar to the last exercise, there are no unit tests that check specific components of your code. The only thing that's required to pass this optional submission, is your model to reach at least **50% accuracy** on __our__ test dataset. The submission system will show you a number between 0 and 100 which corresponds to your accuracy.

- You can make **$\infty$** submissions until the end of the semester. Remember that this exercise is an __OPTIONAL SUBMISSION__ and will __not__ be counted for the bonus. 