# Homework 04 - Image Classification

Contact: David C. Schedl (david.schedl@fh-hagenberg.at)

Note: this is the starter pack for the **Digital Imaging / Computer Vision** homework. You do not need to use the exact same template and can start from scratch as well!
Using regular Python files (.py) is also possible.

# Task 
<a name="Task-A" id="Task-A"> </a>

The goal of this assignment is to familiarize yourself with deep learning for the task of image classification.
You should pick a **dataset** of your choice and train multiple **classifiers** to distinguish between different classes of images. 


The different classifiers/techniques that you should try are:
 * *Optional: Linear Classifiers (to familiarize yourself with the concept),*
 * a simple Multi-Layer Perceptron (MLP) or NN,
 * a Convolutional Neural Network (CNN) such as LeNet,
 * a pretrained model (CNN or another architecture) with Transfer Learning (TL).

The imagedataset is up to you. You can use a dataset from the [PyTorch vision datasets](https://pytorch.org/vision/stable/datasets.html) or any other dataset that you find interesting. In the course we already worked with MNIST and CIFAR10. 

Train those networks and compare the results. Optionally, you can also try to modify the architecture of the individual networks (e.g., the MLP) to improve the results.

You can use PyTorch or TensorFlow for this assignment.


**Hint(s):** 
- Start simple (e.g, use MNIST/CIFAR10 and a small network) and get more complicated if you are sure that everything works correctly.
- For training try to get a fast system (e.g., with an NVIDIA GPU) or use Google Colab or Amazon SageMaker. *Note that Colab will kick you from their servers after you use too much RAM or CPU time.*
- Plan enough time for this assignment. It can be quite time-consuming to train a network for a high-resolution dataset (start simple).



## Setup

Let's import useful libraries, first. 
We'll download binary images into the `binary_leaves` folder. 

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import torchvision.transforms.functional as TF
import matplotlib.pyplot as plt
import time
import os
import copy
import math

## Loading a dataset

Below you can find the code to load and display a dataset with PyTorch vision.

We will use **E**MNIST as example. It is an extension of the MNIST dataset with 47 classes (digits and letters).
If you want to simplify things use `split='mnist'` to only get the original MNIST (without the E) digits.

If you want to use a dataset with higher resolution images you can use `transforms.Resize` to directly downscale the images while loading the dataset ([PyTorch transformations](https://pytorch.org/vision/stable/transforms.html)).

In [None]:
# Data augmentation and normalization for training
# Just normalization for validation
data_transforms = {
    "train": transforms.Compose(
        [
            #transforms.Resize((32, 32)), <-- for large images
            transforms.ToTensor(),
        ]             ),
    "val": transforms.Compose(
        [
            #transforms.Resize((32, 32)), <-- for large images
            transforms.ToTensor(),
        ]    ),
}

batch_size = 32

image_datasets = {
    "train": torchvision.datasets.EMNIST(
        root="./data", split="balanced",
        train=True,
        download=True,
        transform=data_transforms["train"],
    ),
    "val": torchvision.datasets.EMNIST(
        root="./data",
        split="balanced",
        train=False,
        download=True,
        transform=data_transforms["val"],
    ),
}

dataloaders = {
    x: torch.utils.data.DataLoader(
        image_datasets[x],
        batch_size=batch_size,
        shuffle=True if x == "train" else False,
        num_workers=2,
    )
    for x in ["train", "val"]
}
dataset_sizes = {x: len(image_datasets[x]) for x in ["train", "val"]}
class_names = image_datasets["train"].classes
print("number of classes: ", len(class_names))
print(dataset_sizes)

Let's visualize some images from the (training) dataset.

In [None]:
# visualize sample images of the training dataset 
def imshow(inp, title=None, ax=plt):
    """Imshow for Tensor."""
    inp = inp.numpy().transpose((2, 1, 0)) # move the color dimension to the last axis
    # NOTE: for the EMNIST dataset we flip the x and y axis!!! (remove for other datasets)
    inp = np.clip(inp, 0, 1)
    ax.imshow(inp[:,:,0], cmap="gray")
    if title is not None:
        plt.title(title)


# Get a batch of training data
inputs, classes = next(iter(dataloaders["train"]))

plt.figure()
# Make a grid from batch
for i in range(len(inputs)):
    ax = plt.subplot(math.ceil(len(inputs)/8), 8, i + 1, xticks=[],
                        yticks=[],)
    imshow(inputs[i], title=class_names[classes[i]], ax=ax)


## LeNet-5 in PyTorch

Below you can find the code for a (modernized) LeNet-5 architecture in PyTorch. 
Inspired by [this](https://towardsdatascience.com/implementing-yann-lecuns-lenet-5-in-pytorch-5e05a0911320) blog post.

In [None]:
# Define a model
inputs, classes = next(iter(dataloaders["train"]))
input_shape = inputs[0].shape
print("input:", input_shape)
nb_classes = len(class_names)

class CNNModel(nn.Module):

    def __init__(self, input_shape, nb_classes):
        super(CNNModel, self).__init__()

        self.act = nn.ReLU()

        self.conv1 = nn.Conv2d(input_shape[0], 20, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2),)
        self.pool1 = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        self.conv2 = nn.Conv2d(20, 50 , kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
        self.pool2 = nn.MaxPool2d(kernel_size=(2, 2), stride=(2, 2))
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(input_shape[1]//4*input_shape[2]//4*50, 500)
        self.fc2 = nn.Linear(500, nb_classes)

    def forward(self, x):
        x = self.conv1(x)
        x = self.act(x)
        x = self.pool1(x)
        x = self.conv2(x)
        x = self.act(x)
        x = self.pool2(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)

        return x

model = CNNModel(input_shape, nb_classes) # instance the model
print( "output:", model(inputs).shape ) # check the output shape of the model -> (batch_size, nb_classes)

## GPU

Running the code on the GPU is easy. Just move the model and the data to the GPU with `model.to(device)` and `data.to(device)`.
You can check if you have a GPU available with `torch.cuda.is_available()`.

When data is moved to the GPU it is stored on the GPU's memory. You cannot access it from the CPU anymore. Thus, you need to move it back to the CPU with `data.cpu()`.

If you want to use a GPU in Colab, go to the menu and select **Edit** -> **Notebook settings** -> **Hardware accelerator** -> switch to **GPU**.

In [None]:
# if you want to use a GPU (recommended) use `tensor.to(device)`
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
if device.type != "cuda":
    print("Using CPU! things will be slow! :(")

model = model.to(device) # move the model to the GPU
output = model(inputs.to(device)) # move the input to the GPU and run the model
output = output.cpu() # move the output back to the CPU for further processing

## Further comments/hints:
*   You do not need to come up with super efficient implementations or a perfect classifier! It is mostly about understanding the topic and the problem.
*   Think about the options and parameters, train it, and evaluate your solutions on the test images.
*   Summarize your findings and evaluations in the report! 


**Have fun!** 🤖
