# Lab 6 - Intro to Image Classification with Convolutional Neural Networks


## Background - Image Classification

For this project you will be introduced to the basics of Convolutional Neural Networks (CNNs) and the PyTorch framework.  Deep learning with CNNs can be very computationally expensive and runs fastest with GPU support.  If you do not have access to GPUs on your local machine, you can use some from Google using their [colab tool](https://colab.research.google.com).  Colab runs exactly like jupyter notebooks and you can directly upload your .ipynb file. If you have a Mac machine that has a M1 or M2 processor, you can also use MPS acceleration for a slight speed up.

Image classification is the task of taking an image and labeling it as a category.  CNNs have been a leading method for image classification since it dominated the [ImageNet competition in 2012](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf).

The instructions for the lab are contained in this Jupyter notebook. **You may choose to complete the lab in this notebook, or you may choose to complete the lab in standard Python using the .py template file.**

# Part A: Dataset

You will be performing classification on the [Stanford Cars Dataset](https://www.kaggle.com/datasets/jessicali9530/stanford-cars-dataset), which consists of 16,185 images of 196 classes of cars.  The dataset is split 50-50 into a training set and testing set.  You can download the images and the needed annotation files from this [Google Drive folder](https://drive.google.com/drive/folders/1GLHQAt3KNN_3eIETkinzlY5q9HG44Blx?usp=sharing).  Each set is around 1 GB of data so please **<span style="color:red">DO NOT</span>** include the files when you upload your notebook--**just turn in the .ipynb or .py file**.  Assume the notebooks will be run with the images in folders labeled `cars_train` and `cars_test` like so:

```
.
+--proj6-image-classification.ipynb
+--cars_train
|  +--00001.jpg
|  +--00002.jpg
|  +--...
+--cars_test
|  +--00001.jpg
|  +--00002.jpg
|  +--...
+--test_annos.json
+--train_annos.json
```

The first step in any neural network method is to make sure you can read in the data.  Since there will be a lot of images for this project, it is possible not all of them will fit into memory.  This is a common problem in CNNs and PyTorch has provided a pattern so as to only have the images you need in memory at any given time.  They provide a class called `DataLoader` that acts as an iterable object.  To use `DataLoader`, you will need to implement a subclass of PyTorch's `Dataset` class.  To do you so will need to create a class that inherits from `Dataset` and implements the methods `__getitem__` and `__len__`.  An example is given below and PyTorch provides a [tutorial here](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html). You can also reference the Custom Image Loader found in [this tutorial](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#creating-a-custom-dataset-for-your-files):


In [None]:
from torch.utils.data import Dataset, DataLoader

class MyDataset(Dataset):
    def __init__(self, filename):
        super(MyDataset, self).__init__()
        # TODO: implement what happens when someone calls: dataset = MyDataset()
        # Pull in relevant file information, images lists, etc.
    
    def __getitem__(self, idx):
        # TODO: implement what happens when someone calls dataset[idx]
        # Return an image and it's associated label at location idx 
    
    def __len__(self):
        # TODO: implement what happens when someone calls len(dataset)
        # Determine the number of images in the dataset

training_dataset = MyDataset("train_annos.json")
loader = DataLoader(training_dataset,batch_size=1)
for im, label in loader:
      print(im.shape,label)
      # More stuff we will implement a later in the lab

We provide you with two files `test_annos.json` and `train_annos.json`.  These files contain a dictionary mapping image name to the class label the image belongs to.  You can use these files in you Dataset class in order to provide the ground truth labels.  For part A, you will need to implement a dataset class.

**Note:** Both the images and the labels are 1-indexed. You can load the images however you choose, but the labels must be 0-indexed to work with Pytorch's loss functions. Make sure to account for this in your Dataset class.

**Note:** Make sure the images and labels that are returned are both PyTorch tensors. Specifically, the images should be returned as Tensors that rearranged to be shape [channels, rows, cols] and should be values between 0 and 1. To guarantee this, I recommend using `read_image` from `torchvision.io` and dividing by 255.

**Note:** Some of the images in the dataset are grayscale while most are RGB. To prevent issues later, make sure to load all images as RGB. If you use `read_image`, you can use `ImageReadMode.RGB` to gaurantee that all images have 3 channels.

**Note:** Python has a [json library](https://docs.python.org/3/library/json.html) that you can use to turn json files in to Python dictionaries. You can look up some simple tutorials online.

# Part B: Neural Network Architecture

The main backbone for deep learning is the actual neural network architecture.  For image classification, this will consist of some combination of `Conv2d` layers with activations--usually `ReLU`--with intermitted downsampling--usually done using `MaxPool2d`--followed by a few Linear layers.  The input to the network should be an image with shape `(batch_size, channels, image_height, image_width)`(e.g. an single image with dimensions 224x224 would be `(1, 3, 224, 224)`) and output a vector of shape `(num_classes,)` where the largest value's index in the output vector indicates the class label.  

While we built our own network in the mini-lab, for this lab we will used one of Pytorch's pretrained networks. This has the benefit of already having learned features from training on an ImageNet classification problem. To pull in this pretrained network, we use the following line of code:


In [None]:
import torchvision

model = torchvision.models.resnet18(pretrained=True)

print(list(model.__dict__["_modules"].keys()))

The ResNet18 is a common baseline network that uses convolution layers, batch normalization, layers of residual blocks, and a fully connected layer at the end. The different layers are listed above. However, because the pretrained network was trained on ImageNet, the last layer is designed to predict 1000 classes, not 196 like in our dataset.

In [None]:
print(model.fc)

Our goal is to use the same architecture as the ResNet, but replace the last fully connected layer with a new fully connected layer that goes from 512 input features to 196 output features.

PyTorch provides a nice framework for making a neural network architecture.  A network is typically made as a class that inherits from PyTorch's `Module` class and implments the `forward` method.  A network might take the form of the example below. PyTorch also provides a simple Neural Network [tutorial here](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html), the Training a Classifier tutorial is especially helpful.


In [None]:
import torch.nn as nn 

class MyNetwork(nn.Module):
    def __init__(self):
        super(MyNetwork, self).__init__()
        ref_model = torchvision.models.resnet18(pretrained=True)
        self.conv1 = ref_model.conv1
        self.bn1 = ref_model.bn1
        self.relu = ref_model.relu
        # TODO: Continue network setup here
        # TODO: Replace fc layer with your own linear layer that outputs 196 classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        # TODO: Continue feeding output through all layers of the network
        return x

Take all the pretrained layers from ResNet18, but then define your own last fully connected layer. Then write the appropriate forward pass function.

**Note:** ResNet was trained with images that are normalized according to the ImageNet color averages. This means you may want to include an appropriate normalization in your Dataset class if you did not already. This can easily be done with Pytorch's transform objects.

```Python
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225])
```

# Part C: Training

Now that you can access your data and you have a network architecture setup, its time to put things together and start training.  Training requires two major components: 1) the loss function and 2) the optimizer.  The loss function is a comparison between your results and the ground truth data.  The optimizer is what takes the results of the loss function and backpropagates the error to the network weights in an attempt to decrease the loss.  The most common loss functions used for classification is [Cross Entropy](https://pytorch.org/docs/stable/nn.html#crossentropyloss) while the most commonly used optimizer function is [Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam).  

A basic training step might take the following form:

In [None]:
training_dataset = MyDataset("train_annos.json")
train_loader = DataLoader(training_dataset,batch_size=1,shuffle=True)

model = MyNetwork()

learning_rate = 1e-3
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
loss_func = torch.nn.CrossEntropyLoss()

for images,labels in train_loader:
    optimizer.zero_grad()
    outs = model(inputs)
    loss = loss_func(outs, labels)  # loss_func would be an instance of a torch.nn.CrossEntropyLoss class
    loss.backward()
    optimizer.step()

For deliverables on this section, modify the above code to store the loss after every few passes (use `loss.item()` go get the value without the tensor info). Then display a plot of the value of the loss over time.  If things are working, the loss should be decreasing.

You may also choose to run your training loop multiple times. Each run of the training loop is called an **epoch**.

**Note**: This step could take several hours so you will want to look into being able to save your model to a file and load it up again.

**Note**: Mac computers sometimes create .DS_STORE files inside of directories. If your training or testing loop breaks at a random time, it may be trying to load a hidden .DS_STORE file that it thinks is an image. Since they are hidden in the the Finder window, you will need to use the terminal to navigate to the data folder and use command `rm .DS_STORE`.

**Tip**: If you want a good sense of how long your code is going to take to run, use the `tqdm` class to time the FOR loop.

# Part D: Testing

One of the goals of deep learning is to make a model that generalizes to data it has never seen (e.g. new images of cars).  For this part, you will test your generalizability by running the model on a dataset it has not yet seen during training.  To do so, you will also need to put the model into  you will need to make sure you are not calculating any of the gradients by using `torch.no_grad` in a with statement. You will aslo need to put the network into evaluation mode:
```Python
model.eval()
with torch.no_grad():
    # enter testing code here
```

To put the network back into training mode, call `model.train()`.

You will compare your predictions with the ground truth value for value.  The output of your network, however, will be a vector of length 196 (the number of possible classes for the cars dataset) with the **largest** value representing the guessed class.  You'll need to extract the guessed class number and compare it with the ground truth number for all images in the test dataset and calculate the overall accuracy.  Print out the overall accuacy your model got.

While high test accuracy is not the only goal in this lab, most students are able to get above 50% in their testing accuracy. If you are unable to reach this level of accuracy, it may indicate an error in your code.

In [None]:
model.eval()
with torch.no_grad():
    # enter testing code here

## Grading
Points for this assigment will be assigned as follows (100 points total):
* [30 pts] Making a Dataset class
* [10 pts] Setting up you architecture
* [30 pts] Training your model and plotting training loss
* [20 pts] Displaying the overall accuracy of your model

The last 10 points are earned through completing a subset of the following explorations:
* [10 pts] Increase the batch size of your training network. To do so, you will need to garauntee that all images in a batch are the same spatial resolution or it will not run. Thus, you will need to add random crop data augmentation, but make sure the crop is not too small, otherwise you might miss parts of the car. Describe the effects an increased batch_size had on training.
* [10 pts] Modify your code to include GPU acceleration using .cuda calls in the appropriate places. If you have an M1 or M2 processor Mac, you can modify your code to have MPS acceleration. Describe the speed-ups to the training loop after implementing acceleration.
* [10 pts] Enhance your dataloader to include reflection data augmentation (i.e. double the size of your training data by taking the mirror image across the y-axis). **DO NOT** do reflection augmentation across the x-axis (we don't care to detect cars when they are upside down!). You may also add other augmentations. Describe what effects the augmentation had on testing accuracy.
* [20 pts] Generate a confusion matrix of the 196 categories. A confusion matrix shows how often a specific category is guessed as each other category (you can search for example plots online). For example, the 11th row and 34th column in the matrix should tell you how many times category 11 images were guessed to be category 34 images. Thus, a perfect predictor on the test set would have nonzero values only along the diagnol. Once you generated the confusion matrix, you may simply plot it as a grayscale image (with interpolation turned off).
* [10 pts] Analyze the effect of learning rates on the accuracy of the network. Describe what you found and give supporting plots.
* [10 pts] Analyze the effect of epochs on the accuracy of the network. Describe what you found and give supporting plots.
* [10 pts] Analyze the effect of batch_size and varying optimizers on the accuracy of the network. Describe what you found and give supporting plots. 
* [10 pts] Analyze the effect of varying optimizers on the accuracy of the network. Describe what you found and give supporting plots. A list of optimizers in Pytorch can be found [here](https://pytorch.org/docs/stable/optim.html).
* [10 pts] Analyze the effect of different pretrained networks on the accuracy of the network. Describe what you found and give supporting plots. A list of pretrained networks in Pytorch can be found [here](https://pytorch.org/docs/stable/torchvision/models.html).

You may earn up to 20 points extra credit for additional explorations you complete.

An additional 15 points of extra credit will be given to the individual with the highest test accuracy.

Please describe which explorations you completed in the markdown cell below (or in the comments on Canvas)