# Mini Project 2

By the deadline, please submit the provided Jupyter notebook with all/some required tasks completed and clearly solved. Make sure your code is neat, well-commented, and that all outputs are visible (run all cells before saving). Notebooks with missing tasks or unexecuted cells may receive fewer points. After you submit, you won’t be able to make changes, so double-check your work and be sure to start from the provided template

## Submission rules
As already discussed in class, we will stick to the following rules.
- Use the templates and name your files `NAME_SURNAME.ipynb` (If you have more than one name, just concatenate them). We will compare what you present with that file. 
- Code either not written in Python or not using PyTorch receives a grade of 0. Of course, you can use auxiliary packages when needed (`matplotlib`, `numpy`, ...), but for the learning part, you must use PyTorch.
-  If plagiarism is suspected, TAs and I will thoroughly investigate the situation, and we will summon the student for a face-to-face clarification regarding certain answers they provided. In case of plagiarism, a score reduction will be applied to all the people involved, depending on their level of involvement.
-  If extensive usage of AI tools is detected, we will summon the student for a face-to-face clarification regarding certain answers they provided. If the answers are not adequately supported with in-person answers, we will proceed to apply a penalty to the evaluation, ranging from 10% to 100%.

## Image classification uning a CNN

The CIFAR-10 dataset is a widely used collection of images in the field of computer vision. It consists of 60000 (50000 for the training and 10000 for the test) 32x32 color images across 10 different classes, with each class containing 6,000 images. These classes include common objects such as airplanes, automobiles, cats, and dogs. CIFAR-10 serves as a benchmark for image classification tasks and has been instrumental in developing and evaluating machine learning algorithms for image recognition. 
The task of this assignment is to classify the images

In [1]:
import matplotlib.pyplot as plt
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch

### Task 0 (0 pts)
Understand what's going on in the two cells below. Do the following experiments: 
1. Run the first and the second cell in a row.
2. Run the first cell and two times in a row the second one.

What do you observe? We don't need any answer, but please just keep this phenomenon in mind. 

In [39]:
torch.manual_seed(0)

<torch._C.Generator at 0x7f858a1e8890>

In [42]:
torch.randint(1, 10 , (1,1))

tensor([[3]])

Now, we'll set a seed that will remain the same for all the code below

### Task 1 (2.5 pts)
Load the [data (You can do it directly in PyTorch)](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) and take some time to inspect the dataset. Report here at least one image per class and a histogram of the distribution of the images of the training and test set. 

### Task 2 (2.5 pts)
Are the entries in the correct type for the DL framework in PyTorch? How can you arrive at a suitable format for your training pipeline? Be sure to have understood
- The type of each element of the dataset 
- How we can convert it to a suitable type.
- The dimension of the image as a `torch.Tensor` object
- The meaning of each dimension of the images 

### Task 3 (10 pts)
When you arrive at this question you should have each entry as a `torch.Tensor` of shape (3,32, 32) and clear the meaning of each dimension. A good practice in DL is to work with features having mean 0 and standard deviation equal to 1. Convert the dataset of the images in this format. To do so, you can do it from scratch (not recommended) or use the function [`torchvision.transforms.Normalize`](https://pytorch.org/vision/0.8/transforms.html\#torchvision.transforms.Normalize). If you go for this second option, don't forget that we have already transformed our dataset in the previous point, hence, it could be of help using the function [`transforms.Compose`](https://pytorch.org/vision/0.8/transforms.html\#torchvision.transforms.Compose). You can see an example in the tutorial linked above.

**Note** that the usage of the function is tricky: try to really understand what's going on

### Task 4 (1 pts)
As you might have observed, we only have a train and test set. We need a validation set for hyperparameter tuning. Create a validation set by splitting the test set. 

### Task 5 (20 pts)
Starting from the code provided during Lecture 6, define a ConvNet. You can **only** use:

- Convolutional layers  
- Max/Avg Pooling layers  
- Activation Functions  
- Fully connected layers  

Note that the choice `Conv - Pool - Conv` is not mandatory. Have you tried `Conv - Conv - Activ - Pool - Conv - Conv - Activ - Pool - FC`?  
For each convolutional layer you can choose padding and stride. Be prepared on comment on the choices of the dimensions of the layers.

### Task 6 (35 pts)
Implement the training pipeline. Make sure to code the following:

- Print and record the current training loss and accuracy every *n* steps (choose *n*)  
- Print and record the current validation loss and accuracy every *n* steps (choose *n*)  

The validation loss will help you in hyperparameter tuning. Train your model.  
With my choice of hyperparameters, the best test accuracy is above **65%**, hence, a necessary condition to get full marks is to achieve an accuracy on the test set greater or equal to **65%**.

You may want to follow some of these hints (**They are, of course, dependent on the architecture, optimizer, ...**):

- 5 epochs should be enough  
- Batch size = 32 could be a good starting point  
- A learning rate around 0.03 should be a good tradeoff between speed and stability  
- SGD should be a good optimizer  

Be prepared to answer the following questions:
- Why that learning rate? Why not higher/smaller?  
- Number of epochs: Why that number?  
- Some comments on your architecture: why that deep? why not deeper? why not smaller?  

### Task 7 (4 pts)
Plot the evolution of the training and validation losses on the same graph. On the x-axis, you should preferably use the number of steps. Do you observe signs of overfitting or underfitting? Note that an improperly trained model will also result in a point deduction for the previous exercise.

### Task 8 (20 pts)
You may have noticed that it is quite likely to get overfitting in many cases, preventing it from going beyond 65% accuracy.  

Change the architecture as you like and try to increase the accuracy as much as possible. The base architecture must remain a ConvNet.  
Try any idea that comes to your mind but **justify it**.

Some hints:

- Add Dropout (Any other hyperparameter to tune?)  
- Change activation functions ([GeLU](https://arxiv.org/pdf/1606.08415v3.pdf) is known to work well with images)  
- Make your CNN deeper  
- Add some regularization techniques  
- Change the optimizer  
- Data augmentation  
- Batch normalization  
- …  

#### Points distribution:

- Any successful try (test accuracy increase): 6 points  
- Accuracy on test set ≥ 70% : 7 points  
- Accuracy on test set ≥ 72% : 8 points  
- Accuracy on test set ≥ 73% : 9 points  
- …  
- Accuracy on test set ≥ 78% : 15 points  

To get the remaining 5 points you should be able also justify your choices (e.g., “I used Dropout because … and hence I have increased epochs as I noted …”).  

**Note:** Transfer learning is **not allowed** — otherwise you can complete this exercise easily, and it is not useful for educational purposes.

### Task 9 (10 pts)

Until now, we asked you to keep the seed fixed.  
Train **the model you defined at Task 5** five times with 5 different seeds (already specified below, **don't change them**), each for the same number of epochs epochs.  
Don't change other hyperparameters. Observe the accuracy of the test set in each case. What can you say?  

In [4]:
for seed in range(5):
    torch.manual_seed(seed)
    print(f"Testing seed {seed}", flush = True)

Testing seed 0
Testing seed 1
Testing seed 2
Testing seed 3
Testing seed 4


## Questions

During the presentation, we may ask questions to ensure you have understood the core concepts of the course. Examples include:
1. Where are the parameters that a CNN learns during training?
2. Why do we use convolutional layers instead of fully connected layers for image data?
3. What is the role of pooling layers? What would happen if we removed them?
4. What is Dropout? Batch Normalizaton? Do you have other regularization tecniques in mind?