## Assignment 3: Dealing with overfitting

Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).

Your goal for today:
1. Train a FC (fully-connected) network that achieves >= 0.885 test accuracy.
2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).
3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.

__Please, write a small report describing your ideas, tries and achieved results in the end of this file.__

*Note*: Tasks 2 and 3 are interrelated, in task 3 your goal is to make the network from task 2 less prone to overfitting. Task 1 is independent from 2 and 3.

*Note 2*: We recomment to use Google Colab or other machine with GPU acceleration.

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torchsummary
from IPython.display import clear_output
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import numpy as np
import os


device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

In [None]:
# Technical function
def mkdir(path):
    if not os.path.exists(root_path):
        os.mkdir(root_path)
        print('Directory', path, 'is created!')
    else:
        print('Directory', path, 'already exists!')
        
root_path = 'fmnist'
mkdir(root_path)

In [None]:
download = True
train_transform = transforms.ToTensor()
test_transform = transforms.ToTensor()
transforms.Compose((transforms.ToTensor()))


fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path, 
                                                        train=True, 
                                                        transform=train_transform,
                                                        target_transform=None,
                                                        download=download)
fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path, 
                                                       train=False, 
                                                       transform=test_transform,
                                                       target_transform=None,
                                                       download=download)

In [None]:
train_data_size = int(.8 * len(fmnist_dataset_train.data))
validation_data_size = len(fmnist_dataset_train) - train_data_size

from torch.utils.data import random_split
train_dataset, validate_dataset = random_split(
                                                 fmnist_dataset_train,
                                                 [train_data_size, validation_data_size]
                                                )

In [None]:
train_loader = torch.utils.data.DataLoader(fmnist_dataset_train, 
                                           batch_size=128,
                                           shuffle=True,
                                           num_workers=2)
test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,
                                          batch_size=256,
                                          shuffle=False,
                                          num_workers=2)

In [None]:
len(fmnist_dataset_test)

In [None]:
for img, label in train_loader:
    # print('img:',img)
    print(img.shape)
    # print('label',label)
    print(label.shape)
    print(type(label))
    print(label.size(0))
    break

### Task 1
Train a network that achieves $\geq 0.885$ test accuracy. It's fine to use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.

In [None]:
class TinyNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape,400),
            nn.ReLU(),
            nn.Linear(400, 300),
            nn.Tanh(),
            nn.Dropout(p=.23,inplace= False),
            nn.Linear(300, 100),
            nn.ReLU(),
            nn.Linear(100, 10),
            nn.Tanh()
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(TinyNeuralNetwork(), (28*28,))

Your experiments come here:

In [None]:
new_train_loader = torch.utils.data.DataLoader( train_dataset, 
                                                batch_size=128,
                                                shuffle=True,
                                                num_workers=2
                                              )
new_validation_loader = torch.utils.data.DataLoader( validate_dataset, 
                                                     batch_size=128,
                                                     shuffle=True,
                                                     num_workers=2
                                                   )

In [None]:
new_validation_loader

In [31]:
from torch import optim
from torch.nn import CrossEntropyLoss, NLLLoss
from torch.utils.data import Subset, DataLoader, random_split
import time

model = TinyNeuralNetwork().to(dtype=torch.float32)
optimizer = optim.AdamW(model.parameters(), lr = 3e-4)
loss_function = nn.CrossEntropyLoss()
# loss_function = NLLLoss()

                    # *** Train experiment ***

def exp1_train_the_model(model, train_loader, validation_loader, loss_function, batches = int):
    train_loss = []
    validation_loss = []
    validation_accuracy = []

    for b in range(batches):
        b_train_loss = []
        b_validation_loss = []
        b_validation_accuracy = []
        timer = time.time()

        model.train(True)
        for x, y in train_loader:
            training_loss = loss_function(
                                            model(x), 
                                            y
                                         )
            training_loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            b_train_loss.append(training_loss.item())
        
        model.train(False)
        with torch.no_grad():
            for x, y in validation_loader:
                test_prediction = model(x)
                testing_loss = loss_function(
                                            test_prediction, 
                                            y
                                         )
                b_validation_loss.append(testing_loss)
                test_prediction_max = test_prediction.max(1)[1].data
                b_validation_accuracy.append((test_prediction_max == y).to(torch.float32).mean().item())
        print(f'Batch {b + 1} of {batches} took {time.time() - timer:.3f}s')

        train_loss.append(np.mean(b_train_loss))
        validation_loss.append(np.mean(b_validation_loss))
        validation_accuracy.append(np.mean(b_validation_accuracy))
        
        print(f"\t  training loss: {train_loss[-1]:.6f}")
        print(f"\tvalidation loss: {validation_loss[-1]:.6f}")
        print(f"\tvalidation accuracy: {validation_accuracy[-1]:.3f}")

    return train_loss, validation_loss, validation_accuracy

                    # *** Plot results of Experiment ***
def plot_exp1(train_loss, validation_loss, validation_accuracy):
    fig, axes = plt.subplots(1,2, figsize=(10,6))

    axes[0].set_title('Loss')
    axes[0].plot(train_loss, label='Train Loss')
    axes[0].plot(validation_loss, label='Validation Loss')
    axes[0].legend()

    axes[1].set_title('Validation Accuracy')
    axes[1].plot(validation_accuracy)
    axes[1].legend()

batch_size = 40
train_loss, validation_loss, validation_accuracy = exp1_train_the_model(
                                                                        model,
                                                                        new_train_loader,
                                                                        new_validation_loader,
                                                                        loss_function,
                                                                        batch_size             
                                                                        ) 
                                                            
# Your experiments, training and validation loops here

Batch 1 of 40 took 19.462s
	  training loss: 1.262543
	validation loss: 1.101728
	validation accuracy: 0.765
Batch 2 of 40 took 19.662s
	  training loss: 1.053084
	validation loss: 1.023764
	validation accuracy: 0.836
Batch 3 of 40 took 19.614s
	  training loss: 1.012431
	validation loss: 0.995610
	validation accuracy: 0.857
Batch 4 of 40 took 19.645s
	  training loss: 0.993548
	validation loss: 0.997557
	validation accuracy: 0.855
Batch 5 of 40 took 18.906s
	  training loss: 0.981815
	validation loss: 0.990174
	validation accuracy: 0.858
Batch 6 of 40 took 20.779s
	  training loss: 0.974015
	validation loss: 0.975938
	validation accuracy: 0.870
Batch 7 of 40 took 23.943s
	  training loss: 0.964891
	validation loss: 0.967490
	validation accuracy: 0.872
Batch 8 of 40 took 24.363s
	  training loss: 0.958195
	validation loss: 0.968061
	validation accuracy: 0.872
Batch 9 of 40 took 16.517s
	  training loss: 0.955048
	validation loss: 0.964873
	validation accuracy: 0.877
Batch 10 of 40 took

In [None]:
plot_exp1(train_loss, validation_loss, validation_accuracy)

### Task 2: Overfit it.
Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).

*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations.

In [None]:
class OverfittingNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape, 2000),
            nn.Tanh(),
            nn.Linear(2000, 1000),
            nn.Tanh(),
            nn.Linear(1000, 500),
            nn.Tanh(),
            nn.Linear(500, 300),
            nn.Tanh(),
            nn.Linear(300, 100),
            nn.Tanh(),
            nn.Linear(100, num_classes)
        )
        
    def forward(self, inp):   
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(OverfittingNeuralNetwork().to(device), (28*28,))

In [None]:
overfit_model = OverfittingNeuralNetwork().to(device)
overfit_optimizer = torch.optim.Adam(model.parameters(), lr = 3e-4)    # YOUR CODE HERE
overfit_loss_function = NLLLoss()                                       # YOUR CODE HERE

# Your experiments, come here
batch_size = 80
overfit_train_loss, overfit_validation_loss, overfit_validation_accuracy=exp1_train_the_model(
                                                                    overfit_model,
                                                                    new_train_loader,
                                                                    new_validation_loader,
                                                                    overfit_loss_function,
                                                                    batch_size
                                                                       )
plot_exp1(overfit_train_loss, overfit_validation_loss, overfit_validation_accuracy)

### Task 3: Fix it.
Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results. 

In [None]:
class FixedNeuralNetwork(nn.Module):
    """
            *** How to fix the pb of overfitting ***
            === === === === === === === === === === ===
        >>> Use less layers
        >>> Use a better optimizer
        >>> Use DropOut, BatchNorm
        >>> Predict on Data the model has never seen --> Cross_validation 
    """
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), 
            nn.Linear(input_shape, 2000),
            nn.Tanh(),
            nn.Dropout(p = .35, inplace= False),
            nn.Linear(2000, 1000),
            nn.Tanh(),
            nn.Linear(1000, 500),
            nn.BatchNorm1d(500, eps=1e-5),
            nn.Tanh(),
            nn.Dropout(p=.23,inplace=False),
            nn.Linear(500, 300),
            nn.Tanh(),
            nn.Linear(300, 100),
            nn.Tanh(),
            nn.BatchNorm1d(100, eps=1e-5),
            nn.Linear(100, num_classes)
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [None]:
torchsummary.summary(FixedNeuralNetwork().to(device), (28*28,))

In [None]:
fixed_model = FixedNeuralNetwork().to(device)
fixed_optimizer = optim.AdamW(fixed_model.parameters(),lr=3e-4)# YOUR CODE HERE
fixed_loss_function = nn.CrossEntropyLoss() # YOUR CODE HERE

# Your experiments, come here
batch_size = 40
fixed_train_loss_1, fixed_validation_loss_1, fixed_validation_accuracy_1 = exp1_train_the_model(
                                                                                                    fixed_model,
                                                                                                    train_loader,
                                                                                                    test_loader,
                                                                                                    fixed_loss_function,
                                                                                                    batch_size
                                                                                                )
fixed_train_loss_2, fixed_validation_loss_2, fixed_validation_accuracy_2 = exp1_train_the_model(
                                                                                                    fixed_model,
                                                                                                    new_train_loader,
                                                                                                    new_validation_loader,
                                                                                                    fixed_loss_function,
                                                                                                    batch_size
                                                                                                )
plot_exp1(overfit_train_loss, overfit_validation_loss, overfit_validation_accuracy)
plot_exp1(fixed_train_loss_1, fixed_validation_loss_1, fixed_validation_accuracy_1)
plot_exp1(fixed_train_loss_2, fixed_validation_loss_2, fixed_validation_accuracy_2)

### Conclusions:
_Write down small report with your conclusions and your ideas._

In [None]:
"""  
    In conclusion, Here is what i've learnt from this dataset and about this intro to Deep learning;
    >>> Using one's computing power with CPU means the PC is going to spend alot more time than others using GPU.
    >>> We can use a dataloader the same way as we use the basic dataset. Just call some functions on the dataset.
    >>> It is easy to overfit a dataset without knowing, either by --> Training the model for too long
                                                                   --> Using too many hidden layers. It's wise to know that simple is usually best
                                                                   --> Generating too many unnnecessary features within the many layers
                                                                   --> Leaking data by using data the model has seen before
    >>> To avoid the possibility of overfitting we can --> Split the data
                                                       --> normalize the data every other time
                                                       --> use a good optimizer
                                                       --> Use less layers an uncomplicated model
                                                       --> During the training we should zero out some neurons of our network to reduce corelation
                                                       --> Do cross validation or in the least predict on data the model hasn't seen before
                                                       --> Do fewer iterations during training so the model doesn't have enough time to memorize the data.
"""