## Lab 3
### Part 2: Dealing with overfitting

Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).

Your goal for today:
1. Train a FC (fully-connected) network that achieves >= 0.885 test accuracy.
2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).
3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.

__Please, write a small report describing your ideas, tries and achieved results in the end of this file.__

*Note*: Tasks 2 and 3 are interrelated, in task 3 your goal is to make the network from task 2 less prone to overfitting. Task 1 is independent from 2 and 3.

*Note 2*: We recomment to use Google Colab or other machine with GPU acceleration.

In [1]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import torchsummary
from IPython.display import clear_output
from matplotlib import pyplot as plt
from matplotlib.pyplot import figure
import numpy as np
import os
import time


device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

In [2]:
# Technical function
def mkdir(path):
    if not os.path.exists(root_path):
        os.mkdir(root_path)
        print('Directory', path, 'is created!')
    else:
        print('Directory', path, 'already exists!')
        
root_path = 'fmnist'
mkdir(root_path)

Directory fmnist is created!


In [3]:
download = True
train_transform = transforms.ToTensor()
test_transform = transforms.ToTensor()
transforms.Compose((transforms.ToTensor()))


fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path, 
                                                        train=True, 
                                                        transform=train_transform,
                                                        target_transform=None,
                                                        download=download)
fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path, 
                                                       train=False, 
                                                       transform=test_transform,
                                                       target_transform=None,
                                                       download=download)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw/train-images-idx3-ubyte.gz


  0%|          | 0/26421880 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/train-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw/train-labels-idx1-ubyte.gz


  0%|          | 0/29515 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/train-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


  0%|          | 0/4422102 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to fmnist/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


  0%|          | 0/5148 [00:00<?, ?it/s]

Extracting fmnist/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to fmnist/FashionMNIST/raw



In [4]:
fmnist_dataset_train

Dataset FashionMNIST
    Number of datapoints: 60000
    Root location: fmnist
    Split: Train
    StandardTransform
Transform: ToTensor()

In [5]:
train_loader = torch.utils.data.DataLoader(fmnist_dataset_train, 
                                           batch_size=128,
                                           shuffle=True,
                                           num_workers=2)
test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,
                                          batch_size=256,
                                          shuffle=False,
                                          num_workers=2)

In [6]:
len(fmnist_dataset_test)

10000

In [7]:
len(fmnist_dataset_train)

60000

In [8]:
for img, label in train_loader:
    print(img.shape)
#     print(img)
    print(label.shape)
    print(label.size(0))
    break

torch.Size([128, 1, 28, 28])
torch.Size([128])
128


### Task 1
Train a network that achieves $\geq 0.885$ test accuracy. It's fine to use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.

In [9]:
class TinyNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(),
            nn.Linear(input_shape, 200),
            nn.ReLU(),
            nn.Dropout(p=0.2),
            nn.BatchNorm1d(200),
            nn.Linear(200, num_classes)
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [10]:
torchsummary.summary(TinyNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 200]         157,000
              ReLU-3                  [-1, 200]               0
           Dropout-4                  [-1, 200]               0
       BatchNorm1d-5                  [-1, 200]             400
            Linear-6                   [-1, 10]           2,010
Total params: 159,410
Trainable params: 159,410
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.01
Params size (MB): 0.61
Estimated Total Size (MB): 0.62
----------------------------------------------------------------


Your experiments come here:

In [11]:
model = TinyNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_func = nn.CrossEntropyLoss()

In [12]:
def train_model(model, train_loader, val_loader, loss_fn, opt, n_epochs: int):
    train_loss = []
    val_loss = []
    val_accuracy = []
    
    for epoch in range(n_epochs):
        ep_train_loss = []
        ep_val_loss = []
        ep_val_accuracy = []
        start_time = time.time()

        model.train(True) 
        for X_batch, y_batch in train_loader:
      
            opt.zero_grad()
            X_batch = X_batch.to(device)
            y_batch = y_batch.to(device)
            prediction = model(X_batch)
            loss = loss_fn(prediction, y_batch)
            loss.backward()
            opt.step()
            ep_train_loss.append(loss.item())

        model.train(False) 
        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                # move data to target device
                X_batch = X_batch.to(device)
                y_batch = y_batch.to(device)
                
                # compute predictions
                prediction = model(X_batch)
                loss = loss_fn(prediction, y_batch)
                
                ep_val_loss.append(loss.item())
                y_pred = prediction.max(1)[1].data
                ep_val_accuracy.append(np.mean((y_batch.cpu() == y_pred.cpu()).numpy()))

        print(f'Epoch {epoch + 1} of {n_epochs} took {time.time() - start_time:.3f}s')
        train_loss.append(np.mean(ep_train_loss))
        val_loss.append(np.mean(ep_val_loss))
        val_accuracy.append(np.mean(ep_val_accuracy))
        
        print(f"\t  training loss: {train_loss[-1]:.6f}")
        print(f"\tvalidation loss: {val_loss[-1]:.6f}")
        print(f"\tvalidation accuracy: {val_accuracy[-1]:.3f}")

    return train_loss, val_loss, val_accuracy

In [13]:
n_epochs = 10
train_loss, val_loss, val_accuracy = train_model(model, train_loader, test_loader, loss_func, opt, n_epochs)

Epoch 1 of 10 took 7.223s
	  training loss: 0.498741
	validation loss: 0.443937
	validation accuracy: 0.841
Epoch 2 of 10 took 7.135s
	  training loss: 0.402002
	validation loss: 0.396638
	validation accuracy: 0.857
Epoch 3 of 10 took 7.016s
	  training loss: 0.378182
	validation loss: 0.387565
	validation accuracy: 0.859
Epoch 4 of 10 took 7.072s
	  training loss: 0.361606
	validation loss: 0.384788
	validation accuracy: 0.861
Epoch 5 of 10 took 7.215s
	  training loss: 0.347550
	validation loss: 0.373764
	validation accuracy: 0.864
Epoch 6 of 10 took 7.076s
	  training loss: 0.337274
	validation loss: 0.368358
	validation accuracy: 0.871
Epoch 7 of 10 took 7.976s
	  training loss: 0.332326
	validation loss: 0.352879
	validation accuracy: 0.877
Epoch 8 of 10 took 7.269s
	  training loss: 0.316282
	validation loss: 0.360788
	validation accuracy: 0.871
Epoch 9 of 10 took 7.192s
	  training loss: 0.314343
	validation loss: 0.364554
	validation accuracy: 0.876
Epoch 10 of 10 took 7.914s
	

### Task 2: Overfit it.
Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).

*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations.

In [15]:
class OverfittingNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape, 700),
            nn.ReLU(),
            nn.Linear(700, 500),
            nn.ReLU(),
            nn.Linear(500, 250),
            nn.ReLU(),
            nn.Linear(250, 100),
            nn.ReLU(),
            nn.Linear(100, 50),
            nn.ReLU(),
            nn.Linear(50, num_classes)
            
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [16]:
torchsummary.summary(OverfittingNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 700]         549,500
              ReLU-3                  [-1, 700]               0
            Linear-4                  [-1, 500]         350,500
              ReLU-5                  [-1, 500]               0
            Linear-6                  [-1, 250]         125,250
              ReLU-7                  [-1, 250]               0
            Linear-8                  [-1, 100]          25,100
              ReLU-9                  [-1, 100]               0
           Linear-10                   [-1, 50]           5,050
             ReLU-11                   [-1, 50]               0
           Linear-12                   [-1, 10]             510
Total params: 1,055,910
Trainable params: 1,055,910
Non-trainable params: 0
---------------------------

In [17]:
model = OverfittingNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_func = nn.CrossEntropyLoss()

In [18]:
n_epochs = 10

train_loss, val_loss, val_accuracy = train_model(model, train_loader, test_loader, loss_func, opt, n_epochs)

Epoch 1 of 10 took 8.014s
	  training loss: 0.651608
	validation loss: 0.466400
	validation accuracy: 0.834
Epoch 2 of 10 took 7.841s
	  training loss: 0.400132
	validation loss: 0.424571
	validation accuracy: 0.850
Epoch 3 of 10 took 7.770s
	  training loss: 0.351001
	validation loss: 0.370268
	validation accuracy: 0.870
Epoch 4 of 10 took 7.807s
	  training loss: 0.322371
	validation loss: 0.365073
	validation accuracy: 0.867
Epoch 5 of 10 took 8.636s
	  training loss: 0.304510
	validation loss: 0.358697
	validation accuracy: 0.871
Epoch 6 of 10 took 7.922s
	  training loss: 0.288845
	validation loss: 0.338555
	validation accuracy: 0.878
Epoch 7 of 10 took 7.931s
	  training loss: 0.269995
	validation loss: 0.340322
	validation accuracy: 0.880
Epoch 8 of 10 took 7.805s
	  training loss: 0.262552
	validation loss: 0.354388
	validation accuracy: 0.873
Epoch 9 of 10 took 7.871s
	  training loss: 0.253161
	validation loss: 0.338805
	validation accuracy: 0.879
Epoch 10 of 10 took 7.840s
	

### Task 3: Fix it.
Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results. 

In [19]:
class FixedNeuralNetwork(nn.Module):
    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):
        super(self.__class__, self).__init__()
        self.model = nn.Sequential(
            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards
            # Your network structure comes here
            nn.Linear(input_shape, 500),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.BatchNorm1d(500),
            nn.Linear(500, 200),
            nn.ReLU(),
            nn.BatchNorm1d(200),
            # nn.Dropout(p=0.5)
            nn.Linear(200, 50),
            nn.ReLU(),
            nn.Dropout(p=0.2),
            nn.BatchNorm1d(50),
            nn.Linear(50, num_classes)
        )
        
    def forward(self, inp):       
        out = self.model(inp)
        return out

In [20]:
torchsummary.summary(FixedNeuralNetwork().to(device), (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
           Flatten-1                  [-1, 784]               0
            Linear-2                  [-1, 500]         392,500
              ReLU-3                  [-1, 500]               0
           Dropout-4                  [-1, 500]               0
       BatchNorm1d-5                  [-1, 500]           1,000
            Linear-6                  [-1, 200]         100,200
              ReLU-7                  [-1, 200]               0
       BatchNorm1d-8                  [-1, 200]             400
            Linear-9                   [-1, 50]          10,050
             ReLU-10                   [-1, 50]               0
          Dropout-11                   [-1, 50]               0
      BatchNorm1d-12                   [-1, 50]             100
           Linear-13                   [-1, 10]             510
Total params: 504,760
Trainable params:

In [21]:
model = FixedNeuralNetwork().to(device)
opt = torch.optim.Adam(model.parameters(), lr=0.006)
loss_func = nn.CrossEntropyLoss()

In [23]:
n_epochs = 10

train_loss, val_loss, val_accuracy = train_model(model, train_loader, test_loader, loss_func, opt, n_epochs)

Epoch 1 of 10 took 8.195s
	  training loss: 0.418594
	validation loss: 0.385692
	validation accuracy: 0.862
Epoch 2 of 10 took 8.016s
	  training loss: 0.413440
	validation loss: 0.394834
	validation accuracy: 0.863
Epoch 3 of 10 took 7.919s
	  training loss: 0.410541
	validation loss: 0.407491
	validation accuracy: 0.871
Epoch 4 of 10 took 8.008s
	  training loss: 0.398367
	validation loss: 0.450292
	validation accuracy: 0.871
Epoch 5 of 10 took 8.727s
	  training loss: 0.392543
	validation loss: 0.384630
	validation accuracy: 0.873
Epoch 6 of 10 took 8.685s
	  training loss: 0.391552
	validation loss: 0.383599
	validation accuracy: 0.865
Epoch 7 of 10 took 9.464s
	  training loss: 0.392969
	validation loss: 0.401866
	validation accuracy: 0.865
Epoch 8 of 10 took 9.945s
	  training loss: 0.386345
	validation loss: 0.388401
	validation accuracy: 0.871
Epoch 9 of 10 took 10.056s
	  training loss: 0.385769
	validation loss: 0.367717
	validation accuracy: 0.869
Epoch 10 of 10 took 8.169s


### Conclusions:
_Write down small report with your conclusions and your ideas._

Первая модель оказалась лучше всего

Вторая модель переобучена.

Третья модель имеет 500к параметров (слишком много для 60к наблюдений), не хватает выборки, чтобы обучиться. + долго работает