<a href="https://colab.research.google.com/github/asia281/dnn2022/blob/main/Asia_of_DNN_Lab_5_Batchrnorm_and_Convnets_student_version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Program Operacyjny Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego 
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej" 
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

Code based on https://github.com/pytorch/examples/blob/master/mnist/main.py

This exercise covers two aspects:
* In tasks 1-6 you will implement mechanisms that allow training deeper models (better initialization, batch normalization). Note that for dropout and batch norm you are expected to implement it yourself without relying on ready-made components from Pytorch.
* In task 7 you will implement a convnet using [conv2d](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).


Tasks:
1. Check that the given implementation reaches 95% test accuracy for
   architecture input-64-64-10 in a few thousand batches.
2. Improve initialization and check that the network learns much faster
   and reaches over 97% test accuracy. A good basic initialization scheme is so-called Glorot initialization. For a set of weights going from a layer with $n_{in}$ neurons to a layer with $n_{out}$ neurons, it samples each weight from normal distribution with $0$ mean and standard deviation of $\sqrt{\frac{2}{n_{in}+n_{out}}}$.
3. Check, that with proper initialization we can train architecture
   input-64-64-64-64-64-10, while with bad initialization it does
   not even get off the ground.
4. Add dropout implemented in pytorch
5. Check that with 10 hidden layers (64 units each) even with proper
    initialization the network has a hard time to start learning.
6. Implement batch normalization (use train mode also for testing - it should perform well enough):
    * compute batch mean and variance
    * add new variables beta and gamma
    * check that the networks learns much faster for 5 layers
    * check that the network learns even for 10 hidden layers.
7. So far we worked with a fully connected network. Design and implement in pytorch (by using pytorch functions) a simple convolutional network and achieve 99% test accuracy. The architecture is up to you, but even a few convolutional layers should be enough.

Stride = krok and pull 

In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.nn.parameter import Parameter
from torch.nn import init
import torchvision
import torchvision.transforms as transforms

In [12]:
class Linear(torch.nn.Module):
    def __init__(self, in_features, out_features):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        self.bias = Parameter(torch.Tensor(out_features))
        self.reset_parameters()

    def reset_parameters(self):
        self.weight.data.normal_(mean=0,std=0.25)
        init.zeros_(self.bias)

    def forward(self, x):
        r = x.matmul(self.weight.t())
        r += self.bias
        return r

from typing import List


class Net(nn.Module):
    def __init__(self, sizes: List, linear = Linear):
        super(Net, self).__init__()
        self.fc = nn.ModuleList([linear(in_features=in_f, out_features=out_f) for in_f, out_f in zip(sizes, sizes[1:])])

    def forward(self, x):
        x = x.view(-1, 28 * 28)
        for layer in self.fc[:-1]:
          x = F.relu(layer(x))
        x = self.fc[-1](x)
        return x


For a set of weights going from a layer with $n_{in}$ neurons to a layer with $n_{out}$ neurons, it samples each weight from normal distribution with $0$ mean and standard deviation of $\sqrt{\frac{2}{n_{in}+n_{out}}}$.

In [13]:
# torch.nn.init.xavier_uniform(self.weight)
import math

class LinearGlorot(Linear):
    def __init__(self, **kwargs):
      super().__init__(**kwargs)

    def reset_parameters(self):
        init.xavier_normal_(self.weight)
        init.zeros_(self.bias)


In [14]:
class MnistTrainer(object):
    def __init__(self, batch_size):
        transform = transforms.Compose(
                [transforms.ToTensor()])
        self.trainset = torchvision.datasets.MNIST(
            root='./data',
            download=True,
            train=True,
            transform=transform)
        self.trainloader = torch.utils.data.DataLoader(
            self.trainset, batch_size=batch_size, shuffle=True, num_workers=2)

        self.testset = torchvision.datasets.MNIST(
            root='./data',
            train=False,
            download=True, transform=transform)
        self.testloader = torch.utils.data.DataLoader(
            self.testset, batch_size=1, shuffle=False, num_workers=2)

    def train(self, net = Net([784, 64, 64, 10])):
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.SGD(net.parameters(), lr=0.05, momentum=0.9)

        for epoch in range(20):
            running_loss = 0.0
            for i, data in enumerate(self.trainloader, 0):
                inputs, labels = data
                optimizer.zero_grad()

                outputs = net(inputs)
                loss = criterion(outputs, labels)
                loss.backward()
                optimizer.step()

                running_loss += loss.item()
                if i % 100 == 99:
                    print('[%d, %5d] loss: %.3f' %
                          (epoch + 1, i + 1, running_loss / 100))
                    running_loss = 0.0
            correct = 0
            total = 0
            with torch.no_grad():
                for data in self.testloader:
                    images, labels = data
                    outputs = net(images)
                    _, predicted = torch.max(outputs.data, 1)
                    total += labels.size(0)
                    correct += (predicted == labels).sum().item()

            print('Accuracy of the network on the {} test images: {} %'.format(
                total, 100 * correct / total))

In [16]:
trainer = MnistTrainer(batch_size=128)
trainer.train()

[1,   100] loss: 0.809
[1,   200] loss: 0.336
[1,   300] loss: 0.277
[1,   400] loss: 0.246
Accuracy of the network on the 10000 test images: 93.8 %
[2,   100] loss: 0.182
[2,   200] loss: 0.185
[2,   300] loss: 0.175
[2,   400] loss: 0.164
Accuracy of the network on the 10000 test images: 94.94 %
[3,   100] loss: 0.138
[3,   200] loss: 0.148
[3,   300] loss: 0.130
[3,   400] loss: 0.130
Accuracy of the network on the 10000 test images: 95.76 %
[4,   100] loss: 0.108
[4,   200] loss: 0.115
[4,   300] loss: 0.109
[4,   400] loss: 0.118
Accuracy of the network on the 10000 test images: 95.69 %
[5,   100] loss: 0.089
[5,   200] loss: 0.092
[5,   300] loss: 0.098
[5,   400] loss: 0.107
Accuracy of the network on the 10000 test images: 96.05 %
[6,   100] loss: 0.076
[6,   200] loss: 0.082
[6,   300] loss: 0.085
[6,   400] loss: 0.086
Accuracy of the network on the 10000 test images: 96.25 %
[7,   100] loss: 0.074
[7,   200] loss: 0.076
[7,   300] loss: 0.067
[7,   400] loss: 0.075
Accuracy 

In [15]:
trainer = MnistTrainer(batch_size=128)
trainer.train(Net([784, 64, 64, 10], LinearGlorot))

[1,   100] loss: 0.746
[1,   200] loss: 0.287
[1,   300] loss: 0.225
[1,   400] loss: 0.186
Accuracy of the network on the 10000 test images: 95.69 %
[2,   100] loss: 0.160
[2,   200] loss: 0.145
[2,   300] loss: 0.125
[2,   400] loss: 0.123
Accuracy of the network on the 10000 test images: 96.3 %
[3,   100] loss: 0.098
[3,   200] loss: 0.103
[3,   300] loss: 0.093
[3,   400] loss: 0.109
Accuracy of the network on the 10000 test images: 96.98 %
[4,   100] loss: 0.076
[4,   200] loss: 0.077
[4,   300] loss: 0.077
[4,   400] loss: 0.079
Accuracy of the network on the 10000 test images: 96.69 %
[5,   100] loss: 0.064
[5,   200] loss: 0.064
[5,   300] loss: 0.075
[5,   400] loss: 0.071
Accuracy of the network on the 10000 test images: 96.96 %
[6,   100] loss: 0.049
[6,   200] loss: 0.058
[6,   300] loss: 0.062
[6,   400] loss: 0.059
Accuracy of the network on the 10000 test images: 97.25 %
[7,   100] loss: 0.047
[7,   200] loss: 0.050
[7,   300] loss: 0.053
[7,   400] loss: 0.056
Accuracy 

3. Check, that with proper initialization we can train architecture
   input-64-64-64-64-64-10, while with bad initialization it does
   not even get off the ground.

In [17]:
trainer = MnistTrainer(batch_size=128)
trainer.train(Net([784, 64, 64, 64, 64, 64, 10]))

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>Traceback (most recent call last):

  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
Traceback (most recent call last):
      File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
self._shutdown_workers()
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
        if w.is_alive():if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive

    assert self._parent_pid == os.getpid(), 'can only test a child process'
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in

[1,   100] loss: 2.363
[1,   200] loss: 0.791
[1,   300] loss: 0.555
[1,   400] loss: 0.441


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process


Accuracy of the network on the 10000 test images: 88.93 %


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
Exception ignored in:   File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
<function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
    Traceback (most recent call last):
self._shutdown_workers()  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__

    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child

[2,   100] loss: 0.360
[2,   200] loss: 0.339
[2,   300] loss: 0.317
[2,   400] loss: 0.282


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/pytho

Accuracy of the network on the 10000 test images: 91.61 %


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
Exception ignored in:     if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>assert self._parent_pid == os.getpid(), 'can only test a child process'
Traceback (most recent call last):

AssertionError  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
:     can only test a child process
self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/pytho

[3,   100] loss: 0.260
[3,   200] loss: 0.261
[3,   300] loss: 0.238
[3,   400] loss: 0.235


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/pytho

Accuracy of the network on the 10000 test images: 93.11 %


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>Traceback (most recent call last):

  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
Traceback (most recent call last):
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
        if w.is_alive():self._shutdown_workers()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive

      File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
assert self._parent_pid == os.getpid(), 'can only test a child process'
    if w.is_alive():AssertionError
:   File "/usr/lib/python3.7/multiprocessing/process.p

[4,   100] loss: 0.223
[4,   200] loss: 0.201
[4,   300] loss: 0.212
[4,   400] loss: 0.206


Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7ff8d7836170>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1510, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1493, in _shutdown_workers
    if w.is_alive():
  File "/usr/lib/pytho

KeyboardInterrupt: ignored


4. Add dropout implemented in pytorch



In [None]:
trainer = MnistTrainer(batch_size=128)
trainer.train(Net([784, 64, 64, 64, 64, 64, 10], LinearGlorot))

5. Check that with 10 hidden layers (64 units each) even with proper
    initialization the network has a hard time to start learning.


In [None]:
trainer = MnistTrainer(batch_size=128)
trainer.train(Net([784, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 10]))

In [None]:
trainer = MnistTrainer(batch_size=128)
trainer.train(Net([784, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 10], LinearGlorot))

6. Implement batch normalization (use train mode also for testing - it should perform well enough):
compute batch mean and variance
add new variables beta and gamma
check that the networks learns much faster for 5 layers
check that the network learns even for 10 hidden layers.

In [None]:
def batch_normalization(x, gamma, beta, moving_mean, moving_var, eps=1e-5, momentum=0.9):
    if not torch.is_grad_enabled():
        x_norm = (x - moving_mean) / torch.sqrt(moving_var + eps)
    else:
        mean = x.mean(dim=0, keepdims=True)
        var = ((x - mean) ** 2).mean(dim=0, keepdims=True)
        x_norm = (x - mean) / torch.sqrt(var + eps)
        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
        moving_var = momentum * moving_var + (1.0 - momentum) * var
    x = gamma * x_norm + beta
    return x, moving_mean.data, moving_var.data


7. So far we worked with a fully connected network. Design and implement in pytorch (by using pytorch functions) a simple convolutional network and achieve 99% test accuracy. The architecture is up to you, but even a few convolutional layers should be enough.

In [18]:
class ConvNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 16, 5, 1, 2)
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.conv2 = nn.Conv2d(16, 32, 5, 1, 2)
        self.fc1 = nn.Linear(32 * 7 *7, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        return x

In [None]:
trainer = MnistTrainer(batch_size=128)
trainer.train(ConvNet())

[1,   100] loss: 0.748
[1,   200] loss: 0.137
[1,   300] loss: 0.099
[1,   400] loss: 0.082
Accuracy of the network on the 10000 test images: 98.15 %
[2,   100] loss: 0.062
[2,   200] loss: 0.057
[2,   300] loss: 0.056
[2,   400] loss: 0.056
Accuracy of the network on the 10000 test images: 98.59 %
[3,   100] loss: 0.043
[3,   200] loss: 0.042
[3,   300] loss: 0.041
[3,   400] loss: 0.048
Accuracy of the network on the 10000 test images: 98.71 %
[4,   100] loss: 0.036
[4,   200] loss: 0.031
[4,   300] loss: 0.033
[4,   400] loss: 0.033
Accuracy of the network on the 10000 test images: 98.87 %
[5,   100] loss: 0.025
[5,   200] loss: 0.029
[5,   300] loss: 0.027
[5,   400] loss: 0.034
Accuracy of the network on the 10000 test images: 98.82 %
[6,   100] loss: 0.020
[6,   200] loss: 0.024
[6,   300] loss: 0.024
[6,   400] loss: 0.025
Accuracy of the network on the 10000 test images: 98.92 %
[7,   100] loss: 0.020
[7,   200] loss: 0.021
[7,   300] loss: 0.022
[7,   400] loss: 0.023
Accuracy