# Lab. 2 Multi Layered Networks

### Ładowanie danych

PyTroch, a właściwie pakiet `torchvision` udostępnia parę przydatnych rzeczy, z których skorzystamy na dzisiejszych zajęciach.

Zacznijmy od ściąganie i ładowania danych, w [`torchvision.datasets`](https://pytorch.org/docs/stable/torchvision/datasets.html) znajdziemy popularne datasety, zajmiemy się dzisiaj MNISTem.

In [1]:
import torch
from torchvision.datasets import MNIST
import torchvision

train_data = MNIST(root='.',
                   download=True,
                   transform=torchvision.transforms.ToTensor(),
                   train=True)
test_data = MNIST(root='.',
                  download=True,
                  transform=torchvision.transforms.ToTensor(),
                  train=False)
train_data

Dataset MNIST
    Number of datapoints: 60000
    Split: train
    Root Location: .
    Transforms (if any): ToTensor()
    Target Transforms (if any): None

Oprócz tego z samego `torcha` możemy skorzystać z [`DataLoadera`](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), który załatwia za nas sporo przydatnych rzeczy typu shufflowanie i batchowanie danych.

In [2]:
from torch.utils.data import DataLoader
train_loader = torch.utils.data.DataLoader(train_data, batch_size=10)
mean = 0
std = 0
for x, _ in train_loader:
    batch_size = x.size(0)
    batch_samples = x.view(batch_size, x.size(1), -1) 
    mean += batch_samples.mean(2).sum(0)
    std += batch_samples.std(2).sum(0)
    
samples_num = len(train_loader.dataset)
mean /= samples_num  
std /= samples_num 
print(mean)
print(std)


tensor([0.1307])
tensor([0.3015])


In [3]:
train_data = MNIST(root='.',
                   download=True,
                   transform = torchvision.transforms.Compose([
                                   torchvision.transforms.ToTensor(),
                                   torchvision.transforms.Normalize([mean], [std], inplace=False),
                                   torchvision.transforms.Lambda(lambda x: x.flatten()),
                               ]),
                   train=True)
test_data = MNIST(root='.',
                  download=True,
                   transform = torchvision.transforms.Compose([
                                   torchvision.transforms.ToTensor(),
                                   torchvision.transforms.Normalize([mean], [std], inplace=False),
                                   torchvision.transforms.Lambda(lambda x: x.flatten()),
                   ]),
                  train=False)

In [4]:
train_loader = torch.utils.data.DataLoader(train_data, batch_size=10)
for x, _ in train_loader:
    print(x.shape)
    break


torch.Size([10, 784])


Wygląda na to, że aż tak bardzo za darmo wszystkiego nie dostaniemy, klasa `MNIST` zwraca nam dane w postaci obiektów [PILa](https://pillow.readthedocs.io/en/stable/). Musimy coś z tym zrobić.

## Zadanie 1.
1. Za pomocą [`transformerów`](https://pytorch.org/docs/stable/torchvision/transforms.html) przerobić powyższy kod tak aby zadziałał.  
**HINT**: sprawdzić jakie argumenty przyjmuje klasa `MNIST`.
2. Policzyć średnią i odchylenie standardowe wartości pojedynczego piksela dla całego zbioru trenującego i użyć ich do znormalizowania danych trenujących.  
**HINT**: Tutaj torchvision też powinien nam to ułatwić.
3. Zmienić "kształt" jednego przykładu z `28x28` na `784`.  
**HINT**: [`Lambda`](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.Lambda)

Uwaga: zwrócić uwagę co dokładnie robią używane _transformery_!

## Zadanie 2.

Ręcznie zaimplementować prostą sieć z jedną warstwą ukrtyą. Sieć ma mieć:
1. Jedną warstwę ukrytą rozmiaru 500 z wagami zainicjalizowanymi ze standardowego rozkładu normalnego.
2. Warstwa przy obu operacjach ma mieć uczone _biasy_ zainicjalizowane na 0.

**HINT**: Do rozkładu normalnego najlepiej użyć [`torch.randn`](https://pytorch.org/docs/stable/torch.html#torch.randn). Sprawdzić jakie ważne argumenty ta funkcja przyjmuje!

Należy oprócz tego zaimplementować pętlę uczenia z użyciem PyTorchowej funkcji kosztu _cross entropy_ i optymalizatora SGD.

In [5]:
from typing import List

class CustomNetwork(object):
    """
    Simple 1-hidden layer linear neural network
    """
    def __init__(self, input_size, hidden_size, output_size):
        """
        Initialize network's weights 
        """
        
        self.weight_1: torch.Tensor = torch.randn((input_size, hidden_size), requires_grad=True)
        self.bias_1: torch.Tensor = torch.zeros((1, hidden_size), requires_grad=True) 
        
        self.weight_2: torch.Tensor = torch.randn((hidden_size, output_size), requires_grad=True) 
        self.bias_2: torch.Tensor = torch.zeros((1, output_size), requires_grad=True) 
        
    def __call__(self, x: torch.Tensor) -> torch.Tensor:
        """
        Forward pass through the network
        """
        output_1 = torch.mm(x, self.weight_1) + self.bias_1 
        output_2 = torch.mm(output_1, self.weight_2) + self.bias_2 
        
        return output_2 
    
    def parameters(self) -> List[torch.Tensor]:
        """
        Returns all trainable parameters 
        """
        return [self.weight_1, self.bias_1, self.weight_2, self.bias_2]

In [7]:
from torch.optim import SGD
from torch.nn.functional import cross_entropy

# some hyperparams
batch_size: int = 64
epoch: int = 3
lr: float = 0.01
momentum: float = 0.9

# prepare data loaders, base don the already loaded datasets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

# initialize the model
model: CustomNetwork = CustomNetwork(input_size=784,
                                     hidden_size=500,
                                     output_size=10) 

# initialize the optimizer
optimizer: torch.optim.Optimizer = SGD(params=model.parameters(),
                                       lr=lr,
                                       momentum=momentum) 

# training loop
for e in range(epoch):
    for i, (x, y) in enumerate(train_loader):
        
        # reset the gradients from previouis iteration
        optimizer.zero_grad()
        # pass through the network
        output: torch.Tensor = model(x) 
        # calculate loss
        loss: torch.Tensor = torch.nn.CrossEntropyLoss()(output, y) 
        # backward pass thorught the network
        loss.backward()
        # apply the gradients
        optimizer.step()
        # log the loss value
        if (i + 1) % 100 == 0:
            print(f"Epoch {e} iter {i+1}/{len(train_data) // batch_size} loss: {loss.item()}", end="\r")
            
    # at the end of an epoch run evaluation on the test set
    with torch.no_grad():
        # initialize the number of correct predictions
        correct: int = 0 
        for i, (x, y) in enumerate(test_loader):
            # pass through the network
            output: torch.Tensor = model(x)
            # update the number of correctly predicted examples
            pred = output.max(1)[1]
            correct += int(torch.sum(pred == y))

        print(f"\nTest accuracy: {correct / len(test_data)}")

        
# this is your test
assert correct / len(test_data) > 0.8, "Subject to random seed you should be able to get >80% accuracy"

Epoch 0 iter 900/937 loss: 42.063056945800786
Test accuracy: 0.842
Epoch 1 iter 900/937 loss: 30.352207183837894
Test accuracy: 0.8372
Epoch 2 iter 900/937 loss: 15.188524246215824
Test accuracy: 0.8439


## Zadanie 3.

1. Przepisać całą sieć do PyTorcha używając [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module), [`torch.nn.Linear`](https://pytorch.org/docs/stable/nn.html#torch.nn.Linear).
2. Dodać [nieliniowe aktywacje](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) i dodatkową warstwę, tak aby wyciągnąć przynajmniej 95% testowego accuracy w 3 epoki.

In [12]:
from torch import nn
class TorchNetwork(torch.nn.Module):
    """
    Simple 2-hidden layer non-linear neural network
    """
    def __init__(self, input_size,
                 hidden_size_1,
                 hidden_size_2,
                 output_size):
        super(TorchNetwork, self).__init__()
        self.linear_layer_1 = nn.Linear(input_size, hidden_size_1)
        self.activation_1 = nn.LeakyReLU(0.1)
        self.linear_layer_2 = nn.Linear(hidden_size_1, hidden_size_2)
        self.activation_2 = nn.LeakyReLU(0.1)
        self.linear_layer_3 = nn.Linear(hidden_size_2, output_size)
        
    def forward(self, x):
        l_1 = self.linear_layer_1(x)
        a_1 = self.activation_1(l_1)
        l_2 = self.linear_layer_2(a_1)
        a_2 = self.activation_2(l_2)
        return a_2
        
        

In [13]:
from torch.optim import SGD
from torch.nn.functional import cross_entropy

# some hyperparams
batch_size: int = 64
epoch: int = 3
lr: float = 0.01
momentum: float = 0.9

# prepare data loaders, base don the already loaded datasets
train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size)

# initialize the model
model: TorchNetwork = TorchNetwork(input_size=784,
                                   hidden_size_1=500,  
                                   hidden_size_2=100,
                                   output_size=10
                                   )  

# initialize the optimizer
optimizer: torch.optim.Optimizer = SGD(params=model.parameters(),
                                       lr=lr,
                                       momentum=momentum) 


# training loop
for e in range(epoch):
    for i, (x, y) in enumerate(train_loader):
        
        # reset the gradients from previouis iteration
        optimizer.zero_grad()
        # pass through the network
        output: torch.Tensor = model(x) 
        # calculate loss
        loss: torch.Tensor = torch.nn.CrossEntropyLoss()(output, y) 
        # backward pass thorught the network
        loss.backward()
        # apply the gradients
        optimizer.step()
        # log the loss value
        if (i + 1) % 100 == 0:
            print(f"Epoch {e} iter {i+1}/{len(train_data) // batch_size} loss: {loss.item()}", end="\r")
            
    # at the end of an epoch run evaluation on the test set
    with torch.no_grad():
        # initialize the number of correct predictions
        correct: int = 0 
        for i, (x, y) in enumerate(test_loader):
            # pass through the network
            output: torch.Tensor = model(x)
            # update the number of correctly predicted examples
            pred = output.max(1)[1]
            correct += int(torch.sum(pred == y))

        print(f"\nTest accuracy: {correct / len(test_data)}")
            
            
# this is your test       
assert correct / len(test_data) > 0.95, "Subject to random seed you should be able to get >95% accuracy"

Epoch 0 iter 900/937 loss: 0.06492029130458832
Test accuracy: 0.9402
Epoch 1 iter 900/937 loss: 0.03403802216053009
Test accuracy: 0.9617
Epoch 2 iter 900/937 loss: 0.024125441908836365
Test accuracy: 0.9692
