# Introduction to PyTorch

This tutorial shows the basics of PyTorch library.

We design simple Neural Networks for classification task on MNIST dataset.

You're probably going to need PyTorch documentation:

https://pytorch.org/docs/stable/index.html

and tutorials:

https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html

In [None]:
!mkdir data

1. PyTorch is based on tensor operations. First' let's try using them:
- create simple python list with four values and convert it to PyTorch tensor
- create numpy array with random values and shape (1,3,7,7) and convert it to Pytorch tensor
- create PyTorch tensor with random values and shape (1,3,7,7) with preset seed
- create PyTorch tensor with linear space in range from -5 to 15 and reshape it to tensor with shape (1,3,7,7)
- create PyTorch tensor of zeros with shape (1,3,7,7)

For display use `print` function.

In [None]:
import torch
t = ...

2. PyTorch allow for applying GPU for computations.
Check is gpu (CUDA) is available, then use it as `device`, else use `'cpu'`. Then, move one of your tensors to selected device.

In [None]:
device = ...

3. To train a networks we need a dataset.

Please download `MINIST` dataset with `torchvision.dataset`.

For any kind of ML task, validation or testing is required.

So, create train and test datasets.

For train dataset apply also augmentation transforms, crop, translation and rotation.

For both apply ToTensor.

Next, pack datasets into `DataLoader`s with batch size of 64.
Use variables with names: `train_loader` and `test_loader`.

Next display sizes of datasets, shapes of elements and display few images and their labels.

Finally, compare the number of object in each class in both datasets.

In [None]:
import torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


...

train_loader = ...
test_loader = ...


938it [00:05, 179.55it/s]


5. We have our dataset ready, lets create model for classification task.

Please, define class `MLP` as Multi Layer Perceptron with two hidden fully connected layers with bias.

Class must inherits from `torch.nn.Module`.

Apply following configuration:

- first layer with 512 neurons,
- second layer with 512 neurons,
- output layer adjust to size of classification problem.

For `__init__` method add parameters: input_shape and output_size.

Don't forget about nonlinearities!

For hidden layers you can use `ReLU` module from `torch.nn`.

For output apply softmax function.

Network layer-by-layer processing define in `forward` method with argument as a network

input tensor - batch of images with shape (batch_size, channels, height, width).
(channels = 1, for gray scale images)

Instantiate model as `net` object.

Layers:
- `torch.nn.Sequential`
    - layer allows for forward pass through
        component layers:
        ```
        t_in: Tensor
        t_out: Tensor = t_in
        for L in layers:
            t_out = L(t_out)
        return t_out```
        
- `torch.nn.Flatten`
    - layer makes input tensor flattened:
    - (bs,CH,H,W) - > (bs,CH*H*W)
    
- `torch.nn.Linear(ch_in, ch_out, bias)`
    - 'classical' neural network layer - fully connected
    - ch_in is a number of input channels
    - ch_out is number of output channels / number of neurons in layer
    - bias - whether to use bias parameter
    - for Linear layers it is recommended to use flatten layer before,
            when input has more than 2 dimensions
    - operation implemented by this layer is a vector / matrix multiplication
        - `y = W x v` or `y = W x v + b`
        - W has a shape [ch_out, ch_in]
        - v has a shape [ch_in]
        - b has a shape [ch_in]
        - y has a shape [ch_out]

- `torch.nn.ReLU`
    - layer applies ReLU function on input tensor
    
- `torch.nn.Softmax(dim)`
    - layer applies softmax function on input tensor
    - `dim` - dimension over which function is calculated


For the formulas of activation function go to [torch documentation](https://pytorch.org/docs/stable/index.html)

In [None]:

class MLP(...):

    def __init__(self, ) -> None:
        super().__init__()

    def forward(sefl, x: torch.Tensor) -> torch.Tensor:
        pass


net = ...
net = net.to(device)

6. To train network we need to know 'how good or bad' results it gives.
Please, instantiate `torch.nn.CrossEntropyLoss` as `loss_fcn`.

7. To score network define accuracy metric.
For network output you need to decide what is the final network answer. For clasification we can assume, that the final answer is the class with highest probability (`argmax`).

`torch.no_grad()` prevents gradient requirement for computations inside method.

In [None]:
from abc import ABC, abstractmethod
from typing import Any


class BaseMetic(ABC):

    @abstractmethod
    def __call__(self, y_pred, y_ref) -> Any:
        raise NotImplementedError()


class AccuracyMetic(BaseMetic):

    def __init__(self) -> None:
        pass

    @torch.no_grad()
    def __call__(self, y_pred: torch.Tensor, y_ref: torch.Tensor) -> torch.Tensor:
        """
        :param y_pred: tensor of shape (batch_size, num_of_classes) type float
        :param y_ref: tensor with shape (batch_size,) and type Long
        :return: scalar tensor with accuracy metric for batch
        """
        # scalar value
        score: torch.Tensor = ...

        return score


metric = AccuracyMetic()

8. To change network parameters, we need optimizers object.
Instantiate `torch.optim.SGD` (with `net`work parameters) as `optimizer`.
Use learning rate = 0.001

In [None]:
optimizer = ...

9. Now define training / testing function:

In [None]:
from typing import Tuple
import tqdm


def train_or_test(model,
          data_generator,
          criterion,
          metric: BaseMetic,
          mode: str = 'test',
          optimizer: torch.optim.Optimizer = None,
          update_period: int = None,
          device = torch.device('cpu')) -> Tuple[torch.nn.Module, float, float]:

    # change model mode to train or test
    if mode == 'train':
        ...

    elif mode == 'test':
        ...

    else:
        raise RuntimeError("Unsupported mode.")

    # move model to device
    ...

    # reset model parameters' gradients with optimizer
    if mode == 'train':
        ...

    total_loss = 0.0
    total_accuracy = 0.0
    samples_num = 0

    for i, (X, y) in tqdm.tqdm(enumerate(data_generator)):
        # convert tensors to device
        ...

        # process by network
        y_pred = ...

        # calculate loss
        loss = ...

        # designate gradient based on loss
        ...

        if mode == 'train' and (i+1) % update_period == 0:
            # update parameters with optimizer
            ...

        # calculate accuracy
        accuracy = ...

        total_loss += loss.item() * y_pred.shape[0]
        total_accuracy += accuracy.item() * y_pred.shape[0]
        samples_num += y_pred.shape[0]

    if samples_num == 0:
        return model, 0.0, 0.0

    return model, total_loss / samples_num, total_accuracy / samples_num

10. Prepare training loop (over epochs) function:
- adjust max number of epochs to achieve satisfactory results.
- `**` additionally, implement stopping training when accuracy .

In [None]:
def training(model,
             train_loader,
             loss_fcn,
             metric,
             optimizer,
             update_period,
             epoch_max,
             device):
    loss_train = []
    loss_test = []
    acc_train = []
    acc_test = []

    for e in range(epoch_max):
      ...
    return model, {'loss_train': loss_train,
                   'acc_train': acc_train,
                   'loss_test': loss_test,
                   'acc_test': acc_test}

net, history = training(net,
                        train_loader,
                        loss_fcn,
                        metric,
                        optimizer,
                        update_period=5,
                        epoch_max=30,
                        device=device)

11. Display training history.

In [None]:
...

12. Save model and optimizer states to files.

Use method `state_dict` and function `torch.save`.

In [None]:
...

13. Create new network with the same architecture and initialize it with saved weights. Compare evaluations for both networks

`torch.load`, `load_state_dict`.

In [None]:
net2 = ...
...

14. Define your own model and train it.

Try achieve better results.

You can use different parameters, layers e.g.:
- conv2d
- maxpooling2d
- batch norm 2d
- and more...

Save weights to file.

In [None]:
...