<img align="center" style="max-width: 1000px" src="figures/banner.png">

<img align="right" style="max-width: 200px; height: auto" src="figures/hsg_logo.png">

##  Lab 03 - "Introduction to Pytorch"

Machine Learning, University of St. Gallen, Spring Term 2023

Objectives:

 - Become familiar with PyTorch
 - Understand the basics of automatic differentiation
 - Implement a simple linear model with `torch.nn`

## Why Use Deep Learning Libraries?

Numpy is a powerful library for numerical computing and can be used to build and train neural networks from scratch. However, there are two significant limitations to using pure Numpy for deep learning:

1. Numpy does not provide built-in support for GPU acceleration.
2. Numpy does not offer automatic differentiation out of the box.

As a result, using Numpy for deep learning is not always the most efficient or practical option. Instead, frameworks like PyTorch have been designed to address these limitations and provide developers with powerful tools for building, training, and deploying deep neural networks.

##  Which Library to Use?

Some of the major libraries that we can use for deep learning are: PyTorch, Jax, TensorFlow and Chainer.

<img align="center" style="max-width: 800px" src="figures/dl_libraries.png">

Below we compare several features in these libraries:

| Feature        | PyTorch           | JAX               | TensorFlow 2.0    | Chainer           |
|----------------|-------------------|-------------------|-------------------|-------------------|
| Main Focus     | Deep Learning     | Numerical Computing| Deep Learning     | Deep Learning     |
| Backends | PyTorch | XLA, NumPy | TensorFlow | NumPy |
| Programming Language Support | Python | Python, Julia | Python | Python |
| GPU Support | Yes         | Yes         | Yes         | Yes         |
| Computation Graph | Dynamic           | Dynamic            | Dynamic (since version 2.0)            | Dynamic            |
| Auto Differentiation | Yes             | Yes               | Yes              | Yes               |
| APIs for Model Building | Torch.nn | Flax, Haiku | Keras, Estimators, Layers | Chainer.links, Chainer.functions |
| Distributed Training | Yes             | Yes                | Yes              | Yes                |
| Visualization   | TensorBoard       | No                | TensorBoard       | No                |
| Community       | Large, Active  | Small, Active   | Large, Active  | Small, Active   |
| Platform Support | Windows, Linux, macOS | Linux, macOS | Windows, Linux, macOS | Windows, Linux, macOS |
| Ease of Debugging | Good | Good | Okay | Okay |
| Mobile Deployment | PyTorch Mobile | No | TensorFlow Lite | ChainerX |
| Primary Developer| Meta AI (Gov. by Linux Foundations) |Google | Google |  Preferred Networks|


We will use PyTorch for our tutorials.

## An Overview of the PyTorch Library

<img align="center" style="max-width: 600px" src="figures/pytorch_packages.jpeg">


+ `autograd`: This package is used for automatic differentiation. The autograd package is essential for training neural networks using backpropagation, as it allows users to easily compute gradients of the loss function with respect to the model parameters.

+ `nn`: This package provides a high-level API for building neural networks in PyTorch. It includes the most common types of layers such as convolutional layers, pooling layers, and linear layers, as well as activation functions and loss functions. The `nn` module also provides tools for building custom layers and models using PyTorch tensors.

+ `data`: This package provides tools for loading and processing data in PyTorch. It includes data loaders, which can be used to load and transform data from various sources, such as files or databases. The data module also includes utilities for data augmentation and sampling.

+ `optim`: This sub-package provides various optimization algorithms for training neural networks in PyTorch. It includes popular optimization methods such as Stochastic Gradient Descent (SGD), Adam, and Adagrad. The optim module also provides tools for customizing the learning rate and weight decay, as well as implementing learning rate schedulers.


## Computational Graphs and Automatic Differentiation

#### What are computational graphs and why do we need them?

A computational graph in a directed acyclic graph (DAG) that represents the flow of information through the network. It consists of nodes that represent mathematical operations and edges that represent the flow of data between the nodes.


Let's assume we have a very simple function:

$$f(x) = x \times w + b$$

Here $x$ is the input and $w$ and $b$ are (learnable) parameters. We want to change $w$ and $b$ such that the output of the function gets as close as possible to a target output (ground-truth).

Let's say the output of $f$ for $x=0.4$ is equal to $0.7$, but we want to to be equal to $1.0$:

$$f(0.4) = 0.7 \rightarrow f(0.4) = 1.0$$

To do so, we first mesaure the difference between the desired output and the actual output of the function and we call it the loss ($l$):

$$l = ||f(0.4) - 0.7||_{2}^{2}$$

Then, to estimate the amount of required change in $w$ and $b$ to get closer to the desired value, we need to compute the gradients of the loss w.r.t. the functions parameters:

$$\frac{\partial l}{\partial w}, \frac{\partial l}{\partial b}$$

And finally update $w$ an $b$ using gradient descent:
$$w_{new} \leftarrow w - \alpha  \frac{\partial l}{\partial w}$$
$$b_{new} \leftarrow b - \alpha \frac{\partial l}{\partial b}$$

Now let's see how we can do the same steps in PyTorch!

In [112]:
import torch

In [113]:
x = torch.tensor(0.4)  # input tensor
y = torch.tensor(1.0)  # expected output
w = torch.tensor(0.2, requires_grad=True) # requires_grad=True -> learnable parameter
b = torch.tensor(0.0, requires_grad=True) # requires_grad=True -> learnable parameter
z = x * w + b
loss = torch.norm(z - y, p=2)

Running the tensor operations above creates the following computational graphs that enables automatic differentiation.

<img align="center" style="max-width: 500px" src="figures/comp-graph.png">
<sup> Image adapted from: <a href="https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html">https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html</a> <sup>

#### Computing Gradients

Once the loss is computed and the computational graphs is formed (in the background), we can compute the gradients for the learnable parmeters. But first let's check what are the gradient values for the learnable parameters $w$ and $b$ before computing the gradients:

In [114]:
print(w.grad)
print(b.grad)

None
None


The easiest way to compute all gradients in a computational graphs is to call `.backward()` on the loss terms:

In [115]:
loss.backward()

Now, let's check the gradients again.

In [116]:
print(w.grad)
print(b.grad)

tensor(-0.4000)
tensor(-1.)


voila! the gradients are there. Remember that after calling `.backward()` the computational graph is removed for computational reasons. For most application you don't need to keep the computational graph, but there are ways to keep it which is outside the scope of this tutorial.

#### Another way to compute gradients *

We can compute partial derrivatives w.r.t. particular parameters in the model directly

In [117]:
x = torch.tensor(0.4)  # input tensor
y = torch.tensor(1.0)  # expected output
w = torch.tensor(0.2, requires_grad=True) # requires_grad=True -> learnable parameter
b = torch.tensor(0.0, requires_grad=True) # requires_grad=True -> learnable parameter
z = x * w + b
loss = torch.norm(z - y, p=2)


In [118]:
torch.autograd.grad(loss, [w, b])

(tensor(-0.4000), tensor(-1.))

## Building a Neural Network with `torch.nn`

To implement more advanced or complex neural networks, we need to use the `nn` package. Let's start by a simple example, a linear (or affine) mapping:

A linear mapping, takes a vector of size $n$ as input and outputs a vector of size $m$.

$$v = A*x + b$$

Here $A$ is a matrix of shape $m \times n$. $x$ has shape $1 \times n$, $v$ has shape $1 \times m$.

In [119]:
import torch.nn as nn

We use `Linear` modules from `torch.nn`:

In [95]:
linear = nn.Linear(in_features=5, out_features=3, bias=True)

Now let's feed it a random tensor.

In [120]:
x = torch.rand(1, 5)
print(x)

tensor([[0.1065, 0.0988, 0.0673, 0.1315, 0.2903]])


To apply linear operator to the input $x$, we use the "call" operator. 

In [121]:
v = linear(x)
print(v)

tensor([[ 0.2129,  0.2258, -0.5205]], grad_fn=<AddmmBackward0>)


The class `Linear`, takes care of all learnable parameters ($A$ and $b$). But where are the parameters in `linear`?

In [122]:
for n, p in linear.named_parameters():
    print(n, ":", p)

weight : Parameter containing:
tensor([[ 0.2935,  0.1808, -0.2188,  0.3289,  0.1805],
        [-0.1033,  0.0085,  0.3332, -0.0218, -0.0499],
        [-0.2402,  0.0237, -0.0532, -0.3007, -0.0706]], requires_grad=True)
bias : Parameter containing:
tensor([ 0.0829,  0.2309, -0.4337], requires_grad=True)


### Implementing Custom Models

We can implement each layer/transformation separately, but for more complex models we need a better way. Let's assume we want to design a model with three consecutive linear transformations for given input $x$:

$v = A_3 (A_2 (A_1 x + b_1) + b_2) + b_3$

$v_1 = A_1 x + b_1, \rightarrow v_2 = A_2 v_1 + b_2, \rightarrow v_3 = A_3 v_2 + b_3$


We can implement our own neural network as a sequence of operations applied to an input tensor.

In [108]:
class MyLinearModel(nn.Module):    
    
    # Class constructor: called when we create an instance of the model
    def __init__(self):
        # Call super class constructor: to initialize default settings in PyTorch `nn` modules
        super(MyLinearModel, self).__init__()
        
        # First linear layer (transformation)
        self.linear_1 = nn.Linear(10, 20, bias=True) 
        
        # Second linear layer
        self.linear_2 = nn.Linear(20, 5, bias=True)
        
        # Third linear layer
        self.linear_3 = nn.Linear(5, 2, bias=True) 
        
    
    def forward(self, x):
        # Apply the first linear transformation
        x = self.linear_1(x)
        
        # Apply the second linear transformation
        x = self.linear_2(x)
        
        # Apply the third linear transformation
        x = self.linear_3(x)
        
        # Return the final tensor
        return x

In [109]:
my_model = MyLinearModel()

In [111]:
x = torch.randn(1, 10)
v = my_model(x)
print(v.shape)

torch.Size([1, 2])


## Optimizing a Neural Network

## Data Handling