# PyTorch

Snippets of neural network definitions and other essential building blocks in PyTorch.

### Boilerplate Structure of a PyTorch Program (Supervised Learning):

In [None]:
# PyTorch model can be deployed on a CPU or GPU
net = MyModel().to(device)

train_loader = torch.utils.data.DataLoader(...)
test_loader = torch.utils.data.DataLoader(...)

# Choosing an optimiser, eg. stochastic gradient descent, Adam, etc.
optimizer = torch.optim.SGD(net.parameters,...)

# Training is done in epochs
for epoch in range(1, epochs):
    train(params, net, device, train_loader, optimizer)
    if epoch % 10 == 0:
        test(params, net, device, test_loader)

### Defining a Model

In [None]:
# Inherits from the torch.nn.Module class 
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # The structure of the network is defined here
    
    def forward(self, input):
        # Run the input through the network and return the prediction


### Defining a Custom Model

Consider the function $(x,y) \mapsto Ax\log (y) + By^2$,

In [None]:
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # The Parameter constructor needs to be given a torch tensor. 
        # Here we're creating a tensor of size 1 and initialise it with a random Gaussian value
        # Since the aim is to tune this parameter, we set requires_grad=True 
        self.A = nn.Parameter(torch.randn((1), requires_grad=True))
        self.B = nn.Parameter(torch.randn((1), requires_grad=True))
    def forward(self, input):
        output = self.A * input[:,0] * torch.log(input[:,1]) + self.B * input[:,1] * input[:,1]

### Building a Neural Net from Individual Components:
The following network would be suitable model for the XOR multi-layer perceptron.

In [None]:
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        # From an input layer with 2 nodes to a hidden layer with 2 nodes
        self.in_to_hid  = torch.nn.Linear(2, 2)
        # From a hidden layer with 2 nodes to an output layer with 1 node
        self.hid_to_out = torch.nn.Linear(2, 1)
    
    def forward(self, input):
        # Takes the input vector and multiplies it with the weight matrix from input layer -> hidden layer
        hid_sum = self.in_to_hid(input)
        
        # Apply tanh on each component of the hidden layer's output
        hidden  = torch.tanh(hid_sum)
        
        # Matrix multiplication of hidden layer output with the weights going into the output layer
        out_sum = self.hid_to_out(hidden)
        
        # Applying the sigmoid function on the final output
        output  = torch.sigmoid(out_sum)
        return output

![image.png](images/xor-network.png)

### Defining a Sequential Network:
Modules are added in the order that they are passed into the $\texttt{Sequential}$ constructor.

In [None]:
class MyModel(torch.nn.Module):
    def __init__(self, num_input, num_hidden, num_out):
        super(MyModel, self).__init__()
        self.main = nn.Sequential(
            nn.Linear(num_input, num_hidden),
            nn.Tanh(),
            nn.Linear(num_hidden, num_out),
            nn.Sigmoid()
        )
    def forward(self, input):
        output = self.main(input)
        return output

#### Sequential Components:
- Neural network layers:
    - $\texttt{nn.Linear()}$   — for linear layers
    - $\texttt{nn.Conv2d()}$   — for 2D convolutional layers
   
- Intermediate operators — these are applied *prior* to the activation function
    - $\texttt{nn.Dropout()}$
    - $\texttt{nn.BatchNorm()}$

- Activation functions:
    - $\texttt{nn.Tanh()}$
    - $\texttt{nn.Sigmoid()}$
    - $\texttt{nn.ReLU()}$

### Working With Data:
The following snippet declares the input dataset for the XOR network and the expected output predictions.

In [None]:
import torch.utils.data

input    = torch.Tensor([0, 0],
                        [0, 1],
                        [1, 0],
                        [1, 1])
expected = torch.Tensor([0],
                        [1],
                        [1],
                        [0])

# TensorDataset forms the training dataset from the input samples and corresponding target outputs
xdata        = torch.utils.data.TensorDataset(input, expected)
train_loader = torch.utils.data.DataLoader(xdata, batch_size=4)

#### Batch size: 
*Batch size* is a hyperparameter of gradient descent. The $\texttt{batch_size}$ defines the number of input datapoints to be propagated through the network, after which, the network's weights are updated. 

Eg. specifying a batch size of 100 will use the first 100 datapoints from the input dataset to train the network for the first training iteration. For the next iteration, it takes the next 100 datapoints from the input dataset to train the network, and so on.

Training in mini-batches:
- Requires less memory. You can't fit a massive dataset in memory all at once
- Weights are learned more quickly since we are making updates after each batch is completed, as opposed to making 1 single update after the entire input dataset has been propagated through
- Becomes less accurate the smaller the batch

#### Epoch and Iterations:
An *epoch* is one complete iteration through the *entire* dataset, forward and backward through the network.

An *iteration* is the number of batches in 1 epoch.

Since gradient descent is an iterative process, making further epochs will converge the weights closer to 0% error. Only running one epoch usually leads to underfitting. Running too many epochs will usually lead to overfitting. The number of epochs depends on the diversity of the training dataset's samples.


#### Custom Datasets

For some widely used datasets, we have special methods just for fetching them online and then loading for training. See a list of classic datasets provided by pytorch <a href="https://pytorch.org/docs/stable/torchvision/datasets.html">here</a>.

In [None]:
import torchvision.datasets as dsets

mnist    = dsets.MNIST(...)       # Handwritten digits dataset
cifarset = dsets.CIFAR10(...)     # Animal and vehicle images dataset
celebset = dsets.CelebA(...)      # Celebrity pictures dataset

In [None]:
23:46

### Resources:
- Cifar neural network: https://github.com/kuangliu/pytorch-cifar

<hr />

## Tensors:

Tensor — a rank-$n$ tensor in $m$-dimensions is a mathematical object with $n$ indices and $m^n$ *components* and obeys certain transformation rules.

- "Tensor" comes "to stretch" in Latin.
- A vector *is a* tensor

### Continuum Mechanics:
In *continuum mechanics*, *stress* is a physical quantity that expresses the internal forces that neighbouring particles of a continuous material exert on each other.

Consider a cube in 3D space. It can be 'stretched' in 3 separate dimensions and 'sheared' in 6 directions:

<table style="width: 75%">
    <tr>
        <td>
            <img src='images/shear-cube.png'>
        </td>
        <td>
            <img src='images/stress-tensor-cube.png'>
        </td>
    </tr>
</table>


These 9 different stresses that can be applied to the cube are organised into a *stress tensor* like this: 
$
    \begin{pmatrix}
    \sigma_{11} & \sigma_{12} & \sigma_{13} \\
    \sigma_{21} & \sigma_{22} & \sigma_{23} \\
    \sigma_{31} & \sigma_{32} & \sigma_{33} \\
    \end{pmatrix}
$.
Each row and column correspond to a physical dimension (x, y and z).

<strong>*Rank*</strong> can be thought of as the amount of information you need to find a specific *component*. Formally, rank is the number of *basis vectors* required to fully specify a *component* of the tensor.

For example, since we can identify any $\sigma_{ij}$ by specifying the row and the column, we say that this tensor is rank-$2$ and $3$-dimension. Note that the number of components is given by $dim^{rank}=3^2=9$.

<img src="images/different-rank-tensors.png" width="50%">

In general, we just use index notation instead of matrix notation to specify tensors, since matrix notation breaks beyond rank-$2$.

Note that a rank-$2$ tensor is not the same as a matrix. Fundamentally, a matrix is just a data structure for numbers. A tensor, on the other hand, is a data structure that *obeys certain transformation rules*.

Tensors have a deeper physical significance. 

#### Transformation Rules:
- A tensor is *invariant* under a change in the coordinate system. During a change to the coordinate system, the components change according to a special set of equations, but the vector itself has not been affected. Eg. think of a displacement vector between two objects in 3D space — its components will certainly change if the coordinate system is rotated, displaced, etc., but the actual displacement vector itself preserves its physical meaning.


### Resources:
- Tensors:
    - https://medium.com/@quantumsteinke/whats-the-difference-between-a-matrix-and-a-tensor-4505fbdc576c
    