<a href="https://colab.research.google.com/github/dl-ub-summer-school/2022/blob/master/DLUB2022_Seminar1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# PyTorch Basics
===============================================================

<img src="https://miro.medium.com/max/1200/1*bBS_lYMoWhiyJf733Bghwg.jpeg" width="300">

PyTorch is an open source machine learning framework based on the Torch library,used for applications such as computer vision and natural language processing,primarily developed by Meta AI.

<img src="https://miro.medium.com/max/1400/1*IsaBkifkc5P7ihRA8IKQ8Q.png" width="800">

### Agenda

1. PyTorch vs TensorFlow
2. Dynamic Computational Graph
3. PyTorch: Tensors and Autograd
4. Build Model: Pytorch Implementation MLP
5. Pytorch Simple Self-Attention implementation

### 1. PyTorch vs TensorFlow

<img src="https://www.dropbox.com/s/oq0940vz6srofhf/Comparison-between-a-a-static-computation-graph-in-TensorFlow-115-and-b-an.png?raw=1" width=800>

## 2. Dynamic Computational Graph

<img src="https://miro.medium.com/max/1400/1*5PLIVNA5fIqEC8-kZ260KQ.gif" width=800>

#### Computational Graph Example

<img src="https://www.dropbox.com/s/ehw663j0uj68a92/024.png?raw=1" width=800>

#### Coding Time

#### Multiply Layer

<img src="https://www.dropbox.com/s/cbdvkojahfjxlsn/023.png?raw=1" width=600>

In [None]:
class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None

    def forward(self, x, y):
        # TODO
        return out

    def backward(self, dout):
        # TODO
        return dx, dy

#### Add Layer

<img src="https://www.dropbox.com/s/vioxvohnadt0pl7/022.png?raw=1"  width=600>

In [None]:
class AddLayer:
    def __init__(self):
        pass

    def forward(self, x, y):
        # TODO
        return out

    def backward(self, dout):
        # TODO
        return dx, dy

In [None]:
# params
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

In [None]:
# forward
apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)
orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)
total = mul_tax_layer.forward(all_price, tax)  # (4)

In [None]:
total

In [None]:
total_grad = 1 #grad
# backward
all_price_grad, tax_grad            = mul_tax_layer.backward(total_grad)  # (4)
apple_price_grad, orange_price_grad = add_apple_orange_layer.backward(all_price_grad)  # (3)
orange_grad, orange_num_grad        = mul_orange_layer.backward(orange_price_grad)  # (2)
apple_grad, apple_num_grad          = mul_apple_layer.backward(apple_price_grad)  # (1)

In [None]:
print("price:", int(total))
print("dApple:", apple_grad)
print("dApple_num:", int(apple_num_grad))
print("dOrange:", orange_grad)
print("dOrange_num:", int(orange_num_grad))
print("dTax:", tax_grad)

Final result: <img src="https://www.dropbox.com/s/cu0ywj7abepyjaf/025.png?raw=1" width=0>

Parameter Update: 
    
<img src="https://hmkcode.com/images/ai/bp_update_formula.png" width=600>

In [None]:
l_rate       = 0.1
apple        = apple - l_rate * apple_grad
apple_num    = apple_num - l_rate * apple_num_grad
orange       = orange - l_rate * orange_grad
orange_num   = orange_num - l_rate * orange_grad
tax          = tax - l_rate * tax_grad

print("Apple:", apple)
print("Apple_num:", apple_num)
print("Orange:", orange)
print("Orange_num:", orange_num)
print("Tax:", tax)

## 3. PyTorch: Tensors and Autograd

In [None]:
import torch
from torch import nn
import numpy as np

https://pytorch.org/docs/stable/tensors.html

In [None]:
torch.zeros([2, 4], dtype=torch.int32)
device = torch.device('cpu')
torch.ones([2, 4], dtype=torch.float64, device=device)

In [None]:
np_array = np.array([[1,1,1], [2,2,2], [3,3,3]])
tensor = torch.from_numpy(np_array)

In [None]:
tensor

##### Dot Product

In [None]:
# This computes the element-wise product
print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
# Alternative syntax:
print(f"tensor * tensor \n {tensor * tensor}")

####  matrix multiplication

In [None]:
print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor)} \n")
# Alternative syntax:
print(f"tensor @ tensor.T \n {tensor @ tensor}")

<img src="https://www.dropbox.com/s/ehw663j0uj68a92/024.png?raw=1" width=800>

In [None]:
apple = torch.tensor(100.0, requires_grad=True)
apple_num = torch.tensor(2.0, requires_grad=True)
orange = torch.tensor(150.0, requires_grad=True)
orange_num = torch.tensor(3.0, requires_grad=True)
tax = torch.tensor(1.1, requires_grad=True)

In [None]:
z_out = (apple * apple_num + orange * orange_num) * tax

In [None]:
print(z_out)

In [None]:
z_out.backward()

In [None]:
print("dApple:", apple.grad)
print("dApple_num:", apple_num.grad)
print("dOrange:", orange.grad)
print("dOrange_num:", orange_num.grad)
print("dTax:", tax.grad)

In [None]:
print("dApple:", apple_grad)
print("dApple_num:", int(apple_num_grad))
print("dOrange:", orange_grad)
print("dOrange_num:", int(orange_num_grad))
print("dTax:", tax_grad)

In [None]:
apple = torch.tensor(100.0, requires_grad=True)
apple_num = torch.tensor(2.0, requires_grad=True)
orange = torch.tensor(150.0, requires_grad=True)
orange_num = torch.tensor(3.0, requires_grad=True)
tax = torch.tensor(1.1, requires_grad=True)

In [None]:
learning_rate = 1e-6

In [None]:
z = torch.tensor(600.0)

In [None]:
for t in range(5):
    z_pred = (apple * apple_num + orange * orange_num) * tax
    loss = (z_pred - z).pow(2)
    # loss = z_pred - z 
    print(t, loss)
     
    loss.backward()
    
    # Disabling Gradient Tracking
    with torch.no_grad():
        apple      -= learning_rate * apple.grad
        apple_num  -= learning_rate * apple_num.grad
        orange     -= learning_rate * orange.grad
        orange_num -= learning_rate * orange_num.grad
        tax        -= learning_rate * tax.grad
        
        apple.grad      = None
        apple_num.grad  = None
        orange.grad     = None
        orange_num.grad = None
        tax.grad        = None
        
    print(f'Result: {z_pred} = ({apple} x {apple_num} + {orange} x {orange_num}) * {tax}')

In [None]:
apple = torch.tensor(100.0, requires_grad=True)
apple_num = torch.tensor(2.0, requires_grad=False)
orange = torch.tensor(150.0, requires_grad=True)
orange_num = torch.tensor(3.0, requires_grad=False)
tax = torch.tensor(1.1, requires_grad=True)

In [None]:
z = torch.tensor(600.0)

In [None]:
for t in range(10):
    z_pred = (apple * apple_num + orange * orange_num) * tax
    loss = (z_pred - z).pow(2)
    print(t, loss)
     
    loss.backward()
    
    # Disabling Gradient Tracking
    with torch.no_grad():
        apple  -= learning_rate * apple.grad
        orange -= learning_rate * orange.grad
        tax    -= learning_rate * tax.grad
        
        apple.grad  = None
        orange.grad = None
        tax.grad    = None
        
    print(f'Result: {z_pred} = ({apple} x {apple_num} + {orange} x {orange_num}) * {tax}')

## 4. Build Model: Pytorch Implementation MLP

<img src="https://raw.githubusercontent.com/bentrevett/pytorch-image-classification/master/assets/mlp-mnist.png" width=600>

In [None]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, SubsetRandomSampler
from torchvision import datasets
from torchvision.transforms import ToTensor
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device

In [None]:
training_data = datasets.MNIST(
    root = 'data',
    train = True,                         
    transform = ToTensor(), 
    download = True,            
)
validation_data = datasets.MNIST(
    root = 'data', 
    train = False, 
    transform = ToTensor()
)

In [None]:
print(training_data)
print(validation_data)
print(training_data.data.size())

In [None]:

plt.imshow(training_data.data[1], cmap='gray')
plt.title('%i' % training_data.targets[1])
plt.show()
torch.set_printoptions(linewidth=200)
training_data.data[1]

In [None]:
figure = plt.figure(figsize=(10, 8))
for i in range(1, 26):
    img, label = training_data[i]
    figure.add_subplot(5, 5, i)
    plt.title(label)
    plt.imshow(img.squeeze(), cmap="gray")
plt.show()

In [None]:
idxs = list(range(1000))
batch_size = 50 
training_data_loader = torch.utils.data.DataLoader(training_data, 
                                          batch_size=batch_size, 
                                          num_workers=1, 
                                          sampler=SubsetRandomSampler(idxs)
                                          )

val_idxs = list(range(1000))
validation_data_loader = torch.utils.data.DataLoader(validation_data, 
                                          batch_size=batch_size, 
                                          num_workers=1,
                                          sampler=SubsetRandomSampler(idxs))

In [None]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(28*28, 250)
        self.linear2 = nn.Linear(250, 100)
        self.linear3 = nn.Linear(100, 10)

    def forward(self, x):
        x = self.linear1(x)
        x = F.relu(x)
        x = self.linear2(x)
        x = F.relu(x)
        x = self.linear3(x)
        return x

In [None]:
model = MLP()
model

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

In [None]:
epoch = 100
losses = list()
val_losses = list()

epoch_loss_history = list()
epoch_acc_history = list()

epoch_val_loss_history = list()
epoch_val_acc_history = list()

CEL = nn.CrossEntropyLoss()


for i in range(epoch):
    training_corrects = 0.0 
    training_loss = 0.0 

    val_corrects = 0.0 
    val_loss = 0.0 

    # Training
    for X, y in training_data_loader:
        X = X.view(X.shape[0], -1)
        # forward
        y_pred = model.forward(X)

        # loss calculation
        S = CEL(y_pred, y)

        # initialize gradient 
        optimizer.zero_grad()

        # backward
        S.backward()
        optimizer.step()

        losses.append(S.item())
        max_vals, max_idxs = torch.max(y_pred, 1)
        training_corrects += torch.sum(max_idxs == y.data)
        training_loss += S.item()

    epoch_loss = training_loss / len(training_data_loader)
    epoch_acc = 100.0 * training_corrects / ( len(training_data_loader) * batch_size )

    epoch_loss_history.append(epoch_loss)
    epoch_acc_history.append(epoch_acc)
    
    if i % 10 == 0:
        print("Epoch: ", i+1)
        print("Training loss: {:.4f}, Training acc: {:.4f}". format(epoch_loss, epoch_acc))

## 5. Pytorch Simple Self-Attention implementation

<img src="https://miro.medium.com/max/1400/1*dSwckeG028obZPWafgJrmw.png" width=800>

Steps:
- Prepare inputs
- Initialise weights
- Derive key, query and value
- Calculate attention scores
- Calculate softmax
- Attention Output


<img src=https://miro.medium.com/max/1973/1*hmvdDXrxhJsGhOQClQdkBA.png width=800>




In [None]:
#### Prepare inputs

In [None]:
x = [
  [1, 0, 1, 0], # Input 1
  [0, 2, 0, 2], # Input 2
  [1, 1, 1, 1]  # Input 3
 ]
x = torch.tensor(x, dtype=torch.float32)
x

In [None]:
w_key = [
  [0, 0, 1],
  [1, 1, 0],
  [0, 1, 0],
  [1, 1, 0]
]
w_query = [
  [1, 0, 1],
  [1, 0, 0],
  [0, 0, 1],
  [0, 1, 1]
]
w_value = [
  [0, 2, 0],
  [0, 3, 0],
  [1, 0, 3],
  [1, 1, 0]
]
w_key = torch.tensor(w_key, dtype=torch.float32)
w_query = torch.tensor(w_query, dtype=torch.float32)
w_value = torch.tensor(w_value, dtype=torch.float32)

print("Weights for key: \n", w_key)
print("Weights for query: \n", w_query)
print("Weights for value: \n", w_value)

#### Derive key, query and value

<img src="https://miro.medium.com/max/1975/1*wO_UqfkWkv3WmGQVHvrMJw.gif" width=800>

In [None]:
keys = x @ w_key
querys = x @ w_query
values = x @ w_value

In [None]:
attn_scores = querys @ keys.T
print(attn_scores)

#### Calculate softmax

<img src="https://miro.medium.com/max/1973/1*jf__2D8RNCzefwS0TP1Kyg.gif" width=800>

Take the **softmax** across these **attention scores** (blue).
```
softmax([2, 4, 4]) = [0.0, 0.5, 0.5]
```

In [None]:
from torch.nn.functional import softmax

attn_scores_softmax = softmax(attn_scores, dim=-1)
print(attn_scores_softmax)


# For readability,
attn_scores_softmax = [
  [0.0, 0.5, 0.5],
  [0.0, 1.0, 0.0],
  [0.0, 0.9, 0.1]
]
attn_scores_softmax = torch.tensor(attn_scores_softmax)
print(attn_scores_softmax)

#### Attention Output

<img src="https://miro.medium.com/max/1973/1*G8thyDVqeD8WHim_QzjvFg.gif" width=800>

In [None]:
attn_scores_softmax @ values