<p align="center">
    <img src="./assets/pytorch-logo.png" width=100>
</p>

<h2 align="center">MLH Show & Tell: Introduction to PyTorch</h2>

<br/>

<div align="center">
    This notebook gives a introduction to pytorch. We'll be discussing about tensors, usage of computational graphs to calculate gradients and build a simple linear model to get an understanding of the workflow in PyTorch.
</div>

<div>
    <h3>Topics Covered</h3>
    <ol>
        <li>Tensors</li>
        <li>Computational Graphs - Autograd</li>
        <li>Datasets & Dataloaders</li>
        <li>Linear Regression</li>
        <li>Simple Neural Network</li>
    </ol>
</div>

In [1]:
import os

import torch
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

## 1. Tensors

### 1.1 Introduction

In [2]:
# What is a tensor?
# Difference b/w tensor and Tensor

print(torch.Tensor(), torch.tensor([1,2,3]))

a = torch.tensor([[1], [2], [3]], dtype=float, device='cpu')
b = torch.tensor([[4], [5], [6]], dtype=float, device='cpu')

a,b

tensor([]) tensor([1, 2, 3])


(tensor([[1.],
         [2.],
         [3.]], dtype=torch.float64),
 tensor([[4.],
         [5.],
         [6.]], dtype=torch.float64))

In [5]:
torch.ones((1,4,2)) # np.ones((1,4,2))
torch.zeros((1,4,2))

torch.rand(1,4)

tensor([[0.4511, 0.3047, 0.4537, 0.5734]])

In [6]:
print(a.device, a.shape, a.dtype, sep='\n')

cpu
torch.Size([3, 1])
torch.float64


### 1.2 Coversions

In [7]:
# Converting from array to tensor
torch.from_numpy(np.array([1,2,3,4], dtype=float))

# Converting from tensor to array
a.numpy()

# move tensor to device
a = a.to('cpu')

### 1.3 Tensor Operations

In [8]:
# Multiplication

print("Multiplication operator")
print(a@b.T) # 3x1 @ 1x3 -> 3x3
print()
# @, matmul

print("Multiplication matmul")
print(torch.matmul(a, b.T))
print()

print("Transpose")
print(a.T) # Transpose
print()
# a.t()

# Mean, Sum
# axis = 0 is along row and axis =1 is along column
print("Sum and Mean")
print(a.sum(axis=0), a.mean()) 
print()

print("Concat tensors")
print(torch.cat([a,b], axis=0))
print()

print(a.T) # Transpose

Multiplication operator
tensor([[ 4.,  5.,  6.],
        [ 8., 10., 12.],
        [12., 15., 18.]], dtype=torch.float64)

Multiplication matmul
tensor([[ 4.,  5.,  6.],
        [ 8., 10., 12.],
        [12., 15., 18.]], dtype=torch.float64)

Transpose
tensor([[1., 2., 3.]], dtype=torch.float64)

Sum and Mean
tensor([6.], dtype=torch.float64) tensor(2., dtype=torch.float64)

Concat tensors
tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.]], dtype=torch.float64)

tensor([[1., 2., 3.]], dtype=torch.float64)


## 2. Computational Graphs - Autograd

For more detailed explanation on the usage of autograd, please refer to the [official documentation](https://pytorch.org/docs/stable/notes/autograd.html).

<div align='center'>
    <font size="5">$y = (a+b) * c$</font>
</div>


In [9]:
a = torch.tensor(
    [2],
    dtype=float,
    device='cpu',
    requires_grad=True
)

b = torch.tensor(
    [5],
    dtype=float,
    device='cpu',
    requires_grad=True
)

c = torch.tensor(
    [3], 
    dtype=float,
    device='cpu',
    requires_grad=True
)

In [11]:
y = (a+b)*c
y

tensor([21.], dtype=torch.float64, grad_fn=<MulBackward0>)

In [12]:
y.backward()

In [13]:
a.grad, b.grad, c.grad

(tensor([3.], dtype=torch.float64),
 tensor([3.], dtype=torch.float64),
 tensor([7.], dtype=torch.float64))

## 3. Datasets & Data Loaders

For the sake of simplicity, we use a very small subset of a dataset. 

<b>Goal:</b> Predict the yield of apples and oranges given the temperature, rainfall and humidty.

For any given task in PyTorch, its always a good practice to great a dataset class and use a data loader to batch inputs. 

1. Create a dataset class
2. Use a data loader to batch the inputs


In [14]:
# Features: (temp, rainfall, humidity)
inputs = np.array([
    [73, 67, 43],  [91, 88, 64], [87, 134, 58], 
    [102, 43, 37], [69, 96, 70], [73, 67, 43], 
    [91, 88, 64], [87, 134, 58], [102, 43, 37], 
    [69, 96, 70], [73, 67, 43], [91, 88, 64], 
    [87, 134, 58], [102, 43, 37], [69, 96, 70]], 
    dtype='float32'
)

# Targets (apples, oranges)
targets = np.array([
        [56, 70], [81, 101], [119, 133], [22, 37], [103, 119], 
        [56, 70], [81, 101], [119, 133], [22, 37], [103, 119], 
        [56, 70], [81, 101], [119, 133], [22, 37], [103, 119]
    ],
    dtype='float32'
)

x_train, x_test, y_train, y_test = train_test_split(inputs, targets)

In [15]:
len(inputs), len(targets)

(15, 15)

### 3.1 Create Dataset

In [17]:
class Dataset:
    
    def __init__(self, features, targets):
        '''
        Initialize all the features and targets
        '''
        self.features = features
        self.targets = targets
    
    def __len__(self,):
        '''
        return length of the dataset
        '''
        return self.features.shape[0]
    
    def __getitem__(self,index):
        '''
        return sample corresponding to the index
        '''
        
        features = self.features[index]
        target = self.targets[index]
        
        return {
            "features": torch.tensor(
                features,
                dtype=torch.float,
            ),
            "target": torch.tensor(
                target,
                dtype=torch.float,
            )
            
        }

In [18]:
train_dataset = Dataset(x_train, y_train)
test_dataset = Dataset(x_test, y_test)

In [20]:
train_dataset[0], test_dataset[0]

({'features': tensor([91., 88., 64.]), 'target': tensor([ 81., 101.])},
 {'features': tensor([73., 67., 43.]), 'target': tensor([56., 70.])})

### 3.2 Data Loader

In [22]:
print(torch.utils.data.DataLoader.__doc__)


    Data loader. Combines a dataset and a sampler, and provides an iterable over
    the given dataset.

    The :class:`~torch.utils.data.DataLoader` supports both map-style and
    iterable-style datasets with single- or multi-process loading, customizing
    loading order and optional automatic batching (collation) and memory pinning.

    See :py:mod:`torch.utils.data` documentation page for more details.

    Args:
        dataset (Dataset): dataset from which to load the data.
        batch_size (int, optional): how many samples per batch to load
            (default: ``1``).
        shuffle (bool, optional): set to ``True`` to have the data reshuffled
            at every epoch (default: ``False``).
        sampler (Sampler or Iterable, optional): defines the strategy to draw
            samples from the dataset. Can be any ``Iterable`` with ``__len__``
            implemented. If specified, :attr:`shuffle` must not be specified.
        batch_sampler (Sampler or Iterable, opti

In [23]:
train_dataloader = torch.utils.data.DataLoader(
    train_dataset, 
    batch_size=2,
    num_workers=2,
)

test_dataloader = torch.utils.data.DataLoader(
    test_dataset, 
    batch_size=2,
    num_workers=2,
)

## 4. Linear Regression

In [24]:
w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)

lr_rate = 0.001
epochs = 20


def model(x):
    return x @ w.t() + b
    

In [25]:
model(torch.from_numpy(inputs))

tensor([[ -67.8244, -158.7126],
        [ -89.6989, -212.7721],
        [ -76.8156, -294.7608],
        [ -88.7237, -113.3412],
        [ -76.1306, -228.9728],
        [ -67.8244, -158.7126],
        [ -89.6989, -212.7721],
        [ -76.8156, -294.7608],
        [ -88.7237, -113.3412],
        [ -76.1306, -228.9728],
        [ -67.8244, -158.7126],
        [ -89.6989, -212.7721],
        [ -76.8156, -294.7608],
        [ -88.7237, -113.3412],
        [ -76.1306, -228.9728]], grad_fn=<AddBackward0>)

In [26]:
def mse(t1, t2, ):
    diff = t1 - t2
    mse_loss = torch.sum(diff * diff) / diff.numel()
    reg_loss = w.sum() * (0.01/(2*diff.numel()))
    
    loss = mse_loss + reg_loss
    
    return loss 

In [32]:
for epoch in range(epochs):
    
    epoch_loss=0 
    for sample in train_dataloader:
        
        x = sample['features']
        y = sample['target']
        
        output = model(x)
        
        loss = mse(output, y)
        loss.backward()
        
        epoch_loss += loss.item()
        
        with torch.no_grad():
            w -= w.grad * 1e-5
            b -= b.grad * 1e-5
            w.grad.zero_()
            b.grad.zero_()
    
    print(epoch, epoch_loss)
    

0 139.1218023300171
1 133.55698013305664
2 128.40527248382568
3 123.63406372070312
4 119.21163940429688
5 115.10968017578125
6 111.30192852020264
7 107.76474475860596
8 104.47573709487915
9 101.41487169265747
10 98.5640115737915
11 95.90546703338623
12 93.42444324493408
13 91.10596132278442
14 88.937180519104
15 86.90607404708862
16 85.00155067443848
17 83.21338033676147
18 81.53253245353699
19 79.95018172264099


In [33]:
w, b

(tensor([[-0.3248,  1.0848,  0.2464],
         [-0.2468,  0.7819,  0.8386]], requires_grad=True),
 tensor([-0.4590,  0.8789], requires_grad=True))

In [34]:
for test_inputs in test_dataloader:

    x = sample['features']
    y = sample['target']
    
    print(sample, '\n')
    print(f'True value: {y.numpy()[0]}; Prediction: {model(x).detach().numpy()[0]}')
    break
    
    

{'features': tensor([[69., 96., 70.]]), 'target': tensor([[103., 119.]])} 

True value: [103. 119.]; Prediction: [ 98.52357  117.614075]


In [None]:
len(test_dataloader)

## 5. Neural Networks

In [35]:
import torch.nn.functional as F

In [38]:

class NN(torch.nn.Module):
    
    def __init__(self):
        
        super().__init__()
        
        self.linear1 = torch.nn.Linear(3, 3)
        self.act1 = torch.nn.ReLU() # Activation function
        self.linear2 = torch.nn.Linear(3, 2)
    
    def forward(self, x):
        x = self.linear1(x)
        x = self.act1(x)
        x = self.linear2(x)
        return x

model = NN()

In [39]:
model.linear2.weight, model.linear2.bias

(Parameter containing:
 tensor([[-0.0911, -0.0635,  0.3031],
         [ 0.1999,  0.1713, -0.2428]], requires_grad=True),
 Parameter containing:
 tensor([0.0305, 0.1546], requires_grad=True))

In [40]:
loss_fn = F.mse_loss
opt = torch.optim.SGD(model.parameters(), lr=1e-5)


In [41]:
for epoch in range(epochs):
    
    for sample in train_dataloader:
        
        x,y = sample['features'], sample['target']
        
        output = model(x)
        
        loss = loss_fn(output, y)
        
        loss.backward()
        opt.step()
        opt.zero_grad()
        
    print('Training loss: ', loss.detach().numpy())
    

Training loss:  7911.151
Training loss:  1954.1641
Training loss:  288.09064
Training loss:  136.4493
Training loss:  111.762314
Training loss:  100.35274
Training loss:  91.382904
Training loss:  83.41643
Training loss:  76.1566
Training loss:  69.504456
Training loss:  63.40851
Training loss:  57.831757
Training loss:  52.742447
Training loss:  48.10897
Training loss:  43.901104
Training loss:  40.08901
Training loss:  36.642956
Training loss:  33.53417
Training loss:  30.734558
Training loss:  28.21765


In [42]:
for test_inputs in test_dataloader:

    x = sample['features']
    y = sample['target']
    
    print(sample, '\n')
    print(f'True value: {y.numpy()[0]}; Prediction: {model(x).detach().numpy()[0]}')
    break
    
    

{'features': tensor([[69., 96., 70.]]), 'target': tensor([[103., 119.]])} 

True value: [103. 119.]; Prediction: [ 98.921585 116.973206]


## Common Problems

1. Always move all the inputs to the same device (`cpu` or `gpu`)
    ```python
    a.to('cpu')
    ```
2. `TypeError: unsupported operand type(s) for *: 'NoneType' and 'float'`
    
   Make sure that `requires_grad=True`


## Tips

1. Don't do lots of courses without actually practicing anything.


2. Try to work on a new project every month with a new task (Regression, classification, clustering, recommendation, etc ...)


3. Particpiate in kaggle competitions. Read and understand other notebooks and methods.


4. Read review papers


-----