# Basics of Deep Learning using Pytorch

Explains Tensors, Automatic Gradients, Neural Networks and Optimisers

## Tensors

#### Making Tensors

Check out this [link](https://pytorch.org/tutorials/beginner/introyt/tensors_deeper_tutorial.html) for more information on Tensors


In [1]:
import torch
import math

In [5]:
zeros = torch.zeros(2, 3)
print("Zero tensor \n",zeros)

ones = torch.ones(2, 3)
print("One tensor \n",ones)

torch.manual_seed(1729)
random = torch.rand(2, 3)
print("Normally distributed Random tensor \n",random)

Zero tensor 
 tensor([[0., 0., 0.],
        [0., 0., 0.]])
One tensor 
 tensor([[1., 1., 1.],
        [1., 1., 1.]])
Normally distributed Random tensor 
 tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])


#### Tensor Shapes and Types

Often in DL, the first index is the Batch number, and rest are dimensions 

Eg for Images
(batch_size, channels, height, width)

In [9]:
x = torch.ones(2, 2, 3)
print(x.shape)
print(x)

zeros_like_x = torch.zeros_like(x)
print(zeros_like_x.shape)

some_constants = torch.tensor([[3.14,3, 2.71], [1.61,4, 0.007]],  dtype=torch.int16)
print(some_constants.shape)
print(some_constants)

torch.Size([2, 2, 3])
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])
torch.Size([2, 3])
tensor([[3, 3, 2],
        [1, 4, 0]], dtype=torch.int16)


#### Tensor Broadcasting

Very similar to numpy broadcasting, here we will show multiplications, using tensors.
Both @ and torch.matmul do same matrix multiplication, but @ can do broadcasting implicitly, while torch.matmul cannot.
@ will broadcast the smaller tensor to match the shape of the larger one, and do the multiplication.

TBD

In [15]:
# Showing torch @
a = torch.randn(2, 3)
print(a)
b = torch.randn(1, 2)
print(b)
c = a @ b
print(c)
print(a.shape, b.shape, c.shape)



tensor([[-0.0310, -0.1537,  0.8066],
        [-0.3339, -1.0741, -0.5760]])
tensor([[0.3243, 1.3390]])


RuntimeError: mat1 and mat2 shapes cannot be multiplied (2x3 and 1x2)

#### Moving to GPUs

By default, new tensors are created on the CPU, so we have to specify when we want to create our tensor on the GPU with the optional device argument. You can see when we print the new tensor, PyTorch informs us which device it’s on (if it’s not on CPU).

In [16]:
if torch.cuda.is_available():
    my_device = torch.device('cuda')
else:
    my_device = torch.device('cpu')
print('Device: {}'.format(my_device))

x = torch.rand(2, 2, device=my_device)
print(x)

Device: cpu
tensor([[0.9413, 0.4460],
        [0.9289, 0.6293]])


#### Changing Tensor Shapes

For a model that works on 3 x 226 x 226 images - a 226-pixel square with 3 color channels. When you load and transform it, you’ll get a tensor of shape (3, 226, 226). Your model, though, is expecting input of shape (N, 3, 226, 226), where N is the number of images in the batch. We unqueeze the tensor to add a dimension at index 0.

In [17]:
a = torch.rand(3, 226, 226)
b = a.unsqueeze(0)

print(a.shape)
print(b.shape)

torch.Size([3, 226, 226])
torch.Size([1, 3, 226, 226])


Unsqueeze is a way to add a dimension to a tensor, and squeeze is a way to remove a dimension of extent 1, to do unbatched computation.

In [18]:
a = torch.rand(1, 20)
print(a.shape)
print(a)

b = a.squeeze(0)
print(b.shape)
print(b)

# Will not be squeezed
c = torch.rand(2, 2)
print(c.shape)

d = c.squeeze(0)
print(d.shape)

torch.Size([1, 20])
tensor([[0.6191, 0.9935, 0.1844, 0.6138, 0.6854, 0.0438, 0.0636, 0.2884, 0.4362,
         0.2368, 0.1394, 0.1721, 0.1751, 0.3851, 0.0732, 0.3118, 0.9180, 0.7293,
         0.5351, 0.5078]])
torch.Size([20])
tensor([0.6191, 0.9935, 0.1844, 0.6138, 0.6854, 0.0438, 0.0636, 0.2884, 0.4362,
        0.2368, 0.1394, 0.1721, 0.1751, 0.3851, 0.0732, 0.3118, 0.9180, 0.7293,
        0.5351, 0.5078])
torch.Size([2, 2])
torch.Size([2, 2])


In [21]:
output3d = torch.rand(6, 20, 20)
print(output3d.shape)

input1d = output3d.reshape(1, 6 * 20 * 20)
print(input1d.shape)

input1d = output3d.reshape(-1, 2)
print(input1d.shape)

torch.Size([6, 20, 20])
torch.Size([1, 2400])
torch.Size([1200, 2])


What is the order of the 1200,2 tensor datas?

# Automatic Gradients

I hope you know what autograd is why we are using it, if not then check out this [link](https://pytorch.org/tutorials/beginner/introyt/autogradyt_tutorial.html) for more information on Automatic Gradients. 

Here we will foucs on the code

In [22]:
import torch

# Input tensor
x = torch.ones(5)  
# Expected output
y = torch.zeros(3)

# Parameters to get the gradients for
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)

z = torch.matmul(x, w)+b
# loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
loss = 0.5*(z - y).pow(2).sum()

print('Gradient function for z =', z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)

# The backward() call will compute the gradient of loss with respect to all Tensors with requires_grad=True.
loss.backward()
print(w.grad)
print(b.grad)

Gradient function for z = <AddBackward0 object at 0x10dd0f640>
Gradient function for loss = <MulBackward0 object at 0x11ae0c940>
tensor([[ 2.0819,  4.3074, -3.6664],
        [ 2.0819,  4.3074, -3.6664],
        [ 2.0819,  4.3074, -3.6664],
        [ 2.0819,  4.3074, -3.6664],
        [ 2.0819,  4.3074, -3.6664]])
tensor([ 2.0819,  4.3074, -3.6664])


Autograd also makes a Handy profiler and has Higher level APIs for computing jacobiens, hessians etc.

## Neural Networks

Making simple Neural Networks using Pytorch

More info at this [link](https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html)

### A simple Multi Layer Perceptron

The NN layers have some default intialisation, but you can also specify your own initialisation.

In [35]:
import torch
import torch.nn as nn
import torch.nn.init as init

class TinyModel(torch.nn.Module):

    def __init__(self, input_size, output_size, hidden_size):
        # Initialize the superclass and store the parameters
        super(TinyModel, self).__init__()

        # Define the layers of the model
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.activation = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, output_size)
        self.softmax = nn.Softmax()

        # Initialize layers using Xavier initialization
        init.xavier_normal_(self.linear1.weight)
        init.xavier_normal_(self.linear2.weight)

    def forward(self, x):
        # This function defines the forward pass of the model
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x

torch.manual_seed(1729)
tinymodel = TinyModel(input_size = 100, output_size = 10, hidden_size = 50)

# Checking the Model
print('The model:')
print(tinymodel)

print('\nJust one layer:')
print(tinymodel.linear2)

The model:
TinyModel(
  (linear1): Linear(in_features=100, out_features=50, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=50, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)

Just one layer:
Linear(in_features=50, out_features=10, bias=True)


Here, we are checking the summary for an input to the model. Note the input is (5,100), where 5 is the batch size and 100 is the input size, while we have not specified the batch size in the model, we do not have to, torch.nn takes care of it, we write the model as if only 1 data point is being passed through it.

### Important question, batch sizes and batch normalisation

In [49]:
from torchsummary import summary
summary(tinymodel, (100,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 50]           5,050
              ReLU-2                   [-1, 50]               0
            Linear-3                   [-1, 10]             510
           Softmax-4                   [-1, 10]               0
Total params: 5,560
Trainable params: 5,560
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.02
Estimated Total Size (MB): 0.02
----------------------------------------------------------------


### A convolutional Neural Network

The model takes black and white images so in_channels = 1. For colored images, in_channels = 3. 

In [41]:
import torch.nn as nn

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # The input to the network is of shape (28,28,1)
        # Should it not be of shape (1,28,28)? or (batch, channels, height, width) ??

        self.conv1 = nn.Conv2d(in_channels=1,out_channels=16,kernel_size=5,stride=1,padding=2)
        self.conv2 = nn.Conv2d(in_channels=16,out_channels=32,kernel_size=5,stride=1,padding=2)
        self.relu = nn.ReLU()
        self.max_pool = nn.MaxPool2d(kernel_size=2)
        self.out = nn.Linear(2048,10)

    def forward(self, x):

        # Convolve, relu, max pool
        x = self.conv1(x)
        x = self.relu(x)
        x = self.max_pool(x)

        # Convolve, relu, max pool
        x = self.conv2(x)
        x = self.relu(x)
        x = self.max_pool(x)

        # Flatten the tensor, The size of the tensor is (batch_size, 32, 8, 8) 32*8*8 = 2048
        x = x.view(x.size(0), -1)
        output = self.out(x)
        return output

Note, here we not giving the batch size like (10,1,32,32), and it is working

In [51]:
from torchsummary import summary
cnn = CNN()
summary(cnn, (1 ,32, 32))

input_data = torch.rand(10, 1, 32, 32)
print(cnn(input_data).shape)

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Conv2d-1           [-1, 16, 32, 32]             416
              ReLU-2           [-1, 16, 32, 32]               0
         MaxPool2d-3           [-1, 16, 16, 16]               0
            Conv2d-4           [-1, 32, 16, 16]          12,832
              ReLU-5           [-1, 32, 16, 16]               0
         MaxPool2d-6             [-1, 32, 8, 8]               0
            Linear-7                   [-1, 10]          20,490
Total params: 33,738
Trainable params: 33,738
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.42
Params size (MB): 0.13
Estimated Total Size (MB): 0.55
----------------------------------------------------------------
torch.Size([10, 10])


Similarly, we can make other models

## Optimisers

Here, we will show how to use Optimiser to train your network or model (and Learning Rate schedulers). 

For further info into Optimisers, check out this [link](https://pytorch.org/docs/stable/optim.html),
and for Learning Rate Schedulers, check out this [link](https://www.kaggle.com/code/isbhargav/guide-to-pytorch-learning-rate-scheduling)

In [52]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets

We are training a simple MLP on MNIST dataset

In [55]:
# Getting the data
train_dataset = dsets.MNIST(root='./data', train=True, transform=transforms.ToTensor(), download=True)
test_dataset = dsets.MNIST(root='./data', train=False, transform=transforms.ToTensor())

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:04<00:00, 2009700.99it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:01<00:00, 22596.10it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 3301511.81it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 1295778.04it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [62]:
print(len(train_loader))

first_batch = next(iter(train_loader))
print(first_batch[0].shape, first_batch[1].shape)
# Shape of X and y of first batch, 600 such batches

600
torch.Size([100, 1, 28, 28]) torch.Size([100])


In [54]:
class Model(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(Model, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = Model(input_dim, hidden_dim, output_dim)

# Checking the Model
summary(model, (28*28,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                  [-1, 100]          78,500
              ReLU-2                  [-1, 100]               0
            Linear-3                   [-1, 10]           1,010
Total params: 79,510
Trainable params: 79,510
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.30
Estimated Total Size (MB): 0.31
----------------------------------------------------------------


Steps:
1) Load images as Variable
2) Apply forward pass and get model outputs.
3) Make a Loss tensor form the Loss of output vs true labels
4) Use loss.backward() to compute the gradients of all parameters of the model, as all have _grad = True, 
5) Using optimizer.step() after that changes the values of all the parameters put in the optimiser using the gradients based on the Optimiser's algorithm.
6) Zero the gradients, as Pytorch accumulates gradients, and we do not want to use the old gradients again.

In [65]:
# Number of epochs
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

# Loss Function
criterion = nn.CrossEntropyLoss()

# Using the Optimiser to backpropagate the model
learning_rate = 0.1
model = Model(input_dim, hidden_dim, output_dim)
optimizer_SGD = torch.optim.SGD(model.parameters(), lr=learning_rate)

def process(optimizer):
  iter = 0
  for epoch in range(num_epochs):
      for i, (images, labels) in enumerate(train_loader):
          images = images.view(-1, 28*28).requires_grad_() # 1)
          outputs = model(images)                          # 2) 

          loss = criterion(outputs, labels)               # 3)  

          loss.backward()                                 # 4)
          optimizer.step()                                # 5)

          optimizer.zero_grad()                          # 6)
          iter += 1

        ### This part is for printing the loss and accuracy
          if iter % 500 == 0:
              correct = 0 ; total = 0
              for images, labels in test_loader:
                  images = images.view(-1, 28*28)
                  outputs = model(images)
                  _, predicted = torch.max(outputs.data, 1)
                  total += labels.size(0)
                  correct += (predicted == labels).sum()
              accuracy = 100 * correct / total
              
              print('Iteration: {}. Loss: {:.3f}. Accuracy: {:.3f}'.format(iter, loss.item(), accuracy))


In [66]:
# Using the Optimiser to backpropagate the model
learning_rate = 0.1
model = Model(input_dim, hidden_dim, output_dim)
optimizer_SGD = torch.optim.SGD(model.parameters(), lr=learning_rate)
process(optimizer_SGD)

Iteration: 500. Loss: 0.364. Accuracy: 91.420
Iteration: 1000. Loss: 0.314. Accuracy: 92.560
Iteration: 1500. Loss: 0.275. Accuracy: 93.820
Iteration: 2000. Loss: 0.152. Accuracy: 94.490
Iteration: 2500. Loss: 0.257. Accuracy: 94.990
Iteration: 3000. Loss: 0.101. Accuracy: 95.590
