<a href="https://colab.research.google.com/github/NeerajVeerla/CVworkshop/blob/master/Day3/CVIT_Workshop_Day_3_Hello_PyTorch!.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## CVIT Workshop Day 3

In [None]:
# implement your linear regression code here.
# using numpy!

# Hello PyTorch!

In [None]:
import torch

## Tensors!

A Tensor is a fundamental Data Structure underlying most deep learning calculations. Understanding Tensors is key to run operations on them.


But What is a Tensor?

A Tensor is very similar to a matrix - but they can represent even higher dimensions and have couple of extra properties!

### Tensor Initialization:

There are various ways in which you can initialize a Tensor:

#### 1. From a Python *List*

In [None]:
# lets define a python list
x = [1,2,3,4]
x = torch.tensor(x)

In [None]:
x.shape

torch.Size([4])

In [None]:
x.dtype

torch.int64

Every Tensor has data type. And every element in a tensor will be of same data type.

In the above example, all elements are ints, and even though I did not tell `torch` that I would want an `int` as a data type of the tensor `x` , it inferred that, because all of the elements were ints the first place.


However, let's see what happens if I place one float.

In [None]:
x = [1.0,2,3,4]
x = torch.tensor(x)

In [None]:
x.shape

torch.Size([4])

In [None]:
x.dtype

torch.float32

`x` is now of type float.! Torch infers which data type to create, if not specified.

You could specify the data type.

In [None]:
x = [1.5,2,3,4]
x = torch.tensor(x,dtype=torch.int32)

In [None]:
x

tensor([1, 2, 3, 4], dtype=torch.int32)

In [None]:
x.dtype

torch.int32

**Multiple Dimensions**

In [None]:
x = [[1,2,3],[4,5,6]]

In [None]:
x = torch.tensor(x)
x.shape

torch.Size([2, 3])

In [None]:
x.dtype

torch.int64

Get a float tensor from int tensor by running the `tensor.float()` or `tensor.int()` operations.

In [None]:
x.float()

tensor([[1., 2., 3.],
        [4., 5., 6.]])

It returns tensor of new data type -- does not convert `x` to the new data type.

In [None]:
x.dtype

torch.int64

In [None]:
x = [[1,2,3],[1,2,4]]
x = torch.tensor(x)

#### 2. From a  Numpy Array

Torch and numpy are highly compatible.

In [None]:
import numpy as np

# Initialize a tensor from a NumPy array
data = np.ones((3,5))
ndarray = np.array(data)
type(ndarray)

numpy.ndarray

In [None]:
ndarray

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
ndarray = torch.from_numpy(ndarray)

In [None]:
ndarray

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]], dtype=torch.float64)

In [None]:
type(ndarray)

torch.Tensor

#### 3. From a Tensor
We can also initialize a tensor from another tensor, using the following methods:

- `torch.ones_like(old_tensor)`: Initializes a tensor of 1s.
- `torch.zeros_like(old_tensor)`: Initializes a tensor of 0s.
- `torch.rand_like(old_tensor)`: Initializes a tensor where all the elements are sampled from a uniform distribution between 0 and 1.
- `torch.randn_like(old_tensor)`: Initializes a tensor where all the elements are sampled from a normal distribution.

All of these methods preserve the tensor properties of the original tensor passed in, such as the shape and device, which we will cover in a bit.

In [None]:
# Initialize a base tensor
x = torch.tensor([[1., 2.], [3., 4.]])
x

tensor([[1., 2.],
        [3., 4.]])

In [None]:
x.dtype

torch.float32

In [None]:
x.shape

torch.Size([2, 2])

In [None]:
# Initialize a tensor of 0s
x_zeros = torch.zeros_like(x)
x_zeros

tensor([[0., 0.],
        [0., 0.]])

In [None]:
x_zeros.dtype

torch.float32

In [None]:
x_zeros.shape

torch.Size([2, 2])

In [None]:
# Initialize a tensor of 1s
x_ones = torch.ones_like(x)
x_ones

tensor([[1., 1.],
        [1., 1.]])

In [None]:
# Initialize a tensor where each element is sampled from a uniform distribution
# between 0 and 1
x_rand = torch.rand_like(x)
x_rand

tensor([[0.4749, 0.6057],
        [0.8703, 0.8111]])

In [None]:
# Initialize a tensor where each element is sampled from a normal distribution
x_randn = torch.randn_like(x)
x_randn

tensor([[ 0.3906, -0.2053],
        [ 1.2877,  0.1631]])

#### 4. By Specifying a Shape
We can also instantiate tensors by specifying their shapes (which we will cover in more detail in a bit). The methods we could use follow the ones in the previous section:

- `torch.zeros()`
- `torch.ones()`
- `torch.rand()`
- `torch.randn()`

In [None]:
# Initialize a 2x3x2 tensor of 0s
shape = (4, 2, 2)
x_zeros = torch.zeros(shape) # x_zeros = torch.zeros(4, 3, 2) is an alternative
x_zeros

tensor([[[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.]]])

In [None]:
shape = (4, 2, 2)
x = torch.ones(shape) # x_zeros = torch.zeros(4, 3, 2) is an alternative
x

tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])

In [None]:
shape = (4, 2, 2)
x = torch.rand(shape) # x_zeros = torch.zeros(4, 3, 2) is an alternative
x

tensor([[[0.2455, 0.5214],
         [0.9165, 0.7290]],

        [[0.5079, 0.1937],
         [0.6339, 0.6172]],

        [[0.4171, 0.1056],
         [0.1445, 0.1999]],

        [[0.9650, 0.3800],
         [0.0233, 0.7363]]])

In [None]:
shape = (4, 2, 2)
x = torch.randn(shape) # x_zeros = torch.zeros(4, 3, 2) is an alternative
x

tensor([[[-0.1560, -0.3177],
         [-1.5393,  1.4627]],

        [[ 1.9593,  0.3254],
         [ 0.1983, -1.8595]],

        [[-0.8520, -0.7893],
         [ 1.4767, -1.2468]],

        [[-0.9602,  0.5018],
         [-0.0831,  0.0255]]])

#### 5. With `torch.arange()`
We can also create a tensor with `torch.arange(end)`, which returns a 1-D tensor with elements ranging from 0 to end-1. We can use the optional start and step parameters to create tensors with different ranges. 

Read the documentation for more.!

In [None]:
# Create a tensor with values 0-9
x = torch.arange(10)
x

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### What you cannot do?

In [None]:
# the dimensions of the tensors have to be uniform.
# you CANNOT, define

x = [[1,2,3],[4,5]]

In [None]:
x = torch.tensor(x)

ValueError: ignored

### Tensor Properties
Tensors have a few properties that are important for us to cover. These are namely `dtype`, `shape`, and the `device` properties.

We have seen the `dtype` and `shape` properties. Let's see the `device` property.

In [None]:
x = torch.randn(size=(2,3))

In [None]:
x

tensor([[ 0.1174,  1.1425, -0.4018],
        [-0.9724, -1.5218, -0.9902]])

In [None]:
x.shape

torch.Size([2, 3])

In [None]:
x.dtype

torch.float32

In [None]:
x.device

device(type='cpu')

In [None]:
# change run time and shift to gpu.
if torch.cuda.is_available():
  print("Got a CUDA device")
  x = x.to('cuda') 

Got a CUDA device


In [None]:
x.device

device(type='cuda', index=0)

#### Shape Values

In [None]:
# hide these cells
shape1 = (1)
shape12 = (4)
shape2 = (2,3)
shape3 = (2,3,5)

#### determining shape

In [None]:
# shape -- you can determine the shape, if you go from the outermost dimensions to inner most dimension.

x = torch.randn(shape1)

In [None]:
x

tensor([-0.6860])

In [None]:
x = torch.randn(shape12)

In [None]:
x

tensor([ 0.3179, -0.7171,  0.1156, -1.0228])

In [None]:
#x.shape

In [None]:
x = torch.randn(shape2)

In [None]:
x

tensor([[-1.2459, -0.9572, -0.3870],
        [ 0.4503,  0.5461, -1.6117]])

In [None]:
x = torch.randn(shape3)
x

tensor([[[-0.5385, -0.2254,  0.0057, -0.8902,  0.4071],
         [-0.1815,  1.5489,  1.3368,  0.6548, -0.9248],
         [ 1.2217,  0.0640, -0.8949,  1.4406, -0.2800]],

        [[ 0.0112,  0.9876, -0.6032,  0.0857,  0.5904],
         [-0.4381, -0.8598, -1.1076, -0.8448, -0.1673],
         [ 0.5297,  0.0592,  0.5812,  2.1803, -1.7110]]])

In [None]:
# size of each dimension.
dims = len(x.shape)

In [None]:
x.shape

torch.Size([2, 3, 5])

In [None]:
dims

3

In [None]:
x.size(0)

2

In [None]:
x.size(1)

3

In [None]:
x.size(2)

5

### `tensor.reshape()` 

In [None]:

x = torch.randn((155))

In [None]:
x

tensor([-0.8982,  0.3541,  0.6604,  0.5009, -0.1083, -0.3851, -1.7990, -0.7907,
        -1.8495, -0.5345,  0.3196,  1.5039])

In [None]:
x.reshape(2,6)

tensor([[-0.8982,  0.3541,  0.6604,  0.5009, -0.1083, -0.3851],
        [-1.7990, -0.7907, -1.8495, -0.5345,  0.3196,  1.5039]])

In [None]:
x.reshape(5,-1)

tensor([[-0.3621,  0.9109,  0.6584, -1.2500,  1.5489, -0.9120, -1.1807,  0.2433,
          0.4927,  1.1922,  1.3708, -0.8062, -0.4534, -0.7589,  0.1743, -0.3907,
         -0.3085, -0.4110,  0.0082, -0.5992,  1.1541, -1.1078,  0.0622, -1.1797,
          0.2211, -0.1570, -0.2956,  2.4717,  0.5887,  2.3555, -0.5770],
        [ 0.9688, -0.1223, -0.3303, -0.1007, -2.5454, -1.3075, -1.0068,  1.0892,
         -0.7563, -1.5089, -0.3021, -0.4362,  0.5278,  0.2542, -0.1601,  0.0227,
         -0.4220, -1.0283, -0.3224, -0.9727, -1.3117, -0.3555, -0.1943,  0.3302,
          0.8775, -1.4098, -0.9370, -0.1304, -0.0766,  0.1323, -1.5051],
        [-1.8275, -0.9748, -0.9021, -0.8901, -0.5582, -0.7783,  0.2113, -0.9340,
         -0.0612, -0.3427,  1.3981,  0.7000,  0.5418, -2.3647,  0.1408,  0.4784,
         -0.4502, -1.1270, -0.2659, -1.8643, -1.1140,  0.3646, -1.1609,  0.8316,
          2.3889, -1.1783,  0.4914,  0.1187, -1.1433,  0.7257, -1.1227],
        [-0.0502, -0.5351,  1.3842,  0.4397,  0.4220

### Torch Indexing

In PyTorch we can index tensors, similar to NumPy.



In [None]:
# Initialize an example tensor
x = torch.Tensor([
                  [[1, 2], [3, 4]],
                  [[5, 6], [7, 8]], 
                  [[9, 10], [11, 12]] 
                 ])
x

tensor([[[ 1.,  2.],
         [ 3.,  4.]],

        [[ 5.,  6.],
         [ 7.,  8.]],

        [[ 9., 10.],
         [11., 12.]]])

In [None]:
# Access the 0th element, which is the first row
x[0] # Equivalent to x[0, :]

tensor([[1., 2.],
        [3., 4.]])

In [None]:
# Get the top left element of each element in our tensor
x[:, 0, 0]


tensor([1., 5., 9.])

In [None]:
x[1,1,1]

tensor(8.)

### Operations!

In [None]:
# Create an example tensor
x = torch.ones((3,2,2))
x

tensor([[[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]],

        [[1., 1.],
         [1., 1.]]])

In [None]:
# Perform elementwise addition
# Use - for subtraction
x + 2

tensor([[[3., 3.],
         [3., 3.]],

        [[3., 3.],
         [3., 3.]],

        [[3., 3.],
         [3., 3.]]])

In [None]:
# Perform elementwise multiplication
# Use / for division
x * 2

tensor([[[2., 2.],
         [2., 2.]],

        [[2., 2.],
         [2., 2.]],

        [[2., 2.],
         [2., 2.]]])

We can apply the same operations between different tensors of compatible sizes.



In [None]:
# Create a 4x3 tensor of 6s
a = torch.ones((4,3)) * 6
a

tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]])

In [None]:
# Create a 1D tensor of 2s
b = torch.ones(3) * 2
b

tensor([2., 2., 2.])

In [None]:
# Divide a by b
a / b


tensor([[3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.],
        [3., 3., 3.]])

We can use `tensor.matmul(other_tensor)` for matrix multiplication and `tensor.T` for transpose. Matrix multiplication can also be performed with `@`.

In [None]:
# Alternative to a.matmul(b)
# a @ b.T returns the same result since b is 1D tensor and the 2nd dimension
# is inferred
a @ b 

tensor([36., 36., 36., 36.])

There are other interesting functions - `torch.mean()`, `torch.sum()` etc. etc,. you can check them at your own time.

In [None]:
# define a tensor of shape (4,3) with all ones, and find mean and sum along dimensions 0 and 1

**Note:** Most of the operations in PyTorch are not in place. However, PyTorch offers the in place versions of operations available by adding an underscore (_) at the end of the method name.

What do I mean by this?

In [None]:
# Print our tensor
a

tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]])

In [None]:
# add() is not in place
a.add(a)

tensor([[12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.]])

In [None]:
a

tensor([[6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.],
        [6., 6., 6.]])

In [None]:
# add_() is in place
a.add_(a)
a


tensor([[12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.]])

## Autograd!

`PyTorch` and other machine learning libraries are known for their **automatic differantiation** feature. 

That is, given that we have defined the set of operations that need to be performed, the framework itself can figure out how to compute the gradients. 

We can call the `backward()` method to ask PyTorch to calculate the gradiends, which are then stored in the `.grad` attribute.

In [None]:
# tensor has the following properties
# dtype
# shape
# data itself.
# device


In [None]:
# Create an example tensor
# requires_grad parameter tells PyTorch to store gradients
# by default it's false.
x = torch.tensor([2.], requires_grad=True)

In [None]:
x

tensor([2.], requires_grad=True)

In [None]:
x.grad

In [None]:
print(x.grad)

None


In [None]:
# Calculating the gradient of y with respect to x
y = x * x * 3 # 3x^2
y.backward()
# d(y)/d(x) = d(3x^2)/d(x) = 6x = 12

In [None]:
x.grad

tensor([12.])

Let's run backprop from a different tensor again to see what happens.

In [None]:
z = x * x * 3 # 3x^2
z.backward()
x.grad

tensor([24.])

Gradients are accumulating! May not be
 desirable. Remember this! We will see how to deal with this in the later stage.

It's time to revise what has been taught. Go read through all the cells. 

Quickly:

1. Create a 2D list with 4 rows and 2 cols and then convert it into a tensor.
2. Create a tensor using the torch.zeros_like() function - from another tensor.
3. Initialize tensors x,y of 1 dimension , x has value 2, y has value 3, define c = x*y. Compute gradient of c w.r.t to both x and y.

4. torch.sum -- dimension

In [None]:
x = torch.tensor([2.],requires_grad=True)
y = torch.tensor([3.],requires_grad=True)
c = x*y
c.backward()

In [None]:
x.grad

tensor([3.])

In [None]:
y.grad

tensor([2.])

# Neural Networks!

check documentation.

Introduction to the `nn` module.

## nn.Linear()

**Linear Layer**

We can use `nn.Linear(H_in, H_out)` to create a a linear layer. 

In [None]:
import torch.nn as nn

In [None]:
input = torch.ones(4)
input

tensor([1., 1., 1., 1.])

In [None]:
linear = nn.Linear(in_features=4, out_features= 2)


In [None]:
out = linear(input)
out

tensor([-0.5332,  0.5051], grad_fn=<AddBackward0>)

But sometimes you need to train in batches.! how do we account for that?

In [None]:
input = torch.randn(5,4)
input

tensor([[-0.2612, -0.9544,  1.1082,  0.8254],
        [ 0.3337,  0.4326, -0.5582, -1.0146],
        [-0.0979, -0.5938,  1.4627, -1.4629],
        [ 0.6459,  0.2611,  1.9378, -1.8060],
        [-0.5881, -0.9418, -0.3386, -1.2727]])

In [None]:
out = linear(input)
out

tensor([[-0.8506,  0.0035],
        [-0.1125,  0.3432],
        [-0.6805,  0.6350],
        [-0.6626,  1.0627],
        [-0.3428,  0.0526]], grad_fn=<AddmmBackward0>)

In general we can use `nn.Linear(H_in, H_out)` to create a linear layer. This will take a matrix of `(N, *, H_in)` dimensions and output a matrix of `(N, *, H_out)`. The `*` denotes that there could be arbitrary number of dimensions in between. The linear layer performs the operation Ax+b, where A and b are **initialized randomly** (and this is typically what we want to learn). 

If we don't want the linear layer to learn the bias parameters, we can initialize our layer with bias=False.

In [None]:
# Create the inputs
input = torch.ones(2,3,4)

# Make a linear layers transforming N,*,H_in dimensinal inputs to N,*,H_out
# dimensional outputs
linear = nn.Linear(in_features = 4, out_features = 2)
linear_output = linear(input) # forward prop
linear_output

tensor([[[ 0.6118, -1.2677],
         [ 0.6118, -1.2677],
         [ 0.6118, -1.2677]],

        [[ 0.6118, -1.2677],
         [ 0.6118, -1.2677],
         [ 0.6118, -1.2677]]], grad_fn=<AddBackward0>)

Okay! We have seen a single linear layer. But how do we create multiple layers?

## `nn.Sequential()`

In [None]:
mlp = nn.Sequential(
    nn.Linear(in_features = 4,out_features = 10),
    nn.Linear(in_features =10,out_features = 2)
)

In [None]:
mlp(input)

tensor([[[ 0.4115, -0.3507],
         [ 0.4115, -0.3507],
         [ 0.4115, -0.3507]],

        [[ 0.4115, -0.3507],
         [ 0.4115, -0.3507],
         [ 0.4115, -0.3507]]], grad_fn=<AddBackward0>)

## Activation Layers

If one wants to use the `sigmoid` activation - In PyTorch one can call `nn.Sigmoid()` - Note that a sigmoid layer does not have any parameters. Infact , generally activation layers do not have any learnable parameters.

In [None]:
block = nn.Sequential(
    
    nn.Linear(4, 10),
    nn.Sigmoid()
    nn.Linear(10,5),
    nn.Sigmoid() # RELU, TANH
)

input = torch.randn(2,3,4)
output = block(input)
output

tensor([[[0.6012, 0.4092, 0.4407, 0.5507, 0.5942],
         [0.4268, 0.5499, 0.4464, 0.6198, 0.5227],
         [0.4801, 0.5090, 0.5333, 0.5731, 0.5394]],

        [[0.3994, 0.5761, 0.4174, 0.6435, 0.5159],
         [0.6069, 0.6464, 0.4268, 0.5029, 0.6460],
         [0.3911, 0.5771, 0.3574, 0.6696, 0.5172]]],
       grad_fn=<SigmoidBackward0>)

## Custom Models

### Introduction to Python Classes:

In [None]:
class student:
  def __init__(self,name:str,batch:int): # new features for specifying type in python3.8
    
    self.name = name
    self.batch = batch
  
  # define your functions

  def get_reverse_name(self):

    return self.name[::-1]

  def get_grad_year(self,course):
    return self.batch+5 if course=="DD" else self.batch+4


In [None]:
BVK = student("BVK",2018)

In [None]:
# accessing data members
BVK.name

'BVK'

In [None]:
BVK.batch

2018

In [None]:
# calling functions
BVK.get_reverse_name() #notice that we do not pass self. instance.function() automatically does it.

'KVB'

In [None]:
# calling functions with parameters.
BVK.get_grad_year("DD")

2023

In [None]:
# good practice:
BVK.get_grad_year(course="DD")

2023

Things can get a bit complex here! Pay attention.

Instead of using the predefined modules, we can also build our own by extending the `nn.Module` class. For example, we can build a the `nn.Linear` (which also extends `nn.Module`) on our own using the tensor introduced earlier! We can also build new, more complex modules, such as a custom neural network. You will be practicing these in the later assignment.

To create a custom module, the first thing we have to do is to extend or inherit the `nn.Module`. We can then initialize our parameters in the `__init__` function, starting with a call to the `__init__` function of the `super class`. All the class attributes we define which are `nn` module objects are treated as parameters, which can be learned during the training. 

**Tensors are not parameters, but they can be turned into parameters if they are wrapped in nn.Parameter class.**

All classes extending/inheriting `nn.Module` are also expected to implement a `forward(x)` function, where x is a tensor. This is the function that is called when a parameter is passed to our module, such as in model(x).

Want to know more -- understand the `__call__` function

In [None]:
class MultilayerPerceptron(nn.Module):

  def __init__(self, input_size, hidden_size):
    # Call to the __init__ function of the super class
    super(MultilayerPerceptron, self).__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size 
    self.hidden_size = hidden_size 

    # Defining of our model
    # There isn't anything specific about the naming of `self.model`. It could
    # be something arbitrary.
    self.model = nn.Sequential(
        nn.Linear(self.input_size, self.hidden_size),
        nn.ReLU(),
        nn.Linear(self.hidden_size, self.input_size),
        nn.Sigmoid()
    )
    
  def forward(self, x):
    output = self.model(x)
    return output

Here is an alternative way to define the **same class**. You can see that we can replace `nn.Sequential` by defining the individual layers in the __init__ method and connecting the in the forward method.

In [None]:
class MultilayerPerceptron(nn.Module):

  def __init__(self, input_size, hidden_size):
    # Call to the __init__ function of the super class
    super(MultilayerPerceptron, self).__init__()

    # Bookkeeping: Saving the initialization parameters
    self.input_size = input_size 
    self.hidden_size = hidden_size 

    # Defining of our layers
    self.linear = nn.Linear(self.input_size, self.hidden_size)
    self.relu = nn.ReLU()
    self.linear2 = nn.Linear(self.hidden_size, self.input_size)
    self.sigmoid = nn.Sigmoid()
    
  def forward(self, x):
    linear = self.linear(x)
    relu = self.relu(linear)
    linear2 = self.linear2(relu)
    output = self.sigmoid(linear2)
    return output

Now that we have defined our class, we can instantiate it and see what it does.

In [None]:
# Make a sample input
input = torch.randn(2, 5)

# Create our model
model = MultilayerPerceptron(5, 3)

# Pass our input through our model
model(input)

tensor([[0.4263, 0.4642, 0.4540, 0.4476, 0.4292],
        [0.4332, 0.4724, 0.4539, 0.4469, 0.4245]], grad_fn=<SigmoidBackward0>)

We can inspect the parameters of our model with `named_parameters()` and `parameters()` methods.

In [None]:
# model.parameters()

In [None]:
list(model.named_parameters())


[('linear.weight', Parameter containing:
  tensor([[ 0.1806, -0.1759,  0.3249,  0.1463, -0.1434],
          [ 0.0612, -0.1147,  0.3365,  0.0882, -0.0049],
          [-0.0618, -0.4343,  0.0288,  0.0238, -0.3199]], requires_grad=True)),
 ('linear.bias', Parameter containing:
  tensor([0.0682, 0.0052, 0.2081], requires_grad=True)),
 ('linear2.weight', Parameter containing:
  tensor([[ 0.4818, -0.0911, -0.4814],
          [-0.0232, -0.0403, -0.0628],
          [ 0.5435,  0.2732, -0.2599],
          [-0.1605,  0.2896,  0.5508],
          [ 0.3895,  0.0326,  0.0896]], requires_grad=True)),
 ('linear2.bias', Parameter containing:
  tensor([ 0.0118, -0.3315, -0.5369, -0.3529,  0.2460], requires_grad=True))]

You can inspect the model layers, size and shape by printing the model

In [None]:
model # important for debugging

MultilayerPerceptron(
  (linear): Linear(in_features=5, out_features=3, bias=True)
  (relu): ReLU()
  (linear2): Linear(in_features=3, out_features=5, bias=True)
  (sigmoid): Sigmoid()
)

Other useful library is `torchsummary`

In [None]:
from torchsummary import summary

In [None]:
summary(model,input_size=(10,5))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                [-1, 10, 3]              18
              ReLU-2                [-1, 10, 3]               0
            Linear-3                [-1, 10, 5]              20
           Sigmoid-4                [-1, 10, 5]               0
Total params: 38
Trainable params: 38
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


We shall understand the terms a bit later.

## Dummy Data:

In [None]:
# Create the y data
y = torch.ones(10, 5)

# Add some noise to our goal y to generate our x
# We want out model to predict our original data, albeit the noise
x = y + torch.randn_like(y)
x

tensor([[ 1.2615, -0.2632,  0.5419,  0.6176,  0.4636],
        [ 2.1608,  1.2009,  0.2483,  2.2899,  2.0323],
        [ 2.2785,  0.6051,  1.1722,  1.8934,  1.2985],
        [ 0.2747, -0.5296,  1.5358,  2.0892,  0.6290],
        [ 2.0828,  0.6666,  0.1905,  1.1817,  0.4504],
        [-0.1284,  0.4848,  1.5273,  1.8551,  0.6730],
        [-0.7221, -0.3960,  1.0732, -1.6507,  1.4448],
        [-0.3188,  1.7094, -0.6547,  0.8384,  0.9052],
        [ 0.0306, -0.0467,  0.7097,  1.0045,  1.0495],
        [ 1.5080,  1.4703,  2.3015,  0.4347, -0.5780]])

In [None]:
y

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

We want to learn! Let's see our other steps.

## Optimization

We have showed how gradients are calculated with the `backward()` function. Having the gradients isn't enought for our models to learn. We also need to know how to update the parameters of our models. This is where the optimizers comes in. `torch.optim` module contains several optimizers that we can use. Some popular examples are `optim.SGD` and `optim.Adam`. When initializing optimizers, **we pass our model parameters**, which can be accessed with model.parameters(), telling the optimizers which values it will be optimizing. 

Optimizers also has a learning rate (lr) parameter, which determines how big of an update will be made in every step. Different optimizers have different hyperparameters as well.

In [None]:
import torch.optim as optim

In [None]:
# Instantiate the model
model = MultilayerPerceptron(5, 3)

In [None]:
# Define the optimizer
adam = optim.Adam(model.parameters(), lr=1e-1)

In [None]:
# Define loss using a predefined loss function
loss_function = nn.BCELoss()


In [None]:
# Calculate how our model is doing now
y_pred = model(x)
loss_function(y_pred, y).item()

0.6882650852203369

In [None]:
# Set the number of epoch, which determines the number of training iterations
n_epoch = 10 

for epoch in range(n_epoch):
  # Set the gradients to 0
  adam.zero_grad() # remember that gradients were adding up! that is not desirable.

  # Get the model predictions
  y_pred = model(x)

  # Get the loss
  loss = loss_function(y_pred, y)

  # Print stats
  print(f"Epoch {epoch}: traing loss: {loss}")

  # Compute the gradients
  loss.backward()

  # Take a step to optimize the weights
  adam.step()

Epoch 0: traing loss: 0.6882650852203369
Epoch 1: traing loss: 0.6040355563163757
Epoch 2: traing loss: 0.5332837104797363
Epoch 3: traing loss: 0.44752487540245056
Epoch 4: traing loss: 0.3415060043334961
Epoch 5: traing loss: 0.238810196518898
Epoch 6: traing loss: 0.15351499617099762
Epoch 7: traing loss: 0.09175579249858856
Epoch 8: traing loss: 0.05224194377660751
Epoch 9: traing loss: 0.029199518263339996


You can see that our loss is decreasing. Let's check the predictions of our model now and see if they are close to our original y, which was all 1s

In [None]:
# See how our model performs on the training data
y_pred = model(x)
y_pred

tensor([[0.9999, 0.9998, 0.9998, 0.9999, 0.9993],
        [0.9960, 0.9938, 0.9992, 0.9970, 0.9985],
        [1.0000, 1.0000, 0.9998, 1.0000, 0.9990],
        [0.9991, 0.9986, 0.9927, 0.9990, 0.9804],
        [1.0000, 1.0000, 1.0000, 1.0000, 0.9999],
        [0.9999, 0.9999, 0.9995, 0.9999, 0.9977],
        [0.9996, 0.9993, 0.9958, 0.9995, 0.9868],
        [0.9996, 0.9993, 0.9993, 0.9997, 0.9979],
        [0.9998, 0.9996, 0.9981, 0.9997, 0.9935],
        [0.9995, 0.9991, 0.9991, 0.9995, 0.9973]], grad_fn=<SigmoidBackward0>)

Looks like the model has learnt! Remember we had created y with all ones.!

That is it! You know how to code up in PyTorch. There are a little more things, but you know the fundamental idea.

Let's revise!


Steps:

1. Define your problem statement.
2. Look at your data (more on this later).
3. Define your model 
  - inherit nn.module
  - cross check the dimensions and other details using the `print(model)` or `torch.summary`'s `summary(model,input_size)`.
4. instantiate your model
5. We want to learn the model parameters, hence we want to an optimizer. Define an optimizer and pass on model.parameters() and learning rate.
6. Define Loss Function.
7. In each iteration:
   - Forward pass : `model(input)`
   - Set gradients to zero - `optim.zero_grad()`
   - Compute Backward pass: `loss.backward()`
   - Update the weights: `optimizer.step()`
8. That's it - run this for multiple epochs. And hopefully your model is converging!.

**Note:** Recall we that activation layers DO NOT have any learnable parameters (usually) and therefore we can use them directly in the `forward()` function. However, all layers that have learnable parameters MUST be defined in the `__init__()` function for your parameters to get updated.

You have everything needed know to write your own neural nets. Practice.!!!

## Computer Vision!

In [None]:
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor

This is your Hello Computer Vision Moment! You are going to write your first classifier! Buckle up!

We are going to take up PyTorch's Quickstart code and see what exactly is happening there. 

PyTorch has two primitives to work with data: `torch.utils.data.DataLoader` and `torch.utils.data.Dataset`. Dataset stores the samples and their corresponding labels, and `DataLoader` wraps an iterable around the Dataset.

More on this a little later.

In [None]:
# Download training data from open datasets.
training_data = datasets.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

In [None]:
# Download test data from open datasets.
test_data = datasets.FashionMNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

In [None]:
# Visualize the data:

In [None]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


## Define your Model

In [None]:
# Get cpu or gpu device for training.
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")

# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

model = NeuralNetwork().to(device) # move your model to 
print(model)

Using cpu device
NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


## Define your loss function

In [None]:
loss_fn = nn.CrossEntropyLoss()

## Define your optimizer function

In [None]:
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

In [None]:
torch.utils.

In [None]:
dataloader / Dataset.

In [None]:
# datasets - torchvision : MNIST/ Fashin

In [None]:
def train(, moddataloaderel, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        X, y = X gce), y.to(device)

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [None]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [None]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.311933  [    0/60000]
loss: 2.305371  [ 6400/60000]
loss: 2.284497  [12800/60000]
loss: 2.267558  [19200/60000]
loss: 2.262216  [25600/60000]
loss: 2.227860  [32000/60000]
loss: 2.229903  [38400/60000]
loss: 2.202518  [44800/60000]
loss: 2.192530  [51200/60000]
loss: 2.160078  [57600/60000]
Test Error: 
 Accuracy: 44.9%, Avg loss: 2.161226 

Epoch 2
-------------------------------
loss: 2.173546  [    0/60000]
loss: 2.168222  [ 6400/60000]
loss: 2.111716  [12800/60000]
loss: 2.118853  [19200/60000]
loss: 2.083331  [25600/60000]
loss: 2.015227  [32000/60000]
loss: 2.041491  [38400/60000]
loss: 1.970131  [44800/60000]
loss: 1.965418  [51200/60000]
loss: 1.896813  [57600/60000]
Test Error: 
 Accuracy: 57.5%, Avg loss: 1.903381 

Epoch 3
-------------------------------
loss: 1.933541  [    0/60000]
loss: 1.911707  [ 6400/60000]
loss: 1.798238  [12800/60000]
loss: 1.831991  [19200/60000]
loss: 1.726008  [25600/60000]
loss: 1.671614  [32000/600

64%? Seriously? That is what we have acheived? Is this deep learning? What's all the hype all about?

You don't have to worry! Tomorrow you will be introduced to CNNs a powerful model architecture that will allow you to go above 90% accuracy! (even more)

Coming back to the main point - you can see all we are going to discuss now will be how to model in different sort of ways. The pipeline remains fixed.

## How about Custom Datasets? 

Okay, nice, PyTorch has FashionMNIST. Are we restricted to training models only on datasets that are available in PyTorch? Absolutely NOT! 

So, how can I use my own custom dataset? Also Is there any sort of wrapper that will allow me to use the automatic batch dispatcher like the one we used above? Yes! You can make custom datasets to utilize PyTorch's `DataLoader` facility.

I will show you a snippet of a malayalam character recognizer.

Things can get more and more complicated! Check out wandb.