<a href="https://colab.research.google.com/github/Ananth-pinacalabs/Machine-Learning/blob/main/pytorch/pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align = "center"><b>Pytorch crash course</b></h1>

The following  is a notebook created   by following and coding along the video <a href = "https://youtu.be/OIenNRt2bjg?si=xmAPi3_VMLWqod0i" alt = "link to youtube video"> PyTorch Crash Course - Getting Started with Deep Learning</a>
by AssemblyAI. please check out the video!

<h2 align = "center"><b> Overview </b></h2>

The notebook follows the following structure, covering the enlisted topics in the listed order.

* Tensor Basics
* Autograd
* Training loop: model, Loss, & Optimizer.
* Neural Network
* convolutional Network with pytorch.


In [30]:
import torch
import numpy as np
from torch import nn

### **Understanding Tensors**

In [2]:
# torch.empty(size) returns uninitialised tensors

print("scalar:", "\n")

print(torch.empty(1))
print("\nvector:\n")
print(torch.empty(1, 5))
print("\nmatrix:\n")
print(torch.empty(2, 3, 3))

# torch.ones(size)
# torch.zeros(size)
# torch.rand(size)  - > torch  with random numbers btwn [0, 1]


print("random:\n")
print(torch.rand(5, 3))

print("\nones:\n")
print(torch.ones(5, 3))
print("\nzeros:\n")
torch.zeros(5, 3)





scalar: 

tensor([-3.3540e+36])

vector:

tensor([[ 1.7511e+38,  3.3348e-41,  7.0065e-44,  6.7871e+32, -6.7169e-10]])

matrix:

tensor([[[0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 1.4013e-45]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00]]])
random:

tensor([[0.5516, 0.4826, 0.5740],
        [0.5268, 0.2737, 0.0377],
        [0.4083, 0.0869, 0.9821],
        [0.2937, 0.5923, 0.6108],
        [0.3585, 0.9355, 0.4061]])

ones:

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

zeros:



tensor([[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]])

In [3]:
x = torch.ones(2, 3)
print(x.dtype) # the default dtype is a float 32.
x = torch.ones(2, 3, dtype = torch.float16) # maually setting the datatype.
print(x.dtype)

torch.float32
torch.float16


In [4]:
#  get a tensor from other  datatypes.
torch.tensor([1, 3]) # tensor from a

tensor([1, 3])

Setting the `requires_grad` parameter.
This tensor would be addedto the computational graph when the gradient is being calculated

In [5]:
x = torch.tensor([1, 2, 2, 1], dtype = torch.float32, requires_grad = True )
print(x)

tensor([1., 2., 2., 1.], requires_grad=True)


#### **Arithmatic operations on tensors**

In [6]:


x = torch.tensor([1.])
y = torch.tensor([2.])
print("x: ", x, "\ny: ", y)

print("\nusing opertor: ", x+y)
print("using function: ", torch.add(x, y))

# inplace operation.
y.add_(x)
print("\ny after inplace operation:", y)


# other operation
print("\nother operations:")
print("subtraction: ", torch.subtract(x, y))
print("multiplication: ", torch.mul(x, y))
print("division: ", torch.div(x, y))


x:  tensor([1.]) 
y:  tensor([2.])

using opertor:  tensor([3.])
using function:  tensor([3.])

y after inplace operation: tensor([3.])

other operations:
subtraction:  tensor([-2.])
multiplication:  tensor([3.])
division:  tensor([0.3333])


In [7]:
torch.randn(4, 4)

tensor([[ 1.0033,  0.6548,  0.2890,  1.2536],
        [ 0.5166,  0.4802,  0.1275, -1.0205],
        [ 1.0178,  0.4333, -1.2922,  1.1745],
        [-2.1423, -1.4233, -1.8409,  1.2337]])

#### **Reshape tensor with tensor.view()**

In [8]:
x = torch.randn(4, 4)
print(x.view(16), "\n")
print(x.reshape(-1, 8), "\n")
print(x.shape, x.size())# two different ways to get the shape of the tensor

tensor([ 0.1197, -0.7843, -0.6337,  0.9130,  0.3418, -0.2074,  2.7107, -0.4821,
         1.1647,  0.1048, -0.6128,  0.8392, -0.9973, -0.5111,  0.9537, -0.7123]) 

tensor([[ 0.1197, -0.7843, -0.6337,  0.9130,  0.3418, -0.2074,  2.7107, -0.4821],
        [ 1.1647,  0.1048, -0.6128,  0.8392, -0.9973, -0.5111,  0.9537, -0.7123]]) 

torch.Size([4, 4]) torch.Size([4, 4])


#### **Tensor from numpy and vise-versa**

In [9]:
x = np.ones((1, 2))

# Two ways to crete tensor from numpy array.
y = torch.from_numpy(x)
z = torch.tensor(x)
print("using from_numpy() method: ", y)
print("using tensor() method: ", z)

# one problem tensor created using the from_tensor method will use the same memory locations as the numpy.
# modifying the numpy will change the tensor.
print()
x+=1
print("y after modification: ", y)
print("z after modification: ", z)



using from_numpy() method:  tensor([[1., 1.]], dtype=torch.float64)
using tensor() method:  tensor([[1., 1.]], dtype=torch.float64)

y after modification:  tensor([[2., 2.]], dtype=torch.float64)
z after modification:  tensor([[1., 1.]], dtype=torch.float64)


#### **GPU support**

By default all the tensors  are created on the cpu but you can move them  to the gpu.


In [10]:
device = torch.device('cuda' if torch.cuda.is_available()  else 'cpu')

x = torch.rand(2, 3).to(device) # moving the tensor to the gpu.
x = torch.rand(2, 2, device = device) # directly create the tensor on the gpu

<h3> Autograd</h3>


The autograd package provides automatic differentiation on  Tensors. Generally speaking *torch.autograd* is  an engine for computing the vector-jacobian product. It computes the partial derivatives while applying the chain rule.


It tracks all the operations performed on the tensors that have `requires_grad` set to true and computes  the partial differentails for all these tensors.

<h3><b>IMPORTANT</b></h3>


so whenever you want to perform operations on these tensors, that you do not want to track the gradients for you must remove/ detach the vector from computational graph and perform the operation.

In [11]:
x = torch.randn(3, requires_grad = True)
y = x + 2
z = y.mean()
print(x)
print(y)
print(z)


print("\nThe gradient functions")
print(y.grad_fn)
print(z.grad_fn)


tensor([ 0.5649, -1.0556,  0.9222], requires_grad=True)
tensor([2.5649, 0.9444, 2.9222], grad_fn=<AddBackward0>)
tensor(2.1438, grad_fn=<MeanBackward0>)

The gradient functions
<AddBackward0 object at 0x7a3d0d2ca740>
<MeanBackward0 object at 0x7a3d0d2cad10>


<h2> Computing gradients and backpopagartion </h2>

* Now we have a gradient function associated with  the tensors in the computational graph.

* Now when you perform back propogation you compute the gradients  some specific tensor.

Lets say you want to backpropogate z then you will compute the partial differntial.
<br>
<br>
$$
 \text{partial differential wrt y}  = \frac{\partial z}{\partial y}  
$$

<br>
<br>

$$
\text{partial differential wrt x} = \frac{\partial z}{\partial x}
$$

In [12]:
# the .grad attribute tracks the gradient values  with back propagation.
# When we call the backward() method then all the gradients are  calculated wrt that value.
# the .grad a
print(x.grad)
z.backward()
print(x.grad)

# the gradients keep accumulating in the .grad  attribute each time you call the  backward method

#

None
tensor([0.3333, 0.3333, 0.3333])


The `.grad` attribute accumulates the gradients
 as .backward() is called repeatedly every epoch and  you need to empty these by calling maybe using optimizer.zero_grad().

Some computations need to be peformed without gradients. like the weight update step after computing the gradients.  you could change this the following ways -

* x.requires_grad_(False)
* x.detach()
* wrap in with torch.no_grad()


### `.requires_grad()`

In [13]:
a = torch.randn(2, 2)
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)

a.requires_grad_(True) # inplace
b = (a * a).sum()
print(a.requires_grad)
print(b.grad_fn)


False
None
True
<SumBackward0 object at 0x7a3d0d2cb700>


In [14]:
a = torch.randn(2, 2, requires_grad= True)
b = a.detach()

print(a.requires_grad)
print(b.requires_grad)

True
False


In [15]:
a = torch.randn(2, 2, requires_grad= True)
print(a.requires_grad)
with torch.no_grad():
  b = a ** 2
  print(b.requires_grad)


True
False


### **Gradient Descent Autograd**

linear regression `f(x) = 2x`

In [23]:
x = torch.tensor([1, 2, 3, 4, 5, 6, 7],  dtype = torch.float32)
y = 2*a

w = torch.tensor(0.0, dtype = torch.float32, requires_grad = True)

def forward(x):
  return w * x

def loss(y, y_pred):
  return ((y_pred - y)**2).mean()

x_test = 5.0


In [25]:
# setting the model hyperparameters
learning_rate = 0.01
n_epochs = 100

# running the training for 100 epochs
for epoch in range(n_epochs):
  y_pred = forward(x)
  l = loss(y, y_pred)
  l.backward()

# updating the weights
  with torch.no_grad():
    w -= learning_rate * w.grad

  w.grad.zero_()

# printing verbose
  if (epoch + 1) % 10 == 0:
    print("epoch {0} w = {1}  loss = {2}".format(epoch + 1, w.item(), l.item()))






epoch 10 w = 1.9999269247055054  loss = 2.9671917900486733e-07
epoch 20 w = 1.9999995231628418  loss = 9.46036912002901e-12
epoch 30 w = 2.0  loss = 0.0
epoch 40 w = 2.0  loss = 0.0
epoch 50 w = 2.0  loss = 0.0
epoch 60 w = 2.0  loss = 0.0
epoch 70 w = 2.0  loss = 0.0
epoch 80 w = 2.0  loss = 0.0
epoch 90 w = 2.0  loss = 0.0
epoch 100 w = 2.0  loss = 0.0


In [29]:
print("test output :", forward(x_test))
print("final weights: ", w)


test output : tensor(10., grad_fn=<MulBackward0>)
final weights:  tensor(2., requires_grad=True)


### **model, loss, optimizer**
A typical pytorch pipeline looks like this:

1. Design a model
2. Construct the loss and optimizer
3. Training loop
    * **Forward** - compute prediction and loss
    * **Backward** - Compute gradients.  
    * Update weights.
    

### **Building the Linear Regression with pytorch modules.**

In [44]:
import torch
import torch.nn as nn

# Linear regression
# f = w * x
# here : f = 2 * x

# 0) Training samples, watch the shape!
x = torch.tensor([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=torch.float32)
y = torch.tensor([[2], [4], [6], [8], [10], [12], [14], [16]], dtype=torch.float32)

n_samples, n_features = x.shape
print(f'n_samples = {n_samples}, n_features = {n_features}')

# 0) create a test sample
X_test = torch.tensor([5], dtype=torch.float32)

n_samples = 8, n_features = 1


In [45]:
class  LinearRegression(nn.Module):
  def __init__(self, input_dim, output_dim):
    super(LinearRegression, self).__init__()
    self.lin  =  nn.Linear(input_dim, output_dim)


  def forward(self, x):
    return self.lin(x)

input_size, output_size = n_features, n_features

model = LinearRegression(input_size, output_size)

learning_rate = 0.01
n_epochs = 100

loss = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)

for epoch in range(n_epochs):

  # forward
  y_pred = model(x)

  # loss
  l = loss(y, y_pred)

  # computing the gradients
  l.backward()

  # updating the paramters
  optimizer.step()

  # zero the gradients after updating the parameters.
  optimizer.zero_grad()

  if (epoch + 1) % 10 == 0:
    w , b = model.parameters() # unpacking the parameters
    print("epoch: ", epoch + 1, "w:", w[0][0].item(), "loss: ", l.item())




epoch:  10 w: 2.069911241531372 loss:  0.033120978623628616
epoch:  20 w: 2.068133592605591 loss:  0.03046197071671486
epoch:  30 w: 2.065462350845337 loss:  0.028119731694459915
epoch:  40 w: 2.0628952980041504 loss:  0.025957679376006126
epoch:  50 w: 2.060429096221924 loss:  0.023961806669831276
epoch:  60 w: 2.0580594539642334 loss:  0.02211938239634037
epoch:  70 w: 2.0557825565338135 loss:  0.02041861228644848
epoch:  80 w: 2.0535950660705566 loss:  0.01884862221777439
epoch:  90 w: 2.0514934062957764 loss:  0.017399374395608902
epoch:  100 w: 2.0494742393493652 loss:  0.016061486676335335


In [46]:
import torch
import torch.nn as nn

# Linear regression
# f = w * x
# here : f = 2 * x

# 0) Training samples, watch the shape!
X = torch.tensor([[1], [2], [3], [4], [5], [6], [7], [8]], dtype=torch.float32)
Y = torch.tensor([[2], [4], [6], [8], [10], [12], [14], [16]], dtype=torch.float32)

n_samples, n_features = X.shape
print(f'n_samples = {n_samples}, n_features = {n_features}')

# 0) create a test sample
X_test = torch.tensor([5], dtype=torch.float32)

n_samples = 8, n_features = 1


In [47]:
# 1) Design Model, the model has to implement the forward pass!

# Here we could simply use a built-in model from PyTorch
# model = nn.Linear(input_size, output_size)

class LinearRegression(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegression, self).__init__()
        # define different layers
        self.lin = nn.Linear(input_dim, output_dim)

    def forward(self, x):
        return self.lin(x)


input_size, output_size = n_features, n_features

model = LinearRegression(input_size, output_size)

print(f'Prediction before training: f({X_test.item()}) = {model(X_test).item():.3f}')

# 2) Define loss and optimizer
learning_rate = 0.01
n_epochs = 100

loss = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# 3) Training loop
for epoch in range(n_epochs):
    # predict = forward pass with our model
    y_predicted = model(X)

    # loss
    l = loss(Y, y_predicted)

    # calculate gradients = backward pass
    l.backward()

    # update weights
    optimizer.step()

    # zero the gradients after updating
    optimizer.zero_grad()

    if (epoch+1) % 10 == 0:
        w, b = model.parameters() # unpack parameters
        print('epoch ', epoch+1, ': w = ', w[0][0].item(), ' loss = ', l.item())

print(f'Prediction after training: f({X_test.item()}) = {model(X_test).item():.3f}')

Prediction before training: f(5.0) = 1.363
epoch  10 : w =  2.078075647354126  loss =  0.04108716547489166
epoch  20 : w =  2.075925350189209  loss =  0.037827540189027786
epoch  30 : w =  2.072948694229126  loss =  0.03491901978850365
epoch  40 : w =  2.0700881481170654  loss =  0.032234083861112595
epoch  50 : w =  2.0673396587371826  loss =  0.0297556109726429
epoch  60 : w =  2.0646989345550537  loss =  0.027467746287584305
epoch  70 : w =  2.062161922454834  loss =  0.025355709716677666
epoch  80 : w =  2.0597243309020996  loss =  0.023406121879816055
epoch  90 : w =  2.057382345199585  loss =  0.021606450900435448
epoch  100 : w =  2.0551321506500244  loss =  0.019945137202739716
Prediction after training: f(5.0) = 9.966
