<a href="https://colab.research.google.com/github/JpChii/ML-Projects/blob/main/Pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Okay let's do this, Learn some torch.

> 30th April 9:40 IST

In [40]:
import torch
import numpy as np

In [3]:
 # Checking GPU and cuda support
 torch.cuda.is_available()

False

## 1. Tensor Basics

In pytorch everything is based tensors.

In [9]:
# Creating an empty tensor
x = torch.empty(3, 2, 2, 3) # tf.constant()
print(x)

tensor([[[[ 3.5816e-35,  0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  1.4013e-45,  0.0000e+00]],

         [[ 0.0000e+00,  0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00]]],


        [[[ 3.7210e-35,  0.0000e+00,  1.6732e-35],
          [ 0.0000e+00, -2.4286e-30,  4.5874e-41]],

         [[ 2.8026e-45,  0.0000e+00,  2.8026e-45],
          [ 0.0000e+00,  4.2039e-45,  0.0000e+00]]],


        [[[ 0.0000e+00,  0.0000e+00,  0.0000e+00],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00]],

         [[ 4.2039e-45,  0.0000e+00,  1.4013e-45],
          [ 0.0000e+00,  0.0000e+00,  0.0000e+00]]]])


In [11]:
#  Random Tensor
x = torch.rand(2, 2)
print(x)

tensor([[0.6204, 0.0075],
        [0.5981, 0.6651]])


In [13]:
# Zeros tensor
x = torch.zeros(2, 2)
print(x)

tensor([[0., 0.],
        [0., 0.]])


In [14]:
# ones tensor
x = torch.ones(2,2, dtype=torch.float16)
print(x)
print(x.dtype)
print(x.size())

tensor([[1., 1.],
        [1., 1.]], dtype=torch.float16)
torch.float16
torch.Size([2, 2])


In [15]:
# Creating tensor from python list
x = torch.tensor([1,2])
print(x)

tensor([1, 2])


In [20]:
# Basic operations
x = torch.rand(2, 2)
y = torch.rand(2, 2)
print(x)
print(y)

tensor([[0.3010, 0.4266],
        [0.5522, 0.8659]])
tensor([[0.6596, 0.6837],
        [0.0585, 0.6424]])


In [23]:
z = torch.add(x, y)
print(z)

tensor([[0.9605, 1.1103],
        [0.6107, 1.5083]])


In [24]:
# In place addition
y.add_(x) 

tensor([[0.9605, 1.1103],
        [0.6107, 1.5083]])

In [26]:
# Slicing
x = torch.rand(5, 3)
print(x)
print(x[:, 0])

tensor([[0.7905, 0.1205, 0.1779],
        [0.1668, 0.7189, 0.0564],
        [0.5438, 0.8737, 0.2739],
        [0.7827, 0.5072, 0.9777],
        [0.4128, 0.2837, 0.5488]])
tensor([0.7905, 0.1668, 0.5438, 0.7827, 0.4128])


In [27]:
print(x[1, :])

tensor([0.1668, 0.7189, 0.0564])


In [28]:
# Reshaing a tensor
x = torch.rand(4, 4)
print(x)
y = x.view(16)
print(y)

tensor([[0.6978, 0.8248, 0.8967, 0.2190],
        [0.9712, 0.4313, 0.0503, 0.6076],
        [0.5150, 0.5144, 0.3651, 0.3434],
        [0.1204, 0.7721, 0.1049, 0.0760]])
tensor([0.6978, 0.8248, 0.8967, 0.2190, 0.9712, 0.4313, 0.0503, 0.6076, 0.5150,
        0.5144, 0.3651, 0.3434, 0.1204, 0.7721, 0.1049, 0.0760])


In [33]:
# Numpy to torch and vice versa
a = torch.ones(5)
print(a)
b = a.numpy()
print(b)
print(type(b))

tensor([1., 1., 1., 1., 1.])
[1. 1. 1. 1. 1.]
<class 'numpy.ndarray'>


In [34]:
a.add_(1)

tensor([2., 2., 2., 2., 2.])

In [37]:
# In cpu a nd b both point to same memory location
print(a)
print(b)

tensor([2., 2., 2., 2., 2.])
[2. 2. 2. 2. 2.]


In [38]:
a = np.ones(5)
print(a)
b = torch.from_numpy(a)
print(type(b))
print(b)

[1. 1. 1. 1. 1.]
<class 'torch.Tensor'>
tensor([1., 1., 1., 1., 1.], dtype=torch.float64)


In [39]:
a += 1
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


In [2]:
# Switched to GPU to check the above scenario
if torch.cuda.is_available():
  device = torch.device("cuda")
  x = torch.ones(5, device=device)
  y = torch.ones(5)
  Y = y.to(device)
  z = x + Y

In [3]:
print(z)

tensor([2., 2., 2., 2., 2.], device='cuda:0')


In [7]:
z = z.numpy()

TypeError: ignored

In [8]:
# Numpy doesn't work with GPU, since we created the tensor woth gpu
z = z.to("cpu")
z.numpy()

array([2., 2., 2., 2., 2.], dtype=float32)

In [41]:
x = torch.ones(5, requires_grad=True) # This will tell pytorch that gradients needs to be calculated for this tensor later in the optimization step
print(x)

tensor([1., 1., 1., 1., 1.], requires_grad=True)


> Whenever we've a variable that needs to optimized, `requires_grad=True`

### TensorBasics Summary

1. Similar to numpy
2. view - reshape
3. torch.math operations
4. variable.math_operations_ - inplace mathametical operation
5. requires_grad

## Autograd

Package in pytorch to calculate graidents, Gradient computation.
Autograd provides all the stuff with Autograd

In [42]:
x = torch.randn(3)
print(x)

tensor([ 2.1019, -0.8781, -0.3495])


In [43]:
# Later we need to calculate the gradient of some function with respect to x intialize the variable with requires_grad
x = torch.randn(3, requires_grad=True)
print(x)

tensor([ 0.0129,  1.6865, -0.4534], requires_grad=True)


In [44]:
y = x + 2

In [45]:
print(y) # In the output we can see a gradient function, since the operation is addition, grad_dn is AddBackward

tensor([2.0129, 3.6865, 1.5466], grad_fn=<AddBackward0>)


In [48]:
z = y*y*2 # For multiplication
print(z)

tensor([ 8.1032, 27.1808,  4.7840], grad_fn=<MulBackward0>)


In [49]:
z = z.mean()
print(z)

tensor(13.3560, grad_fn=<MeanBackward0>)


In [50]:
# To calculate the gradient just call backward
z.backward() # dz/dx
print(x.grad)

tensor([2.6838, 4.9154, 2.0621])


In [54]:
# Gradient calculation works for a scalar, what happens with a vector
x = torch.randn(2, requires_grad=True)
y = x + 2
z = y*y*2
z.backward()
print(x.grad)

RuntimeError: ignored

This wont't work, since gradient calculation is a jacobian vector multiplication. We need to pass a vector

In [55]:
# Initializing random weights
v = torch.tensor([0.1,1.0], dtype=torch.float32)
z.backward(v)
print(x.grad)

tensor([ 0.0924, 11.4316])


To prevent pytorch from tracking the history and grad_fn attribute.
Updating the weights shouldn't be a part of gradient computation which is gradien_fn.
Three options to avoid this are
1. x.requires_grad_(false)
2. x.detach()
3. with torch.no_grad():

In [56]:
x.requires_grad_(False)
print(x)

tensor([-1.7690,  0.8579])


In [61]:
x.requires_grad_(True)
y = x.detach()
print(y)

tensor([-1.7690,  0.8579])


In [62]:
with torch.no_grad():
  y = x + 2
  print(y)

tensor([0.2310, 2.8579])


In [63]:
y = x + 2
print(y)

tensor([0.2310, 2.8579], grad_fn=<AddBackward0>)


Whenver `.backward()` function is called gradient for the tensor will be calculated and accmulated in `grad`

In [68]:
weights = torch.ones(4, requires_grad=True)

# Training loop
for epoch in range(3):
  model_output = (weights * 3).sum()

  # Calculating gradient
  model_output.backward()
  print(weights.grad)

  # Resetting weights for each epoch
  weights.grad.zero_()

tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])
tensor([3., 3., 3., 3.])


### Autograd summary:

* Whenever gradient needs to be calculated for a tensor `requires_grad=True`
* To calculate the gradient by calling `.backward()`
* Before next iteration or epoch empty the gradients
* Prevent weigh calculation from gradient computation using `detach(), torch.no_grad()`

## Backpropogation

* Gradient is calculated using a chain role from output to input in backward direction
* Computation graph - for every operation we do with tensors, pytorch will create a graph 

*Whole concept consists of three steps:*

1. Forward pass: Compute loss
2. Each node - compute local gradients
3. Backward pass: Compute dLoss / dWeights using the chain rule

In [69]:
x = torch.tensor(1.0)
y = torch.tensor(2.0)

w = torch.tensor(1.0, requires_grad=True)

In [70]:
# Forward pass and compute the loss
y_hat = w * x
loss = (y_hat - y) ** 2
print(loss)

tensor(1., grad_fn=<PowBackward0>)


In [71]:
# Backward pass
# Pytorch automatically computes locaal gradient and backward pass
loss.backward()
print(w.grad)

tensor(-2.)


### Backpropogation summary

1. forward pass
2. compute local gradients
3. backward pass

## linear regression

1. Prediction: Manually - PyTorch Model
2. Gradients computation: Manually - Autograd
3. Loss computation: Manually - PyTorch Loss
4. Parameter updates: Manually - PyTorch Optimizer

### All manual

In [106]:
# f = w * x
# f = 2 * x
X = np.array([1, 2, 3, 4], dtype=np.float32)
Y = np.array([2, 4, 6, 8], dtype=np.float32)

In [107]:
# Initialize initial weight
w = 0.0

In [108]:
# Model prediction
def forward(x):
  return w * x

# loss = MSE
def loss(y, y_pred):
  return ((y_pred - y) ** 2).mean()

# Gradient
# MSE = 1/N * (w*x - y) ** 2
# MSE = 1/n_samples * (pred - truth) ** 2
# dJ/dw = 1/N 2x (w*x) - y
def gradient(x, y, y_pred):
  return np.dot(2*x, y_pred - y).mean()

In [109]:
print(f"Prediction before training: f(5) = {forward(5):.3f}")

Prediction before training: f(5) = 0.000


In [110]:
# Training
learning_rate = 0.01
n_iters = 10

In [111]:
# Training loop
for epoch in range(n_iters):
  # prediction = forward_pass
  y_pred = forward(X)
  # Loss
  l = loss(Y, y_pred)
  # gradients
  dw = gradient(X, Y, y_pred)

  # Update weights
  w -= learning_rate * dw

  if epoch % 1 == 0:
    print(f"Epoch: {epoch + 1}, weight: {w:.3f}, loss: {l:.8f}")

print(f"Prediction after training: f(5) = {forward(5):.3f}")

Epoch: 1, weight: 1.200, loss: 30.00000000
Epoch: 2, weight: 1.680, loss: 4.79999924
Epoch: 3, weight: 1.872, loss: 0.76800019
Epoch: 4, weight: 1.949, loss: 0.12288000
Epoch: 5, weight: 1.980, loss: 0.01966083
Epoch: 6, weight: 1.992, loss: 0.00314574
Epoch: 7, weight: 1.997, loss: 0.00050331
Epoch: 8, weight: 1.999, loss: 0.00008053
Epoch: 9, weight: 1.999, loss: 0.00001288
Epoch: 10, weight: 2.000, loss: 0.00000206
Prediction after training: f(5) = 9.999


weights is `10` becaue gradient is calculate as 2* so for 5 epochs 5 *2 = 10

### Let's impliment all the step using pytorch

In [112]:
X = torch.tensor([1,2,3,4], dtype=torch.float32)
Y = torch.tensor([2,4,6,8], dtype=torch.float32)

w = torch.tensor(0.0, dtype=torch.float32, requires_grad=True)

for epoch in range(n_iters):
  y_pred = forward(X)

  l = loss(Y, y_pred)

  # Replacing gradient calculation with pytorch
  l.backward() # dl/dw

  # Update weights
  with torch.no_grad():
    w -= learning_rate * w.grad

  # Zero gradients
  w.grad.zero_()


  if epoch % 1 == 0:
    print(f"Epoch: {epoch + 1}, weight: {w:.3f}, loss: {l:.8f}")

print(f"Prediction after training: f(5) = {forward(5):.3f}")

Epoch: 1, weight: 0.300, loss: 30.00000000
Epoch: 2, weight: 0.555, loss: 21.67499924
Epoch: 3, weight: 0.772, loss: 15.66018772
Epoch: 4, weight: 0.956, loss: 11.31448650
Epoch: 5, weight: 1.113, loss: 8.17471695
Epoch: 6, weight: 1.246, loss: 5.90623236
Epoch: 7, weight: 1.359, loss: 4.26725292
Epoch: 8, weight: 1.455, loss: 3.08308983
Epoch: 9, weight: 1.537, loss: 2.22753215
Epoch: 10, weight: 1.606, loss: 1.60939169
Prediction after training: f(5) = 8.031


Gradient computation is moved to Autograd, next loss and paramter updation