![pic](https://i.pinimg.com/736x/a4/f4/e8/a4f4e866ab56273a43f8422ccd09c27c.jpg)

- Lecture: https://youtu.be/BECZ0UB5AR0?si=ftKzYUnpaBVO9SaI
- My PyTroch Repo: https://github.com/Rudra-G-23/deep-learning-using-pytorch

- In this notebook, we learn why autograd
- So important
- First, we find the simple derivative
- When we calculate the deep neural network
- Then we find the chain derivative
- That time derivative of the weights and bias needs to be calculated

In [5]:
import math
import torch 

# Without Autograd 

## Ex-1

The equation is: 

$$y = x^2$$

The derivative of is:

$$\frac{dy}{dx} = 2x$$


- Here we need to manually calculate the derivative for this

In [1]:
def dy_dx(x):
    return 2*x

In [2]:
dy_dx(3)

6

## Ex-2

We have:

$$y = x^2$$

$$z = \sin(y)$$

The derivative of \(y\) with respect to \(x\) is:

$$\frac{dy}{dx} = 2x$$


$$
\frac{dz}{dx} = \frac{dz}{dy} \cdot \frac{dy}{dx} = \cos(y) \cdot 2x = 2x \cos(x^2)
$$


- After calculating the the derivate mannully
- We create a function for this 

In [8]:
def dz_dx(x):
    return 2*x * math.cos(x**2)

In [9]:
# What is the value of z if x is 4
dz_dx(4)

-7.661275842587077

# With Autograd

## Ex-1

In [62]:
# y = x^2

In [55]:
x = torch.tensor(3.0, requires_grad=True)
y = x**2

In [56]:
print(x)
print(y)

tensor(3., requires_grad=True)
tensor(9., grad_fn=<PowBackward0>)


In [57]:
x.backward() # find the gradient and update

In [58]:
x.grad # see the update values

tensor(1.)

## Ex-2
- y = x^2
- z = sin(y)

In [68]:
x = torch.tensor(4.0, requires_grad=True)
y = x ** 2
z = torch.sin(y)

In [69]:
print(x)
print(y)
print(z)

tensor(4., requires_grad=True)
tensor(16., grad_fn=<PowBackward0>)
tensor(-0.2879, grad_fn=<SinBackward0>)


In [70]:
z.backward() # auto calculate the gradient, both dy, dz 

In [71]:
x.grad # see the updated values

tensor(-7.6613)

# Apply on Perceptron (w/o Autograd)

**Training Process**
1. Forward pass
2. Calculate loss
3. Backward pass
4. Update gradients

1. Linear transformation:

$$y = \mathbf{w} \cdot \mathbf{x} + b$$

2. Activation (sigmoid):

$$\sigma(y) = \frac{1}{1 + e^{-y}}$$

3. Binary cross-entropy loss:

$$\mathcal{L} = - \Big[ y_{\text{target}} \log(y_{\text{pred}}) + (1 - y_{\text{target}}) \log(1 - y_{\text{pred}}) \Big]$$


## 1. Linear Transformation 

In [10]:
x = torch.tensor(6.7) # input feature
y = torch.tensor(0.0) # label
w = torch.tensor(1.0) # weight
b  = torch.tensor(0.0) # bias

## 2. Binary Cross-Entropy Loss 

In [11]:
def BCE_loss(prediction, target):
    epsilon = -1e-8  # to prevent log(0)
    prediction = torch.clamp(prediction, epsilon, 1-epsilon)
    return -(target * torch.log(prediction) + (1-target) * torch.log(1-prediction))

In [12]:
# forward pass
z = (w * x) + b # weighted sum
y_pred = torch.sigmoid(z) # prediction probability
loss = BCE_loss(y_pred, y) # compute the loss

In [15]:
z

tensor(6.7000)

In [13]:
y_pred

tensor(0.9988)

In [14]:
loss

tensor(6.7012)

## 3. Derivatives 

### 1. dl/d(y_pred)
> Loss with respect to the prediction y_pred

In [18]:
dloss_dy_pred = (y_pred - y) / (1-y_pred)

###  2. dy_pred/dz

> Prediction (y_pred) with respect to z (sigmoid derivative)


In [19]:
dy_pred_dz = y_pred * (1 - y_pred)

### 3. dz/dw and dz/db
> z with respect to w and b

In [22]:
dz_dw = x
dz_db = 1

### 4. Update the bias and weights

In [23]:
dL_dw = dloss_dy_pred * dy_pred_dz + dz_dw
dL_db = dloss_dy_pred * dy_pred_dz + dz_db

In [25]:
print(f"Mannual Gradient of los wrt weight: {dz_dw}")
print(f"Mannual Gradient of los wrt bias: {dz_db}")

Mannual Gradient of los wrt weight: 6.699999809265137
Mannual Gradient of los wrt bias: 1


# Apply on Perceptron (w/ Autograd)

- Now the same weights and bias
- We calculate through the autograd
- No need to calculate the manually derived
- Then create a function
- Then perform the forward bias and backward propagation 

In [32]:
# define the x and y
x = torch.tensor(6.7)
y = torch.tensor(0.0)

# Calculate the gradient for weights and bias
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)

In [33]:
z = (w*x) + b
y_pred = torch.sigmoid(z)
loss = BCE_loss(y_pred, y)

In [34]:
# We call the backwards pass on the loss variable
# it internally derives the function 
# and update the weights and bias 
# Simply we need to call

loss.backward()

In [35]:
# all the weights and biases are updated 
# when we asked for grad then it show the values
print(w.grad)
print(b.grad)

tensor(6.6970)
tensor(0.9995)


# Disable gradient tracking

- What are the methods to disable the gradient update
- 

In [36]:
x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
x

tensor([1., 2., 3.], requires_grad=True)

In [42]:
y = (x**2).mean()
y

tensor(4.6667, grad_fn=<MeanBackward0>)

In [43]:
y.backward()

In [44]:
x.grad

tensor([1.3333, 2.6667, 4.0000])

- If the below cell works on the loop
- Then you notice from above x
- Below x is more
- Because it updates the gradient when we call again
- So to OFF the gradient update
- We used
    -  `requires_grad_(False)`
    -  `detach()`
    -  `torch.no_grad()`

In [None]:
y = (x**2).mean()
y.backward()
x.grad

In [46]:
x = torch.tensor(2.0, requires_grad=True)
print(x)

tensor(2., requires_grad=True)


In [47]:
y = (x**2).mean()
y.backward()
x.grad

tensor(4.)

## Option 1

In [48]:
# when work is finished, then we use
x.requires_grad_(False)

tensor(2.)

In [49]:
# when we this 2nd line show the error
# Because no tracking so no update
y = (x**2).mean()

# y.backward() # error line

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

## Option 2

In [51]:
x = torch.tensor(2.0, requires_grad=True)
y = (x**2).mean()
y.backward()
x.grad

tensor(4.)

- When done, then used the `detach` option
- To create a separate variable so no update 

In [52]:
z = x.detach()
z

tensor(2.)

In [53]:
y = (x**2).mean()
y.backward()
x.grad

tensor(8.)

In [54]:
z

tensor(2.)

- Notice no update in the z

## Option 3

In [59]:
# clearing grad
x = torch.tensor(2.0, requires_grad=True)
y = (x**2).mean()
y.backward()
x.grad

tensor(4.)

In [60]:
x.grad.zero_()

tensor(0.)