<h1 align="center">PyTorch Autograd</h1>

- Autograd is a core component of PyTorch that provide automatic differentiation for tensor operations.
- it enables gradiant computation, which is essential for training models using optimization algorithms like Gradiant Descent.

In [1]:
import torch

In [2]:
x = torch.tensor(10, requires_grad=True, dtype=torch.float16)
y = x ** 2

print(x)

tensor(10., dtype=torch.float16, requires_grad=True)


In [3]:
print(y)

tensor(100., dtype=torch.float16, grad_fn=<PowBackward0>)


In [4]:
y.backward()

In [5]:
x.grad

tensor(20., dtype=torch.float16)

- Forward Propagation : X -- (x) **2 --> y
    - 10 -- (10) ** 2 --> 100
- Backward Propagation : dy/dx = d(x^2)/dx = 2 * x
    - 10 -- 2(10) --> 20

In [110]:
# Simple NN with no hidden layer
# Sigmoid activation function and Binary Cross Entropy Loss Fn

torch.manual_seed(42)

w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.0, requires_grad=True)
w, b

(tensor(1., requires_grad=True), tensor(0., requires_grad=True))

In [111]:
def binary_cross_entropy(prediction, target):
    epsilon = 1e-8
    prediction = torch.clamp(prediction, epsilon, 1 - epsilon)
    return -(target * torch.log(prediction) + (1 - target) * torch.log(1 - prediction))

In [112]:
y = torch.tensor(0.0)

In [120]:
X = torch.tensor(6.7)
print(x)

# FORWARD PROPAGATION
# dot product of weight and input and add bias
z = w * X + b
print("Z: ", z)

# apply sigmoid activation function since it's classification
y_pred = torch.sigmoid(z)
print("Y_Pred: ", y_pred)

# using BCE since it is binary classification
loss = binary_cross_entropy(y_pred, y)
print("Loss: ", loss)

# BACKWARD PROPAGATION
# compute gradients
loss.backward()

tensor([-0.6029])
Z:  tensor(6.7000, grad_fn=<AddBackward0>)
Y_Pred:  tensor(0.9988, grad_fn=<SigmoidBackward0>)
Loss:  tensor(6.7012, grad_fn=<NegBackward0>)


w -- * -- x --- +(b) --> z ---> sigmoid(z) --> y_pred --- y --> Loss Function ---> Loss

If you rerun the upper cell and run the next cell you will notice that the value of gradients w and  b will change from 6.6918 and 0.9988 (after each run). To overcome this issue we can manually set the grads of w and b to 0. `w.grad.zero()` or use `with torch.no_grad()`

In [121]:
w.grad, b.grad

(tensor(6.6918), tensor(0.9988))

In [122]:
w.grad.zero_(), b.grad.zero_()

(tensor(0.), tensor(0.))

After training the model, during prediction we don't need back propagation in that case we can turn off the gradiant calculation.

In [123]:
x = torch.tensor(1.0, requires_grad=True)
y = x ** 2
y.backward()

In [None]:
# option 1: set require_grad = False
x.requires_grad_(False)

In [125]:
# option 2: detach()
x_new = x.detach()
print(x_new)

y = x_new ** 2
y.backward()

tensor(1.)


RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

In [126]:
# option 3: torch_no_grad()

x = torch.tensor(6.7, requires_grad=True)

with torch.no_grad():
    y = x ** 2

y.backward() # Error

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn