<a href="https://colab.research.google.com/github/SandeshBashyal/Pytorch_Offical_Tutorials/blob/main/Automatic_Defferentiation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic Differentiation with `torch.autograd`

https://pytorch.org/tutorials/beginner/basics/autogradqs_tutorial.html#

https://pytorch.org/docs/stable/autograd.html#function

In [2]:
import torch

x = torch.ones(5)         # Input
w = torch.randn(5,3, requires_grad = True)    # Weight
b = torch.randn(3, requires_grad = True)      # Bias
y = torch.zeros(3)        # Expected Output

z = torch.matmul(x,w)+b

loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
loss

tensor(0.1110, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)

## Tensors, Functions, Computational Graph

You can set the value of `requires_grad` when creating a tensor, or later by using `x.requires_grad_(True)` method.

In [3]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x7f785ddc6dd0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f785e497ee0>


## Computing Gradients

In [4]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0035, 0.0035, 0.0893],
        [0.0035, 0.0035, 0.0893],
        [0.0035, 0.0035, 0.0893],
        [0.0035, 0.0035, 0.0893],
        [0.0035, 0.0035, 0.0893]])
tensor([0.0035, 0.0035, 0.0893])


## Disabling Gradient Tracking

In [5]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [6]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


## More on Computational Graphs

--> In a forward pass, autograd does two things simultaneously:

1. run the requested operation to compute a resulting tensor

2. maintain the operation’s gradient function in the DAG.

--> The backward pass kicks off when `.backward()` is called on the DAG root `.autograd` then:

1. computes the gradients from each `.grad_fn`,

2. accumulates them in the respective tensor’s `.grad` attribute

3. using the chain rule, propagates all the way to the leaf tensors.

## Tensor Gradients and Jacobian Products

In [7]:
inp = torch.eye(4, 5, requires_grad=True)
out = (inp+1).pow(2).t()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"First call\n{inp.grad}")
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nSecond call\n{inp.grad}")
inp.grad.zero_()
out.backward(torch.ones_like(out), retain_graph=True)
print(f"\nCall after zeroing gradients\n{inp.grad}")

First call
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])

Second call
tensor([[8., 4., 4., 4., 4.],
        [4., 8., 4., 4., 4.],
        [4., 4., 8., 4., 4.],
        [4., 4., 4., 8., 4.]])

Call after zeroing gradients
tensor([[4., 2., 2., 2., 2.],
        [2., 4., 2., 2., 2.],
        [2., 2., 4., 2., 2.],
        [2., 2., 2., 4., 2.]])
