# AUTOMATIC DIFFERENTIATION WITH TORCH.AUTOGRAD 

- Backpropagation is the widely used algorithm for training neural networks.
- It adjusts model weights (parameters) based on the gradient of the loss function with respect to those parameters.
- PyTorch offers a built-in differentiation engine called torch.autograd.
- This engine allows for the automatic computation of gradients for any computational graph during training.

**Consider the simplest one-layer neural network, with input x, parameters w and b, and some loss function. It can be defined in PyTorch in the following manner:**

In [1]:
import torch

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

## Tensors, Functions and Computational graph

- To construct the computational graph in PyTorch, we apply functions to tensors.
- These functions are instances of the class Function.
- A Function object has knowledge of both forward computation and how to compute its derivative during backward propagation.
- During the backward propagation step, a reference to the backward function is stored in the **grad_fn** property of a tensor.

In [2]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x000001CB0D863C40>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x000001CB0D863D00>


## Computing Gradients

- To optimize weights in a neural network, we compute the derivatives of the loss function with respect to the parameters (w and b).
- The derivatives we need are ∂loss/∂w and ∂loss/∂b for specific values of x and y.
- To compute these derivatives, we use the `loss.backward()` method in PyTorch.
- After calling `loss.backward()`, the gradients are calculated and stored in `w.grad` and `b.grad`.

In [3]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0814, 0.0974, 0.0037],
        [0.0814, 0.0974, 0.0037],
        [0.0814, 0.0974, 0.0037],
        [0.0814, 0.0974, 0.0037],
        [0.0814, 0.0974, 0.0037]])
tensor([0.0814, 0.0974, 0.0037])


## Disabling Gradient Tracking


- By default, tensors with `requires_grad=True` track their computational history and support gradient computation.
- In certain scenarios, such as when we only want to apply a trained model to input data for forward computations without computing gradients, we can stop tracking computations using `torch.no_grad()` block.
- By wrapping our computation code within the `torch.no_grad()` block, PyTorch will not track the operations and gradients, saving memory and computation time.
- This is particularly useful for inference or evaluation phases, where we do not need to update model parameters.
- Using `torch.no_grad()` allows us to efficiently use the trained model for prediction without incurring additional overhead for gradient computations.

In [4]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


*Another way to achieve the same result is to use the detach() method on the tensor:*

In [5]:
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False
