# PyTorch Autograd Tutorial

Autograd is the automatic differentiation system in PyTorch that simplifies the computation of gradients for backpropagation during the training of neural networks. It tracks the operations performed on tensors to build a dynamic computational graph, which is then used to compute gradients with respect to the model's parameters.

Here are some key concepts related to Autograd:

## 1. Computational Graph
A directed graph that represents the sequence of operations performed on tensors during a forward pass. Each node in the graph corresponds to an operation, and the edges represent the flow of tensor data between the operations.

## 2. requires_grad
A boolean attribute of tensors that indicates whether a tensor should track gradients for automatic differentiation. By default, it is set to False for input tensors and True for model parameters (e.g., weights and biases in a neural network).

In [None]:
import torch

x = torch.tensor([2.0], requires_grad=True)
y = torch.tensor([3.0], requires_grad=True)
print(f'x requires_grad: {x.requires_grad}')
print(f'y requires_grad: {y.requires_grad}')

## 3. backward()
A method available on scalar tensors that computes gradients for all tensors in the computational graph with requires_grad=True. Typically, this method is called on the scalar output of a loss function after the forward pass. It accumulates gradients in the .grad attribute of the tensors involved in the computation.

In [None]:
z = (x ** 2) * y
z.backward()
print(f'dz/dx: {x.grad}')
print(f'dz/dy: {y.grad}')

## 4. Gradient accumulation
When calling the backward() method multiple times without zeroing gradients, they will accumulate. This is useful for certain optimization techniques but can lead to unexpected behavior if not handled properly. Use the .zero_grad() method of an optimizer or manually set the .grad attribute to zero for tensors to prevent this accumulation.

In [None]:
z.backward(retain_graph=True)
print(f'Accumulated dz/dx: {x.grad}')
print(f'Accumulated dz/dy: {y.grad}')
x.grad.zero_()
y.grad.zero_()

## 5. .detach()
A method that creates a new tensor without gradient tracking, sharing the same data as the original tensor. This is useful when you want to use a tensor's data for computations that should not be part of the computational graph.

In [None]:
detached_x = x.detach()
print(f'detached_x requires_grad: {detached_x.requires_grad}')

## 6. torch.no_grad()
A context manager that temporarily disables gradient tracking for all tensors. This is useful when evaluating a model, as gradients are not needed and disabling tracking can improve performance and save memory.

In [None]:
with torch.no_grad():
    y = x * 2
print(f'y requires_grad: {y.requires_grad}')

Now, let's see an example of using Autograd to compute gradients for a simple function:

In [None]:
# Let's use Autograd to compute gradients for a simple function
x = torch.tensor([3.0], requires_grad=True)
y = x ** 2
y.backward()
print(f'dy/dx: {x.grad}')

By understanding these concepts, you'll be able to leverage PyTorch's Autograd system to simplify gradient computations for your machine learning models.