### PyTorch Autograd and Backpropagation

Understanding PyTorch’s autograd and backpropagation is crucial for efficiently designing and training models in machine learning. This chapter introduces these essential concepts with straightforward explanations and examples.

#### Introduction to Autograd

PyTorch provides an automatic differentiation system called **autograd**. It tracks all operations on `torch.Tensor` objects with the `requires_grad` attribute set to `True`. By doing so, it creates a **computation graph**, which enables efficient computation of gradients—needed for optimizing models via backpropagation. This makes PyTorch a preferred choice for tasks involving deep learning.

#### Computational Graphs

In PyTorch, each tensor operation creates a node in a computational graph. Tensors with `requires_grad=True` will be traced, and the entire history of computation will be captured in the graph. This graph is dynamic and automatically adjusted as tensors become unused (e.g., they go out of scope), which can be more efficient than static graph systems.


### Example: Simple Autograd Usage

Let’s see a basic example of using autograd in PyTorch:


In [1]:
import torch

# Create a tensor and set requires_grad=True to track its operations
x = torch.tensor(3.0, requires_grad=True)

# Perform some operations
y = x**2  # y = x^2
z = 2 * y + 3  # z = 2 * x^2 + 3

# Perform backpropagation to compute gradients
z.backward()  # Compute dz/dx

# Print the gradient of x
print(f"Gradient of x (dz/dx): {x.grad}")


Gradient of x (dz/dx): 12.0


In [5]:
import torch
# Create tensors
x = torch.randn(3, requires_grad = True )
print(f'{x=}')
y = x * 2
z = y.mean ()
print(f'{z=}')
# Backpropagation
z.backward ()

# Gradients
print(f'\ngrad dz/dx = {x.grad}')

x=tensor([-2.3873,  1.7180,  0.2499], requires_grad=True)
z=tensor(-0.2796, grad_fn=<MeanBackward0>)

grad dz/dx = tensor([0.6667, 0.6667, 0.6667])


### Explanation of the Example

In this example, we:
- Created a tensor `x` with `requires_grad=True`.
- Performed operations on `x` to build a computational graph.
- Used `backward()` on `z` to compute the gradient of `z` with respect to `x`.
- Accessed `x.grad` to view the computed gradients.

### Backpropagation Explained

**Backpropagation** is an algorithm used to compute the gradient of a loss function with respect to all weights in a neural network. It works by applying the chain rule of differentiation across the layers in the network, allowing for efficient gradient computation needed for optimization algorithms like gradient descent.

#### How Backpropagation Works:
1. **Forward Pass**: During the forward pass, the inputs propagate through the network, and outputs are computed based on the current weights of the network.
2. **Loss Calculation**: The difference between the predicted output and the actual target is used to compute the loss.
3. **Backward Pass**: During backpropagation, the gradient of the loss function with respect to each parameter (weight) is computed. The gradients are calculated using the chain rule by propagating gradients backward from the output layer to the input layer.

This allows for efficient updating of weights during training, using optimization techniques like stochastic gradient descent (SGD).

#### PyTorch and Backpropagation:

In PyTorch, when you call `.backward()` on the final output tensor, PyTorch automatically:
- Traverses the computational graph in reverse (from output back to input).
- Computes the gradients for all tensors in the graph that have `requires_grad=True`.

These gradients can then be used by optimization algorithms to update the weights of a neural network and minimize the loss function.
