# [PyTorch - Learning the Basics](https://pytorch.org/tutorials/beginner/basics/intro.html)

In part five we'll cover automatic differentiation.

## Automatic differentiation with `torch.autograd`

**Back propagation** is the most frequently used algorithm when training neural networks. In back propagation, parameters, or model weights, are adjusted according to the **gradient** of the loss function with respect to the given parameter. Here's a quick overview of some terms for clarification:

- The **Loss function** is a formula that measures **how bad the model's prediction is** compared to the actual target. It's essentially a score that we want to be low, it's best to minimize the loss function during training.
- The **gradient** is the **slope of the loss function** with respect to the model's parameters (weights and biases). We can use it to learn **how to change the parameters** to reduce the loss. If the gradient is positive, we want to decrease the weight and if it's negative, we want to increase it. It's calculated using **calculus** (specifically, partial derivatives).
- **Back propagation** is the algorithm used to **efficiently compute all gradients** of the loss with respect to every weight in the network. In it, we do a **forward pass**, which is used for computing predictions and loss, and a **backward pass**, where we apply the chain rule to propagate gradients from output to input. The gradients returned from back propagation are used to **update the weights**, typically via gradient descent.

In PyTorch, we use the built-in differentiation engine `torch.autograd` to compute the gradients. It supports automatic computation of gradient for any computational graph.

As an example, let's consider the simplest one-layer neural network, with input `x`, parameters `w` and `b`, and some loss function. It can be defined in PyTorch like so:

In [1]:
import torch

x = torch.ones(5)       # input tensor
y = torch.zeros(3)      # expected output
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

### Tensors, functions, and computational graph

The code above defines the following **computational graph**:

```mermaid
flowchart LR
 subgraph s1["Parameters"]
        n7["w"]
        n8["b"]
  end
    A["x"] --> n1["times"]
    n1 --> n2["plus"]
    n2 --> n3["z"]
    n3 --> n4["CE"]
    n4 --> n5["loss"]
    n6["y"] --> n4
    n7 --> n1
    n8 --> n2
    n7@{ shape: rounded}
    n8@{ shape: rounded}
    A@{ shape: rounded}
    n1@{ shape: rounded}
    n2@{ shape: rounded}
    n3@{ shape: rounded}
    n4@{ shape: rounded}
    n5@{ shape: rounded}
    n6@{ shape: rounded}
```

