<a href="https://colab.research.google.com/github/Renan-Domingues/LearnTheBasics-Pytorch/blob/main/Tutorials_05_AutomaticDifferentiation(autograd).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automatic Differentiation
with torch.autograd

in the training, the most frequently used algorithm is back propagation.
In this algorithm, parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameters.

PyTorch has a buit-in differentiation endine called torch.autograd to compute those gradients

One layer neural network, with input x, parameters w and b, and some loss function. it can be defined in Pytorch in the following manner:


In [None]:
import torch

x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True) # requires_grad = if is True, gradients need to be computed for this Tensor
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z,y)


in this network, w and b are parameters, which we need to optimize. So we need to compute the gradients of loss functions with respect to those variables (that's why we set the ``require_grad``).

In [None]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x79b61dd1ba30>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x79b61c0a17e0>


### Computing Gradients
The optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters.
To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad.

In [None]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.0038, 0.0018, 0.0311],
        [0.0038, 0.0018, 0.0311],
        [0.0038, 0.0018, 0.0311],
        [0.0038, 0.0018, 0.0311],
        [0.0038, 0.0018, 0.0311]])
tensor([0.0038, 0.0018, 0.0311])


### Disable Gradient Tracking

By default, all tensors with requires_grad=True are tracking their computinal history and support gradient computation. However, there are some cases we don't need to do that, when we have trained the model and just wnat to apply it ti some input data, for example

in the next example we will stop tracking computations with the code torch.no_grad()

In [None]:
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
  z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [None]:
# Another way to achieve the same result

z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


Why disable gradient tracking?

- To Mark some parameters in your neural network as frozen parameters
- To speed up computation whe you are only doing the forward pass (computations on tensors that do not track gradients would be more efficient)

### More on Computational Graphs

Autograd keeps a record of data (tensors) and all executed operations

In a forward pass, autograd does 2 things simultaneously:
- run the requested operation to compute a result tensor
- maintain the operation's gradient function in DAG. (DAG = Dierected Acyclic Graph)

In the backward pass kicks off when .backward() is called on the DAG root. autograd then:
- computes the gradients from each .grad_fn,
- accumulates them in the respective tensor's .grad attribute
- using the chain rule, propagates all the way to the leaf tensors.