- pytorch has a built in differentiation engine which is used to calculate the gradients and all backpropagation depends on it
- it supports automatic computation of gradient

In [1]:
import torch
# input tensor
x = torch.ones(5)
y = torch.zeros(3)
w = torch.rand(5,3,requires_grad = True)
b = torch.rand(3,requires_grad = True)
z = torch.matmul(x,w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z,y)


In this network, w and b are parameters, which we need to optimize. Thus, **we need to be able to compute the gradients of loss function with respect to those variables**. In order to do that, we set the requires_grad property of those tensors.

You can set the value of `requires_grad` when creating a tensor, or later by using `x.requires_grad_(True)` method.

- A function that we apply to tensors to construct computational graph is in fact an object of class `Function`.
This object of class `Function` knows 
    - how to compute the function in forward direction
    - how to calculate it's derivative in backward direction
    - reference to backward propagation is stored in `grad_fn` property of a tensor. 

In [2]:
print(z.grad_fn)
print(loss.grad_fn)

<AddBackward0 object at 0x000001EE67D4CE20>
<BinaryCrossEntropyWithLogitsBackward0 object at 0x000001EE67D52850>


In [4]:
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3039, 0.3217, 0.2949],
        [0.3039, 0.3217, 0.2949],
        [0.3039, 0.3217, 0.2949],
        [0.3039, 0.3217, 0.2949],
        [0.3039, 0.3217, 0.2949]])
tensor([0.3039, 0.3217, 0.2949])


To Optimize this weights and biases we need to take gradient or derivate of losses w.r.t to this variables - keeping the values of output and input constant

this gradients are calculated after calling loss.backward() function <br>
and then we call w.grad and b.grad to know their values

In [3]:
loss.backward()

In [4]:
print(w.grad)
print(b.grad)

tensor([[0.2863, 0.2945, 0.3069],
        [0.2863, 0.2945, 0.3069],
        [0.2863, 0.2945, 0.3069],
        [0.2863, 0.2945, 0.3069],
        [0.2863, 0.2945, 0.3069]])
tensor([0.2863, 0.2945, 0.3069])


we can use `.grad` property on only the leaf nodes such as w and b which we wanted to optimize and for them we had `requires_grad = True` and for all the other notes `.grad` is not available.

As we can see we got the same values for all of them because the `loss.backward()` was performed only once in our case and if we want to use it for multiple times we need to set `loss.backward(retain_graph = True)`.

## Disabling gradient Tracking

IN case of testing our model we do not need to modify our weights rather keep them as it is, hence for those test inputs we don't need to keep our parameter of `requires_grad = True` 

In such cases we only need forward pass. we can then stop this computation by adding `torch.no_grad()` meaning don't change the gradients or weights, keep them as it is.

In [5]:
z = torch.matmul(x,w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x,w) + b
print(z.requires_grad)

True
False


In [6]:
# another way to use this is to use `detach()` method

z = (torch.matmul(x,w)+b).detach()
print(z.requires_grad)

False


## Process of computational graphs

- autograd keeps the record of both **data(tensors) and all the executed operations(along with the new tensors)**
- they are stored in DAG **Directed Acyclic Groups**
- IN DAGs leaves are the input tensors and roots are the output tensors
- by tracing from leaves to nodes we are automatically computing the gradients using chain rule

### In a forward pass, autograd does two things simultaneously:

- run the requested operation to compute a resulting tensor

- maintain the operation’s gradient function in the DAG.

### BAckward pass
    1 when we use `.backward` it is called DAG root.autograd`
    - computes and gradients from each `.grad` attribute
    
**Note** - DAGs are dynamic in PyTorch An important thing to note is that the `graph is recreated from scratch; after each .backward() call`, autograd starts populating a new graph. This is exactly what allows you to use control flow statements in your model; you can change the shape, size and operations at every iteration if needed.
    