**torch.autograd**: to compute gradients, pyTorch has built-in differentiation engine torch.autograd. It supports automatic computation of gradient for any computational graph

We set **requires_grad** for those variables which we need to optimize. We will be able to compute the gradient of the loss function with respect to those variables.

In [19]:
import torch

x = torch.ones(5)
y = torch.zeros(3)

w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)

x, y, w, b

(tensor([1., 1., 1., 1., 1.]),
 tensor([0., 0., 0.]),
 tensor([[ 1.4352, -0.1994, -1.6113],
         [-1.1986, -0.1369,  0.3532],
         [ 0.9336,  0.0817,  0.0554],
         [-1.6064,  0.3219, -1.8778],
         [ 0.0423,  0.1048,  0.9700]], requires_grad=True),
 tensor([-0.1757,  0.6317,  0.8846], requires_grad=True))

In [20]:
x.shape, y.shape, w.shape, b.shape

(torch.Size([5]), torch.Size([3]), torch.Size([5, 3]), torch.Size([3]))

In [21]:
z = torch.matmul(x, w)+b
z.shape, z

(torch.Size([3]), tensor([-0.5698,  0.8037, -1.2261], grad_fn=<AddBackward0>))

In [22]:
loss = torch.nn.functional.binary_cross_entropy_with_logits(y, z)
loss

tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>)

## Computing Gradients
- use loss.backward(), then retrieve gradient values from w.grad and b.grad
- .grad is only available for the leaf node, having requires_grad=True

- we can only perform .backward() once on a given graph, for performance reasons. If we need to do several .backward calls on the same graph, we need to pass **retain_graph=True** to the .backward call.

In [23]:
loss.backward()

In [24]:
w.grad, b.grad

(tensor([[-0., -0., -0.],
         [-0., -0., -0.],
         [-0., -0., -0.],
         [-0., -0., -0.],
         [-0., -0., -0.]]),
 tensor([-0., -0., -0.]))

## Disabling gradient tracking

Use with **torch.no_grad():**

Used when:
- During eval
- To mark some parameters as frozen
- speeding up the computations

In [27]:
z = torch.matmul(x, w) + b
z, z.requires_grad

(tensor([-0.5698,  0.8037, -1.2261], grad_fn=<AddBackward0>), True)

In [26]:
with torch.no_grad():
    z = torch.matmul(x, w) + b

z, z.requires_grad

(tensor([-0.5698,  0.8037, -1.2261]), False)

**autograd keeps a record of data(tensors) and all executed operations(along with the new results) in a DIRECTED ACYCLIC GRAPH(DAG) consisting of Function objects**

- DAG leaves: input tensors
- DAG roots: output tensors

Computing gradients => tracing graph from root to leaves

**Each graph is recreated from scratch in each .backward() call (So dynamic DG)**