the most used algo for training nns is backpropogation 

parameters (model weights) r adjusted based on the gradient of the loss function in respect to he given parameter

pytorch provides an autodiff function to calculate these gradients

In [1]:
import torch

In [2]:
#example simple one layer nn
x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output
w = torch.randn(5, 3, requires_grad=True) #we need to compute loss, so we set requires_grad to True
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

In [3]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Gradient function for z = <AddBackward0 object at 0x00000213B85EC6A0>
Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x00000213B85C5C10>


In [4]:
#this computes the derivatives of loss functions
loss.backward()
print(w.grad)
print(b.grad)

tensor([[0.3206, 0.0100, 0.0636],
        [0.3206, 0.0100, 0.0636],
        [0.3206, 0.0100, 0.0636],
        [0.3206, 0.0100, 0.0636],
        [0.3206, 0.0100, 0.0636]])
tensor([0.3206, 0.0100, 0.0636])


In [5]:
#all tensors with requires_grad=True track their computational history and support gradient computation
#if we dont need to do that (e.g. we have fully trained a model and only want to forward pass) we can stop tracking by using a torch.no_grad() block
z = torch.matmul(x, w)+b
print(z.requires_grad)

with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

True
False


In [6]:
#detatch() method also works
z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

False


this allows you to:
- freeze parameters in your network that you dont want to change with further training
- speed up computations

Conceptually, autograd keeps a record of data (tensors) and all executed operations (along with the resulting new tensors) in a directed acyclic graph (DAG) consisting of Function objects. In this DAG, leaves are the input tensors, roots are the output tensors. By tracing this graph from roots to leaves, you can automatically compute the gradients using the chain rule.

In a forward pass, autograd does two things simultaneously:

- run the requested operation to compute a resulting tensor

- maintain the operation’s gradient function in the DAG.

The backward pass kicks off when .backward() is called on the DAG root. autograd then:

- computes the gradients from each .grad_fn,

- accumulates them in the respective tensor’s .grad attribute

- using the chain rule, propagates all the way to the leaf tensors.