# PyTorch : Autograd

Autograd is reverse automatic differentiation system. Conceptually, autograd records a graph recording all of the operations that created the data as you execute operations, giving you a directed acyclic graph whose leaves are the input tensors and roots are the output tensors.

In [1]:
# Importing library
import torch

Operations on tensors with `requires_grad = true` causes PyTorch to build a Computational graph
<br>`Computational Graph` - Data Structures that help solve the gradient problem during back-propagation 

In [2]:
N, D_in, H, D_out = 64, 1000, 100, 10

# creating tensors that are used in the network
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Creating tensors with requires_grad = True enables Autograd
w1 = torch.randn(D_in, H, requires_grad = True)
w2 = torch.randn(H, D_out, requires_grad = True)

learning_rate = 1e-6

Defining inputs that are to be used in the network<br>
In this network there are 64 inputs with (total) dimensions.size = 1000<br>
Total neurons in hidden layers = 100<br>
Final clasifying outputs in the final layer = 10<br>
<br>
We will run the 1000 epoch (complete cycle of one forward and backward propagation) cycle for updating weights in our network t improve its performance by learning                             

In [3]:
for epoch in range(1000):
# Forward propagation
    y_pred = x.mm(w1).clamp(min = 0).mm(w2)
    loss = (y_pred-y).pow(2).sum()    
    if epoch == 0:
        print(loss)
        
# Backward propagation
    # Computes gradients with respect to all inputs that have requires_grad = True
    loss.backward()
    
    # After above step is completed gradients are stored in w1.grad and w2.grad
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        
        # Define w1.grad and w2.grad to 0 for next epoch
        w1.grad.zero_()
        w2.grad.zero_()

tensor(37507552., grad_fn=<SumBackward0>)


In [4]:
loss

tensor(5.3382e-06, grad_fn=<SumBackward0>)

We can compare that initially the loss was 1e+7 and towards the end loss was reduced to 1e-6 magnitude. This means that Neural network was able to update weights and improve performance over the epochs it ran throughout.