# Autograd 

## PyTorch: Tensors and Autograd 

For a two-layer network it is not a big deal to implement the forward and backward passes, but it can quickly get hairy for large complex networks.

In PyTorch, **automatic differentiation** is used to automate the computation of backward passes in neural networks (Autograd).

When using autograd, the forward pass of your network will define a **computational graph**; nodes in the graph will be Tensors, and edges will be functions that produce output Tensors from input Tensors. Backpropagating through this graph then allows you to easily compute gradients.

In [None]:
# -*- coding: utf-8 -*-

In [None]:
import torch

In [None]:
dtype = torch.float

# device = torch.device("cpu") # Uncomment this to run on CPU
device = torch.device("cuda:0") # Uncomment this to run on GPU

In [None]:
# N is the batch size, D_in is input dimension
# H is the hidden dimension, D_out is output dimension

N, D_in, H, D_out = 64, 1000, 100, 10

In [None]:
# Create random input and output data
# Setting requires_grad = false indicates that we do not need to compute gradients
# with respect these Tensors during the backward pass.

# default requires_grad = False 

x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

In [None]:
# Create random weights.
# Here need to caculate grad (requires_grad = True)

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

In [None]:
# init learning rate

learning_rate = 1e-6

In [None]:
for t in range(500):
    
    # Forward pass:
    # Here we dont need to keep reference as a result of Autograd
    
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # Compute and print loss:
    
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())
        
    # Backprop to update weights:
    # Here we use Autograd, this will compute gradient of loss with respect to all
    # with respect requires_grad = True.
    # After this call w1.grad and w2.grad will hold the gradient respectively.
    
    loss.backward()
    
    # Manually update weight using GD.
    # Wrap in torch.no_grad() because weights have requires_grad=True, but we don't、
    # need to track this in autograd.
    
    with torch.no_grad():  
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        
        w1.grad.zero_()
        w2.grad.zero_()
        
    # An alternative way is to operate on weight.data and weight.grad.data.
    # Recall that tensor.data gives a tensor that shares the storage with
    # tensor, but doesn't track history.
    # You can also use torch.optim.SGD to achieve this.

-- by HanaRo, 2020/09/08