# Autograd

## Define new autograd function 

Under the hood, each primitive autograd operator is really two functions that operate on Tensors. The **forward function** computes output Tensors from input Tensors. The **backward** function receives the gradient of the output Tensors with respect to some scalar value, and computes the gradient of the input Tensors with respect to that same scalar value.

In PyTorch we can easily define our own autograd operator by defining a subclass of torch.autograd.Function and implementing the forward and backward functions. We can then use our new autograd operator by constructing an instance and calling it like a function, passing Tensors containing input data.

In [None]:
# -*- coding: utf-8 -*-

In [1]:
import torch

In [2]:
# Define a ReLU function

class MyReLU(torch.autograd.Function):
    """
    We can implement our own custom autograd Functions by subclassing
    torch.autograd.Function and implementing the forward and backward passes
    which operate on Tensors.
    """
    
    @staticmethod
    def forward(ctx, input):
        """
        In the forward pass we receive a Tensor containing the input and return
        a Tensor containing the output. ctx is a context object that can be used
        to stash information for backward computation. You can cache arbitrary
        objects for use in the backward pass using the ctx.save_for_backward method.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)
    
    @staticmethod
    def backward(ctx, grad_output):
        """
        In the backward pass we receive a Tensor containing the gradient of the loss
        with respect to the output, and we need to compute the gradient of the loss
        with respect to the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input

In [3]:
dtype = torch.float

# device = torch.device("cpu") # Uncomment this to run on CPU
device = torch.device("cuda:0") # Uncomment this to run on GPU

In [4]:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension;

N, D_in, H, D_out = 64, 1000, 100, 10

In [5]:
# Create random input and output data
# Setting requires_grad = false indicates that we do not need to compute gradients
# with respect these Tensors during the backward pass.

# default requires_grad = False 

x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

In [6]:
# Create random weights.
# Here need to caculate grad (requires_grad = True)

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

In [7]:
# init learning rate

learning_rate = 1e-6

In [12]:
for t in range(500):
    
    # Use Function.apply method to alias MyReLU as "relu"
    relu = MyReLU.apply
    
    # Forward pass 
    y_pred = relu(x.mm(w1)).mm(w2)
    
    # Caculate loss
    loss = (y - y_pred).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())
        
    # Use autograd backward
    loss.backward()
    
    # Update weights
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        
        w1.grad.zero_()
        w2.grad.zero_()

99 390.6264343261719
199 1.2671489715576172
299 0.007072353269904852
399 0.00017808620759751648
499 3.3400647225789726e-05


-- by HanaRo, 2020/09/09