**autograd** package defines a *computational graph* during the *forward pass* of the network. **`nodes`** in the graph will be *`Tensors`*, and **`edges`** will be *`functions`* that produce output *`Tensors`* from input *`Tensors`*
- If `x` is a `Variable` then `x.data` is a `Tensor`, and `x.grad` is another `Variable` holding the `gradient of x with respect to some scalar value`

In [1]:
# Imports
import torch
from torch.autograd import Variable

In [2]:
# Run on GPU 
dtype = torch.cuda.FloatTensor # Normal: dtype = torch.FloatTensor 

In [3]:
# N: batch size, D_in: input dimension, H: hidden dimension, D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

In [4]:
# Create random Tensors to hold input and output and wrap them in Variables
# Setting requires_grad=False -> Do not compute gradients wrt these Variables during backward pass
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

In [5]:
# Create random Tensors for weights and wrap them in Variables
# Setting requires_grad=True -> Compute gradients wrt these Variables during backward pass
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

In [6]:
# Simple Neural Network

learning_rate = 1e-6

for i in xrange(500):
    # Forward pass: Compute predicted y using operations on Variables
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # Compute Total SE loss using operations on Variables
    # loss is a Variable of shape (1,); loss.data is a Tensor of shape (1,);
    # loss.data[0] is a scalar value
    loss = (y_pred - y).pow(2).sum() 
    if i % 50 == 0:
        print i, loss.data[0]
        
    # Manually mutate the gradients before running backward pass
    #w1.grad.data.zero_()
    #w2.grad.data.zero_()
    
    # Use autograd to compute backward pass
    # gradient of loss wrt all Variables with requires_grad=True
    loss.backward()
    
    # Update the weights using gradient descent
    # NOTE: w<k>.data is Tensor, w<k>.grad is Variable, w<k>.grad.data is Tensor
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

0 30002020.0
50 675475.3125
100 614328.4375
150 3440276.0
200 68483.2890625
250 nan
300 nan
350 nan
400 nan
450 nan


**Defining Custom Autograd Functions**
- Each primitive autograd operator is really two functions that operate on Tensors
    - **`Forward`** function: Computes output Tensors from input Tensors
    - **`Backward`** function: Receives the gradient of the output Tensor with respect to some `scalar value`, and computes the gradient of the input Tensors with respect to that same `scalar value`
- Define custom autograd operator by defining a subclass of `torch.autograd.Function` and implementing the `forward` and `backward` functions

In [10]:
class CustomReLU(torch.autograd.Function):
    """
    Implementing custom autograd Functions by subclassing 
    torch.autograd.Function, and implementing forward and 
    backward passes which operate on Tensors
    
    ReLU = min(0.5, input)
    """
    def forward(self, input):
        """
        Forward pass receives a Tensor containing the input 
        and returns a Tensor containing output.
        
        NOTE: Cache arbitrary Tensors for use in the backward 
        pass using `save_for_backward` method.
        """
        self.save_for_backward(input)
        return input.clamp(min=0.5)
    
    def backward(self, grad_output):
        """
        Backward pass receives a Tensor containing the gradient 
        of the loss with respect to the output, and computes and 
        returns the gradient of the loss with respect to the input.
        """
        input = self.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0.5 # Gradient computation 
        return grad_input

In [8]:
# N: batch size, D_in: input dimension, H: hidden dimension, D_out: output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and output and wrap them in Variables
# Setting requires_grad=False -> Do not compute gradients wrt these Variables during backward pass
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

In [9]:
# Create random Tensors for weights and wrap them in Variables
# Setting requires_grad=True -> Compute gradients wrt these Variables during backward pass
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

In [14]:
# Simple Neural Network with custom autograd function

learning_rate = 1e-6

for i in xrange(500):
    # Create an instance of CustomReLU class to use in the network
    relu = CustomReLU()
    
    # Forward pass: Compute predicted y using operations on Variables
    y_pred = relu(x.mm(w1)).mm(w2)
    
    # Compute Total SE loss using operations on Variables
    # loss is a Variable of shape (1,); loss.data is a Tensor of shape (1,);
    # loss.data[0] is a scalar value
    loss = (y_pred - y).pow(2).sum() 
    if i % 50 == 0:
        print i, loss.data[0]
        
    # Manually mutate the gradients before running backward pass
    #w1.grad.data.zero_()
    #w2.grad.data.zero_()
    
    # Use autograd to compute backward pass
    # gradient of loss wrt all Variables with requires_grad=True
    loss.backward()
    
    # Update the weights using gradient descent
    # NOTE: w<k>.data is Tensor, w<k>.grad is Variable, w<k>.grad.data is Tensor
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

0 37390400.0


TypeError: indexing a tensor with an object of type bool. The only supported types are integers, slices, numpy scalars and torch.cuda.LongTensor or torch.cuda.ByteTensor as the only argument.