# PyTorch - Revisited

In the previous lab you implemented backpropagation for a simple neural network, just using Numpy. This lab you will learn how PyTorch could have saved you a lot of time.


In [0]:
import torch
import numpy as np

### PyTorch Autograd
To compute the gradient of the loss function w.r.t. all the model parameters you had to manually check how these parameters were involved in computing the neural network's output. You saw how computing these gradients basically came down to applying the chain rule. Recall your sigmoid implementation from the previous lab assignment:




In [0]:
def sigmoid(X):
    return 1 / (1 + np.exp(-X))


def dsigmoid(X):
    sig=sigmoid(X)
    return sig * (1 - sig)


In PyTorch, sigmoid is already defined and would look something like this:

In [0]:
from torch.autograd import Function

class SigmoidFunction(Function):

    @staticmethod
    def forward(ctx, x):
        sigmoid = 1 / (1 + torch.exp(-x))
        ctx.save_for_backward(sigmoid)
        return sigmoid

    @staticmethod
    def backward(ctx, grad_output):
        sigmoid, = ctx.saved_tensors
        return sigmoid * (1 - sigmoid) * grad_output


It contains a function that defines what happens during the forward pass, and a function that defines how to compute the gradient during the backward pass. Also, it stores relevant information that was computed during the forward pass so it can be reused during the backward pass. In the Numpy example, sigmoid(x) had to be recomputed when computing the gradient. Here it is stored in a context argument.

Many commonly used functions have already been defined in PyTorch and the code you usually write are just compositions of these functions. PyTorch thus already knows how to compute the gradients for the models you build! It is very rare that you would have to define the gradient of any function!


Let's look at an example. Suppose that, for simplicity, we take our model to be a single neuron of fixed size:

In [0]:
class Neuron:

  def __init__(self):
    self.weights = torch.randn(5)
    self.bias = torch.randn(1)
  
  def forward(self, x):
    return torch.sigmoid(torch.sum(self.weights * x) + self.bias)


neuron = Neuron()

Now, when given some labeled data, PyTorch can compute the gradient of a loss function w.r.t. the model parameters without explicitly having to define how this need to be done.

In [51]:
# Define a loss function
mse = torch.nn.MSELoss()

# Create a random (data, label) pair
data, label = torch.randn(5), torch.randn(1)

# Compute the loss
loss = mse(neuron.forward(data), label)

# Perform a backward pass to compute the gradients
loss.backward()

# Print the gradients corresponding to the model weights
neuron.weights.grad


tensor([-0.0117,  0.0813, -0.0145,  0.1313,  0.0212])

This gave an error! This is because does not know which variables are model parameters for which the gradient should be computed. This has to be indicated explicitly, otherwise PyTorch would have to keep track of too many redundant computations. To do this we have to change our model to:

In [0]:
class Neuron:

  def __init__(self):
    self.weights = torch.randn(5, requires_grad=True)  # When creating the model weights, explicitly mention they need 
    self.bias = torch.randn(1, requires_grad=True)
  
  def forward(self, x):
    return torch.sigmoid(torch.sum(self.weights * x) + self.bias)
  
  def parameters(self):
    return [self.weights, self.bias]


neuron = Neuron()

Now, with the new model definition, try running the same cell again. The gradients will now be computed.

### PyTorch Optimizer
The computed gradients can be used to optimize the model parameters:

In [0]:
optimizer = torch.optim.SGD(neuron.parameters(), lr=0.1)

print(neuron.weights)
optimizer.step()
print(neuron.weights)


### PyTorch Module

To make things even easier, PyTorch contains a Module class that keeps track of which attributes are model parameters. This becomes useful when you have to build larger models. By calling the .parameters() function you obtain all model parameters. Modules that are attributes of another Module will automatically be registered as part of the model. The following implementations are functionally equivalent to the above.

In [0]:
from torch.nn import Module, Parameter

class Neuron(Module):

  def __init__(self):
    super(Neuron, self).__init__()
    self.weights = Parameter(torch.randn(5))  # Parameters are automatically assumed to require a gradient
    self.bias = Parameter(torch.randn(1))
  
  def forward(self, x):
    return torch.sigmoid(torch.sum(self.weights * x) + self.bias)


neuron = Neuron()


In [0]:
from torch.nn import Module, Linear

class Neuron(Module):

  def __init__(self):
    super(Neuron, self).__init__()
    self.linear = Linear(5, 1, bias=True)  # Linear is a Module for dense layers. It's weights are automatically registered as parameters and assumed to require a gradient
  
  def forward(self, x):
    return torch.sigmoid(self.linear(x))


neuron = Neuron()

In [43]:
torch.nn.ParameterList(neuron.parameters())

ParameterList(
    (0): Parameter containing: [torch.FloatTensor of size 5]
    (1): Parameter containing: [torch.FloatTensor of size 1]
)