In [3]:
import torch
import math

In the previous examples, we had to manually implement both the forward and backward\
passes of our neural network. Manually implementing the backward pass is not a big\
deal for a small two-layer network, but can quickly get very hairy for large\
complex networks.

Thankfully, we can use **automatic differentiation** to automate the computation of\
backward passes in neural networks. The autograd package in PyTorch provides exactly\
this functionality. When using **autograd**:

1- The forward pass of your network will define a computational graph\
nodes in the graph will be Tensors, and edges will be functions that\
produce output Tensors from input Tensors. 

2- Backpropagating through this graph then allows you to easily compute gradients.\
Each Tensor represents a node in a computational graph. If x is a Tensor that\
has x.requires_grad=True then x.grad is another Tensor holding the gradient of\
x with respect to some scalar value.


Here we use PyTorch Tensors and autograd to implement our fitting sine wave\
with third order polynomial example now we no longer need to manually implement\
the backward pass through the network:

In [4]:
device = "cuda" if torch.cuda.is_available() else "cpu"
torch.set_default_device(device)

In [5]:
# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, dtype=torch.float)
y = torch.sin(x)

In [6]:
# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), dtype=torch.float, requires_grad=True)
b = torch.randn((), dtype=torch.float, requires_grad=True)
c = torch.randn((), dtype=torch.float, requires_grad=True)
d = torch.randn((), dtype=torch.float, requires_grad=True)

To use **autograd** we call the `backward()` method on any computation that is based on\
tensors that have `requires_grad` set to True. After calling the method,\
each tensor will hold the required gradient information retrieved using `tensor.grad`.

After each `backward()` step we have to reset the `tensor.grad` values to `None`.

In [None]:

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None


In [None]:
print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')