A fully-connected ReLU network with one hidden layer, trained to predict y from x by minimizing squared Euclidean distance.

This implementation uses the nn package from PyTorch to build the network. PyTorch autograd makes it easy to define computational graphs and take gradients, but raw autograd can be a bit too low-level for defining complex neural networks; this is where the nn package can help. The nn package defines a set of Modules, which you can think of as a neural network layer that has produces output from input and may have some trainable weights.

In [6]:
import torch
from collections import OrderedDict

In [4]:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

In [7]:
# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.
model = torch.nn.Sequential(OrderedDict([
    ('linear1', torch.nn.Linear(D_in, H)),
    ('relu1', torch.nn.ReLU()),
    ('linear2', torch.nn.Linear(H, D_out))
]))

In [8]:
# The nn package also contains definitions of popular loss functions; in this
# case we will use Mean Squared Error (MSE) as our loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

In [9]:
learning_rate = 1e-4
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model. Module objects
    # override the __call__ operator so you can call them like functions. When
    # doing so you pass a Tensor of input data to the Module and it produces
    # a Tensor of output data.
    y_pred = model(x)

    # Compute and print loss. We pass Tensors containing the predicted and true
    # values of y, and the loss function returns a Tensor containing the
    # loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Zero the gradients before running the backward pass.
    model.zero_grad()

    # Backward pass: compute gradient of the loss with respect to all the learnable
    # parameters of the model. Internally, the parameters of each Module are stored
    # in Tensors with requires_grad=True, so this call will compute gradients for
    # all learnable parameters in the model.
    loss.backward()

    # Update the weights using gradient descent. Each parameter is a Tensor, so
    # we can access its gradients like we did before.
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

0 693.1672973632812
1 643.5679931640625
2 600.2606201171875
3 562.1929931640625
4 528.158935546875
5 497.18572998046875
6 468.82666015625
7 442.555419921875
8 417.99591064453125
9 395.1479187011719
10 373.65032958984375
11 353.68707275390625
12 334.8423156738281
13 317.01519775390625
14 299.9827575683594
15 283.7430725097656
16 268.2907409667969
17 253.5657196044922
18 239.5972442626953
19 226.29335021972656
20 213.65621948242188
21 201.63499450683594
22 190.1625213623047
23 179.2407989501953
24 168.88064575195312
25 159.0170135498047
26 149.64932250976562
27 140.77862548828125
28 132.38076782226562
29 124.44772338867188
30 116.95597839355469
31 109.87602233886719
32 103.19412231445312
33 96.89482879638672
34 90.96257781982422
35 85.37261199951172
36 80.1072769165039
37 75.1681137084961
38 70.526123046875
39 66.158935546875
40 62.06158447265625
41 58.218292236328125
42 54.62055206298828
43 51.246437072753906
44 48.08297348022461
45 45.124732971191406
46 42.351593017578125
47 39.7586936