<a href="https://colab.research.google.com/github/dlmacedo/ml-dl-notebooks/blob/master/notebooks/deep-learning/PYTORCH_two_layer_net_optim.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
%matplotlib inline


PyTorch: optim
--------------

A fully-connected ReLU network with one hidden layer, trained to predict y from x
by minimizing squared Euclidean distance.

This implementation uses the nn package from PyTorch to build the network.

Rather than manually updating the weights of the model as we have been doing,
we use the optim package to define an Optimizer that will update the weights
for us. The optim package defines many optimization algorithms that are commonly
used for deep learning, including SGD+momentum, RMSProp, Adam, etc.



In [2]:
import torch

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold inputs and outputs
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Tensors it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(x)

    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()

0 668.0164184570312
1 650.7867431640625
2 633.9840087890625
3 617.70947265625
4 601.9817504882812
5 586.7880859375
6 572.0440063476562
7 557.73876953125
8 543.8649291992188
9 530.4135131835938
10 517.407470703125
11 504.89251708984375
12 492.7296447753906
13 480.9434509277344
14 469.4600830078125
15 458.28765869140625
16 447.46075439453125
17 436.9723815917969
18 426.739501953125
19 416.8033447265625
20 407.1033935546875
21 397.701904296875
22 388.56695556640625
23 379.6774597167969
24 371.00537109375
25 362.5210876464844
26 354.22625732421875
27 346.1149597167969
28 338.2031555175781
29 330.5519714355469
30 323.0850830078125
31 315.7976379394531
32 308.6966552734375
33 301.7493591308594
34 294.9686584472656
35 288.33319091796875
36 281.889892578125
37 275.5746154785156
38 269.4440002441406
39 263.4324645996094
40 257.52984619140625
41 251.72418212890625
42 246.00758361816406
43 240.38504028320312
44 234.87759399414062
45 229.50186157226562
46 224.23619079589844
47 219.0615234375
48 21