In [1]:
%matplotlib inline


PyTorch: Variables and autograd
-------------------------------

A fully-connected ReLU network with one hidden layer and no biases, trained to
predict y from x by minimizing squared Euclidean distance.

This implementation computes the forward pass using operations on PyTorch
Variables, and uses PyTorch autograd to compute gradients.  
这个实现使用PyTorch的Variable来计算前向误差，并使用PyTorch的autograd来计算梯度。

A PyTorch Variable is a wrapper around a PyTorch Tensor, and represents a node
in a computational graph. If x is a Variable then x.data is a Tensor giving its
value, and x.grad is another Variable holding the gradient of x with respect to
some scalar value.  
PyTorch 的 Variable 是 PyTorch 里的 Tensor 的一层包装，它代表一个计算图的一个节点。如果 x 是一个 Variable，那么 x.data 就是这个Variable 的值，x.grad 就是 另一个根据其它一些标量计算保存 x 的梯度的变量。

PyTorch Variables have the same API as PyTorch tensors: (almost) any operation
you can do on a Tensor you can also do on a Variable; the difference is that
autograd allows you to automatically compute gradients.  
Variable 拥有几乎所有 Tensor 的API，不一样的地方是它能自动计算梯度。



In [1]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
x = Variable(torch.randn(N, D_in).type(dtype), requires_grad=False)
y = Variable(torch.randn(N, D_out).type(dtype), requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
w1 = Variable(torch.randn(D_in, H).type(dtype), requires_grad=True)
w2 = Variable(torch.randn(H, D_out).type(dtype), requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y using operations on Variables; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    # Compute and print loss using operations on Variables.
    # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    # (1,); loss.data[0] is a scalar value holding the loss.
    loss = (y_pred - y).pow(2).sum()
    print(t, loss.data[0])

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call w1.grad and w2.grad will be Variables holding the gradient
    # of the loss with respect to w1 and w2 respectively.
    loss.backward()

    # Update weights using gradient descent; w1.data and w2.data are Tensors,
    # w1.grad and w2.grad are Variables and w1.grad.data and w2.grad.data are
    # Tensors.
    w1.data -= learning_rate * w1.grad.data
    w2.data -= learning_rate * w2.grad.data

    # Manually zero the gradients after updating weights
    w1.grad.data.zero_()
    w2.grad.data.zero_()

0 36210464.0
1 31809348.0
2 29183126.0
3 24476958.0
4 17674240.0
5 11104495.0
6 6487892.5
7 3824445.75
8 2417374.0
9 1669795.0
10 1245353.625
11 980070.375
12 797994.9375
13 663636.125
14 559615.5
15 476379.53125
16 408381.5
17 352073.1875
18 304962.625
19 265293.6875
20 231672.25
21 203016.5
22 178484.5625
23 157398.796875
24 139213.0625
25 123493.015625
26 109819.8984375
27 97891.171875
28 87452.2734375
29 78280.0546875
30 70200.859375
31 63074.75
32 56771.4453125
33 51183.23046875
34 46209.18359375
35 41780.015625
36 37826.625
37 34294.0390625
38 31130.28515625
39 28294.8203125
40 25746.705078125
41 23453.7265625
42 21386.8125
43 19516.47265625
44 17826.96484375
45 16299.1337890625
46 14915.7080078125
47 13660.8857421875
48 12522.333984375
49 11487.607421875
50 10546.3583984375
51 9689.4912109375
52 8908.919921875
53 8197.23828125
54 7547.54736328125
55 6953.7626953125
56 6411.3662109375
57 5915.81982421875
58 5461.85986328125
59 5045.66162109375
60 4663.91845703125
61 4313.30419921

388 0.00016907844110392034
389 0.0001651666680118069
390 0.00016137119382619858
391 0.00015698473725933582
392 0.00015386499580927193
393 0.0001498531346442178
394 0.0001462782092858106
395 0.00014349546108860523
396 0.00014041789108887315
397 0.00013662436685990542
398 0.00013369924272410572
399 0.0001304556499235332
400 0.00012735028576571494
401 0.00012459605932235718
402 0.00012237450573593378
403 0.00011961325799347833
404 0.00011713507410604507
405 0.00011436497152317315
406 0.00011247371003264561
407 0.00010972344898618758
408 0.00010761333396658301
409 0.00010518972703721374
410 0.00010336471314076334
411 0.00010109140566783026
412 9.894711547531188e-05
413 9.693757601780817e-05
414 9.513758413959295e-05
415 9.342043631477281e-05
416 9.16521021281369e-05
417 8.961030107457191e-05
418 8.824776159599423e-05
419 8.646755304653198e-05
420 8.517709648003802e-05
421 8.366642578039318e-05
422 8.178655843948945e-05
423 8.025552961044014e-05
424 7.908533734735101e-05
425 7.7287986641749