# chapter1


## 什么是PyTorch


PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

### Tensors


Tensor类似与NumPy的ndarray，但Tensor可以在GPU上加速运算。


In [1]:
import torch

构造一个未初始化的5x3矩阵:

In [2]:
x = torch.empty(5,3)
x

tensor([[9.2755e-39, 1.0561e-38, 5.1429e-39],
        [4.5000e-39, 4.9592e-39, 4.2246e-39],
        [1.0286e-38, 1.0653e-38, 1.0194e-38],
        [8.4490e-39, 1.0469e-38, 9.3674e-39],
        [9.9184e-39, 8.7245e-39, 9.2755e-39]])

构建一个随机初始化的矩阵:

In [3]:
x = torch.rand(5,3)
x

tensor([[0.5692, 0.4592, 0.9475],
        [0.3835, 0.9670, 0.0272],
        [0.7540, 0.2939, 0.0618],
        [0.6145, 0.9561, 0.9872],
        [0.3975, 0.9880, 0.4428]])

构建一个全部为0，类型为long的矩阵:

In [4]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [5]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [6]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [7]:
x = x.new_ones(5,3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [8]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[ 0.5531, -0.7920, -1.0496],
        [-1.5054, -0.5829,  2.1198],
        [ 0.0652, -0.7518,  0.8118],
        [-1.0592,  0.6627,  1.5710],
        [ 1.4802,  1.4611,  0.7231]])

得到tensor的形状:

In [9]:
x.shape

torch.Size([5, 3])

tensor运算



In [10]:
y = torch.rand(5,3)
y

tensor([[0.7886, 0.4058, 0.5671],
        [0.3653, 0.6046, 0.0958],
        [0.3161, 0.6384, 0.0770],
        [0.0357, 0.7102, 0.3487],
        [0.1268, 0.7710, 0.3853]])

In [11]:
x + y

tensor([[ 1.3417, -0.3862, -0.4825],
        [-1.1401,  0.0217,  2.2156],
        [ 0.3814, -0.1134,  0.8888],
        [-1.0236,  1.3729,  1.9198],
        [ 1.6070,  2.2321,  1.1084]])

另一种着加法的写法


In [12]:
torch.add(x, y)

tensor([[ 1.3417, -0.3862, -0.4825],
        [-1.1401,  0.0217,  2.2156],
        [ 0.3814, -0.1134,  0.8888],
        [-1.0236,  1.3729,  1.9198],
        [ 1.6070,  2.2321,  1.1084]])

加法：把输出作为一个变量

In [13]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[ 1.3417, -0.3862, -0.4825],
        [-1.1401,  0.0217,  2.2156],
        [ 0.3814, -0.1134,  0.8888],
        [-1.0236,  1.3729,  1.9198],
        [ 1.6070,  2.2321,  1.1084]])

in-place加法
> in-place运算，加下划线‘_’的方法会改变其成员的原有值

In [14]:
y.add_(x)
y

tensor([[ 1.3417, -0.3862, -0.4825],
        [-1.1401,  0.0217,  2.2156],
        [ 0.3814, -0.1134,  0.8888],
        [-1.0236,  1.3729,  1.9198],
        [ 1.6070,  2.2321,  1.1084]])

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。

In [15]:
x[1:, 2:]
# 阐述类似列表索引
# [ ]内两个参数索引范围，第一个代表行索引范围，第二代表列参数索引范围
# 类似[1:, 2:]
# 代表取tensor x的第二行到最后一行，第三列到最后一列的内容
# 索引从 0 开始

tensor([[2.1198],
        [0.8118],
        [1.5710],
        [0.7231]])

Resizing: 改变(resize/reshape)一个tensor的维度，可以使用`torch.view`：

In [16]:
x = torch.randn(4,4) 
# x.view(16)
# x.view(2,8)
x.view(2,-1) # -1 代表自动适配
# 不可以同时传入两个-1，已不能输入一个无法整除的参数

tensor([[-0.7106, -0.4528,  1.1421, -0.4061, -1.3812,  0.6637, -1.6359,  1.1662],
        [-0.6864, -0.8400, -0.6951,  0.2994,  0.6903, -3.1923,  0.7903,  1.1523]])

如果只有一个元素的tensor，使用`.item()`方法可以把里面的value变成Python数值

In [17]:
x = torch.randn(1)
x

tensor([0.0613])

In [22]:
x.item()

0.06132584810256958

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  [PyTorch](https://pytorch.org/docs/torch)

## Numpy和Tensor之间的转化


在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项

把Torch Tensor转变成NumPy Array


In [23]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [24]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [25]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [26]:
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [27]:
import numpy as np

In [28]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)

[2. 2. 2. 2. 2.]


In [29]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [30]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))
    

tensor([1.0613], device='cuda:0')
tensor([1.0613], dtype=torch.float64)


In [31]:
y.to("cpu").data.numpy()
y.cpu().data.numpy()

array([1.], dtype=float32)

In [32]:
# 将模型搬到cuda上
# model = model.cuda()



热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $\hat{y} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [33]:
import numpy as np
N, D_in, H, D_out = 64, 1000, 100, 10
# N 64个输入，D_in  输入1000维
# H 中间变量hidden 100维
# D_out 输出10维

# 随机创建一些训练数据
# randn函数返回一个或一组样本，具有标准正态分布
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    # dot()返回的是两个数组的点积(dot product) 
    h = x.dot(w1)    # N x H
    h_relu = np.maximum(h, 0) # N x H
    y_pred = h_relu.dot(w2) # N x D_out
    
    # compute loss 
    # 平方差损失
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 30929090.770908855
1 28550384.58645065
2 28717286.132209916
3 27472413.25213855
4 22745366.989458382
5 15933113.133631997
6 9644418.840859853
7 5495341.427163018
8 3196020.2484077364
9 2019424.7213925761
10 1406983.895160383
11 1064543.7524391043
12 852222.5676911637
13 706430.9050180643
14 598076.9441031637
15 513149.4116471554
16 444264.0243081858
17 387154.9126507777
18 339142.47033004777
19 298506.84089289285
20 263778.91905184963
21 233994.04353553028
22 208278.5141506013
23 185986.89091403305
24 166575.52155426683
25 149611.8789440797
26 134718.92851913502
27 121612.50513729635
28 110046.67039525774
29 99826.50241708834
30 90744.53107636681
31 82655.9003377166
32 75424.77295013565
33 68951.46239074336
34 63141.23984564561
35 57917.235838168046
36 53209.96064245373
37 48960.69161426388
38 45120.91967548371
39 41640.53774992483
40 38477.845404593165
41 35600.774726939315
42 32979.28017746621
43 30585.798849629435
44 28397.778738119127
45 26394.10080759773
46 24559.809237411468
47

366 0.04437797005974571
367 0.04282404009180146
368 0.041324940485993736
369 0.0398782608225544
370 0.038482558354755364
371 0.03713627733793123
372 0.03583686213563789
373 0.03458356106385278
374 0.0333739327207703
375 0.032206509528950475
376 0.031080122215183657
377 0.02999332972574326
378 0.0289446127047379
379 0.027932643562601688
380 0.026956244017395325
381 0.026014441974364675
382 0.025105216783144726
383 0.024227916283546452
384 0.023381470640611822
385 0.022564525650252604
386 0.021776347734776286
387 0.021015853032609836
388 0.020281895381442358
389 0.01957398956252309
390 0.01889043908931467
391 0.018230839230867414
392 0.017594373802928512
393 0.01698015813330916
394 0.01638745183052057
395 0.01581550675024498
396 0.01526369475579082
397 0.014731314570207685
398 0.014217358623040416
399 0.013721339306380916
400 0.013242666555214076
401 0.012780771377767505
402 0.012334955282331769
403 0.01190473887825167
404 0.011489681092003493
405 0.011089193579504853
406 0.0107025509361


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [38]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in).to('cuda:0')
y = torch.randn(N, D_out).to('cuda:0')

w1 = torch.randn(D_in, H).to('cuda:0')
w2 = torch.randn(H, D_out).to('cuda:0')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 30250086.0
1 28872602.0
2 31828720.0
3 33688052.0
4 30448412.0
5 21634968.0
6 12400173.0
7 6225164.5
8 3178788.0
9 1825991.25
10 1218190.5
11 912062.25
12 732123.625
13 610064.0
14 518731.25
15 446301.40625
16 387034.9375
17 337680.1875
18 296225.125
19 261039.359375
20 230955.421875
21 205083.84375
22 182737.390625
23 163351.71875
24 146494.515625
25 131771.6875
26 118844.15625
27 107451.5625
28 97388.515625
29 88468.859375
30 80539.484375
31 73469.5078125
32 67145.8125
33 61473.0234375
34 56371.78125
35 51782.8828125
36 47644.37890625
37 43897.9375
38 40500.3359375
39 37415.5234375
40 34608.921875
41 32050.83984375
42 29715.56640625
43 27580.7265625
44 25630.919921875
45 23846.7578125
46 22210.37890625
47 20705.72265625
48 19320.408203125
49 18042.705078125
50 16863.296875
51 15773.8984375
52 14766.0107421875
53 13832.392578125
54 12966.85546875
55 12163.8916015625
56 11418.001953125
57 10724.654296875
58 10079.740234375
59 9478.896484375
60 8919.18359375
61 8398.27734375
62 7911.7

383 0.018374446779489517
384 0.017748259007930756
385 0.017131933942437172
386 0.016541294753551483
387 0.015969909727573395
388 0.015424471348524094
389 0.014895631931722164
390 0.014386126771569252
391 0.013907031156122684
392 0.013425618410110474
393 0.012973438948392868
394 0.012528006918728352
395 0.012101845815777779
396 0.011687804944813251
397 0.011300770565867424
398 0.010916613973677158
399 0.010548194870352745
400 0.010189925320446491
401 0.009846732951700687
402 0.009515131823718548
403 0.009196916595101357
404 0.008887029252946377
405 0.008587455376982689
406 0.008299765177071095
407 0.008023882284760475
408 0.007760077714920044
409 0.007503504864871502
410 0.007252664305269718
411 0.007012003567069769
412 0.006783957593142986
413 0.006564436014741659
414 0.006345140747725964
415 0.006140252109616995
416 0.005936454050242901
417 0.005743416026234627
418 0.00555788166821003
419 0.0053766462951898575
420 0.005202390253543854
421 0.005036048591136932
422 0.004873798228800297


简单的autograd

In [35]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b # y = 2*1+3

y.backward()

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)


tensor(1.)
tensor(2.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [104]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 30471860.0
1 26228534.0
2 23250712.0
3 19323444.0
4 14492299.0
5 9863642.0
6 6322839.0
7 3993820.5
8 2593148.75
9 1772902.75
10 1285633.625
11 982769.125
12 783322.875
13 643465.75
14 539724.875
15 459373.625
16 395244.0
17 342868.28125
18 299300.59375
19 262557.96875
20 231325.5
21 204574.59375
22 181519.578125
23 161545.640625
24 144163.359375
25 128993.734375
26 115730.0859375
27 104073.953125
28 93788.8828125
29 84687.84375
30 76613.078125
31 69429.625
32 63025.64453125
33 57302.21875
34 52180.94921875
35 47590.734375
36 43471.87890625
37 39764.48046875
38 36424.37890625
39 33403.87109375
40 30667.91796875
41 28185.20703125
42 25927.62109375
43 23873.6484375
44 22001.40234375
45 20293.447265625
46 18732.90625
47 17305.96484375
48 16000.0126953125
49 14803.1748046875
50 13705.9853515625
51 12698.6142578125
52 11772.5693359375
53 10921.0810546875
54 10137.337890625
55 9415.0673828125
56 8749.2607421875
57 8135.1337890625
58 7568.26171875
59 7044.49169921875
60 6560.39404296875
61 6

426 0.0005005761049687862
427 0.0004886656533926725
428 0.0004766747879330069
429 0.00046420813305303454
430 0.0004530760634224862
431 0.00044140161480754614
432 0.00043154825107194483
433 0.00042091053910553455
434 0.00041044564568437636
435 0.0004003559588454664
436 0.000391183712054044
437 0.0003828379267361015
438 0.00037331454223021865
439 0.0003651838924270123
440 0.0003571343549992889
441 0.00034884311025962234
442 0.00034054124262183905
443 0.0003324503777548671
444 0.00032492668833583593
445 0.0003185669193044305
446 0.00031215840135701
447 0.0003047320060431957
448 0.000297665799735114
449 0.00029143612482585013
450 0.0002847549912985414
451 0.0002797916531562805
452 0.0002737375907599926
453 0.0002680768957361579
454 0.0002621157036628574
455 0.0002568378404248506
456 0.0002514065126888454
457 0.0002464077842887491
458 0.00024145790666807443
459 0.0002364566025789827
460 0.00023189335479401052
461 0.0002266272495035082
462 0.00022262055426836014
463 0.00021841854322701693
46


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [114]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad()

0 28937890.0
1 23708218.0
2 25900276.0
3 31564730.0
4 36221696.0
5 34548928.0
6 25147132.0
7 13910646.0
8 6474728.0
9 3005202.25
10 1618558.5
11 1055975.625
12 795853.375
13 649322.4375
14 550674.4375
15 475871.40625
16 415458.09375
17 365069.0
18 322357.28125
19 285773.0625
20 254263.21875
21 226923.859375
22 203099.890625
23 182238.15625
24 163887.015625
25 147698.84375
26 133395.546875
27 120695.09375
28 109397.8125
29 99327.9296875
30 90320.203125
31 82259.09375
32 75033.1796875
33 68532.6640625
34 62677.9140625
35 57397.265625
36 52616.3359375
37 48279.73046875
38 44344.37109375
39 40770.19140625
40 37517.8203125
41 34553.5859375
42 31851.201171875
43 29383.748046875
44 27127.37109375
45 25062.505859375
46 23171.8984375
47 21438.802734375
48 19848.447265625
49 18388.103515625
50 17046.625
51 15812.09765625
52 14676.0859375
53 13629.376953125
54 12664.22265625
55 11773.703125
56 10951.4765625
57 10192.1376953125
58 9490.6220703125
59 8841.31640625
60 8241.046875
61 7685.80126953125

415 0.0009464324684813619
416 0.0009199553169310093
417 0.0008938985411077738
418 0.0008681747131049633
419 0.0008440786623395979
420 0.0008184001198969781
421 0.0007966457051225007
422 0.0007744226022623479
423 0.0007520546205341816
424 0.0007311642402783036
425 0.0007106095436029136
426 0.0006915915291756392
427 0.0006718204822391272
428 0.0006538843153975904
429 0.0006378090474754572
430 0.0006195709574967623
431 0.000604025786742568
432 0.000588531605899334
433 0.0005726809613406658
434 0.0005578958080150187
435 0.0005443562404252589
436 0.0005287613021209836
437 0.0005157238338142633
438 0.0005042870179750025
439 0.0004913591546937823
440 0.0004785044293384999
441 0.0004674119991250336
442 0.00045650688116438687
443 0.00044453126611188054
444 0.0004345918423496187
445 0.00042317615589126945
446 0.00041341138421557844
447 0.0004029229166917503
448 0.0003939396410714835
449 0.0003840120625682175
450 0.000375825387891382
451 0.0003669420548249036
452 0.0003586196980904788
453 0.00035

In [113]:
model[0].weight

Parameter containing:
tensor([[-0.0218,  0.0212,  0.0243,  ...,  0.0230,  0.0247,  0.0168],
        [-0.0144,  0.0177, -0.0221,  ...,  0.0161,  0.0098, -0.0172],
        [ 0.0086, -0.0122, -0.0298,  ..., -0.0236, -0.0187,  0.0295],
        ...,
        [ 0.0266, -0.0008, -0.0141,  ...,  0.0018,  0.0319, -0.0129],
        [ 0.0296, -0.0005,  0.0115,  ...,  0.0141, -0.0088, -0.0106],
        [ 0.0289, -0.0077,  0.0239,  ..., -0.0166, -0.0156, -0.0235]],
       requires_grad=True)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [118]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 24436214.0
1 20115276.0
2 18840850.0
3 18223790.0
4 17027580.0
5 14675071.0
6 11567663.0
7 8366201.5
8 5720385.0
9 3799774.75
10 2535152.0
11 1735162.5
12 1236944.25
13 922564.25
14 718680.1875
15 580795.125
16 483214.65625
17 410988.375
18 355345.84375
19 310880.21875
20 274325.625
21 243722.59375
22 217734.828125
23 195305.78125
24 175787.5625
25 158686.078125
26 143600.5
27 130217.0390625
28 118322.4765625
29 107720.890625
30 98256.671875
31 89756.2734375
32 82104.359375
33 75197.3125
34 68949.78125
35 63292.28515625
36 58161.140625
37 53495.71484375
38 49262.35546875
39 45406.51171875
40 41886.3671875
41 38671.0625
42 35729.078125
43 33036.390625
44 30567.708984375
45 28301.845703125
46 26222.076171875
47 24308.93359375
48 22548.6953125
49 20927.591796875
50 19433.642578125
51 18058.23046875
52 16788.662109375
53 15616.2177734375
54 14533.13671875
55 13530.798828125
56 12604.1884765625
57 11745.923828125
58 10950.625
59 10213.337890625
60 9529.8671875
61 8895.59375
62 8306.091796

438 0.0002393243630649522
439 0.00023363585933111608
440 0.00022893572167959064
441 0.00022382299357559532
442 0.00021802390983793885
443 0.0002135633840225637
444 0.00020877565839327872
445 0.00020438502542674541
446 0.000200132533791475
447 0.00019636568322312087
448 0.00019197550136595964
449 0.00018845757585950196
450 0.00018516821728553623
451 0.0001812220725696534
452 0.00017768006364349276
453 0.00017394236056134105
454 0.00017036692588590086
455 0.0001669702905928716
456 0.000163633594638668
457 0.0001608784223208204
458 0.00015729425649624318
459 0.00015425201854668558
460 0.0001512485760031268
461 0.00014837279741186649
462 0.00014571723295375705
463 0.00014315942826215178
464 0.0001404505455866456
465 0.00013795308768749237
466 0.00013533096353057772
467 0.00013275298988446593
468 0.0001303627504967153
469 0.00012791437620762736
470 0.00012587543460540473
471 0.00012379918189253658
472 0.000121756260341499
473 0.00011986290337517858
474 0.00011718282621586695
475 0.000115406


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [122]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 675.5318603515625
1 659.0227661132812
2 642.9276123046875
3 627.2772216796875
4 612.0651245117188
5 597.2562255859375
6 582.8500366210938
7 568.887451171875
8 555.327880859375
9 542.1878662109375
10 529.456787109375
11 517.1315307617188
12 505.17462158203125
13 493.5650329589844
14 482.40338134765625
15 471.6194152832031
16 461.161865234375
17 450.9877624511719
18 441.0876159667969
19 431.423828125
20 421.9713134765625
21 412.79229736328125
22 403.83502197265625
23 395.1403503417969
24 386.673583984375
25 378.3887939453125
26 370.2898254394531
27 362.3497009277344
28 354.5650634765625
29 346.9619140625
30 339.5443115234375
31 332.314208984375
32 325.2721252441406
33 318.3789978027344
34 311.65435791015625
35 305.1058349609375
36 298.7152099609375
37 292.43585205078125
38 286.27581787109375
39 280.24542236328125
40 274.3594055175781
41 268.6078796386719
42 262.9523010253906
43 257.3995666503906
44 251.9408721923828
45 246.5938262939453
46 241.3463592529297
47 236.2169647216797
48 231.

360 0.0006928329239599407
361 0.0006594658480025828
362 0.0006276500644162297
363 0.0005973356892354786
364 0.0005684461793862283
365 0.0005409115692600608
366 0.0005146677722223103
367 0.0004896665341220796
368 0.0004658424004446715
369 0.0004431433626450598
370 0.0004215217486489564
371 0.00040092665585689247
372 0.00038129440508782864
373 0.0003626033430919051
374 0.00034480434260331094
375 0.00032785360235720873
376 0.0003117044398095459
377 0.0002963263541460037
378 0.000281691609416157
379 0.00026776010054163635
380 0.000254485901677981
381 0.00024186100927181542
382 0.00022984233510214835
383 0.0002183937467634678
384 0.0002075097436318174
385 0.00019714680092874914
386 0.00018728163558989763
387 0.00017789709090720862
388 0.00016896944725885987
389 0.00016047449025791138
390 0.00015239401545841247
391 0.00014470981841441244
392 0.00013740228314418346
393 0.0001304488250752911
394 0.00012383922876324505
395 0.00011755021841963753
396 0.00011157765402458608
397 0.0001058938141795