# 第一课


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [8]:
import torch

构造一个未初始化的5x3矩阵:

In [2]:
x = torch.empty(5,3)
x

tensor([[6.0544e+26, 7.5110e-43, 1.4123e+20],
        [1.9324e-42, 0.0000e+00, 0.0000e+00],
        [2.1019e-44, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])

构建一个随机初始化的矩阵:

In [3]:
x = torch.rand(5,3)
x

tensor([[0.7097, 0.8390, 0.7657],
        [0.2429, 0.6576, 0.5448],
        [0.7818, 0.3429, 0.2571],
        [0.7865, 0.6909, 0.9925],
        [0.9803, 0.8722, 0.6800]])

构建一个全部为0，类型为long的矩阵:

In [4]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [5]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [12]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [16]:
x = x.new_ones(5,3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [6]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[ 0.3810, -1.4158,  1.5123],
        [ 1.2750, -0.8045,  0.9899],
        [-0.7283, -1.0833, -0.4748],
        [ 0.0238,  0.6993, -0.8439],
        [ 0.5793,  0.3279, -0.6226]])

得到tensor的形状:

In [7]:
x.shape

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [8]:
y = torch.rand(5,3)
y

tensor([[0.0928, 0.1498, 0.5364],
        [0.3903, 0.0997, 0.2491],
        [0.2245, 0.7477, 0.8506],
        [0.8338, 0.6930, 0.8916],
        [0.5807, 0.4412, 0.5481]])

In [9]:
x + y

tensor([[ 0.4738, -1.2661,  2.0486],
        [ 1.6653, -0.7047,  1.2390],
        [-0.5037, -0.3356,  0.3758],
        [ 0.8576,  1.3923,  0.0476],
        [ 1.1600,  0.7691, -0.0744]])

另一种着加法的写法


In [10]:
torch.add(x, y)

tensor([[ 0.4738, -1.2661,  2.0486],
        [ 1.6653, -0.7047,  1.2390],
        [-0.5037, -0.3356,  0.3758],
        [ 0.8576,  1.3923,  0.0476],
        [ 1.1600,  0.7691, -0.0744]])

加法：把输出作为一个变量

In [11]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[ 0.4738, -1.2661,  2.0486],
        [ 1.6653, -0.7047,  1.2390],
        [-0.5037, -0.3356,  0.3758],
        [ 0.8576,  1.3923,  0.0476],
        [ 1.1600,  0.7691, -0.0744]])

in-place加法

In [12]:
y.add_(x)
y

tensor([[ 0.4738, -1.2661,  2.0486],
        [ 1.6653, -0.7047,  1.2390],
        [-0.5037, -0.3356,  0.3758],
        [ 0.8576,  1.3923,  0.0476],
        [ 1.1600,  0.7691, -0.0744]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [13]:
x[1:, 1:]

tensor([[-0.8045,  0.9899],
        [-1.0833, -0.4748],
        [ 0.6993, -0.8439],
        [ 0.3279, -0.6226]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [14]:
x = torch.randn(4,4)
y = x.view(16) #一行16列
z = x.view(-1,8) #自动算-1是2
z

tensor([[ 0.6270, -0.8419,  0.0040,  0.3074,  0.1586,  0.0394,  0.4364,  0.6716],
        [-0.5896, -1.1069, -1.2116, -1.2828, -0.8017, -1.6660,  1.5386,  0.6995]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [40]:
x = torch.randn(1)
x

tensor([-1.1493])

In [16]:
#dir(x)
#x.grad #梯度

In [44]:
x.item()

-1.1493233442306519

In [48]:
z.transpose(1,0)

tensor([[-0.5683, -0.2612],
        [ 1.3885, -0.4682],
        [-2.0829, -1.0596],
        [-0.7613,  0.7447],
        [-1.9115,  0.7603],
        [ 0.3732, -0.4281],
        [-0.2055,  0.5495],
        [-1.2300,  0.1025]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [17]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [18]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [19]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [20]:
a   #共享内存空间

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [2]:
import numpy as np

In [22]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)

[2. 2. 2. 2. 2.]


In [56]:
b
#如果是a=a+1，则b不变，说明此a存储空间非彼a

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [23]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))
    

In [None]:
y.to("cpu").data.numpy()
y.cpu().data.numpy()

In [None]:
model = model.cuda()



热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [6]:
N, D_in, H, D_out = 64, 1000, 100, 10
#64个输入，1000维输入，hidden100维，输出10维
# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0   #假如h<0,grad_h=0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 26260518.479507644
1 20258679.056001905
2 17801338.348818142
3 16243557.548461802
4 14446199.501138322
5 12040192.83698256
6 9343515.93041677
7 6775539.127125524
8 4712715.7087360285
9 3212794.619045986
10 2206942.6891931323
11 1550994.775646038
12 1129272.5978639415
13 853975.0693235979
14 669682.2262716827
15 541625.3938003681
16 449140.4722563087
17 379660.5018198945
18 325544.59499496233
19 282119.2200693574
20 246510.94077322196
21 216736.2930860414
22 191491.7175115259
23 169896.98794850317
24 151235.19802019914
25 135011.5166086263
26 120835.4673937675
27 108395.48877197046
28 97450.14592517677
29 87775.92949265646
30 79206.35818883323
31 71602.80958897714
32 64840.90844493135
33 58802.638592075746
34 53400.30222929719
35 48560.46447970312
36 44214.197939120655
37 40305.35676815151
38 36781.64362404117
39 33601.681337603615
40 30727.1124692268
41 28124.951282969774
42 25767.71171031759
43 23628.00263577388
44 21683.801897013873
45 19914.89061400213
46 18304.738441893227
47 168

406 3.575589573314604e-05
407 3.406247499452774e-05
408 3.245073026746582e-05
409 3.091439209390789e-05
410 2.9451032995389552e-05
411 2.8057220495754328e-05
412 2.672968128728712e-05
413 2.5466085766337747e-05
414 2.4261580705175413e-05
415 2.3114244748222312e-05
416 2.2021370559109786e-05
417 2.0980417143365408e-05
418 1.9989562313724553e-05
419 1.904495889860647e-05
420 1.8145159628202742e-05
421 1.7288081793904154e-05
422 1.6471598914932146e-05
423 1.569438461560476e-05
424 1.495340280852953e-05
425 1.4247530056993007e-05
426 1.3575092072183875e-05
427 1.2934562587572247e-05
428 1.2324759800959744e-05
429 1.1743366056137079e-05
430 1.118949230845728e-05
431 1.0661829388153352e-05
432 1.0159155263136402e-05
433 9.68059889834199e-06
434 9.224311499465546e-06
435 8.789601906180459e-06
436 8.375483931288158e-06
437 7.981050177311488e-06
438 7.605394958254384e-06
439 7.2471998678674146e-06
440 6.905929043016728e-06
441 6.580777741368997e-06
442 6.271010283627686e-06
443 5.97605962258598


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [9]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 34731688.0
1 30377002.0
2 27971616.0
3 23638750.0
4 17431602.0
5 11184127.0
6 6655522.5
7 3942436.0
8 2478214.5
9 1694665.625
10 1254510.625
11 985414.0
12 804553.9375
13 672981.125
14 571692.625
15 490613.4375
16 424187.0625
17 368879.4375
18 322421.40625
19 283021.0625
20 249408.578125
21 220612.0
22 195763.0625
23 174238.421875
24 155498.0
25 139133.375
26 124862.8359375
27 112323.4765625
28 101268.3046875
29 91484.4453125
30 82814.609375
31 75107.9765625
32 68234.2734375
33 62089.91796875
34 56590.2578125
35 51654.1875
36 47214.9296875
37 43214.96875
38 39603.88671875
39 36339.21484375
40 33382.46875
41 30701.982421875
42 28270.013671875
43 26061.02734375
44 24050.01953125
45 22215.837890625
46 20541.65625
47 19010.291015625
48 17609.095703125
49 16325.552734375
50 15147.8671875
51 14069.2685546875
52 13081.8251953125
53 12173.0478515625
54 11335.9619140625
55 10564.5166015625
56 9852.2275390625
57 9194.3408203125
58 8586.0595703125
59 8023.18212890625
60 7501.849609375
61 7018.8

380 0.016018245369195938
381 0.015476448461413383
382 0.014951464720070362
383 0.014441912062466145
384 0.013956673443317413
385 0.013486040756106377
386 0.013026065193116665
387 0.01258719153702259
388 0.012163025327026844
389 0.011754828505218029
390 0.011355024762451649
391 0.010977594181895256
392 0.01060868427157402
393 0.01025715284049511
394 0.009912313893437386
395 0.009589200839400291
396 0.009274404495954514
397 0.008967535570263863
398 0.008663455955684185
399 0.008379526436328888
400 0.008097177371382713
401 0.007831125520169735
402 0.007572998758405447
403 0.00732583599165082
404 0.007083498407155275
405 0.006851179525256157
406 0.006627289578318596
407 0.006414034403860569
408 0.006202825345098972
409 0.005998758599162102
410 0.005812093149870634
411 0.0056219627149403095
412 0.005437853746116161
413 0.00526835722848773
414 0.00509529747068882
415 0.004936501383781433
416 0.004780967719852924
417 0.004625502973794937
418 0.004480442032217979
419 0.00434182770550251
420 0.

简单的autograd

In [72]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b # y = 2*1+3

y.backward()

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)


tensor(1.)
tensor(2.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [10]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    #clamp 把min，max以外的x夹到区间内，可以实现relu
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad(): #不让计算图占内存
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_() #清零grad
        w2.grad.zero_()

0 32960512.0
1 29188500.0
2 28438578.0
3 26402736.0
4 21384986.0
5 14806375.0
6 9017941.0
7 5177371.5
8 3013784.25
9 1875331.5
10 1273282.0
11 936571.5625
12 731784.25
13 595563.6875
14 497064.5
15 421983.125
16 362465.65625
17 313918.28125
18 273591.0
19 239660.28125
20 210855.84375
21 186258.21875
22 165123.828125
23 146856.09375
24 131007.2890625
25 117208.375
26 105146.2109375
27 94551.4765625
28 85213.46875
29 76961.1640625
30 69646.53125
31 63148.8203125
32 57360.328125
33 52186.21875
34 47553.2421875
35 43401.5
36 39676.4296875
37 36320.96875
38 33291.2578125
39 30550.796875
40 28069.7265625
41 25817.955078125
42 23772.654296875
43 21911.853515625
44 20217.92578125
45 18672.6640625
46 17259.91796875
47 15968.1181640625
48 14784.634765625
49 13699.275390625
50 12702.2109375
51 11785.68359375
52 10942.537109375
53 10166.63671875
54 9451.8134765625
55 8792.091796875
56 8183.12060546875
57 7620.6083984375
58 7100.43798828125
59 6619.15478515625
60 6173.48779296875
61 5760.7373046875

374 0.0020653975661844015
375 0.0019925590604543686
376 0.0019236052175983787
377 0.0018579961033537984
378 0.001791852992027998
379 0.0017311401898041368
380 0.0016731239156797528
381 0.0016177863581106067
382 0.001560962526127696
383 0.0015100686578080058
384 0.0014613399980589747
385 0.0014123055152595043
386 0.0013675085501745343
387 0.0013217814266681671
388 0.0012786616571247578
389 0.0012371735647320747
390 0.0011988658225163817
391 0.0011587187182158232
392 0.0011230671079829335
393 0.0010875911684706807
394 0.0010540611110627651
395 0.0010209353640675545
396 0.0009897075360640883
397 0.000957503158133477
398 0.0009302443941123784
399 0.000901674444321543
400 0.000873816548846662
401 0.0008489174651913345
402 0.0008237242582254112
403 0.0007994757615961134
404 0.0007755120750516653
405 0.0007538145873695612
406 0.0007312081288546324
407 0.0007095684995874763
408 0.0006886759074404836
409 0.0006687747081741691
410 0.0006512730615213513
411 0.0006336100632324815
412 0.00061538867


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [11]:
import torch.nn as nn #neural network

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x 
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight) #参数初始化
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad() #下一次backward前更新

0 25838628.0
1 18728948.0
2 15530906.0
3 13685715.0
4 12187200.0
5 10555952.0
6 8781750.0
7 6967601.0
8 5331275.0
9 3962857.75
10 2911013.5
11 2135065.75
12 1583214.625
13 1194474.0
14 921617.0
15 727984.0625
16 588416.3125
17 485569.21875
18 408068.6875
19 348146.4375
20 300773.09375
21 262421.15625
22 230790.390625
23 204268.046875
24 181759.203125
25 162417.046875
26 145661.03125
27 131029.28125
28 118178.9765625
29 106853.359375
30 96809.8828125
31 87883.8984375
32 79942.0
33 72844.0546875
34 66483.125
35 60770.84765625
36 55622.07421875
37 50975.71875
38 46779.8359375
39 42982.74609375
40 39538.83203125
41 36413.05078125
42 33569.5390625
43 30986.255859375
44 28637.783203125
45 26490.330078125
46 24525.662109375
47 22726.7421875
48 21077.177734375
49 19562.30078125
50 18169.234375
51 16887.091796875
52 15707.9775390625
53 14620.9248046875
54 13617.5634765625
55 12690.2890625
56 11832.76953125
57 11039.4384765625
58 10304.7060546875
59 9623.4462890625
60 8991.3349609375
61 8404.833

373 0.001068723271600902
374 0.0010309848003089428
375 0.0009957392467185855
376 0.0009605677332729101
377 0.0009263763786293566
378 0.0008968023466877639
379 0.000863698311150074
380 0.0008349126437678933
381 0.0008052863995544612
382 0.0007805170607753098
383 0.0007546487031504512
384 0.0007292722002603114
385 0.0007066510734148324
386 0.0006829076446592808
387 0.0006615056772716343
388 0.0006401198334060609
389 0.0006185690872371197
390 0.0006001098663546145
391 0.000580713851377368
392 0.000564343761652708
393 0.0005465850117616355
394 0.0005295487935654819
395 0.0005139454733580351
396 0.0004990484449081123
397 0.00048383043031208217
398 0.0004695757816080004
399 0.0004563727125059813
400 0.00044244242599233985
401 0.00042923897854052484
402 0.00041678015259094536
403 0.00040548015385866165
404 0.00039350465522147715
405 0.0003817620745394379
406 0.00037165347021073103
407 0.000360832636943087
408 0.00035103256232105196
409 0.00034128365223295987
410 0.00033162691397592425
411 0.0

In [13]:
model[0].weight

Parameter containing:
tensor([[-3.6821e-02,  1.5798e-01, -2.2564e-01,  ...,  1.7312e-03,
         -3.7531e-02, -5.3610e-01],
        [-2.0635e+00,  1.1830e+00, -7.2006e-01,  ...,  1.2211e+00,
         -6.8314e-01,  3.8206e-01],
        [-1.7063e+00, -1.9049e-01,  4.5881e-01,  ...,  2.8608e-01,
         -6.0220e-01,  4.8516e-02],
        ...,
        [-5.8931e-01, -1.0688e+00, -1.4994e+00,  ...,  1.8958e+00,
          7.3361e-01, -2.0974e+00],
        [ 3.9113e-01, -2.6700e+00, -1.4424e+00,  ...,  7.8076e-01,
          4.7368e-01,  8.6743e-01],
        [ 1.4691e+00, -5.8205e-01,  3.3722e-01,  ...,  3.2276e-01,
         -3.1056e-01,  2.1613e-01]], requires_grad=True)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [14]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4   adam一般用1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 28643966.0
1 22942542.0
2 21245074.0
3 20417300.0
4 18791420.0
5 15755221.0
6 11884029.0
7 8162179.0
8 5286388.5
9 3355601.0
10 2170669.5
11 1466577.875
12 1047371.625
13 789463.625
14 622841.5
15 508720.28125
16 425932.75
17 362807.84375
18 312861.75
19 272148.46875
20 238248.953125
21 209602.640625
22 185184.578125
23 164189.578125
24 146076.0625
25 130333.4765625
26 116591.21875
27 104554.5
28 93980.2578125
29 84664.2421875
30 76441.234375
31 69155.765625
32 62687.03515625
33 56924.4453125
34 51780.859375
35 47181.7734375
36 43061.9375
37 39362.92578125
38 36032.85546875
39 33029.96484375
40 30317.93359375
41 27864.3203125
42 25641.2890625
43 23624.7109375
44 21787.55859375
45 20116.298828125
46 18593.755859375
47 17204.470703125
48 15935.435546875
49 14774.4609375
50 13711.4931640625
51 12736.818359375
52 11841.966796875
53 11019.6142578125
54 10262.958984375
55 9566.1806640625
56 8923.7392578125
57 8330.8974609375
58 7783.23681640625
59 7277.04296875
60 6808.53955078125
61 6374.

378 0.006758278701454401
379 0.0065166824497282505
380 0.006276507396250963
381 0.006046999711543322
382 0.005817735567688942
383 0.0056042904034256935
384 0.005399123299866915
385 0.005203564185649157
386 0.005014677997678518
387 0.0048336489126086235
388 0.004659057594835758
389 0.0044856807217001915
390 0.004322004970163107
391 0.004170745145529509
392 0.00402364507317543
393 0.0038795594591647387
394 0.0037459838204085827
395 0.0036099234130233526
396 0.0034790041390806437
397 0.0033591415267437696
398 0.0032369722612202168
399 0.0031255106441676617
400 0.0030163524206727743
401 0.0029137427918612957
402 0.002813653787598014
403 0.002713663736358285
404 0.0026211889926344156
405 0.002534805564209819
406 0.0024492761585861444
407 0.0023695528507232666
408 0.002287674229592085
409 0.002208961173892021
410 0.0021381250116974115
411 0.00206698733381927
412 0.0019968408159911633
413 0.0019324992317706347
414 0.0018690747674554586
415 0.0018087386852130294
416 0.0017515389481559396
417 0


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [15]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):  #继承
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 641.4818115234375
1 624.8438110351562
2 608.7247314453125
3 593.102783203125
4 577.921875
5 563.0911865234375
6 548.759521484375
7 534.8473510742188
8 521.3331909179688
9 508.1829833984375
10 495.3949279785156
11 483.0735778808594
12 471.0773010253906
13 459.41546630859375
14 448.04290771484375
15 437.0804748535156
16 426.4126281738281
17 415.96649169921875
18 405.809814453125
19 395.948974609375
20 386.4199523925781
21 377.21453857421875
22 368.2930603027344
23 359.65032958984375
24 351.2679748535156
25 343.07952880859375
26 335.0801086425781
27 327.2890319824219
28 319.73028564453125
29 312.3487854003906
30 305.1401062011719
31 298.08795166015625
32 291.19989013671875
33 284.48931884765625
34 277.9595947265625
35 271.5782470703125
36 265.3182373046875
37 259.176513671875
38 253.17208862304688
39 247.30111694335938
40 241.55137634277344
41 235.90223693847656
42 230.35887145996094
43 224.92999267578125
44 219.6110076904297
45 214.39511108398438
46 209.2941436767578
47 204.29656982421

350 9.92795662568824e-07
351 8.8498171635365e-07
352 7.891184736763535e-07
353 7.019068561930908e-07
354 6.248518502616207e-07
355 5.55952908598556e-07
356 4.945455316374137e-07
357 4.402316449159116e-07
358 3.909035513061099e-07
359 3.4757212574731966e-07
360 3.0902501180207764e-07
361 2.746351128735114e-07
362 2.4382450192206306e-07
363 2.1684778062081023e-07
364 1.9249166882673308e-07
365 1.7089445236706524e-07
366 1.5174757095337554e-07
367 1.347842442100955e-07
368 1.1968988644639467e-07
369 1.0628213686914023e-07
370 9.431405345594612e-08
371 8.386479066757602e-08
372 7.437996174530781e-08
373 6.615483982841397e-08
374 5.8797517965558654e-08
375 5.230256405752698e-08
376 4.646130591368092e-08
377 4.1304730302726966e-08
378 3.680252902427128e-08
379 3.278962523722839e-08
380 2.9187445349521113e-08
381 2.6057561441916732e-08
382 2.3270434468258827e-08
383 2.0708776915512317e-08
384 1.863616461150741e-08
385 1.659472914639082e-08
386 1.4851767815571293e-08
387 1.3280308408525343e-08