# 什么是Pytorch？

Pytorch是一个基于Python的科学计算库，它有以下特点：\
（1）类似于Numpy,但它可以使用GPU\
（2）它可以定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors\
Tensor类似于Numpy的ndarray,唯一的区别是Tensor可以在GPU上加速运算

# 内容一：Pytorch初识

In [1]:
import torch
import torchvision

tensor的构建：
- 构建一个未初始化的5$\times$3矩阵
- 构建一个随机初始化的矩阵
- 构建一个全部为0，类型为long的矩阵
- 从数据直接构建tensor
- 也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如数据类型
- 产生跟原来数据相同形状的tensor

In [2]:
x = torch.empty(5,3)
x = torch.rand(5,3)
x = torch.zeros(5,3,dtype=torch.long) # x = torch.zeros(5,3).long()
x = torch.tensor([5.5,3])
y = x.new_ones(5,3)
y = torch.randn_like(x) # x.shape

Tensor的运算：
- 加法运算（note: 任何in-place运算都会以_结尾，例如x.copy_(y)）
- 各类的Numpy的index操作都可以在tensor上使用
- Resize操作，使用.view()
- 如果只有一个元素的tensor，使用.item()方法可以把里面的value变成Python数值

In [3]:
x = torch.rand(4,4)
y = torch.rand(4,4)
# for add
x + y # torch.add(x,y)
y.add_(x) # in-place加法
# for index
x[1:,1:]
# for resize
z = x.view(16)
z = x.view(2,-1)
# pick up the value
single_x = torch.randn(1)
single_x.item()

0.4365122318267822

Numpy与Tensor之间的转化

In [4]:
import numpy as np

# tensor转numpy
a = torch.ones(5)
b = a.numpy()

# numpy转tensor
a = np.ones(5)
b = torch.from_numpy(a)

# 内容二：利用numpy实现两层神经网络

用该程序帮我们复习BP算法\
一个全连接ReLU神经网络，一个隐含层，没有bias，使用L2 Loss
- $h = W_1X+b_1$
- $a = max(0,h)$
- $y_{hat} = W_2a+b_2$

numpy ndarray是一个普通的n维array，它不知道任何关于深度学习或者梯度（gradient）的知识，也不知道计算图（computation graph），只是一种用来计算数学运算的数据结构

In [5]:
N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
# 随机初始化一些训练数据
x = np.random.randn(N,D_in) # random initialize some data
y = np.random.randn(N,D_out) # random initialize the label of the data
w1 = np.random.randn(D_in,H) # 初始化权重
w2 = np.random.randn(H,D_out)
learning_rate = 1e-6
for it in range(500):
    # forward pass，compute the output
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h,0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    # compute loss
    loss = np.square(y_pred-y).sum()
    print(it,loss)
    # BP - compute the gradient
    grad_y_pred = 2.0 * (y_pred-y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2   

0 29815211.97280951
1 32430052.168707892
2 41960480.7684306
3 50538989.69902974
4 46927826.70114848
5 29457594.81639556
6 12589534.05255913
7 4630010.327981381
8 2083698.3171426926
9 1297971.6185855288
10 988173.6698992307
11 814303.1639805045
12 690701.6832366292
13 593250.6522803409
14 513342.34551331913
15 446787.19380660413
16 390820.46165329876
17 343336.28486629005
18 302866.09728740295
19 268171.0055956651
20 238334.85384275578
21 212510.08094142424
22 190065.87219428248
23 170477.34204914363
24 153324.89267491596
25 138243.4205611129
26 124941.54096124342
27 113161.81178408166
28 102708.8857205775
29 93410.5579973696
30 85116.4302100552
31 77702.82644016932
32 71058.98281743022
33 65093.80453212628
34 59729.10314163721
35 54887.62467379314
36 50508.50358723455
37 46539.723242484586
38 42939.923548629435
39 39668.639106456416
40 36689.172384854224
41 33973.59940964919
42 31492.81561369486
43 29222.644429470405
44 27143.59181777021
45 25239.51116357791
46 23490.70465551441
47 218

427 0.001070973361446263
428 0.0010292029413278038
429 0.0009890626415247435
430 0.0009504951863454966
431 0.0009134238482259158
432 0.000877799232198885
433 0.0008435656083360301
434 0.0008106792852440734
435 0.000779079875029474
436 0.0007487049656158387
437 0.0007195155386156652
438 0.0006914650882473062
439 0.0006645062881636657
440 0.0006386055990476804
441 0.0006137254279927941
442 0.0005898034452199515
443 0.0005668145961534457
444 0.0005447328217862088
445 0.000523516859730152
446 0.0005031168390736576
447 0.00048352136863154683
448 0.00046468026425756427
449 0.0004465756552783626
450 0.00042918172364730554
451 0.0004124618266588898
452 0.00039639519976905114
453 0.00038096338539798035
454 0.0003661277995658667
455 0.0003518678794245967
456 0.0003381662942413371
457 0.00032499679481036165
458 0.00031234132824151247
459 0.0003001849567657227
460 0.0002884990652741964
461 0.0002772670579235523
462 0.0002664723533220963
463 0.00025609964361466886
464 0.00024613103759472184
465 0.0

In [6]:
x = torch.from_numpy(x)
y = x.new_ones(5,3)
y.dtype

torch.float64

# 内容三：利用Tensors实现两层神经网络

利用tensors来创建前向神经网络，计算损失以及反向传播

Tensor: 一个Pytorch Tensor很像一个numpy的ndarray，但是它和numpy ndarray最大的区别是，Pytorch Tensor可以再CPU或者GPU上运算。如果想在GPU上运算，就需要把Tensor换成cuda类型

Autograd: Pytorch的一个重要功能就是autograd，也就是说只要定义了forward pass（前向神经网络），计算了Loss之后，Pytorch可以自动求导计算模型所有参数的梯度。

要想计算某个Tensor x 的梯度需要x.requires_grad = True,那么x.grad存储着x当前梯度

# 子内容一：直接用Tensor替换numpy

In [7]:
N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
# 随机初始化一些训练数据
x = torch.randn(N,D_in) # x = np.random.randn(N,D_in)
y = torch.randn(N,D_out) # y = np.random.randn(N,D_out)
w1 = torch.randn(D_in,H) # w1 = np.random.randn(D_in,H)
w2 = torch.randn(H,D_out) # w2 = np.random.randn(H,D_out)
learning_rate = 1e-6
for it in range(500):
    # forward pass
    h = x.mm(w1) # N * H      h = x.dot(w1)
    h_relu = h.clamp(min=0) # N * H     np.maximum(h,0)
    y_pred = h_relu.mm(w2) # N * D_out     h_relu.dot(w2)  
    # compute loss
    loss = (y_pred - y).pow(2).sum() # np.square(y_pred-y).sum()
    print(it,loss.item()) #  print(it,loss)    
    # BP - compute the gradient
    grad_y_pred = 2.0 * (y_pred-y)
    grad_w2 = h_relu.t().mm(grad_y_pred) # h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())  # grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.clone() # grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h) # x.T.dot(grad_h)    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 34692652.0
1 32124486.0
2 32528580.0
3 30109726.0
4 23434794.0
5 14842870.0
6 8211364.5
7 4353117.5
8 2472183.5
9 1578426.25
10 1131335.875
11 879924.3125
12 719088.3125
13 604480.8125
14 516621.9375
15 446397.1875
16 388611.65625
17 340140.53125
18 299026.03125
19 263876.8125
20 233644.84375
21 207494.421875
22 184783.28125
23 165000.21875
24 147684.3125
25 132469.890625
26 119064.1875
27 107223.921875
28 96732.3828125
29 87423.3515625
30 79143.4375
31 71761.78125
32 65162.86328125
33 59253.94140625
34 53950.2421875
35 49186.32421875
36 44901.84765625
37 41038.67578125
38 37550.67578125
39 34398.90625
40 31546.53515625
41 28960.396484375
42 26613.125
43 24480.58984375
44 22542.02734375
45 20775.8984375
46 19163.39453125
47 17692.048828125
48 16347.5947265625
49 15117.8515625
50 13991.8408203125
51 12959.912109375
52 12014.29296875
53 11145.19921875
54 10349.2578125
55 9617.5361328125
56 8944.138671875
57 8323.9775390625
58 7752.09326171875
59 7224.11865234375
60 6736.14794921875
61 

# 子内容二：用Tensor的autograd直接计算梯度

In [8]:
# 简单的autograd
x = torch.tensor(1.,requires_grad=True)
w = torch.tensor(2.,requires_grad=True)
b = torch.tensor(3.,requires_grad=True)
y = w*x + b
y.backward()
print(w.grad)

tensor(1.)


Tensor可以自动帮我们计算BP

In [9]:
#利用Tensor的BP算法直接进行梯度计算

N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
# 随机初始化一些训练数据
x = torch.randn(N,D_in)
y = torch.randn(N,D_out)

w1 = torch.randn(D_in,H,requires_grad=True)# torch.randn(D_in,H)
w2 = torch.randn(H,D_out,requires_grad=True)# torch.randn(H,D_out)

learning_rate = 1e-6

for it in range(500):
    # forward pass
    #h = x.mm(w1) # N * H     
    #h_relu = h.clamp(min=0) # N * H     
    #y_pred = h_relu.mm(w2) # N * D_out 
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph np.square(y_pred-y).sum()
    print(it,loss.item()) #  print(it,loss)
    
    # BP - compute the gradient
    loss.backward()
    # grad_y_pred = 2.0 * (y_pred-y)
    # grad_w2 = h_relu.t().mm(grad_y_pred) 
    # grad_h_relu = grad_y_pred.mm(w2.t())  
    # grad_h = grad_h_relu.clone() 
    # grad_h[h<0] = 0
    # grad_w1 = x.t().mm(grad_h) 
    
    # update weights of w1 and w2
    with torch.no_grad(): # 为了不计算w1和w2的计算图
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 26689344.0
1 20897416.0
2 18159194.0
3 16013519.0
4 13596226.0
5 10840989.0
6 8120386.0
7 5796864.5
8 4031031.25
9 2791336.0
10 1961377.75
11 1416067.625
12 1056644.625
13 815426.8125
14 648785.375
15 529893.5
16 442047.1875
17 375009.90625
18 322377.78125
19 279937.6875
20 245019.96875
21 215860.4375
22 191139.59375
23 169963.75
24 151683.640625
25 135816.1875
26 121957.640625
27 109794.0859375
28 99089.4921875
29 89622.921875
30 81256.0
31 73810.8046875
32 67162.5
33 61216.40625
34 55877.9453125
35 51079.1953125
36 46754.83984375
37 42847.453125
38 39316.06640625
39 36116.5234375
40 33214.5078125
41 30581.505859375
42 28184.857421875
43 26000.453125
44 24006.408203125
45 22181.142578125
46 20512.06640625
47 18985.28125
48 17584.0
49 16299.0869140625
50 15118.486328125
51 14034.10546875
52 13036.671875
53 12117.9921875
54 11270.1015625
55 10487.4091796875
56 9764.626953125
57 9096.8447265625
58 8479.2099609375
59 7907.02685546875
60 7377.0205078125
61 6885.6650390625
62 6428.4316406

384 0.00046849827049300075
385 0.0004534596810117364
386 0.0004408633103594184
387 0.00042738759657368064
388 0.000413730856962502
389 0.0004017611499875784
390 0.00039090978680178523
391 0.0003786002635024488
392 0.00036797919892705977
393 0.000357355602318421
394 0.0003471036325208843
395 0.00033735379111021757
396 0.00032782641937956214
397 0.000318954698741436
398 0.00030966027406975627
399 0.000300940009765327
400 0.0002932333154603839
401 0.00028514291625469923
402 0.00027719466015696526
403 0.0002699899487197399
404 0.00026356481248512864
405 0.00025681176339276135
406 0.0002502826973795891
407 0.0002433116897009313
408 0.0002368032728554681
409 0.00023103045532479882
410 0.00022579100914299488
411 0.00021979794837534428
412 0.0002146057377103716
413 0.0002088173059746623
414 0.00020406702242325991
415 0.00019964123202953488
416 0.00019463355420157313
417 0.00018952053505927324
418 0.00018514692783355713
419 0.00018122841720469296
420 0.00017709503299556673
421 0.000172346539329

In [10]:
# 下列代码用于说明必须要把梯度清零
x = torch.randn(N,D_in)
y = torch.randn(N,D_out)
w1 = torch.randn(D_in,H,requires_grad=True)# torch.randn(D_in,H)
w2 = torch.randn(H,D_out,requires_grad=True)# torch.randn(H,D_out)

In [11]:
# 下列代码用于说明必须要把梯度清零
y_pred = x.mm(w1).clamp(min=0).mm(w2)
loss = (y_pred - y).pow(2).sum()
#w1.grad.zero_()
#w2.grad.zero_()
loss.backward()
w1.grad

tensor([[-8753.1465,  4495.6226, -7436.8481,  ...,  1600.6499,   530.2744,
         11690.1016],
        [ 7450.5708, -5487.0317,  2640.8167,  ...,   310.9961, -1132.8789,
         -8411.6816],
        [ 5904.0796,  3767.0496, -6358.5000,  ..., -2752.0674,  -389.3004,
           271.7637],
        ...,
        [ 5281.3955,  2982.5105, -1258.2209,  ...,  1631.7744,  3283.2441,
         -9090.7217],
        [ 4729.3433, -3386.1663,  -209.6577,  ...,  7784.6450, -1404.1782,
          3457.7529],
        [-1313.4174,  7739.9727, -5644.2974,  ...,    69.7607,  6458.7349,
          4796.4189]])

# 子内容三：利用Pytorch的nn库来构建网络

In [12]:
import torch.nn as nn

N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
# 随机初始化一些训练数据
x = torch.randn(N,D_in)
y = torch.randn(N,D_out)
model = nn.Sequential(
    nn.Linear(D_in,H), # with bias
    nn.ReLU(),
    nn.Linear(H,D_out) 
)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-6
for it in range(500):
    # forward pass
    y_pred = model(x) # y_pred = x.mm(w1).clamp(min=0).mm(w2)
    # compute loss
    loss = loss_fn(y_pred,y) # (y_pred - y).pow(2).sum()
    print(it,loss.item()) 
    # BP - compute the gradient
    loss.backward()    
    # update weights of w1 and w2
    with torch.no_grad(): # 为了不计算w1和w2的计算图
        for param in model.parameters():
            param -= learning_rate * param.grad
    model.zero_grad()

0 638.028076171875
1 637.5271606445312
2 637.0269165039062
3 636.5285034179688
4 636.0314331054688
5 635.5350952148438
6 635.0394287109375
7 634.5447998046875
8 634.0511474609375
9 633.5579833984375
10 633.0654907226562
11 632.5736694335938
12 632.0824584960938
13 631.591796875
14 631.101806640625
15 630.6126708984375
16 630.1243286132812
17 629.6365966796875
18 629.1497802734375
19 628.6636352539062
20 628.17822265625
21 627.6937255859375
22 627.2106323242188
23 626.7283325195312
24 626.24658203125
25 625.7655029296875
26 625.2849731445312
27 624.8053588867188
28 624.32666015625
29 623.8487548828125
30 623.3713989257812
31 622.89453125
32 622.4183349609375
33 621.9427490234375
34 621.4678344726562
35 620.9934692382812
36 620.5198364257812
37 620.046875
38 619.5745849609375
39 619.1028442382812
40 618.6320190429688
41 618.1617431640625
42 617.6927490234375
43 617.2243041992188
44 616.7564086914062
45 616.2891235351562
46 615.8231811523438
47 615.3580322265625
48 614.8936767578125
49 61

493 460.33111572265625
494 460.0656433105469
495 459.8004150390625
496 459.5353088378906
497 459.2703857421875
498 459.0057067871094
499 458.7410583496094


# 子内容四：利用Pytorch的optim来更新参数

不用手动更新模型的weights，使用optim来帮助我们更新参数。optim这个package提供了各种不同的模型优化算法，包括SGD，Momentum，Adam等

In [13]:

import torch.nn as nn

N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
# 随机初始化一些训练数据
x = torch.randn(N,D_in)
y = torch.randn(N,D_out)
model = nn.Sequential(
    nn.Linear(D_in,H), # with bias
    nn.ReLU(),
    nn.Linear(H,D_out) 
)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)
for it in range(500):
    # forward pass
    y_pred = model(x) 
    # compute loss
    loss = loss_fn(y_pred,y)
    print(it,loss.item())     
    # BP - compute the gradient
    loss.backward()     
    # update weights of parameters    
    optimizer.step()    
    optimizer.zero_grad()
    #with torch.no_grad(): # 为了不计算w1和w2的计算图
    #    for param in model.parameters():
    #        param -= learning_rate * param.grad
    #model.zero_grad()

0 646.9700317382812
1 630.3114013671875
2 614.1044921875
3 598.3280639648438
4 582.9887084960938
5 568.0324096679688
6 553.549560546875
7 539.4805908203125
8 525.8538208007812
9 512.6001586914062
10 499.7048645019531
11 487.1651916503906
12 474.9773254394531
13 463.1817932128906
14 451.6817626953125
15 440.5172424316406
16 429.7248229980469
17 419.26153564453125
18 409.07916259765625
19 399.13238525390625
20 389.40155029296875
21 379.90570068359375
22 370.6490173339844
23 361.61639404296875
24 352.803466796875
25 344.1985778808594
26 335.7839660644531
27 327.5777893066406
28 319.5711364746094
29 311.7518615722656
30 304.142822265625
31 296.7071533203125
32 289.43310546875
33 282.3287353515625
34 275.3757629394531
35 268.5796203613281
36 261.9247741699219
37 255.41079711914062
38 249.03871154785156
39 242.82159423828125
40 236.73411560058594
41 230.7714385986328
42 224.93362426757812
43 219.2354278564453
44 213.65280151367188
45 208.18984985351562
46 202.8541717529297
47 197.63201904296

405 4.159167929174146e-06
406 3.921184998034732e-06
407 3.695772875289549e-06
408 3.4841127671825234e-06
409 3.283226078565349e-06
410 3.093363375228364e-06
411 2.914439164669602e-06
412 2.7449575554783223e-06
413 2.585567244750564e-06
414 2.435475153106381e-06
415 2.2927697500563227e-06
416 2.1591588392766425e-06
417 2.032498741755262e-06
418 1.913782625706517e-06
419 1.8014524130194332e-06
420 1.694872253210633e-06
421 1.5944915503496304e-06
422 1.5003826092652162e-06
423 1.4115133808445535e-06
424 1.3276131767270272e-06
425 1.2485913885029731e-06
426 1.1740164609364001e-06
427 1.1037500371458009e-06
428 1.0380379080743296e-06
429 9.757026191437035e-07
430 9.172006230073748e-07
431 8.619160780654056e-07
432 8.09665948509064e-07
433 7.608621217514155e-07
434 7.147507403715281e-07
435 6.715312110827654e-07
436 6.304312023530656e-07
437 5.917364660490421e-07
438 5.557166105063516e-07
439 5.214500333750038e-07
440 4.896556902167504e-07
441 4.59495652194164e-07
442 4.309968346660753e-07
4

# 子内容五：利用类来完成网络构建

In [14]:
import torch.nn as nn

N,D_in,H,D_out = 64, 1000, 100, 10 # N: batch size, D_in:input size, H:hidden size, D_out: output size
x = torch.randn(N,D_in)
y = torch.randn(N,D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet,self).__init__()
        self.linear1 = nn.Linear(D_in,H)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(H,D_out)
    def forward(self,x):
        y_pred = self.linear2(self.relu(self.linear1(x)))
        return y_pred
model = TwoLayerNet(D_in,H,D_out)
#model = model.cuda()
#device
#model = nn.Sequential(
#    nn.Linear(D_in,H), # with bias
#    nn.ReLU(),
#    nn.Linear(H,D_out) 
#)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)
for it in range(500):
    # forward pass
    y_pred = model(x) 
    # compute loss
    loss = loss_fn(y_pred,y)
    print(it,loss.item())     
    optimizer.zero_grad()
    # BP - compute the gradient
    loss.backward()     
    # update weights of parameters    
    optimizer.step()

0 675.923583984375
1 658.5440673828125
2 641.6889038085938
3 625.4530029296875
4 609.729736328125
5 594.4407958984375
6 579.5232543945312
7 565.1015014648438
8 551.1636352539062
9 537.6224975585938
10 524.482177734375
11 511.78265380859375
12 499.50042724609375
13 487.60107421875
14 476.1173095703125
15 465.1091613769531
16 454.49468994140625
17 444.1859130859375
18 434.1778564453125
19 424.42205810546875
20 414.8888244628906
21 405.6813049316406
22 396.7059631347656
23 387.9710693359375
24 379.46392822265625
25 371.1733703613281
26 363.09930419921875
27 355.24365234375
28 347.60394287109375
29 340.1535339355469
30 332.8658142089844
31 325.7673645019531
32 318.80084228515625
33 311.9539489746094
34 305.29534912109375
35 298.77288818359375
36 292.3866882324219
37 286.1257019042969
38 279.970947265625
39 273.9219665527344
40 267.98797607421875
41 262.15802001953125
42 256.41632080078125
43 250.7698211669922
44 245.21511840820312
45 239.74855041503906
46 234.4093017578125
47 229.177017211

371 0.00024070213839877397
372 0.0002287725219503045
373 0.00021743641991633922
374 0.00020666180353146046
375 0.00019641860853880644
376 0.0001866793172666803
377 0.00017742099589668214
378 0.00016862407210282981
379 0.00016025757940951735
380 0.0001523283717688173
381 0.00014474154158961028
382 0.00013755657710134983
383 0.0001307233760599047
384 0.0001242270809598267
385 0.00011805166286649182
386 0.0001121808381867595
387 0.0001065958131221123
388 0.00010129123256774619
389 9.624118683859706e-05
390 9.144146315520629e-05
391 8.688020170666277e-05
392 8.254234126070514e-05
393 7.841959450161085e-05
394 7.449988333974034e-05
395 7.077474583638832e-05
396 6.72310998197645e-05
397 6.386290624504909e-05
398 6.065960042178631e-05
399 5.761788270319812e-05
400 5.472278644447215e-05
401 5.1971448556287214e-05
402 4.935835750075057e-05
403 4.686825559474528e-05
404 4.450848791748285e-05
405 4.226114469929598e-05
406 4.0126880776369944e-05
407 3.809859117609449e-05
408 3.617152833612636e-05


In [15]:
import torch
import torch.nn as nn
N,D_in,H,D_out = 64, 1000, 100, 10 
x = torch.randn(N,D_in)
y = torch.randn(N,D_out)
class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet,self).__init__()
        self.linear1 = nn.Linear(D_in,H)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(H,D_out)
    def forward(self,x):
        y_pred = self.linear2(self.relu(self.linear1(x)))
        return y_pred
model = TwoLayerNet(D_in,H,D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)
for it in range(500):
    y_pred = model(x) 
    loss = loss_fn(y_pred,y)
    print(it,loss.item())     
    optimizer.zero_grad()
    loss.backward()     
    optimizer.step()

0 658.7977294921875
1 641.6973876953125
2 625.0911865234375
3 608.9293823242188
4 593.2158203125
5 578.0084228515625
6 563.2932739257812
7 549.0132446289062
8 535.2071533203125
9 521.7998657226562
10 508.8052062988281
11 496.20977783203125
12 484.00714111328125
13 472.1374206542969
14 460.6215515136719
15 449.5301818847656
16 438.752197265625
17 428.2802429199219
18 418.0882568359375
19 408.18988037109375
20 398.5968933105469
21 389.26715087890625
22 380.216552734375
23 371.365234375
24 362.69903564453125
25 354.2012023925781
26 345.90057373046875
27 337.7982177734375
28 329.8778381347656
29 322.1277160644531
30 314.62701416015625
31 307.301025390625
32 300.1055908203125
33 293.0993347167969
34 286.2662353515625
35 279.583984375
36 273.064208984375
37 266.6877746582031
38 260.4410705566406
39 254.31776428222656
40 248.33193969726562
41 242.4794464111328
42 236.73109436035156
43 231.09706115722656
44 225.5972900390625
45 220.2174072265625
46 214.9424285888672
47 209.76171875
48 204.6956

432 0.00012596107262652367
433 0.00011969226761721075
434 0.00011371451546438038
435 0.0001080264919437468
436 0.00010260855196975172
437 9.745733404997736e-05
438 9.254990436602384e-05
439 8.787932893028483e-05
440 8.343705849256366e-05
441 7.920294592622668e-05
442 7.517950143665075e-05
443 7.135176565498114e-05
444 6.771265907445922e-05
445 6.425115134334192e-05
446 6.0957543610129505e-05
447 5.782699736300856e-05
448 5.485139990923926e-05
449 5.202380270930007e-05
450 4.933502714266069e-05
451 4.678054028772749e-05
452 4.4349584641167894e-05
453 4.204360084258951e-05
454 3.985356670455076e-05
455 3.77671021851711e-05
456 3.579130134312436e-05
457 3.3913231163751334e-05
458 3.21275474561844e-05
459 3.0436794986599125e-05
460 2.8830463634221815e-05
461 2.7303640308673494e-05
462 2.5856425054371357e-05
463 2.4483539164066315e-05
464 2.3178594346973114e-05
465 2.1942889361525886e-05
466 2.0771256458829157e-05
467 1.9655772121041082e-05
468 1.8600954717840068e-05
469 1.7599504644749686e

# 扩展实验

## 用数据创建pytorch张量的4种方法

In [16]:
t = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(torch.Tensor(t))           # 类构造函数
print(torch.tensor(t))           # 工厂函数
print(torch.as_tensor(t))        # 工厂函数
print(torch.from_numpy(t))       # 工厂函数


tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])


## 使用全连接层来拟合4阶多项式

In [17]:
from itertools import count

import torch
import torch.nn.functional as F

POLY_DEGREE = 4
W_target = torch.randn(POLY_DEGREE, 1) * 5
b_target = torch.randn(1) * 5


def make_features(x):
    """Builds features i.e. a matrix with columns [x, x^2, x^3, x^4]."""
    x = x.unsqueeze(1)
    return torch.cat([x ** i for i in range(1, POLY_DEGREE+1)], 1)


def f(x):
    """Approximated function."""
    return x.mm(W_target) + b_target.item()


def poly_desc(W, b):
    """Creates a string description of a polynomial."""
    result = 'y = '
    for i, w in enumerate(W):
        result += '{:+.2f} x^{} '.format(w, i + 1)
    result += '{:+.2f}'.format(b[0])
    return result


def get_batch(batch_size=32):
    """Builds a batch i.e. (x, f(x)) pair."""
    random = torch.randn(batch_size)
    x = make_features(random)
    y = f(x)
    return x, y


# Define model
fc = torch.nn.Linear(W_target.size(0), 1)

for batch_idx in count(1):
    # Get data
    batch_x, batch_y = get_batch()

    # Reset gradients
    fc.zero_grad()

    # Forward pass
    output = F.smooth_l1_loss(fc(batch_x), batch_y)
    loss = output.item()

    # Backward pass
    output.backward()

    # Apply gradients
    for param in fc.parameters():
        param.data.add_(-0.1 * param.grad)

    # Stop criterion
    if loss < 1e-3:
        break

print('Loss: {:.6f} after {} batches'.format(loss, batch_idx))
print('==> Learned function:\t' + poly_desc(fc.weight.view(-1), fc.bias))
print('==> Actual function:\t' + poly_desc(W_target.view(-1), b_target))

Loss: 0.000914 after 339 batches
==> Learned function:	y = -2.90 x^1 +3.22 x^2 -4.01 x^3 -5.89 x^4 -5.23
==> Actual function:	y = -2.90 x^1 +3.40 x^2 -4.01 x^3 -5.95 x^4 -5.31
