# 第一课

# 什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [2]:
import torch

In [4]:
x = torch.empty(5,3)
x

tensor([[-1.2810e-38,  6.9645e-43, -1.2810e-38],
        [ 6.9645e-43, -1.2808e-38,  6.9645e-43],
        [-1.2808e-38,  6.9645e-43, -1.2809e-38],
        [ 6.9645e-43, -1.2809e-38,  6.9645e-43],
        [-1.2809e-38,  6.9645e-43, -1.2809e-38]])

构造一个未初始化的5x3矩阵:

In [5]:
x = torch.rand(5,3)
x

tensor([[0.9997, 0.7619, 0.0929],
        [0.5685, 0.0303, 0.4391],
        [0.9027, 0.4066, 0.5937],
        [0.9048, 0.8424, 0.0759],
        [0.3139, 0.3125, 0.5240]])

构建一个随机初始化的矩阵:

In [8]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

构建一个全部为0，类型为long的矩阵:

In [16]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [18]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [20]:
x = x.new_ones(5,3,dtype = torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [25]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[-0.0405,  3.2234, -0.0687],
        [ 0.1203,  0.2263, -0.1558],
        [ 0.5672, -0.0601,  1.0564],
        [ 0.7290,  0.3666,  1.4013],
        [ 0.1513, -0.5786,  0.1419]])

得到tensor的形状:

In [27]:
x.size()

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [29]:
y = torch.rand(5,3)
y

tensor([[0.5344, 0.7352, 0.9436],
        [0.2376, 0.3519, 0.0327],
        [0.0640, 0.1556, 0.3407],
        [0.2114, 0.4380, 0.1177],
        [0.9235, 0.0059, 0.9402]])

In [30]:
x + y

tensor([[ 0.4939,  3.9585,  0.8749],
        [ 0.3579,  0.5782, -0.1231],
        [ 0.6312,  0.0955,  1.3971],
        [ 0.9403,  0.8046,  1.5191],
        [ 1.0748, -0.5728,  1.0821]])

另一种着加法的写法


In [31]:
torch.add(x,y)

tensor([[ 0.4939,  3.9585,  0.8749],
        [ 0.3579,  0.5782, -0.1231],
        [ 0.6312,  0.0955,  1.3971],
        [ 0.9403,  0.8046,  1.5191],
        [ 1.0748, -0.5728,  1.0821]])

加法：把输出作为一个变量

In [34]:
result = torch.empty(5,3)
#torch.add(x,y, out = result)
result = x+y
result

tensor([[ 0.4939,  3.9585,  0.8749],
        [ 0.3579,  0.5782, -0.1231],
        [ 0.6312,  0.0955,  1.3971],
        [ 0.9403,  0.8046,  1.5191],
        [ 1.0748, -0.5728,  1.0821]])

in-place加法

In [35]:
y.add_(x)
y

tensor([[ 0.4939,  3.9585,  0.8749],
        [ 0.3579,  0.5782, -0.1231],
        [ 0.6312,  0.0955,  1.3971],
        [ 0.9403,  0.8046,  1.5191],
        [ 1.0748, -0.5728,  1.0821]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [44]:
x[:,1:]

tensor([[ 3.2234, -0.0687],
        [ 0.2263, -0.1558],
        [-0.0601,  1.0564],
        [ 0.3666,  1.4013],
        [-0.5786,  0.1419]])

In [41]:
x

tensor([[-0.0405,  3.2234, -0.0687],
        [ 0.1203,  0.2263, -0.1558],
        [ 0.5672, -0.0601,  1.0564],
        [ 0.7290,  0.3666,  1.4013],
        [ 0.1513, -0.5786,  0.1419]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [66]:
x = torch.randn(4,4)
y = x.view(16)
z = x.view(-1,8)
z

tensor([[ 0.9319, -0.3206,  0.2920, -0.3281, -0.7416,  0.4262,  1.7801, -0.1777],
        [ 0.9868, -0.3992, -1.6991,  0.2019,  1.0307,  0.3333,  1.2568,  0.6295]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [59]:
x = torch.randn(1)
x

tensor([-0.2440])

In [62]:
x.grad

In [64]:
x.item()

-0.2440228909254074

In [67]:
z.transpose(1,0)

tensor([[ 0.9319,  0.9868],
        [-0.3206, -0.3992],
        [ 0.2920, -1.6991],
        [-0.3281,  0.2019],
        [-0.7416,  1.0307],
        [ 0.4262,  0.3333],
        [ 1.7801,  1.2568],
        [-0.1777,  0.6295]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [68]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [69]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [70]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

[2. 2. 2. 2. 2.]


In [81]:
b

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

把NumPy ndarray转成Torch Tensor

In [4]:
import torch
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
#np.add(a,1,out = a)
a = a + 1
print(a)

[2. 2. 2. 2. 2.]


所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [None]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x,device = device)
    y = y.to(DEVICE)
    x = x.to(device)
    z = x+y
    print(z)
    print(z.to("cpu",torch.double))

In [None]:
y..to("cpu").data.numpy()
y.cpu().data.numpy()

In [85]:
N, D_in, H, D_out = 64, 1000, 100, 10
x =  np.random.randn(N, 1000)
x

array([[-0.01372288, -0.29289175,  0.93514339, ...,  1.39225452,
         0.31225805, -0.91067108],
       [-0.21245621, -1.02141537, -1.57759355, ..., -1.04539356,
         1.36380161, -0.77640271],
       [-0.72229352, -0.70572455,  0.54844382, ...,  0.69908531,
         0.843915  ,  0.6811373 ],
       ...,
       [ 0.29916378,  1.54407059, -1.53596855, ...,  0.17935494,
         0.81565509,  0.03827301],
       [ 0.55192973,  2.09738736, -1.07479948, ..., -0.31945091,
         0.74205682, -0.23142615],
       [ 0.05565796,  1.41255843,  0.17214299, ...,  0.17216836,
        -0.14213673, -1.7318212 ]])


热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [90]:
N, D_in, H, D_out = 64, 1000, 100, 10

#随机创建训练数据
x =  np.random.randn(N, D_in)
y =  np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    #Forward pass
    h = x.dot(w1) #N * H
    h_relu = np.maximum(h,0) #N * H
    y_pred = h_relu.dot(w2) #N * D_out
    
    #compute loss
    loss = np.square(y_pred - y).sum()
    print(it,loss)
    
    # Backward pass
    # compute the gradient
    y = 
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    64*1000 * 1000*100 = 64*100
    R(64*100)
    64*100 * 100*10 = 64*10
    
    100*64 * 64*10 = (100*10)
    
    [64*10] h_relu = 64*100 100*10
    #update weight of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
    

0 30551527.825239252
1 30023944.209300306
2 32453485.547485694
3 32099334.532128327
4 26390499.438597858
5 17146335.673600487
6 9331097.629885877
7 4711693.55747799
8 2525803.5890038507
9 1538655.3314006797
10 1071380.5151944356
11 820403.8284598
12 664788.2984036456
13 555675.8538289461
14 472792.0440079676
15 406708.39658133825
16 352540.1744951883
17 307451.8996329936
18 269461.9197243368
19 237178.13764996215
20 209584.64382979594
21 185863.64477925087
22 165378.56203879497
23 147602.0332003029
24 132112.95787504048
25 118563.75067789965
26 106680.64579338375
27 96221.2814221456
28 86984.79628604272
29 78804.61743025237
30 71552.93815607345
31 65090.92155475961
32 59315.07858558921
33 54144.02409181126
34 49504.222101126375
35 45328.51571080658
36 41567.59138497697
37 38172.86340969414
38 35104.684443317834
39 32324.862448127853
40 29801.979273086294
41 27508.86303215069
42 25421.208121081225
43 23517.840618321243
44 21779.203454185394
45 20189.993167555902
46 18734.9530828092
47 1

420 0.0008072470793297144
421 0.0007753044189621771
422 0.0007446319246098381
423 0.0007151770210018292
424 0.0006869019847863585
425 0.0006597517601382422
426 0.0006336737467083021
427 0.0006086288460182809
428 0.0005845782731026153
429 0.000561482277601842
430 0.0005393056013930661
431 0.0005180082519392
432 0.0004975556736458194
433 0.00047791288331910687
434 0.00045905163048704835
435 0.0004409400018807113
436 0.00042355023266922023
437 0.00040684307243014445
438 0.00039079903348193743
439 0.0003753892990271348
440 0.00036058962574948275
441 0.0003463767274758682
442 0.0003327259024049605
443 0.00031961671352689474
444 0.00030702545263434753
445 0.0002949327259493741
446 0.00028332099786564893
447 0.0002721700147464095
448 0.0002614562291511229
449 0.00025116583404570926
450 0.00024128192818564129
451 0.0002317890927743651
452 0.0002226713844217438
453 0.00021391328238459258
454 0.00020550114894946623
455 0.00019742210254954936
456 0.0001896621426465122
457 0.00018221222734855948
4

In [92]:
x

array([[-0.81756989,  0.77957654,  0.20980914, ...,  0.38272713,
         0.73197843,  2.58095912],
       [-0.44349724,  1.08206152,  0.64265964, ...,  0.25258146,
        -2.44690219, -1.39414466],
       [-1.47378768,  1.23454402,  1.60597067, ..., -0.20972761,
        -0.80346951, -0.71388885],
       ...,
       [-0.14095537, -0.24309022, -0.58861802, ..., -1.02087177,
         0.43289475,  2.33867111],
       [-0.35099278,  1.20978685,  0.37698557, ...,  0.96516157,
        -0.36943134, -0.65502581],
       [-0.53798034, -0.4515874 ,  0.00719865, ..., -0.1414127 ,
        -0.72468542, -1.22918663]])

In [94]:
h = x.dot(w1) #N * H
h_relu = np.maximum(h,0) #N * H
y_pred = h_relu.dot(w2) #N * D_out

In [95]:
y_pred - y

array([[-1.70154744e-04, -2.47555359e-04, -4.22490188e-04,
         4.92379818e-04, -2.85200064e-04,  3.98119343e-04,
        -6.43929027e-05, -7.59607648e-04,  5.81418260e-04,
         2.93020528e-04],
       [ 6.63485419e-05,  3.60516027e-05,  3.68763577e-05,
        -7.41999813e-05,  7.67316965e-05, -4.52075128e-05,
        -9.46576448e-06,  1.07778900e-04, -1.27805936e-04,
        -4.82682556e-05],
       [ 4.29927937e-05,  3.37856855e-05,  6.00904251e-05,
        -2.66602939e-05, -2.17094441e-05, -2.66810162e-05,
        -6.27669572e-07,  1.02780152e-04, -2.45545646e-05,
        -2.39649823e-05],
       [-2.66301711e-04, -9.58456061e-05,  2.09162359e-04,
         2.52656405e-05, -4.11834945e-04,  4.41056944e-05,
         2.60503326e-04,  1.21003137e-04, -1.30652674e-04,
         8.86990572e-05],
       [ 3.60520171e-05,  2.58379717e-05,  1.51372936e-05,
        -2.07310834e-05,  3.03430008e-05, -4.67327260e-05,
         7.43582795e-06,  2.95287750e-05, -3.94883107e-05,
        -1.


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [96]:
N, D_in, H, D_out = 64, 1000, 100, 10

#随机创建训练数据
x =  torch.randn(N, D_in)
y =  torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    #Forward pass
    h = x.mm(w1) #N * H
    h_relu = h.clamp(min = 0) #N * H
    y_pred = h_relu.mm(w2) #N * D_out
    
    #compute loss
    loss = np.square(y_pred - y).pow(2).sum().item()
    print(it,loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    #update weight of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 4940065931264.0
1 3911241170944.0
2 4067188539392.0
3 3909724930048.0
4 2875357396992.0
5 1571263873024.0
6 621849346048.0
7 215734583296.0
8 66393522176.0
9 25123129344.0
10 9753358336.0
11 5438758400.0
12 2954146816.0
13 2067238016.0
14 1351061888.0
15 1014957568.0
16 723222336.0
17 555966656.0
18 414422304.0
19 322097216.0
20 246531824.0
21 193363360.0
22 150748240.0
23 119306128.0
24 94351768.0
25 75348488.0
26 60267916.0
27 48535220.0
28 39177320.0
29 31793910.0
30 25892144.0
31 21176658.0
32 17376940.0
33 14309264.0
34 11816818.0
35 9788177.0
36 8134588.5
37 6779063.0
38 5664029.0
39 4743439.0
40 3982805.75
41 3351110.25
42 2825693.0
43 2387001.5
44 2020567.75
45 1713171.5
46 1455019.375
47 1237980.625
48 1054806.125
49 900246.8125
50 769303.25
51 658328.75
52 564072.375
53 483942.0
54 415695.25
55 357477.96875
56 307768.46875
57 265258.9375
58 228795.875
59 197579.796875
60 170746.15625
61 147712.9375
62 127894.1796875
63 110814.828125
64 96100.2265625
65 83399.1328125
66 7244

443 1.4473327918096413e-11
444 1.4075240972744041e-11
445 1.3718260902240154e-11
446 1.3260881975840633e-11
447 1.2959250859923e-11
448 1.2381437021480313e-11
449 1.2200483678892482e-11
450 1.1929895439577454e-11
451 1.1599970983766639e-11
452 1.125001567597872e-11
453 1.0898517331658919e-11
454 1.0695220750700507e-11
455 1.0411028809886869e-11
456 1.008349480302595e-11
457 9.90519732207984e-12
458 9.549765492278262e-12
459 9.308010959219182e-12
460 9.06057433941454e-12
461 8.792092921761085e-12
462 8.515490396154846e-12
463 8.251347521914809e-12
464 8.104967218203196e-12
465 7.952890949958213e-12
466 7.726620558645703e-12
467 7.54758235077535e-12
468 7.391349685081927e-12
469 7.2456940621024035e-12
470 7.0046347194763214e-12
471 6.8371731887229e-12
472 6.735683625441746e-12
473 6.593682197464368e-12
474 6.505123696654014e-12
475 6.2575786566321234e-12
476 6.208531518753224e-12
477 5.947355455337977e-12
478 5.929869442700131e-12
479 5.8121701895785804e-12
480 5.6228533114321966e-12
481

简单的autograd

In [98]:
x = torch.tensor(1.,requires_grad = True)
w = torch.tensor(2.,requires_grad = True)
b = torch.tensor(2.,requires_grad = True)

y = w*x + b # y = 2*1+3

y.backward()

#dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)

tensor(1.)
tensor(2.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [108]:
N, D_in, H, D_out = 64, 1000, 100, 10

#随机创建训练数据
x =  torch.randn(N, D_in)
y =  torch.randn(N, D_out)

w1 = torch.randn(D_in, H,requires_grad = True)
w2 = torch.randn(H, D_out,requires_grad = True)

learning_rate = 1e-6
for it in range(500):
    #Forward pass
    #h = x.mm(w1) #N * H
    #h_relu = h.clamp(min = 0) #N * H
    #y_pred = h_relu.mm(w2) #N * D_out
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    #compute loss
    loss = (y_pred - y).pow(2).sum() #computation graph
    print(it,loss.item())
    
    # Backward pass
    loss.backward()
    
    #update weight of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 30985066.0
1 27305224.0
2 28303392.0
3 29193510.0
4 26725862.0
5 20251818.0
6 12763282.0
7 7059876.5
8 3825328.75
9 2203477.5
10 1419031.75
11 1017013.625
12 788747.75
13 642744.4375
14 539250.5
15 460469.09375
16 397419.90625
17 345591.25
18 302287.09375
19 265675.40625
20 234450.25
21 207655.4375
22 184520.484375
23 164479.75
24 147038.03125
25 131813.359375
26 118452.8515625
27 106690.25
28 96299.3515625
29 87096.46875
30 78933.5703125
31 71676.03125
32 65196.59765625
33 59404.328125
34 54204.515625
35 49528.1171875
36 45316.51171875
37 41515.0234375
38 38080.8984375
39 34969.58984375
40 32148.46484375
41 29583.59375
42 27248.943359375
43 25128.1875
44 23192.48828125
45 21423.67578125
46 19805.037109375
47 18321.88671875
48 16961.6328125
49 15712.6103515625
50 14565.1826171875
51 13509.712890625
52 12538.23046875
53 11642.9375
54 10817.498046875
55 10055.498046875
56 9351.96875
57 8701.6279296875
58 8100.20849609375
59 7543.54296875
60 7028.3115234375
61 6550.80126953125
62 6108.2

413 0.00011359740165062249
414 0.00011067802552133799
415 0.0001083162787836045
416 0.00010571021994110197
417 0.00010359719453845173
418 0.00010170918540097773
419 9.953570406651124e-05
420 9.74594586296007e-05
421 9.518861770629883e-05
422 9.312803013017401e-05
423 9.087581565836444e-05
424 8.921628614189103e-05
425 8.74101315275766e-05
426 8.543151488993317e-05
427 8.397391502512619e-05
428 8.218555012717843e-05
429 8.047165465541184e-05
430 7.909603300504386e-05
431 7.70574260968715e-05
432 7.603519770782441e-05
433 7.427587843267247e-05
434 7.3219183832407e-05
435 7.149664452299476e-05
436 7.018528413027525e-05
437 6.856030086055398e-05
438 6.738928641425446e-05
439 6.627924449276179e-05
440 6.497212598333135e-05
441 6.377488898579031e-05
442 6.258565554162487e-05
443 6.166693492559716e-05
444 6.048585055395961e-05
445 5.94855155213736e-05
446 5.848223008797504e-05
447 5.699686153093353e-05
448 5.6056753237498924e-05
449 5.526830500457436e-05
450 5.439458982436918e-05
451 5.329243


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [116]:
import torch.nn as nn
N, D_in, H, D_out = 64, 1000, 100, 10

#随机创建训练数据
x =  torch.randn(N, D_in)
y =  torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias = False), # w_1 * x +b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias = False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)
# model = model.cuda()
loss_fn = nn.MSELoss(reduction = 'sum')

learning_rate = 1e-6
for it in range(500):
    #Forward pass
    y_pred = model(x) # model.forward()
    
    #compute loss
    #loss = (y_pred - y).pow(2).sum()
    loss = loss_fn(y_pred , y) #computation graph
    print(it,loss.item())
    
    # Backward pass
    loss.backward()
    
    #update weight of w1 and w2
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad
    model.zero_grad()

0 35976576.0
1 33627300.0
2 32823382.0
3 28503330.0
4 20755568.0
5 12552720.0
6 6928441.0
7 3844890.75
8 2346761.75
9 1603154.5
10 1201395.5
11 955295.5
12 786485.4375
13 660621.8125
14 561923.875
15 482156.21875
16 416336.0625
17 361328.5625
18 315093.84375
19 275892.4375
20 242461.75
21 213820.5625
22 189150.734375
23 167778.625
24 149233.734375
25 133098.390625
26 119017.859375
27 106668.7578125
28 95801.0625
29 86214.9375
30 77737.640625
31 70218.515625
32 63533.0234375
33 57575.4453125
34 52267.09765625
35 47525.546875
36 43276.03515625
37 39460.28125
38 36023.1328125
39 32926.234375
40 30129.91796875
41 27601.11328125
42 25309.998046875
43 23231.728515625
44 21345.646484375
45 19630.3125
46 18069.26171875
47 16644.541015625
48 15344.974609375
49 14158.7763671875
50 13073.4599609375
51 12081.248046875
52 11172.927734375
53 10339.7197265625
54 9575.65625
55 8873.43359375
56 8228.0048828125
57 7634.017578125
58 7087.06787109375
59 6583.29052734375
60 6118.52392578125
61 5689.7773437

427 0.0005582190933637321
428 0.0005441140383481979
429 0.0005317627219483256
430 0.0005190843949094415
431 0.0005072311614640057
432 0.0004952414310537279
433 0.00048497726675122976
434 0.0004741216544061899
435 0.0004623241547960788
436 0.0004512745072133839
437 0.00044077870552428067
438 0.0004310616059228778
439 0.00042123778257519007
440 0.00041166678420268
441 0.0004021768399979919
442 0.00039277743780985475
443 0.0003847598854918033
444 0.0003766242880374193
445 0.00036834037746302783
446 0.0003604974190238863
447 0.00035385246155783534
448 0.00034578604390844703
449 0.0003388909390196204
450 0.0003312654444016516
451 0.00032403459772467613
452 0.00031741790007799864
453 0.0003101934271398932
454 0.0003039841540157795
455 0.0002976985415443778
456 0.0002920374390669167
457 0.00028616131749004126
458 0.0002809765574056655
459 0.0002751834108494222
460 0.0002690095570869744
461 0.00026390113634988666
462 0.00025884530623443425
463 0.0002540227142162621
464 0.00024915579706430435
4

In [115]:
model[0].weight

Parameter containing:
tensor([[ 0.0189,  0.0211, -0.0144,  ..., -0.0286, -0.0072,  0.0149],
        [-0.0071,  0.0031,  0.0112,  ..., -0.0170,  0.0054, -0.0284],
        [ 0.0111,  0.0052,  0.0021,  ...,  0.0145, -0.0083, -0.0149],
        ...,
        [-0.0137,  0.0052, -0.0046,  ..., -0.0254,  0.0240,  0.0281],
        [-0.0324, -0.0172, -0.0127,  ..., -0.0076, -0.0252, -0.0250],
        [-0.0306,  0.0014, -0.0071,  ..., -0.0128,  0.0087,  0.0175]],
       requires_grad=True)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [128]:
import torch.nn as nn
N, D_in, H, D_out = 64, 1000, 100, 10

#随机创建训练数据
x =  torch.randn(N, D_in)
y =  torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias = False), # w_1 * x +b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias = False),
)

# model = model.cuda()
loss_fn = nn.MSELoss(reduction = 'sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
for it in range(500):
    #Forward pass
    y_pred = model(x) # model.forward()
    
    #compute loss
    #loss = (y_pred - y).pow(2).sum()
    loss = loss_fn(y_pred , y) #computation graph
    print(it,loss.item())
    
    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    #update model parameters
    optimizer.step()

0 670.28271484375
1 653.2674560546875
2 636.701904296875
3 620.6353149414062
4 604.9697875976562
5 589.724853515625
6 574.9131469726562
7 560.5047607421875
8 546.5084838867188
9 532.9190673828125
10 519.6795654296875
11 506.774658203125
12 494.2572937011719
13 482.1623840332031
14 470.4052734375
15 459.0059814453125
16 447.9474792480469
17 437.2471008300781
18 426.91534423828125
19 416.9652099609375
20 407.26171875
21 397.7615051269531
22 388.4757995605469
23 379.4512023925781
24 370.7007751464844
25 362.13531494140625
26 353.7283630371094
27 345.51910400390625
28 337.5329284667969
29 329.72039794921875
30 322.08001708984375
31 314.61273193359375
32 307.2954406738281
33 300.1372985839844
34 293.1665954589844
35 286.3602294921875
36 279.70452880859375
37 273.1783447265625
38 266.7779541015625
39 260.50146484375
40 254.40106201171875
41 248.45089721679688
42 242.6266326904297
43 236.93707275390625
44 231.383544921875
45 225.9230499267578
46 220.58612060546875
47 215.34007263183594
48 210


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [129]:
import torch.nn as nn
N, D_in, H, D_out = 64, 1000, 100, 10

#随机创建训练数据
x =  torch.randn(N, D_in)
y =  torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, d_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        #define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias = False)
        self.linear2 = torch.nn.Linear(H, D_out, bias = False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred
    
model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction = 'sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)

for it in range(500):
    #Forward pass
    y_pred = model.forward(x) # model.forward()
    
    #compute loss
    loss = loss_fn(y_pred , y) #computation graph
    print(it,loss.item())
    
    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    #update model parameters
    optimizer.step()

0 635.7352294921875
1 619.595703125
2 603.8748779296875
3 588.588134765625
4 573.8768920898438
5 559.69921875
6 545.8568725585938
7 532.421142578125
8 519.4111328125
9 506.75469970703125
10 494.425537109375
11 482.4440002441406
12 470.81011962890625
13 459.50860595703125
14 448.51483154296875
15 437.7812805175781
16 427.3119812011719
17 417.1290588378906
18 407.22442626953125
19 397.5800476074219
20 388.2611083984375
21 379.1764831542969
22 370.309814453125
23 361.6200256347656
24 353.1167907714844
25 344.8515319824219
26 336.77587890625
27 328.8753662109375
28 321.1671142578125
29 313.6334533691406
30 306.2353515625
31 298.9990539550781
32 291.93536376953125
33 285.02032470703125
34 278.2306823730469
35 271.596435546875
36 265.1070556640625
37 258.729248046875
38 252.48605346679688
39 246.37570190429688
40 240.369384765625
41 234.49746704101562
42 228.75302124023438
43 223.11997985839844
44 217.58287048339844
45 212.15371704101562
46 206.8564910888672
47 201.66400146484375
48 196.5764

397 3.2507814466953278e-06
398 3.018275947397342e-06
399 2.800705487970845e-06
400 2.598434548417572e-06
401 2.4115831820381572e-06
402 2.237325361420517e-06
403 2.0756374397024047e-06
404 1.9246449483034667e-06
405 1.7852219116321066e-06
406 1.6547564882785082e-06
407 1.5342892538683373e-06
408 1.4219443755791872e-06
409 1.3183762348489836e-06
410 1.221763227476913e-06
411 1.1322222235321533e-06
412 1.0488799944141647e-06
413 9.7137115062651e-07
414 8.998435987450648e-07
415 8.333087180290022e-07
416 7.714391472291027e-07
417 7.145475251491007e-07
418 6.616515975110815e-07
419 6.122661488916492e-07
420 5.666951210514526e-07
421 5.244687599770259e-07
422 4.854053372582712e-07
423 4.4906940388500516e-07
424 4.1540326378708414e-07
425 3.8418258441197395e-07
426 3.5517132346285507e-07
427 3.2832812735250627e-07
428 3.036265070477384e-07
429 2.8073725388821913e-07
430 2.5945669790417014e-07
431 2.3975800900188915e-07
432 2.2162689106153266e-07
433 2.048563203516096e-07
434 1.89198956945801