# PyTorch

PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [1]:
import torch

构造一个未初始化的5x3矩阵:

In [3]:
x = torch.empty(5,3)
x

tensor([[ 0.0000e+00, -8.5899e+09,  6.1755e+06],
        [-1.5849e+29,  1.1210e-44,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00]])

构建一个随机初始化的矩阵:

In [5]:
x = torch.rand(5,3)
x

tensor([[0.6810, 0.3265, 0.8178],
        [0.6543, 0.3534, 0.2280],
        [0.3134, 0.5683, 0.2908],
        [0.8424, 0.9082, 0.4896],
        [0.7684, 0.7449, 0.5811]])

构建一个全部为0，类型为long的矩阵:

In [8]:
x = torch.zeros(5,3,dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [12]:
x = torch.zeros(5,3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [13]:
x = torch.tensor([5.5,3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [14]:
x = x.new_ones(5,3, dtype=torch.double)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)

In [23]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[-0.8010,  0.8457, -1.4586],
        [ 1.1330,  1.4742, -1.0887],
        [-1.0819, -2.2549, -0.3417],
        [-0.5484, -1.0932,  0.1838],
        [ 0.4091, -1.0085,  0.3517]])

得到tensor的形状:

In [24]:
x.shape

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [25]:
y = torch.rand(5,3)
y

tensor([[0.8180, 0.3385, 0.9888],
        [0.5876, 0.7394, 0.5713],
        [0.1612, 0.4179, 0.5265],
        [0.4128, 0.7043, 0.4860],
        [0.6781, 0.3320, 0.2424]])

In [26]:
x + y

tensor([[ 0.0170,  1.1842, -0.4698],
        [ 1.7206,  2.2136, -0.5174],
        [-0.9206, -1.8370,  0.1848],
        [-0.1356, -0.3889,  0.6699],
        [ 1.0871, -0.6764,  0.5940]])

另一种着加法的写法


In [27]:
torch.add(x, y)

tensor([[ 0.0170,  1.1842, -0.4698],
        [ 1.7206,  2.2136, -0.5174],
        [-0.9206, -1.8370,  0.1848],
        [-0.1356, -0.3889,  0.6699],
        [ 1.0871, -0.6764,  0.5940]])

加法：把输出作为一个变量

In [28]:
result = torch.empty(5,3)
torch.add(x, y, out=result)
# result = x + y
result

tensor([[ 0.0170,  1.1842, -0.4698],
        [ 1.7206,  2.2136, -0.5174],
        [-0.9206, -1.8370,  0.1848],
        [-0.1356, -0.3889,  0.6699],
        [ 1.0871, -0.6764,  0.5940]])

in-place加法

In [31]:
y.add_(x)
y

tensor([[-0.7840,  2.0298, -1.9284],
        [ 2.8536,  3.6878, -1.6061],
        [-2.0025, -4.0918, -0.1569],
        [-0.6840, -1.4821,  0.8537],
        [ 1.4962, -1.6849,  0.9457]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [30]:
x[1:, 1:]

tensor([[ 1.4742, -1.0887],
        [-2.2549, -0.3417],
        [-1.0932,  0.1838],
        [-1.0085,  0.3517]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [32]:
x = torch.randn(4,4)
y = x.view(16)
z = x.view(-1,8)
z

tensor([[ 1.6900, -1.7227, -0.6280,  1.3029,  0.6015,  0.6181,  0.1487, -0.8255],
        [-0.8715,  0.8789,  0.1611, -0.1951, -0.3434, -1.0511,  1.0028,  1.5693]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [33]:
x = torch.randn(1)
x

tensor([-0.9940])

In [34]:
x.item()

-0.9939664006233215

In [35]:
z.transpose(1,0)

tensor([[ 1.6900, -0.8715],
        [-1.7227,  0.8789],
        [-0.6280,  0.1611],
        [ 1.3029, -0.1951],
        [ 0.6015, -0.3434],
        [ 0.6181, -1.0511],
        [ 0.1487,  1.0028],
        [-0.8255,  1.5693]])

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor transform
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [36]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [37]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [38]:
b[1] = 2
b

array([1., 2., 1., 1., 1.], dtype=float32)

In [39]:
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [41]:
import numpy as np

In [42]:
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)

[2. 2. 2. 2. 2.]


In [43]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [46]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))

In [45]:
y.to("cpu").data.numpy()
y.cpu().data.numpy()

array([ 1.6899612 , -1.7227192 , -0.6280245 ,  1.3029416 ,  0.6015244 ,
        0.61805296,  0.14865094, -0.82548445, -0.87150854,  0.8788831 ,
        0.16114956, -0.1951438 , -0.34339398, -1.0510595 ,  1.0027655 ,
        1.5692511 ], dtype=float32)

In [47]:
model = model.cuda()


NameError: name 'model' is not defined


Warmup: numpy two layers NN
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [48]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.dot(w1) # N * H
    h_relu = np.maximum(h, 0) # N * H
    y_pred = h_relu.dot(w2) # N * D_out
    
    # compute loss
    loss = np.square(y_pred - y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 24384482.14371205
1 19059805.49289798
2 18034374.8630774
3 18662473.798492998
4 19303319.569653958
5 18535229.859591052
6 15906280.69537563
7 12003687.405007564
8 8117287.816758016
9 5087255.277786513
10 3096115.954114371
11 1904442.447443176
12 1222094.314980349
13 831788.1684918577
14 602748.8411453317
15 461530.9533488357
16 369058.5342357412
17 304555.4725390519
18 256821.1522600044
19 219781.09648833825
20 190042.20159645443
21 165509.73662128946
22 144908.6828892689
23 127399.64971904151
24 112375.40280014066
25 99427.54925781919
26 88214.06915295873
27 78452.36294916696
28 69939.56122515562
29 62484.218852655315
30 55928.75251508532
31 50153.577052188935
32 45049.472681930834
33 40536.043306527674
34 36535.6211116328
35 32980.148758412775
36 29808.927438943556
37 26976.536151415345
38 24443.8779491549
39 22176.441753753457
40 20142.599897801723
41 18316.48101959226
42 16673.293786171816
43 15194.291578579428
44 13859.368379717747
45 12653.046642417408
46 11562.397296519595
47 

401 5.870239609845616e-06
402 5.560355718681156e-06
403 5.267211078481371e-06
404 4.989285568898259e-06
405 4.726018751973308e-06
406 4.476662862036842e-06
407 4.240677246996758e-06
408 4.016959375579613e-06
409 3.8050922878890495e-06
410 3.604436481898322e-06
411 3.4145021632136933e-06
412 3.2344376362397955e-06
413 3.0639059275772343e-06
414 2.902386565583021e-06
415 2.7495593626482187e-06
416 2.604641795615457e-06
417 2.467373763033352e-06
418 2.337342938496234e-06
419 2.2142777613428707e-06
420 2.0976112189132267e-06
421 1.987108393361011e-06
422 1.8824288768738431e-06
423 1.7833547207003406e-06
424 1.6894299138143816e-06
425 1.6004628885075811e-06
426 1.5161888593198963e-06
427 1.4364227470813827e-06
428 1.3607998778130701e-06
429 1.2891586938686508e-06
430 1.2213075924184903e-06
431 1.157069043509599e-06
432 1.096166305960946e-06
433 1.0384785185977176e-06
434 9.83838195467848e-07
435 9.321163546946076e-07
436 8.830717397612797e-07
437 8.366156878394616e-07
438 7.926200631117393e


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [49]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 27552600.0
1 22831500.0
2 21362678.0
3 20149068.0
4 17736432.0
5 14087596.0
6 10045004.0
7 6608660.5
8 4163358.0
9 2618555.25
10 1693753.0
11 1149169.25
12 822687.625
13 619761.375
14 486789.71875
15 394854.6875
16 327775.3125
17 276598.1875
18 236175.453125
19 203422.96875
20 176344.21875
21 153640.03125
22 134405.609375
23 117988.453125
24 103889.1640625
25 91723.203125
26 81191.328125
27 72036.90625
28 64059.60546875
29 57078.66796875
30 50951.234375
31 45559.4921875
32 40806.6953125
33 36607.88671875
34 32892.00390625
35 29596.123046875
36 26665.529296875
37 24055.271484375
38 21726.935546875
39 19646.58203125
40 17778.990234375
41 16106.2607421875
42 14606.61328125
43 13259.3623046875
44 12047.55859375
45 10958.0732421875
46 9975.599609375
47 9088.96875
48 8288.0234375
49 7563.56591796875
50 6907.91259765625
51 6314.69384765625
52 5777.55615234375
53 5289.7978515625
54 4846.4189453125
55 4443.14404296875
56 4076.10302734375
57 3741.80419921875
58 3436.989501953125
59 3158.839111

404 6.249840225791559e-05
405 6.146811210783198e-05
406 6.023237438057549e-05
407 5.925857840338722e-05
408 5.822923412779346e-05
409 5.720651097362861e-05
410 5.63364228582941e-05
411 5.562525984714739e-05
412 5.501155828824267e-05
413 5.390182923292741e-05
414 5.2765550208278e-05
415 5.2291576139396057e-05
416 5.1133480155840516e-05
417 5.0601749535417184e-05
418 4.979971708962694e-05
419 4.924544919049367e-05
420 4.8309422709280625e-05
421 4.7648973122704774e-05
422 4.6902849135221913e-05
423 4.627078305929899e-05
424 4.552870086627081e-05
425 4.478705159272067e-05
426 4.41436204710044e-05
427 4.3751413613790646e-05
428 4.299344072933309e-05
429 4.220712435198948e-05
430 4.1665050957817584e-05
431 4.129927037865855e-05
432 4.056377656525001e-05
433 3.998534884885885e-05
434 3.9447091694455594e-05
435 3.883754834532738e-05
436 3.83113874704577e-05
437 3.785230001085438e-05
438 3.7338420952437446e-05
439 3.6733024899149314e-05
440 3.638087582658045e-05
441 3.5974764614365995e-05
442 3

简单的autograd

In [50]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w*x + b # y = 2*1+3

y.backward()

# dy / dw = x
print(w.grad)
print(x.grad)
print(b.grad)


tensor(1.)
tensor(2.)
tensor(1.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [51]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()

0 28299048.0
1 23773848.0
2 22744752.0
3 21905120.0
4 19649854.0
5 15638536.0
6 11072706.0
7 7138689.5
8 4427779.5
9 2753997.25
10 1788518.25
11 1231345.5
12 901943.1875
13 695891.625
14 558676.6875
15 461274.40625
16 388331.75
17 331332.625
18 285398.9375
19 247551.453125
20 215888.609375
21 189121.125
22 166300.453125
23 146718.046875
24 129826.359375
25 115199.1171875
26 102478.0
27 91368.734375
28 81641.3984375
29 73104.234375
30 65590.7421875
31 58956.76953125
32 53084.37890625
33 47891.0234375
34 43277.296875
35 39167.4453125
36 35500.9375
37 32221.259765625
38 29285.00390625
39 26650.982421875
40 24283.328125
41 22152.609375
42 20233.28125
43 18500.203125
44 16933.744140625
45 15516.5244140625
46 14236.875
47 13074.3955078125
48 12017.85546875
49 11056.7421875
50 10181.212890625
51 9382.912109375
52 8654.2939453125
53 7988.30078125
54 7379.11328125
55 6821.57470703125
56 6310.43408203125
57 5841.67041015625
58 5411.4658203125
59 5016.1220703125
60 4652.5615234375
61 4317.8793945

371 0.000430035637691617
372 0.00041690855869092047
373 0.0004039099731016904
374 0.00039074532105587423
375 0.00037839720607735217
376 0.00036655631265603006
377 0.00035635297535918653
378 0.00034505853545852005
379 0.0003353143692947924
380 0.0003256569616496563
381 0.00031681396649219096
382 0.00030688545666635036
383 0.00029813003493472934
384 0.00028884338098578155
385 0.00028048918466083705
386 0.00027256313478574157
387 0.0002644407795742154
388 0.0002581024600658566
389 0.000250981334829703
390 0.0002443348348606378
391 0.000237829823163338
392 0.00023110542679205537
393 0.00022483438078779727
394 0.00021909073984716088
395 0.00021301768720149994
396 0.00020745050278492272
397 0.00020246249914634973
398 0.00019726429309230298
399 0.0001921433868119493
400 0.00018728544819168746
401 0.0001826612133299932
402 0.00017783776274882257
403 0.0001731167285470292
404 0.00016890570987015963
405 0.00016485506785102189
406 0.00016081782814580947
407 0.00015721323143225163
408 0.0001534108


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [52]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)  # initialization 
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()   if using gpu

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad()

0 23218340.0
1 15432958.0
2 11853179.0
3 10020764.0
4 8970540.0
5 8208570.5
6 7487490.5
7 6685167.0
8 5799485.0
9 4871694.0
10 3974821.0
11 3162712.5
12 2472715.0
13 1910447.75
14 1469828.25
15 1130703.125
16 874145.5
17 681080.5625
18 536303.875
19 427299.90625
20 344746.8125
21 281665.34375
22 232997.5
23 194964.140625
24 164897.21875
25 140824.5
26 121315.2265625
27 105312.5703125
28 92054.5078125
29 80938.109375
30 71540.6484375
31 63518.359375
32 56619.83984375
33 50639.734375
34 45428.33984375
35 40862.44921875
36 36845.39453125
37 33298.2265625
38 30148.783203125
39 27344.07421875
40 24840.05859375
41 22597.23046875
42 20583.4921875
43 18772.28515625
44 17139.5859375
45 15665.26171875
46 14331.98828125
47 13124.224609375
48 12027.958984375
49 11032.3916015625
50 10127.619140625
51 9303.9150390625
52 8553.17578125
53 7868.5791015625
54 7244.4111328125
55 6674.6123046875
56 6153.46533203125
57 5676.36572265625
58 5239.31005859375
59 4838.67236328125
60 4471.1640625
61 4133.8398437

370 0.0003891066589858383
371 0.0003783211577683687
372 0.0003680475929286331
373 0.0003573755093384534
374 0.00034712901106104255
375 0.0003365539596416056
376 0.0003268725995440036
377 0.00031859398586675525
378 0.0003101077163591981
379 0.0003017225826624781
380 0.0002943536383099854
381 0.00028593529714271426
382 0.0002775838947854936
383 0.0002709585824050009
384 0.0002634820411913097
385 0.00025676918448880315
386 0.0002497597015462816
387 0.00024287670385092497
388 0.00023676559794694185
389 0.0002311005664523691
390 0.00022483730572275817
391 0.00021936447592452168
392 0.00021399851539172232
393 0.00020864466205239296
394 0.00020373743609525263
395 0.00019801179587375373
396 0.00019345644977875054
397 0.0001886338141048327
398 0.00018443366570863873
399 0.0001796508440747857
400 0.0001758566068019718
401 0.0001717096020001918
402 0.00016780183068476617
403 0.0001635204826015979
404 0.0001601826079422608
405 0.00015616683231201023
406 0.00015219485794659704
407 0.000149223269545

In [113]:
model[0].weight

Parameter containing:
tensor([[-0.0218,  0.0212,  0.0243,  ...,  0.0230,  0.0247,  0.0168],
        [-0.0144,  0.0177, -0.0221,  ...,  0.0161,  0.0098, -0.0172],
        [ 0.0086, -0.0122, -0.0298,  ..., -0.0236, -0.0187,  0.0295],
        ...,
        [ 0.0266, -0.0008, -0.0141,  ...,  0.0018,  0.0319, -0.0129],
        [ 0.0296, -0.0005,  0.0115,  ...,  0.0141, -0.0088, -0.0106],
        [ 0.0289, -0.0077,  0.0239,  ..., -0.0166, -0.0156, -0.0235]],
       requires_grad=True)


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [55]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 28732992.0
1 27389154.0
2 28437134.0
3 27870268.0
4 23446660.0
5 16457074.0
6 9840677.0
7 5417547.0
8 3003722.0
9 1804423.5
10 1207240.125
11 889561.125
12 701820.0625
13 577609.5
14 487402.8125
15 417395.15625
16 360832.78125
17 313975.4375
18 274600.34375
19 241237.046875
20 212677.890625
21 188156.5625
22 166976.671875
23 148599.890625
24 132620.0625
25 118660.75
26 106402.375
27 95612.7890625
28 86085.828125
29 77645.0859375
30 70151.859375
31 63500.1875
32 57568.6953125
33 52269.21875
34 47526.6015625
35 43271.29296875
36 39445.796875
37 36006.8203125
38 32903.65234375
39 30100.4375
40 27564.453125
41 25265.671875
42 23183.05078125
43 21290.37109375
44 19568.66015625
45 18000.998046875
46 16571.859375
47 15267.4697265625
48 14075.3896484375
49 12985.5185546875
50 11987.9521484375
51 11073.775390625
52 10234.728515625
53 9464.603515625
54 8757.29296875
55 8106.9130859375
56 7508.60009765625
57 6957.7998046875
58 6450.234375
59 5982.16748046875
60 5550.45751953125
61 5152.01855468

372 0.00022275075025390834
373 0.00021627926616929471
374 0.00021043707965873182
375 0.0002045500441454351
376 0.00019822017929982394
377 0.00019317270198371261
378 0.00018814984650816768
379 0.0001830885885283351
380 0.00017781062342692167
381 0.00017323947395198047
382 0.0001682989386608824
383 0.00016471804701723158
384 0.0001607760787010193
385 0.00015632288705091923
386 0.0001517540222266689
387 0.00014838622882962227
388 0.00014435607590712607
389 0.00014082089182920754
390 0.00013746944023296237
391 0.00013425663928501308
392 0.00013135006884112954
393 0.00012777149095200002
394 0.00012536850408650935
395 0.00012254458852112293
396 0.00012027286720694974
397 0.0001173562923213467
398 0.0001146419090218842
399 0.00011184575851075351
400 0.00010930215648841113
401 0.00010641732660587877
402 0.00010399378516012803
403 0.00010168743756366894
404 9.944023622665554e-05
405 9.738239168655127e-05
406 9.538907033856958e-05
407 9.31437753024511e-05
408 9.116100409300998e-05
409 8.97415666


PyTorch: self-define nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [56]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 732.715576171875
1 714.9182739257812
2 697.7252807617188
3 681.110595703125
4 664.9786376953125
5 649.2716064453125
6 634.0565185546875
7 619.1920166015625
8 604.7763061523438
9 590.8048706054688
10 577.2684936523438
11 564.0919189453125
12 551.289306640625
13 538.880126953125
14 526.8214721679688
15 515.2015380859375
16 504.00421142578125
17 493.0967102050781
18 482.48114013671875
19 472.1227111816406
20 461.9892272949219
21 452.1866760253906
22 442.6809387207031
23 433.42724609375
24 424.4046936035156
25 415.640869140625
26 407.04241943359375
27 398.6216735839844
28 390.43682861328125
29 382.4599609375
30 374.6800842285156
31 367.06781005859375
32 359.6200866699219
33 352.32598876953125
34 345.22357177734375
35 338.2963562011719
36 331.4851989746094
37 324.7981872558594
38 318.230712890625
39 311.81103515625
40 305.5078125
41 299.328125
42 293.2718200683594
43 287.34710693359375
44 281.5329895019531
45 275.820068359375
46 270.1873779296875
47 264.6455993652344
48 259.2001037597656


367 0.001038744580000639
368 0.001003346755169332
369 0.0009692653547972441
370 0.0009364671423099935
371 0.0009048557840287685
372 0.0008744219085201621
373 0.000845091650262475
374 0.0008168242638930678
375 0.0007895745802670717
376 0.0007632998749613762
377 0.0007379769231192768
378 0.000713529996573925
379 0.0006899612490087748
380 0.0006672180024906993
381 0.0006452657398767769
382 0.0006240763468667865
383 0.0006036179256625473
384 0.0005838656215928495
385 0.0005647919024340808
386 0.0005463658017106354
387 0.0005285703809931874
388 0.000511368562001735
389 0.0004947558045387268
390 0.0004786926438100636
391 0.0004631750052794814
392 0.00044817355228587985
393 0.00043366695172153413
394 0.0004196464142296463
395 0.00040608819108456373
396 0.0003929724043700844
397 0.00038029448478482664
398 0.00036802756949327886
399 0.0003561547491699457
400 0.0003446804767008871
401 0.0003335759392939508
402 0.00032283057225868106
403 0.00031243113335222006
404 0.0003023733734153211
405 0.0002