
用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。
- $h = W_1X$
- $a = max(0, h)$
- $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
- forward pass
- loss
- backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。


In [3]:
import numpy as np

N, D_in, H, D_out = 64,1000, 100, 10

x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6

for i in range(500):
    h = x.dot(w1)
    a = np.maximum(h, 0)
#     print(a.shape)
    y_pred = a.dot(w2)
    loss = np.square(y_pred - y).sum()
    
    grad_y_pred = 2*(y_pred - y)
#     print(grad_y_pred.shape)
    grad_w2 = a.T.dot(grad_y_pred)
#     print('gw2',grad_w2.shape)
    grad_a = grad_y_pred.dot(w2.T)
    grad_h = grad_a.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
#     print('gw1', grad_w1.shape)
    
    w1 -= learning_rate*grad_w1
    w2 -= learning_rate*grad_w2
    print(i+1,'   loss: %f'%loss)

1    loss: 32419044.420235
2    loss: 28803322.954488
3    loss: 25447912.996801
4    loss: 20299466.646728
5    loss: 14334010.358585
6    loss: 9100750.472797
7    loss: 5533105.146304
8    loss: 3406446.882498
9    loss: 2223038.949741
10    loss: 1558951.155826
11    loss: 1168941.706120
12    loss: 922449.728296
13    loss: 753947.068483
14    loss: 630579.139168
15    loss: 535479.054370
16    loss: 459587.778781
17    loss: 397482.368722
18    loss: 345769.850383
19    loss: 302260.658377
20    loss: 265383.392667
21    loss: 233881.116057
22    loss: 206809.473399
23    loss: 183437.411048
24    loss: 163175.493572
25    loss: 145539.877877
26    loss: 130142.055450
27    loss: 116645.268470
28    loss: 104781.081559
29    loss: 94310.401773
30    loss: 85048.498345
31    loss: 76838.949862
32    loss: 69545.902075
33    loss: 63052.066269
34    loss: 57255.775553
35    loss: 52077.335237
36    loss: 47436.875231
37    loss: 43274.483364
38    loss: 39529.531924
39    loss: 361

395    loss: 0.014437
396    loss: 0.013950
397    loss: 0.013480
398    loss: 0.013025
399    loss: 0.012586
400    loss: 0.012161
401    loss: 0.011751
402    loss: 0.011355
403    loss: 0.010972
404    loss: 0.010602
405    loss: 0.010245
406    loss: 0.009900
407    loss: 0.009566
408    loss: 0.009244
409    loss: 0.008933
410    loss: 0.008632
411    loss: 0.008341
412    loss: 0.008060
413    loss: 0.007788
414    loss: 0.007526
415    loss: 0.007273
416    loss: 0.007028
417    loss: 0.006791
418    loss: 0.006563
419    loss: 0.006342
420    loss: 0.006128
421    loss: 0.005922
422    loss: 0.005723
423    loss: 0.005530
424    loss: 0.005344
425    loss: 0.005164
426    loss: 0.004991
427    loss: 0.004823
428    loss: 0.004661
429    loss: 0.004504
430    loss: 0.004352
431    loss: 0.004206
432    loss: 0.004065
433    loss: 0.003928
434    loss: 0.003796
435    loss: 0.003668
436    loss: 0.003545
437    loss: 0.003426
438    loss: 0.003311
439    loss: 0.003199
440    los


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。

In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    h = x.mm(w1) # N * H
    h_relu = h.clamp(min=0) # N * H
    y_pred = h_relu.mm(w2) # N * D_out
    
    # compute loss
    loss = (y_pred - y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weights of w1 and w2
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2


PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。

In [None]:
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute loss
    loss = (y_pred - y).pow(2).sum() # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad
        w1.grad.zero_()
        w2.grad.zero_()


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。



In [None]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H, bias=False), # w_1 * x + b_1
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out, bias=False),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

# model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())
    
    # Backward pass
    loss.backward()
    
    # update weights of w1 and w2
    with torch.no_grad():
        for param in model.parameters(): # param (tensor, grad)
            param -= learning_rate * param.grad
            
    model.zero_grad()


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。

In [None]:
import torch.nn as nn
import torch

x = torch.randn(64, 1000)
y = torch.randn(64, 10)

model = nn.Sequential(
    nn.Linear(1000, 100, bias = False),
    nn.ReLU(),
    nn.Linear(100, 10, bias = False)
)
torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr = learning_rate)
loss_fn = nn.MSELoss(reduction = 'sum')
for it in range(500):
    y_pred = model(x)
    loss = loss_fn(y_pred, y)
    print(it+1,loss.item())
    
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [1]:
import torch.nn as nn
import torch
import torch.nn.functional as F
N, D_in, H, D_out = 64, 1000, 100, 10

# 随机创建一些训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = self.linear2(x)
        return x

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

for it in range(500):
    # Forward pass
    y_pred = model(x) # model.forward() 
    
    # compute loss
    loss = loss_fn(y_pred, y) # computation graph
    print(it, loss.item())

    optimizer.zero_grad()
    # Backward pass
    loss.backward()
    
    # update model parameters
    optimizer.step()


0 708.2902221679688
1 690.6024780273438
2 673.4022216796875
3 656.6740112304688
4 640.3989868164062
5 624.6063842773438
6 609.2277221679688
7 594.2924194335938
8 579.7576293945312
9 565.6204833984375
10 551.8333129882812
11 538.4389038085938
12 525.4700927734375
13 512.8477783203125
14 500.6307373046875
15 488.703369140625
16 477.16925048828125
17 465.9186096191406
18 454.9781799316406
19 444.3111877441406
20 433.9541015625
21 423.9173889160156
22 414.1399841308594
23 404.56536865234375
24 395.1788024902344
25 386.0086975097656
26 377.04425048828125
27 368.3394470214844
28 359.90484619140625
29 351.6649169921875
30 343.64385986328125
31 335.83502197265625
32 328.1756591796875
33 320.7092590332031
34 313.4267272949219
35 306.3157958984375
36 299.377685546875
37 292.6019287109375
38 285.9645690917969
39 279.4676208496094
40 273.1219787597656
41 266.9126281738281
42 260.8498840332031
43 254.896240234375
44 249.0304412841797
45 243.2618865966797
46 237.6168212890625
47 232.09451293945312
4

359 0.001607690704986453
360 0.001545189879834652
361 0.0014850537991151214
362 0.001427192590199411
363 0.001371570280753076
364 0.0013180422829464078
365 0.0012665791437029839
366 0.0012170739937573671
367 0.0011694631539285183
368 0.0011236702557653189
369 0.0010796217247843742
370 0.001037268782965839
371 0.000996532035060227
372 0.0009573434945195913
373 0.0009196847677230835
374 0.0008834534673951566
375 0.0008486151928082108
376 0.000815116218291223
377 0.0007829057285562158
378 0.0007519570644944906
379 0.000722153636161238
380 0.000693541660439223
381 0.000666032254230231
382 0.0006395739037543535
383 0.0006141540943644941
384 0.0005896971561014652
385 0.0005662099574692547
386 0.0005436079809442163
387 0.0005218963488005102
388 0.0005010328604839742
389 0.00048097403487190604
390 0.00046169120469130576
391 0.0004431648994795978
392 0.0004253685474395752
393 0.0004082614032085985
394 0.0003918131405953318
395 0.0003760193649213761
396 0.00036083365557715297
397 0.0003462542372