In [1]:
%matplotlib inline


Warm-up: numpy 热身：numpy
--------------

A fully-connected ReLU network with one hidden layer and no biases, trained to predict y from x using Euclidean error.

一个全连接ReLU网络，一个隐藏层，没有偏差，训练用x预测y，使用欧几里得误差。

This implementation uses numpy to manually compute the forward pass, loss, and backward pass.

这个实现使用numpy来手工计算前向传播、损失、反向传播。

A numpy array is a generic n-dimensional array; it does not know anything about deep learning or gradients or computational graphs, and is just a way to perform generic numeric computations.

numpy数组是一个通用的n维数组;它对深度学习、梯度或计算图一无所知，只是一种执行泛型数字计算的方法。


Bias(偏差)，Error(误差)，Variance(方差)

In [3]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

In [4]:
x

array([[ 0.28035856,  0.75160063,  1.90459357, ...,  1.5564473 ,
         0.39803313, -1.24833685],
       [ 1.39040605, -0.34029133, -0.18968654, ...,  1.52399182,
        -0.45284732,  0.66065498],
       [-0.80763839,  0.33872613,  1.01897088, ...,  0.56202622,
        -2.02279023,  1.16422272],
       ..., 
       [ 1.21436558, -1.81938721, -1.16691364, ...,  1.99576436,
        -1.08719709, -0.92796977],
       [-0.20005929,  1.61603632, -0.60540364, ..., -1.16503565,
        -0.96092737,  0.04606042],
       [ 0.55172781, -0.38103249,  0.41836932, ...,  0.27550859,
         0.25122383,  1.61432648]])

In [5]:
y

array([[  1.61192973e+00,   9.71551163e-01,   1.26621729e+00,
         -1.36719475e+00,  -1.11679286e+00,   7.45439838e-01,
          1.66228816e+00,  -5.77675548e-02,   1.43295207e+00,
         -1.57492541e+00],
       [  2.26106536e-01,   1.42528997e-01,  -1.05286092e-01,
         -1.98281143e-01,   7.15744311e-01,   6.35861554e-01,
         -9.30363699e-02,  -1.01151706e+00,  -5.26576953e-01,
          3.46786096e-01],
       [ -2.14348050e+00,   7.14387054e-01,   3.90858182e-01,
          6.96021734e-01,  -5.31009004e-02,  -3.88163888e-01,
         -1.38436690e+00,  -4.17640381e-01,  -3.03364632e-01,
         -1.46818300e+00],
       [  6.94604369e-02,  -5.62357833e-01,  -1.56671691e+00,
          3.05048111e+00,  -2.44733945e+00,   6.81682552e-01,
          5.51361849e-01,  -2.62004359e-01,  -1.17562925e+00,
          1.65167208e+00],
       [  1.74664791e-01,   3.69964875e-02,   2.16628664e+00,
          1.25151405e-01,  -4.41861716e-01,  -1.15715539e+00,
         -2.44011889e-01

In [2]:
learning_rate = 1e-6

for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

0 32332818.1694
1 26433248.3831
2 24237993.2033
3 22049061.4902
4 18279205.2206
5 13605302.5579
6 9090423.80891
7 5744863.8775
8 3565824.15901
9 2284373.4375
10 1541265.23183
11 1107446.63989
12 841755.175741
13 669466.762314
14 550093.861678
15 462420.792084
16 394893.802163
17 341063.660175
18 296993.637421
19 260240.936845
20 229206.167577
21 202750.39503
22 180044.664147
23 160419.206366
24 143378.65122
25 128505.97671
26 115467.354403
27 104013.380983
28 93910.2580694
29 84975.731323
30 77048.3711029
31 70000.7571837
32 63717.7643609
33 58095.689354
34 53051.8938647
35 48518.380091
36 44436.2241761
37 40755.4418437
38 37432.839262
39 34426.6371793
40 31698.6416011
41 29221.5261839
42 26966.1804685
43 24900.4355946
44 23016.3063997
45 21294.8264781
46 19719.8247795
47 18271.8376938
48 16944.6724794
49 15726.4109478
50 14607.2612998
51 13578.3056591
52 12632.4197364
53 11760.5258065
54 10956.5166143
55 10213.8577717
56 9527.6048249
57 8892.98428108
58 8305.33353851
59 7760.89991565
