# 手动实现简单的神经网络
本节内容我们手动实现一个简单的神经网络，不使用任何深度学习框架。目的在于**理解深度学习的原理**，包括参数的前向传递和反向传递。反向传播的过程使用梯度下降算法来更新参数。

In [1]:
## 加载数据集
from tensorflow.keras.datasets import boston_housing

(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

In [2]:
x_train.shape, y_train.shape, x_test.shape, y_test.shape

((404, 13), (404,), (102, 13), (102,))

## 构造参数
这里定义输入层和输出层，不定义隐藏层参数，相当于是一个多元线性回归的问题。
1. 输入数据有13个特征，就有13个w参数
2. 输出只有一个维度，所以参w数的shape为13x1，加上一个偏置b

总的参数量为14个，没有激活层。boston_housing数据是根据13个特征来预测房价，这是一个典型的回归问题。


In [3]:
import numpy as np
import matplotlib.pyplot as plt

In [4]:
# 初始化参数
w = np.random.rand(13, 1)
b = np.random.rand(1)

In [5]:
w.shape

(13, 1)

## 前向传播

前向传播过程计算参数与权重的乘积， 直到输出最终的预测结果。$$\hat y = wx+b$$ 其中$\hat y$就是最终的输出结果。接下来计算误差，我们采用MSE（平均平方误差）。总的误差E作为全部样本的误差和的平均值。 $$ E = \frac {1} {2m} \sum_{i=1}^m (\hat y_i - y_i) ^2 $$

In [6]:
hat_y = np.dot(x_train, w) + b * np.ones_like(x_train.shape[0]) # hat_y.shape = (404, 1)

In [7]:
error = hat_y - y_train.reshape(-1, 1) # (404, 1)

In [8]:
error.shape

(404, 1)

## 反向传播

参数$w, b$取值，使得总的误差E最小。我们采用梯度下降算法来计算参数。

$$ \Delta w = \gamma \frac{\partial E}{\partial w} $$
$$ \Delta b = \gamma \frac{\partial E}{\partial b} $$

$$ w \leftarrow w -  \Delta w $$
$$ b \leftarrow b -  \Delta b $$


梯度下降算法的关键在于计算梯度值，接下来推导一下梯度的计算过程。

$$ \frac{\partial E}{\partial w_j} = \sum_{i=1} ^m \frac{\partial E}{\partial \hat y_i} \frac{\partial \hat y_i}{\partial w_j} \\
$$


$$\frac{\partial E}{\partial \hat y_i} = \frac 1 m (\hat y_i - y_i)$$
$$ \frac{\partial \hat y_i}{\partial w_j} = x_j $$

同理

$$ \frac{\partial E}{\partial b} =  \frac 1 m \sum_{i=1}^m (\hat y_i - y_i) $$

In [9]:
epoches = 500000
lr = 1e-6
m = x_train.shape[0] # 样本量

In [10]:
for epoch in range(epoches):
    delta_w = lr * np.dot(x_train.T, error)
    delta_b = lr* error.sum()
    
    w = w - delta_w / m
    b = b - delta_b / m
    
    hat_y = np.dot(x_train, w) + b * np.ones_like(x_train.shape[0])
    error = hat_y - y_train.reshape(-1, 1)
    
    loss = 0.5 * error ** 2
    
    if epoch % 1000 == 0:
        print(f"epoch: {epoch} \t loss: {loss.sum()}")

epoch: 0 	 loss: 10295627.992053345
epoch: 1000 	 loss: 45310.45611735715
epoch: 2000 	 loss: 36517.299120095224
epoch: 3000 	 loss: 31457.156618423713
epoch: 4000 	 loss: 28236.034005940983
epoch: 5000 	 loss: 26006.0726613328
epoch: 6000 	 loss: 24341.45202293557
epoch: 7000 	 loss: 23016.546603254064
epoch: 8000 	 loss: 21908.29141260578
epoch: 9000 	 loss: 20947.89059524959
epoch: 10000 	 loss: 20095.70479014146
epoch: 11000 	 loss: 19327.932217213856
epoch: 12000 	 loss: 18629.485381561164
epoch: 13000 	 loss: 17990.16195125253
epoch: 14000 	 loss: 17402.575518984006
epoch: 15000 	 loss: 16861.02891043295
epoch: 16000 	 loss: 16360.893412253867
epoch: 17000 	 loss: 15898.260368855088
epoch: 18000 	 loss: 15469.740101531923
epoch: 19000 	 loss: 15072.341118734606
epoch: 20000 	 loss: 14703.393619605496
epoch: 21000 	 loss: 14360.497901221157
epoch: 22000 	 loss: 14041.487174680358
epoch: 23000 	 loss: 13744.3990643156
epoch: 24000 	 loss: 13467.45262628786
epoch: 25000 	 loss: 1320

In [11]:
test_num = lambda num: (np.dot(x_test[num, :], w) + b, y_test[num])

In [12]:
test_num(2)

(array([22.87754994]), 19.0)

In [13]:
test_num(4)

(array([24.01166513]), 22.2)

In [14]:
test_num(63)

(array([21.23986411]), 21.2)

In [15]:
test_num(77)

(array([21.40267385]), 28.1)