#### 一、背景

##### 1.1、数值微分计算梯度有什么问题？

- 费时间

#### 二、计算图

##### 2.1、实际问题建模为计算图表示的过程

- 在超市买了 2 个苹果、3 个橘子。其中，苹果每个 100 元，橘子每个 150 元。消费税是 10%，请计算支付金额。

- ![](./attachements/基于计算图的建模过程.png)
- ![](./attachements/反向传播计算实例.png)

##### 2.2、正向传播与反向传播传播的是什么？
- 正向传播：传播的是激活值（中间计算结果）
- 反向传播：传播的是梯度信息（损失对每个节点的梯度）


##### 2.3、为什么要使用计算图？
- 使用计算图最大的原因是，可以通过反向传播高效计算导数。

#### 三、链式法则


##### 3.1、什么是链式法则？

- 定义：如果某个函数由复合函数表示，则该复合函数的导数可以用构成复合函数的各个函数的导数的乘积表示。
- 实例：

$$
\begin{align*}
z = t^2 \\
t = x + y
\end{align*}

\longrightarrow 

\frac{\partial z}{\partial x}=\frac{\partial z}{{\color{Red} \partial t} } \frac{{\color{Red} \partial t} }{\partial x}

\longrightarrow 

=2t \cdot 1 = 2(x + y)
$$

##### 3.2、链式法则在计算图中怎么实现反向传播的？

- 沿着与正方向相反的方向，乘上局部导数后传递
![](./attachements/反向传播在计算图中沿着与正方向相反的方向乘上局部导数后传递-上.png)
![](./attachements/反向传播在计算图中沿着与正方向相反的方向乘上局部导数后传递-下.png)



#### 四、反向传播

##### 4.1、MulLayer乘法层实现原理

$$
z = xy 

\longrightarrow 

\left\{\begin{matrix}
  \frac{\partial z}{\partial x}  = y& \\
  \frac{\partial z}{\partial y}  = x&
\end{matrix}\right.
$$

![](./attachements/乘法节点的正反向传播.png)


In [3]:
class MulLayer:
    def __init__(self) -> None:
        self.x = None
        self.y = None

    def forward(self, x, y):
        self.x = x
        self.y = y

        out = x * y
        return out
    
    def backward(self, dout):
        dx = dout * self.y 
        dy = dout * self.x

        return dx, dy
    

apple = 100
apple_num = 2
tax = 1.1

mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

apple_price = mul_apple_layer.forward(apple, apple_num)
final_price = mul_tax_layer.forward(apple_price, tax)

print(final_price)

220.00000000000003




##### 4.2、AddLayer加法层实现原理

$$
z = x + y 

\longrightarrow 

\left\{\begin{matrix}
  \frac{\partial z}{\partial x}  = 1& \\
  \frac{\partial z}{\partial y}  = 1&
\end{matrix}\right.
$$

![](./attachements/加法节点的正反向传播.png)


In [5]:
class AddLayer:

    def __init__(self) -> None:
        pass

    def forward(self, x, y):
        out = x + y
        return out
    
    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        return dx, dy



##### 4.3、购买2个苹果+3个橘子的计算图与代码实现

![](./attachements/反向传播计算实例.png)


In [7]:
# 我发现图形化编程特别适合我
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# 每一个节点就是一个计算Layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tex_layer = MulLayer()


# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
orange_price = mul_orange_layer.forward(orange, orange_num)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)
price = mul_tax_layer.forward(all_price, tax)


#backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)


print(price)
print(dorange, dorange_num, dapple, dapple_num, dapple_price, dorange_price,  dall_price, dtax)



715.0000000000001
3.3000000000000003 165.0 2.2 110.00000000000001 1.1 1.1 1.1 650


##### 4.4、激活函数层的计算图与代码实现



$$
y = \left\{\begin{matrix}
  x&(x > 0) \\
  0&(x \le 0)
\end{matrix}\right.

\longrightarrow 

\frac{\partial y}{\partial x} = \left\{\begin{matrix}
  1&(x > 0) \\
  0&(x \le 0)
\end{matrix}\right.
$$

![](./attachements/ReLU层的计算图.png)

In [8]:
# ReLU层
class ReLU:
    def __init__(self) -> None:
        self.mask = None

    def forward(self, x):
        self.mask = (x <= 0 )
        out = x.copy()
        out[self.mask] = 0

        return out
    
    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout

        return dx

import numpy as np
x = np.array([[1.0, -0.5], [-2.0, 3.0]])
print(x)
mask = (x <= 0)
print(mask)


[[ 1.  -0.5]
 [-2.   3. ]]
[[False  True]
 [ True False]]


$$
\sigma(x)  = \frac{1}{1 + exp(-x)} = \frac{1}{1 + e^{-x}}  

\longrightarrow 

\sigma'(x) = \sigma(x) \cdot (1 - \sigma(x))
$$

![](./attachements/sigmoid层的计算图.png)


In [9]:
# Sigmoid层
class Sigmoid:
    def __init__(self) -> None:
        self.out = None

    def forward(self, x):
        out = 1 / (1 + np.exp(-x))
        self.out = out
        
        return out
    
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out

        return dx
    



##### 4.5、神经网络正向传播中进行的矩阵乘积运算层-Affine仿射变换
> 几何中，仿射变换包括一次线性变换和一次平移，分别对应神经网络的加权和运算与加偏置运算。

$$
\boldsymbol{Y} = \boldsymbol{X} \cdot \boldsymbol{W} + \boldsymbol{B} \\
\frac{\partial L}{\partial \boldsymbol{Y}} \\
\left\{\begin{matrix}
  \frac{\partial L}{\partial \boldsymbol{X}} =  \frac{\partial L}{\partial \boldsymbol{Y}} \cdot \boldsymbol{W}^T \\
  \frac{\partial L}{\partial \boldsymbol{W}} = \boldsymbol{X}^T \cdot \frac{\partial L}{\partial \boldsymbol{Y}}
\end{matrix}\right.
$$

![](./attachements/Affine层的反向传播.png)


##### 4.6、批版本的Affine仿射变换层计算图

![](./attachements/批版本的Affine层计算图.png)

In [10]:
# Affine层的实现

class Affine:
    def __init__(self) -> None:
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None


    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b

        return out
    
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0)

        return dx
    


##### 4.7、Softmax-with-Loss层的计算图与代码实现
![](./attachements/Softmax-with-Loss层的计算图.png)


In [None]:
# SoftmaxWithLoss


def softmax(x):
    c = np.max(x)
    exp_x = np.exp(x - c)  # 数值稳定性
    return exp_x / np.sum(exp_x)

def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
    batch_size = y.shape[0]
    return -np.sum(t * np.log(y + 1e-7)) / batch_size 

class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None # 损失
        self.y = None # softmax 的输出
        self.t = None # 监督数据（one-hot vector）
    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)
        self.loss = cross_entropy_error(self.y, self.t)
        return self.loss
    def backward(self, dout=1):
        batch_size = self.t.shape[0]
        dx = (self.y - self.t) / batch_size
        return dx

#### 五、误差反向传播算法

##### 5.1、神经网络学习全貌图

- Step-01: （mini-batch）从训练数据中选择一部分数据
- Step-02: （计算梯度）计算损失函数关于各个权重参数的梯度
- Step-03: （更新参数）将权重参数沿梯度方向进行微小的更新
- Step-04: （重复）重复步骤1、2、3


##### 5.2、具备误差反向传播算法的神经网络实现

In [11]:
# coding: utf-8
import sys, os

sys.path.append(os.pardir)

import numpy as np
from NeuralNetwork.Datasets.minist import load_mnist
from NeuralNetwork.BackwardPropagation.TwoLayerNet import TwoLayerNet

# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

iters_num = 10000
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]

    # 梯度
    # grad = network.numerical_gradient(x_batch, t_batch)
    grad = network.gradient(x_batch, t_batch)

    # 更新
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]

    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)

    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)


0.11705 0.1154
0.9051666666666667 0.9094
0.9243166666666667 0.9265
0.9382833333333334 0.9381
0.9474 0.9467
0.9538 0.9526
0.9592166666666667 0.9578
0.9624 0.9592
0.9654833333333334 0.9621
0.9675166666666667 0.963
0.97125 0.9666
0.9738166666666667 0.9695
0.9747333333333333 0.9682
0.9771666666666666 0.9711
0.9783 0.9717
0.9791833333333333 0.9722
0.9800333333333333 0.9728
