# 一个二分类交叉熵的三层全连接神经网络

一个三层全连接网络包含输入层，隐藏层和输出层。隐藏层加sigmoid作为激活函数，输出层加sigmoid做为二分类器

In [141]:
import numpy as np

sigmoid函数的公式是：
$f(x) = \displaystyle\frac{1}{1+e^{-x}}$

In [142]:
def sigmoid(x):
  return 1 / (1 + np.exp(-x))

对sigmoid函数求导的公式是：$f'(x) = \displaystyle\frac{1}{1+e^{-x}} * (1 - \frac{1}{1+e^{-x}}) = f(x) * (1 -f(x))$

In [143]:
def sigmoid_deriv(x):
  y = sigmoid(x)
  return y * (1 - y)

这里使用交叉熵(cross entropy)作为损失函数，假设y为标签，$\hat{y}$是预测结果，则有\
$\displaystyle L(y，\hat{y})=-(y*log\hat{y} + (1-y)*log\hat{y})$ \
对一批训练集的结果还需要做一下平均，最后的结果就是：\
$\displaystyle\varepsilon =\frac{1}{m}\sum_{i=1}^{m}L(y_{i}, \hat{y_{i}})$

In [144]:
def cross_entropy(y_hat, y):
    return - (y * np.log(y_hat)) + (1 - y) * np.log(1 - y_hat)

def loss_func(y_hat, y) :
    return cross_entropy(y_hat, y).mean()

函数$L(y，\hat{y})$对$\hat{y}$求导的公式是:
$\displaystyle \frac{\partial L}{\partial \hat{y}} = -\frac{y}{\hat{y}} + \frac{1 - y}{1 - \hat{y}} $

In [145]:
def cross_entropy_deriv(y_hat, y):
    eps = 0.000000000001
    return -np.divide(y, y_hat + eps) + np.divide(1 - y, 1 - y_hat + eps)

定义一个类SimpleNeuralNetwork，参数随机初始化为\[0,1)区间的数

假定输入为x，标签为y；中间层结果为h$\hat{y}$，输出结果为$\hat{y}$，每层神经元的计算公式为：
$\displaystyle w^{T}x+b$，那么一个三层神经网络的公式为：\
$\displaystyle h = w_{1}^{T}x+b_{1}$  \
$\displaystyle \hat{h} = sigmoid(h)$ \
$\displaystyle o = w_{2}^{T}\hat{h}+b_{2}$   \
$\hat{y} = sigmoid(o)$

对于反向传播算法，就是计算损失函数对参数w和b的导数，再利用这些导数和学习率来更新参数。\
$\displaystyle\frac{\partial L }{\partial w_{2}} = \frac{\partial L}{\partial \hat{y}}\frac{\partial\hat{y}}{\partial o}\frac{\partial o}{\partial w_{2}} = \frac{crossentropy'(\hat{y}, y) * sigmoid'(o))^{T}*\hat{h} }{ m}$\
$\displaystyle\frac{\partial L }{\partial b_{2}} = \frac{\partial L}{\partial \hat{y}}\frac{\partial\hat{y}}{\partial o}\frac{\partial o}{\partial b_{2}} = \frac{\sum(crossentropy'(\hat{y}, y) * sigmoid'(o))}{m}$ \
我们这里使用的batch训练法，x包含了一批数据（每行代表一个数据）。计算w1'的矩阵运算会通过数据相加来消除维度，因此w1'的值都要除以m (这里m表示一批数据的大小), 计算b2'得到的结果包含了一批数据，因此要计算它们的平均值。后面w1'和b1'的计算都类似


In [146]:
class SimpleNeuralNetwork:
    def __init__(self, input1, hidden, output):
        self.w1 = np.random.random((hidden, input1))
        self.w2 = np.random.random((output, hidden))
        self.b1 = np.random.random((hidden, 1))
        self.b2 = np.random.random((output, 1))
    def feed_forward(self, x):
        self.x = x
        self.h = np.dot(x, self.w1.T) + self.b1.T
        self.h_hat = sigmoid(self.h)
        #
        self.o = np.dot(self.h_hat, self.w2.T) + self.b2.T
        self.y_hat = sigmoid(self.o)
        return self.y_hat
    def feed_backward(self, y, lr):
        d_o = cross_entropy_deriv(self.y_hat, y) * sigmoid_deriv(self.o) 
        dw2 =np.dot(d_o.T, self.h_hat) / d_o.shape[0]
        db2 = d_o.mean(axis = 0).reshape(-1, 1)
        dh_hat = d_o * self.w2
        #
        d_h = dh_hat*sigmoid_deriv(self.h)
        dw1 = np.dot(d_h.T, self.x) / d_h.shape[0]
        db1 = d_h.mean(axis = 0).reshape(-1, 1)
       
        self.w1 -= lr * dw1
        self.w2 -= lr * dw2
        self.b1 -= lr * db1
        self.b2 -= lr * db2

这里定义一个2x3x1的网络来用于训练


In [164]:
def train(data, label):
    model = SimpleNeuralNetwork(2, 3, 1)
    learn_rate = 0.1
    epochs = 10000

    for epoch in range(epochs):
        y_hat = model.feed_forward(data)
        loss = loss_func(y_hat, label)
        model.feed_backward(label, learn_rate)

        #print("Epoch %d loss:", loss)

    return model

使用西瓜数据集来做测试

In [163]:
def main():
    X = np.array([[0.607, 0.460],
                  [0.774, 0.376],
                  [0.634, 0.264],
                  [0.608, 0.318],
                  [0.556, 0.215],
                  [0.403, 0.237],
                  [0.481, 0.149],
                  [0.437, 0.211],
                  [0.666, 0.091],
                  [0.243, 0.267],
                  [0.245, 0.057],
                  [0.343, 0.099],
                  [0.639, 0.161],
                  [0.657, 0.198],
                  [0.360, 0.370],
                  [0.593, 0.042],
                  [0.719, 0.103],
                  ])
    y = np.array([1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0])

    model = train(X, y.reshape(-1,1))

    wm1 = np.array([0.719, 0.103]).reshape(1, -1)
    wm2 = np.array([0.607, 0.406]).reshape(1, -1)
    print(f"wm1: {np.squeeze(model.feed_forward(wm1)):.3f}")
    print(f"wm2: {np.squeeze(model.feed_forward(wm2)):.3f}")

if __name__ == "__main__":
    main()

wm1: 0.266
wm2: 0.885
