# 一篇文章教你用 11 行 Python 代碼實現神經網絡

## Very Simple Neural Network

輸出的爲樣本爲 X 爲 4×3，有 4 個樣本 3 個屬性，每一個樣本對於這一個真實值 y，爲 4×1 的向量，我們要根據 input 的值輸出與 y 值損失最小的輸出。

<div>
$$
    X = \begin{bmatrix}
        0 & 0 & 1 \\
        1 & 1 & 1 \\
        1 & 0 & 1 \\
        0 & 1 & 1
    \end{bmatrix},
    \;\;\;\;
    y = \begin{bmatrix}
        0 \\
        1 \\
        1 \\
        0
    \end{bmatrix}
$$
</div>

## Two Layer Neural Network

首先考慮最簡單的神經網絡，其中第一層是輸入層，第二層是輸出層。輸入層有3個神經元(因爲有3個屬性),輸出爲一個值，w1, w2, w3 爲其權重。輸出爲:

<div>
$$
    f(w_1 \cdot x_1+w_2 \cdot x_2+w_3 \cdot x_3)
$$
</div>

這裡的 f 爲 sigmoid 函數： 

<div>
$$
    f(x)=\frac{1}{1+e^{-x}}
$$
</div>

其導數為：

<div>
$$
    \frac{df}{dx}=f(x)(1-f(x))
$$
</div>

神經網絡的優化過程是：
1. 前向傳播求損失
2. 反向傳播更新 w

#### 程式碼如下：

In [1]:
import numpy as np

In [18]:
# sigmoid function

# deriv=ture 是求導數

def sigmoid(x,deriv=False):

    if(deriv==True):

        return x*(1-x)

    return 1/(1+np.exp(-x))

In [72]:
# ReLu function

# derive=true 是求導數

def relu(x,deriv=False):
    
    result = np.array([[0.0,0.0,0.0,0.0]]).T
    
    for i in range(len(x)):
        
        if (deriv==False):
            
            if x[i] >= 0:
                
                result[i] = x[i]
                
        else:
            
            if x[i] >= 0:
                
                result[i] = 1.0
                
    return result

In [57]:
def train_by_2_layers(X,y,function):

    # seed random numbers to make calculation

    np.random.seed(1)

    # initialize weights randomly with mean 0

    syn0 = 2*np.random.random((3,1)) - 1
    print("syn0=")
    print(syn0)

    n = 10000

    for iter in range(n):

        # forward propagation

        # l0也就是輸入層

        l0 = X

        l1 = function(np.dot(l0,syn0))

        # how much did we miss?

        l1_error = l1 - y

        # multiply how much we missed by the slope of the sigmoid at the values in l1

        l1_delta = - l1_error * function(l1,True)

        # update weights

        syn0 += np.dot(l0.T,l1_delta)

    print("Output After Training:")
    print("syn0=")
    print(syn0)
    print("l1=")
    print(l1)

In [11]:
# input dataset

X = np.array([  [0,0,1],

                [1,1,1],

                [1,0,1],

                [0,1,1] ])

# output dataset            

y = np.array([[0,1,1,0]]).T

print("X=")
print(X)
print("y=")
print(y)

X=
[[0 0 1]
 [1 1 1]
 [1 0 1]
 [0 1 1]]
y=
[[0]
 [1]
 [1]
 [0]]


In [58]:
train_by_2_layers(X,y,sigmoid)

syn0=
[[-0.16595599]
 [ 0.44064899]
 [-0.99977125]]
Output After Training:
syn0=
[[ 9.67299303]
 [-0.2078435 ]
 [-4.62963669]]
l1=
[[ 0.00966449]
 [ 0.99211957]
 [ 0.99358898]
 [ 0.00786506]]


In [59]:
train_by_2_layers(X,y,relu)

syn0=
[[-0.16595599]
 [ 0.44064899]
 [-0.99977125]]
Output After Training:
syn0=
[[-0.11407714]
 [-1.93882287]
 [-5.39385291]]
l1=
[[ 0.        ]
 [ 3.28455359]
 [ 2.93882287]
 [ 0.        ]]


## Three Layer Neural Network

我們知道，兩層的神經網絡即爲一個小的感知機，它只能出來線性可分的數據，如果是線性不可分，則其出來的效果較差。

In [60]:
# input dataset

X = np.array([  [0,0,1],

                [0,1,1],

                [1,0,1],

                [1,1,1] ])

# output dataset            

y = np.array([[0,1,1,0]]).T

print("X=")
print(X)
print("y=")
print(y)

X=
[[0 0 1]
 [0 1 1]
 [1 0 1]
 [1 1 1]]
y=
[[0]
 [1]
 [1]
 [0]]


In [61]:
train_by_2_layers(X,y,sigmoid)

syn0=
[[-0.16595599]
 [ 0.44064899]
 [-0.99977125]]
Output After Training:
syn0=
[[  2.08166817e-16]
 [  2.22044605e-16]
 [ -3.05311332e-16]]
l1=
[[ 0.5]
 [ 0.5]
 [ 0.5]
 [ 0.5]]


In [62]:
train_by_2_layers(X,y,relu)

syn0=
[[-0.16595599]
 [ 0.44064899]
 [-0.99977125]]
Output After Training:
syn0=
[[ 0.]
 [ 0.]
 [ 2.]]
l1=
[[ 0.]
 [ 0.]
 [ 0.]
 [ 0.]]


因爲數據並不是線性可分的，因此它是一個非線性的問題，神經網絡的強大之處就是其可以搭建更多的層來對非線性的問題進行處理。下面我將搭建一個含有 5 個神經元的隱含層，要搞清楚 w 的維度：第一層到第二層的 w 爲 3×5，第二層到第三層的 W 爲 5×1，因此還是同樣的兩個步驟，前向計算誤差，然後反向求導更新 w。

#### 程式碼如下：

In [63]:
import numpy as np

In [64]:
X = np.array([[0,0,1],

            [0,1,1],

            [1,0,1],

            [1,1,1]])

y = np.array([[0],

            [1],

            [1],

            [0]])

In [69]:
def train_by_3_layers(X,y,function):
    
    np.random.seed(1)

    # randomly initialize our weights with mean 0

    syn0 = 2*np.random.random((3,5)) - 1

    syn1 = 2*np.random.random((5,1)) - 1

    for j in range(60000):

        # Feed forward through layers 0, 1, and 2

        l0 = X

        l1 = function(np.dot(l0,syn0))

        l2 = function(np.dot(l1,syn1))

        # how much did we miss the target value?

        l2_error = y - l2

        if (j % 10000) == 0:

            print("Error:" + str(np.mean(np.abs(l2_error))))

        # in what direction is the target value?

        # were we really sure? if so, don't change too much.

        l2_delta = l2_error * function(l2,deriv=True)

        # how much did each l1 value contribute to the l2 error (according to the weights)?

        l1_error = l2_delta.dot(syn1.T)

        # in what direction is the target l1?

        # were we really sure? if so, don't change too much.

        l1_delta = l1_error * function(l1,deriv=True)

        syn1 += l1.T.dot(l2_delta)

        syn0 += l0.T.dot(l1_delta)

    print(l2)

In [70]:
train_by_3_layers(X,y,sigmoid)

Error:0.500628229093
Error:0.00899024507125
Error:0.0060486255435
Error:0.00482794013965
Error:0.00412270116481
Error:0.00365084766242
[[ 0.00225305]
 [ 0.99723356]
 [ 0.99635205]
 [ 0.00456238]]
