# 深度学习图解 Grokking Deep Learning

1. [web book](https://www.manning.com/books/grokking-deep-learning)
2. [github](https://github.com/iamtrask/Grokking-Deep-Learning)



概要

* chapter3 前向传播
    - 数学工具: 向量，矩阵
* chapter4 梯度下降
    - 数据工具: 微积分的导数
    - 误差与权重的定量关系 error=(pred-goal)<sup>2</sup>=(input x weight - goal)<sup>2</sup> (其中input和goal是确定的)
    - 权重增量 weight_delta=input x delta=input x (pred - goal)，就是 误差y~权重x 的导数
    - 更新权重时，请添加alpha值(0.1,0.01,0.001,...)，防止权重调整过大导致误差离散
    - 开口向上的二次曲线为例，当y'<0 时(斜率为负)，x要向正向移动，误差才会降低；反之亦然
* chapter5 一次学习多个权重
    - 多个输入的神经网络(一个输出对应一个权重增量)
    - 多个输出(相当于多个 1 input--1 output 并列，彼此没有影响)
    - 手写体识别: MNIST dataset; 权重可视化；

# chapter 3 前向传播

In [2]:
## 3.1 最简单的神经元: 单个输入进行预测

weight=0.1

def neural_network(input, weight):
    prediction=input*weight;
    return prediction

#
number_of_toes=[8.5,9.5,10,9]

input=number_of_toes[0]
neural_network(input, weight)

0.8500000000000001

In [4]:
## 3.2 多个输入进行预测

# 加权和，也叫向量的点积。
def w_sum(a,b):
    assert(len(a)==len(b))
    n=len(a)
    result=0
    for i in range(n):
        result+=a[i]*b[i]
    return result;


weight=[0.1, 0.2, 0]

def neural_network2(input, weight):
    prediction=w_sum(input, weight)
    return prediction

##
toes=[8.5,9.5,10,9] #脚趾数量
wlrec=[0.65,0.8,0.8,0.9] #比赛胜率(百分比)
nfans=[1.2,1.3,0.5,1] #粉丝数量(百万)

input=[ toes[0], wlrec[0], nfans[0]]
pred=neural_network2(input, weight)
pred

0.9800000000000001

In [5]:
## 向量的操作
def elementwise_multiplication(vec_a, vec_b): #对应元素的乘积
    assert(len(vec_a)==len(vec_b))
    output=[]
    for i in range(len(vec_a)):
        output.append( vec_a[i] * vec_b[i] )
    return output

# test
elementwise_multiplication([1,2,3], [10,20,30])

[10, 40, 90]

In [6]:
### 对应元素做加法
def elementwise_addition(vec_a, vec_b): 
    assert(len(vec_a)==len(vec_b))
    output=[]
    for i in range(len(vec_a)):
        output.append( vec_a[i] + vec_b[i] )
    return output

# test
elementwise_addition([1,2,3], [10,20,30])

[11, 22, 33]

In [8]:
### 向量加法
def vector_sum(vec_a):
    output=0;
    for i in range(len(vec_a)):
        output+=vec_a[i]
    return output;

#test
vector_sum([10,20,30])

60

In [12]:
### 向量平均值
def vector_average(vec_a):
    assert( len(vec_a)>0 )
    output=vector_sum(vec_a)/ len(vec_a)
    return output

#test
vector_average([10,20,60])

30.0

In [14]:
### 向量的点乘
def weighted_sum(a, b):
    assert( len(a) == len(b) )
    c=elementwise_multiplication(a,b)
    return vector_sum(c)

## test
weighted_sum([0.1, 0.2, 0], [8.5, 0.65, 1.2])

0.9800000000000001

> 正的权重类似 and， 而负的权重类似 not;

# 使用 numpy 实现相同的功能

In [16]:
import numpy as np
weights=np.array([0.1,0.2,0])

def neural_network3(input, weights):
    pred=input.dot(weights)
    return pred

#test
toes=np.array([8.5,9.5,10,9]) #脚趾数量
wlrec=np.array([0.65,0.8,0.8,0.9]) #比赛胜率(百分比)
nfans=np.array([1.2,1.3,0.5,1]) #粉丝数量(百万)

input=np.array([toes[0], wlrec[0], nfans[0]])
pred=neural_network3(input, weights)
pred

0.9800000000000001

## 预测多个输出

In [18]:
# 更容易实现: 根据过去的胜率记录，预测 是否受伤？是否胜利？是否难过？三个事件独立

def ele_multi_vector(ele, vect):
    output=[]
    for i in range(len(vect)):
        output.append(ele * vect[i])
    return output

# test
ele_multi_vector(0.65, [0.3,0.2,0.9])

[0.195, 0.13, 0.5850000000000001]

In [19]:
def neural_network4(input, weights):
    pred=ele_multi_vector(input, weights)
    return pred

wlrec=[0.65, 0.8, 0.8, 0.9]
input=wlrec[0]
weights=[0.3,0.2,0.9]
pred=neural_network4(input, weights);
pred

[0.195, 0.13, 0.5850000000000001]

## 多输入多输出进行预测

In [23]:
# 一组输入值[脚趾数目， 胜负纪录，粉丝数目]，一组输出值[hurt?, win?, sad?]，中间的权重矩阵
def w_sum(a,b): #向量点乘
    assert( len(a) == len(b) )
    output=0
    for i in range(len(a)):
        output+=a[i]*b[i]
    return output

def vect_mult_mat(vect, matrix):
    assert( len(vect) == len(matrix) )
    output=[]
    
    for i in range(len(vect)):
        output.append(  w_sum(vect, matrix[i])  )
    return output

def neural_network5(input, weights):
    pred=vect_mult_mat(input, weights)
    return pred

###
toes=[8.5,9.5,9.9,9]
wlrec=[0.65,0.8,0.8,0.9]
nfans=[1.2,1.3,0.5,1]

input=[ toes[0], wlrec[0], nfans[0]]

weights=[ [0.1,0.1,-0.3], # hurt?
         [0.1,0.2,0], #win?
         [0,1.3,0.1]] #sad?

pred=neural_network5(input, weights)
pred

[0.555, 0.9800000000000001, 0.9650000000000001]

> 把某一个输出向量的权重写成列vector，得到权重矩阵。[input1, in2,... inN] [a1 a2 ... an], 可知 out1=[in1 ... inN][ w11 w21 ... wn1]T

## 神经网络可以堆叠（带有一层隐藏层）

In [25]:
import numpy as np
# 输入到隐藏层的权重
ih_wgt=np.array([
    [0.1,0.2,-0.1], # output: hid[0]
    [-0.1,0.1,0.9], # output: hid[1]
    [0.1,0.4,0.1] #output: hid[2]
]).T; #转置，把一个输出的权重写成列向量

# 隐藏层到输出的权重
ho_wgt=np.array([
    [0.3,1.1,-0.3], #output: hurt?
    [0.1,0.2,0], #output: win?
    [0,1.3,0.1] #sad?
]).T

weights=[ih_wgt, ho_wgt]

def neural_network6(input, weights):
    hid=input.dot(weights[0])
    pred=hid.dot(weights[1])
    return pred

##
toes=np.array([8.5,9.5,9.9,9])
wlrec=np.array([0.65,0.8,0.8,0.9])
nfans=np.array([1.2,1.3,0.5,1])

input=np.array([toes[0], wlrec[0], nfans[0]]);
pred=neural_network6(input, weights)
pred

array([0.2135, 0.145 , 0.5065])

# chpter 4 梯度下降(权重如何调整优化)

In [27]:
# error example 比较产生误差，误差就是准确程度的度量
knob_weight=0.5
input=0.5

goal_pred=0.8

pred=input*knob_weight
error=(pred-goal_pred)**2 #平方之后，大于1的会变大，小于1的会更小；但是我们只关注主要误差
error

0.30250000000000005

In [35]:
# 双向扰动权重，看是否降低误差了，确定权重调整的方向
weight=0.1 #权重
lr=0.01 #扰动

def neural_network(input, weight):
    pred=input*weight
    return pred

# input data
number_of_toes=[8.5]
win_or_loss=[1] #win

input=number_of_toes[0]
true=win_or_loss[0]

# 预测一次
pred=neural_network(input, weight)
# 预测误差error
error=(pred-true)**2
print(error)

0.022499999999999975


In [36]:
# 对权重进行扰动
pred_up=neural_network(input, weight+lr)
error_up=(pred_up-true)**2
print(error_up)
#
pred_dn=neural_network(input, weight-lr)
error_dn=(pred_dn-true)**2
print(error_dn)

0.004224999999999993
0.05522499999999994


In [37]:
# 判定权重变动方向
if error>error_up or error>error_dn:
    if error_up>error_dn:
        weight-=lr
    else:
        weight+=lr
print(weight)

0.11


## 冷热学习(对权重增大、减小，看error变化)

In [63]:
### 把上述代码写到一个循环中，求出最优权重

# 已知条件
weight=0.5 #权重
step_amout=0.001 #扰动

input=0.5
goal_pred=0.8
#
def neural_network(input, weight):
    pred=input*weight
    return pred

#
i=0
while True:
    i+=1
    if i>2000:
        break;
    #
    # 预测一次
    pred=neural_network(input, weight)
    # 预测误差error
    error=(pred-goal_pred)**2
    #print(error)
    
    # 对权重进行扰动
    pred_up=neural_network(input, weight+step_amout)
    error_up=(pred_up-goal_pred)**2
    #print(error_up)
    #
    pred_dn=neural_network(input, weight-step_amout)
    error_dn=(pred_dn-goal_pred)**2
    #print(error_dn)
    
    # 判定权重变动方向
    if error>error_up or error>error_dn:
        if error_up>error_dn:
            weight-=step_amout
        else:
            weight+=step_amout
    else:
        print(weight)
        break;
    
    # 输出结果
    print(i, "Erorr:%f (dn:%f, up:%f); Prediction:%0.8f; NextWeight:%0.8f;" % (error, error_dn, error_up, pred, weight) )

1 Erorr:0.302500 (dn:0.303050, up:0.301950); Prediction:0.25000000; NextWeight:0.50100000;
2 Erorr:0.301950 (dn:0.302500, up:0.301401); Prediction:0.25050000; NextWeight:0.50200000;
3 Erorr:0.301401 (dn:0.301950, up:0.300852); Prediction:0.25100000; NextWeight:0.50300000;
4 Erorr:0.300852 (dn:0.301401, up:0.300304); Prediction:0.25150000; NextWeight:0.50400000;
5 Erorr:0.300304 (dn:0.300852, up:0.299756); Prediction:0.25200000; NextWeight:0.50500000;
6 Erorr:0.299756 (dn:0.300304, up:0.299209); Prediction:0.25250000; NextWeight:0.50600000;
7 Erorr:0.299209 (dn:0.299756, up:0.298662); Prediction:0.25300000; NextWeight:0.50700000;
8 Erorr:0.298662 (dn:0.299209, up:0.298116); Prediction:0.25350000; NextWeight:0.50800000;
9 Erorr:0.298116 (dn:0.298662, up:0.297570); Prediction:0.25400000; NextWeight:0.50900000;
10 Erorr:0.297570 (dn:0.298116, up:0.297025); Prediction:0.25450000; NextWeight:0.51000000;
11 Erorr:0.297025 (dn:0.297570, up:0.296480); Prediction:0.25500000; NextWeight:0.5110000

657 Erorr:0.049284 (dn:0.049506, up:0.049062); Prediction:0.57800000; NextWeight:1.15700000;
658 Erorr:0.049062 (dn:0.049284, up:0.048841); Prediction:0.57850000; NextWeight:1.15800000;
659 Erorr:0.048841 (dn:0.049062, up:0.048620); Prediction:0.57900000; NextWeight:1.15900000;
660 Erorr:0.048620 (dn:0.048841, up:0.048400); Prediction:0.57950000; NextWeight:1.16000000;
661 Erorr:0.048400 (dn:0.048620, up:0.048180); Prediction:0.58000000; NextWeight:1.16100000;
662 Erorr:0.048180 (dn:0.048400, up:0.047961); Prediction:0.58050000; NextWeight:1.16200000;
663 Erorr:0.047961 (dn:0.048180, up:0.047742); Prediction:0.58100000; NextWeight:1.16300000;
664 Erorr:0.047742 (dn:0.047961, up:0.047524); Prediction:0.58150000; NextWeight:1.16400000;
665 Erorr:0.047524 (dn:0.047742, up:0.047306); Prediction:0.58200000; NextWeight:1.16500000;
666 Erorr:0.047306 (dn:0.047524, up:0.047089); Prediction:0.58250000; NextWeight:1.16600000;
667 Erorr:0.047089 (dn:0.047306, up:0.046872); Prediction:0.58300000; 

In [61]:
pred=neural_network(input, weight) #+-step_amout 都会使误差变大达到e-7
# 预测误差error
error=(pred-goal_pred)**2
print(pred, error)

0.7999999999999672 1.0799505792475652e-27


In [60]:
weight #所以这就是最佳权重了

1.5999999999999344

> 缺点: 1 效率低下：每次比较要重复三次; 2)有时候预测准确是不可能的:即使知道权重调整方向，但是无法确定正确的幅度。

> 怎么调整权重的时候，方向和幅度都照顾到呢？

## 基于误差调节权重

In [75]:
## 基于误差调节权重
weight=0.5
goal_pred=0.8
input=0.5

i=0
while True:
    i+=1
    if i>200:
        break
    #
    #预测
    pred=input*weight
    error=(pred-goal_pred)**2
    
    ##三个主要的边界情况: 缩放、负值反转和停止调节
    direction_and_amout=(pred-goal_pred)*input # 重点！ 纯误差 * 输入，
    if abs(error)<1e-20:
        print(weight)
        break;
    weight=weight-direction_and_amout #
    
    print(i, "Error:%0.8f; Prediction: %0.8f; NextWeight:%0.8f" % (error, pred, weight) )

1 Error:0.30250000; Prediction: 0.25000000; NextWeight:0.77500000
2 Error:0.17015625; Prediction: 0.38750000; NextWeight:0.98125000
3 Error:0.09571289; Prediction: 0.49062500; NextWeight:1.13593750
4 Error:0.05383850; Prediction: 0.56796875; NextWeight:1.25195312
5 Error:0.03028416; Prediction: 0.62597656; NextWeight:1.33896484
6 Error:0.01703484; Prediction: 0.66948242; NextWeight:1.40422363
7 Error:0.00958210; Prediction: 0.70211182; NextWeight:1.45316772
8 Error:0.00538993; Prediction: 0.72658386; NextWeight:1.48987579
9 Error:0.00303184; Prediction: 0.74493790; NextWeight:1.51740685
10 Error:0.00170541; Prediction: 0.75870342; NextWeight:1.53805513
11 Error:0.00095929; Prediction: 0.76902757; NextWeight:1.55354135
12 Error:0.00053960; Prediction: 0.77677068; NextWeight:1.56515601
13 Error:0.00030353; Prediction: 0.78257801; NextWeight:1.57386701
14 Error:0.00017073; Prediction: 0.78693350; NextWeight:1.58040026
15 Error:0.00009604; Prediction: 0.79020013; NextWeight:1.58530019
16 E

In [73]:
weight

1.5999999998022254

```
# 三个主要的边界情况: 缩放、负值反转和停止调节
direction_and_amout=(pred-goal_pred)*input # 重点！ 纯误差 * 输入，
weight=weight-direction_and_amout #
```

- 停止调节: 收音机关了，旋转开关也没用；就是输入为0的时候，不对权重进行调整；
- 负值反转: 很难理解，要在纸上多画画。
    * input>0时，如果pred=input*weight大了，input不能动，只能消减weight;
    * input<0时，如果pred=input*weight大了，就要调大weight;
- 缩放: input越大，weight调整幅度越大；很容易失控，接着使用alpha处理失控

## 学习就是减少误差

```
error=(pred - goal_pred)**2
error=(input*weight - goal_pred)**2
# 带入一直变量 input=0.5, goal_pred=0.8
error=(0.5*weight - 0.8)**2 #就是一个二次曲线，自变量是权重，因变量是误差
# 在碗底error=0,在曲线上其他点，斜率都指向碗底。可以利用该斜率帮助神经网络减少误差。

```

In [81]:
## 回顾学习过程
weight=0
goal_pred=0.8
input=1.1

for i in range(100):
    print("----%d\nWeight:"%i, weight)
    
    pred=input*weight
    error=(pred-goal_pred)**2
    if error<1e-30:
        break;
    
    delta=(pred-goal_pred)
    weight_delta=delta*input
    
    weight -= weight_delta
    
    print("Error:%0.5f; Prediction:%0.5f" %(error, pred))
    print("Delta:%0.5f; Weight_delta:%0.5f" %(delta, weight_delta))

----0
Weight: 0
Error:0.64000; Prediction:0.00000
Delta:-0.80000; Weight_delta:-0.88000
----1
Weight: 0.8800000000000001
Error:0.02822; Prediction:0.96800
Delta:0.16800; Weight_delta:0.18480
----2
Weight: 0.6951999999999999
Error:0.00124; Prediction:0.76472
Delta:-0.03528; Weight_delta:-0.03881
----3
Weight: 0.734008
Error:0.00005; Prediction:0.80741
Delta:0.00741; Weight_delta:0.00815
----4
Weight: 0.72585832
Error:0.00000; Prediction:0.79844
Delta:-0.00156; Weight_delta:-0.00171
----5
Weight: 0.7275697528
Error:0.00000; Prediction:0.80033
Delta:0.00033; Weight_delta:0.00036
----6
Weight: 0.727210351912
Error:0.00000; Prediction:0.79993
Delta:-0.00007; Weight_delta:-0.00008
----7
Weight: 0.7272858260984799
Error:0.00000; Prediction:0.80001
Delta:0.00001; Weight_delta:0.00002
----8
Weight: 0.7272699765193192
Error:0.00000; Prediction:0.80000
Delta:-0.00000; Weight_delta:-0.00000
----9
Weight: 0.7272733049309429
Error:0.00000; Prediction:0.80000
Delta:0.00000; Weight_delta:0.00000
----1

In [83]:
0.8/1.1 # 权重 最优点 = goal/input； 在二次曲线的底部，error=0

0.7272727272727273

> 权重增量，就是导数。

### 破坏梯度下降

In [90]:
## 扩大input到某个值后，整个系统就开始异常了：weight符号一直变，但是绝对值一直增大；error越来越大。
weight=0
goal_pred=0.8
input=1.1*6

for i in range(100):
    print("----%d\nWeight:"%i, weight)
    
    pred=input*weight
    error=(pred-goal_pred)**2
    if error<1e-30:
        break;
    
    delta=(pred-goal_pred)
    weight_delta=delta*input
    
    weight -= weight_delta
    
    print("Error:%0.5f; Prediction:%0.5f" %(error, pred))
    print("Delta:%0.5f; Weight_delta:%0.5f" %(delta, weight_delta))

----0
Weight: 0
Error:0.64000; Prediction:0.00000
Delta:-0.80000; Weight_delta:-5.28000
----1
Weight: 5.280000000000001
Error:1159.26630; Prediction:34.84800
Delta:34.04800; Weight_delta:224.71680
----2
Weight: -219.43680000000012
Error:2099841.19311; Prediction:-1448.28288
Delta:-1449.08288; Weight_delta:-9563.94701
----3
Weight: 9344.510208000007
Error:3803554904.56646; Prediction:61673.76737
Delta:61672.96737; Weight_delta:407041.58466
----4
Weight: -397697.07445248036
Error:6889582869184.11328; Prediction:-2624800.69139
Delta:-2624801.49139; Weight_delta:-17323689.84315
----5
Weight: 16925992.768697564
Error:12479470732594976.00000; Prediction:111711552.27340
Delta:111711551.47340; Weight_delta:737296239.72447
----6
Weight: -720370246.9557683
Error:22604734237580550144.00000; Prediction:-4754443629.90807
Delta:-4754443630.70807; Weight_delta:-31379327962.67327
----7
Weight: 30658957715.717506
Error:40945166738284806668288.00000; Prediction:202349120923.73557
Delta:202349120922.9355

OverflowError: (34, 'Result too large')

### 为什么会过度修正权重？ 

```
weight_delta=input * (pred - goal);
weight=weight - weight_delta;
```

即使很小的error，如果input很大，则 weight_delta的绝对值也可以超过weight，导致对权重的优化变得过犹不及。


### 避免过度修正权重

解决方案就是为 weight_delta增加一个系数alpha，介于0-1之间。
- 基本就是手工尝试设置的，一般取几个数量级(10,1,0.1,0.01,0.001,0.0001, ...)
- 如果发现模型error开始发散，则调小alpha；
- 如果模型训练太慢，则调大alpha

In [92]:
## 扩大input到某个值后，整个系统就开始异常了：weight符号一直变，但是绝对值一直增大；error越来越大。
# 增加alpha值，防止过度修正
weight=0
goal_pred=0.8
input=1.1*6

alpha=0.01 #防止权重被过度修正

for i in range(100):
    print("----%d\nWeight:"%i, weight)
    
    pred=input*weight
    error=(pred-goal_pred)**2
    if error<1e-30:
        break;
    
    #导数
    derivative=input*(pred-goal_pred)
    #增量，就是导数*alpha
    weight -= derivative*alpha
    
    print("Error:%0.5f; Prediction:%0.5f" %(error, pred))

----0
Weight: 0
Error:0.64000; Prediction:0.00000
----1
Weight: 0.052800000000000014
Error:0.20387; Prediction:0.34848
----2
Weight: 0.08260032
Error:0.06494; Prediction:0.54516
----3
Weight: 0.099419620608
Error:0.02069; Prediction:0.65617
----4
Weight: 0.10891243387115519
Error:0.00659; Prediction:0.71882
----5
Weight: 0.11427017767688
Error:0.00210; Prediction:0.75418
----6
Weight: 0.11729408828083107
Error:0.00067; Prediction:0.77414
----7
Weight: 0.11900078342570106
Error:0.00021; Prediction:0.78541
----8
Weight: 0.11996404216546568
Error:0.00007; Prediction:0.79176
----9
Weight: 0.12050770539818884
Error:0.00002; Prediction:0.79535
----10
Weight: 0.12081454892673778
Error:0.00001; Prediction:0.79738
----11
Weight: 0.1209877314142508
Error:0.00000; Prediction:0.79852
----12
Weight: 0.12108547561020315
Error:0.00000; Prediction:0.79916
----13
Weight: 0.12114064243439866
Error:0.00000; Prediction:0.79953
----14
Weight: 0.1211717785899746
Error:0.00000; Prediction:0.79973
----15
Weig

In [93]:
goal_pred/input

0.12121212121212122

# chapter 5 一次学习多个权重

In [111]:
## 多输入，也适用

#向量权重和
def w_sum(a,b):
    assert(len(a)==len(b))
    output=0;
    for i in range(len(a)):
        output+=a[i]*b[i]
    return output

#向量乘以常数
def ele_mult_vector(num, vec):
    output=[]
    for i in range(len(vec)):
        output.append( num*vec[i] )
    return output


def neural_network(input, weights):
    pred=w_sum(input, weights);
    return pred

weights=[0.1,0.2,-0.1]

input=[8.5, 1.65, 1.2]
goal=1

alpha=0.02

for i in range(200):
    pred=neural_network(input, weights)
    error=(pred-goal)**2
    
    if error<1e-5:
        break;
    
    #计算权重增量
    delta=pred-goal
    weight_deltas=ele_mult_vector(delta, weights)
    
    #更新权重
    for j in range(len(weights)):
        weights[j] -= alpha*weight_deltas[j]
    
    print("---%d\nWeights:" %i, weights)
    print("Weights Deltas:"+str(weight_deltas))
    print("Error:", error)

---0
Weights: [0.09988000000000001, 0.19976000000000002, -0.09988000000000001]
Weights Deltas:[0.006000000000000005, 0.01200000000000001, -0.006000000000000005]
Error: 0.0036000000000000064
---1
Weights: [0.09976268494720002, 0.19952536989440003, -0.09976268494720002]
Weights Deltas:[0.005865752640000012, 0.011731505280000025, -0.005865752640000012]
Error: 0.003448977984000013
---2
Weights: [0.09964798886487466, 0.19929597772974933, -0.09964798886487466]
Weights Deltas:[0.005734804116267445, 0.01146960823253489, -0.005734804116267445]
Error: 0.003304463192114736
---3
Weights: [0.09953584764499156, 0.1990716952899831, -0.09953584764499156]
Weights Deltas:[0.005607060994155697, 0.011214121988311393, -0.005607060994155697]
Error: 0.003166164570378952
---4
Weights: [0.0994261989850131, 0.1988523979700262, -0.0994261989850131]
Weights Deltas:[0.005482432998922325, 0.01096486599784465, -0.005482432998922325]
Error: 0.0030338047516662693
---5
Weights: [0.0993189823272566, 0.1986379646545132, 

### 多输出，也可使用梯度下降法

In [121]:
## 1个输入，多个输出，则每个权重分别有 权重增量

input=0.65 #上一次赛事的成功率

#预测三个结果 hurt, win, sad
goal_pred=[0.1,1,0.1] #真实值

# 每个结果的初始化权重
weights=[0.3,0.2,0.9]

#定义数值和向量的乘法
def scalar_multi_vec(scalar, vec):
    out=[]
    for i in range(len(vec)):
        out.append(scalar * vec[i])
    return out;

#定义神经元
def neural_network(input, weights):
    pred=scalar_multi_vec(input, weights)
    return pred;

alpha=0.1

#梯度下降法，优化3个权重
for i in range(105):
    #预测，返回包含三个结果的向量
    pred=neural_network(input, weights)
    
    #误差，三个输出，三个权重，要分别计算
    errors=[]
    for j in range(len(pred)):
        errors.append( (pred[j] - goal_pred[j])**2 ) #误差
        delta=pred[j] - goal_pred[j]
        weights_delta=delta*input
        weights[j] -= weights_delta*alpha #更新权重
    #
    print("---%d\nError:" %(i), errors )
    print("Weights:", weights )
    print("Prediction:", pred )

---0
Error: [0.009025, 0.7569, 0.2352250000000001]
Weights: [0.293825, 0.25655, 0.868475]
Prediction: [0.195, 0.13, 0.5850000000000001]
---1
Error: [0.008278497689062499, 0.69429306380625, 0.21576837882656252]
Weights: [0.28791089375, 0.3107107625, 0.83828193125]
Prediction: [0.19098625, 0.1667575, 0.56450875]
---2
Error: [0.007593742270117802, 0.6368646564268324, 0.19792111085744712]
Weights: [0.2822466584890625, 0.362583232784375, 0.8093645196546875]
Prediction: [0.1871420809375, 0.20196199562500003, 0.5448832553125]
---3
Error: [0.0069656263528539005, 0.5841864361745281, 0.18155007854294272]
Weights: [0.2768217371678996, 0.4122640911992351, 0.781668868699277]
Prediction: [0.18346032801789064, 0.23567910130984374, 0.5260869377755468]
---4
Error: [0.006389465004429233, 0.5358654916180042, 0.16653317514314317]
Weights: [0.2716260187725558, 0.45984593334606744, 0.7551433589967326]
Prediction: [0.17993412915913473, 0.26797165927950284, 0.50808476465453]
---5
Error: [0.005860960805929433,

### 多输入，多输出

In [133]:
# 输入输出
input =[8.5, 0.65, 1.2] #三个输入 toes, wlrec, nfans
goal=[0.1,1,0.1] #三个输出 hurt, win, sad

alpha=0.01 #权重增量的缩放系数

# 权重，一行是一个ouput的三个权重，要一起调整
weights=[
    #toes  wlrec fans
    [0.1, 0.1, -0.3],#out: hurt
    [0.1, 0.2, 0], #out: win
    [0, 1.3, 0.1] #out: sad
]

#向量点乘(权重和)
def vec_dot(a,b):
    assert(len(a)==len(b))
    out=0;
    for i in range(len(a)):
        out+=a[i]*b[i]
    return out

# 向量和矩阵的乘法
def vec_mult_mat(vec, mat):
    assert(len(vec)==len(mat))
    out=[]
    for i in range(len(vec)):
        out.append( vec_dot(vec, mat[i]) )
    return out

# 定义神经元 
def neural_network(input, weights):
    pred=vec_mult_mat(input, weights)
    return pred;
#

#更新权重，一次更新一个output的权重。
for i in range(100):
    print("---%d\nWeights:" %i, weights)
    #预测
    pred=neural_network(input, weights)
    #计算误差
    error=[]
    for j in range(len(pred)):
        error.append( (pred[j]-goal[j])**2 )
        delta=pred[j]-goal[j]
        # 更新 权重，一个output的(也就是一行)
        for k in range( len(weights[j]) ):
            weights_delta= delta*input[k]
            weights[j][k] -= alpha*weights_delta
    print("Error:", error)
    print("Prediction:", pred)
    
    #根据error判断是否终止
    accept=1
    for i in range(len(error)):
        if error[i]>1e-6:
            accept=0;
    if accept==1:
        break;

---0
Weights: [[0.1, 0.1, -0.3], [0.1, 0.2, 0], [0, 1.3, 0.1]]
Error: [0.20702500000000007, 0.0003999999999999963, 0.7482250000000001]
Prediction: [0.555, 0.9800000000000001, 0.9650000000000001]
---1
Weights: [[0.061325, 0.0970425, -0.30546], [0.1017, 0.20013, 0.00023999999999999887], [-0.07352500000000001, 1.2943775, 0.08962]]
Error: [0.013874042391015624, 2.680650625000059e-05, 0.05014324534726555]
Prediction: [0.217788125, 0.9948224999999999, 0.32392687499999984]
---2
Weights: [[0.051313009374999996, 0.0962768771875, -0.3068734575], [0.1021400875, 0.20016365375, 0.0003021299999999995], [-0.092558784375, 1.2929219753124999, 0.0869328775]]
Error: [0.0009297865101688144, 1.7964719433280777e-06, 0.0033604130494918946]
Prediction: [0.13049240085937502, 0.9986596746875, 0.15796906976562497]
---3
Weights: [[0.04872115530195312, 0.09607867658191406, -0.3072393663103125], [0.1022540151515625, 0.20017236586453124, 0.00031821390374999885], [-0.09748615530507812, 1.2925451763590232, 0.086237248

In [132]:
weights

[[0.05603864734299517, 0.05603864734299517, -0.34396135265700484],
 [0.10188069247517811, 0.2018806924751782, 0.0018806924751781297],
 [-0.08349032551430621, 1.216509674485694, 0.016509674485693838]]

### 手写数字识别

In [None]:
### MNIST 手写数字 28*28=784像素作为输入，10个数字标签作为输出
# http://yann.lecun.com/exdb/mnist/
#每一个图像，神经网络将计算10个标签的概率，给出最可能的结果

# 784 个input， 10个output
# 输入处理：把二维压平，变一维向量。


## 权重的可视化，将权重以图片的形式进行可视化。
# 输出2对应的权重，可见到依稀的2形状。高亮的是高权重，暗色的是低权重、甚至负权重。
### 比如0的权重，还原成二维，则很像是0的形状。说明就是求input和权重的相似程度，越相似打分越高。
### 这其实就是点积的定义决定的。对应元素相乘再求和。

In [138]:
import numpy as np
np.zeros([2,3,4])

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])