基于tensorflow的NN，tensor表示数据，graph搭建神经网络，session执行graph，优化线上的权重(参数)，得到模型

### 张量(tensor)：多维数组(列表)，阶：张量的维度
张量表示 0阶到n阶的数组

| 维数 | 阶   | 名字        | 例子                        |
| ---- | ---- | ----------- | --------------------------- |
| 0-D  | 0    | 标量 scalar | s = 123                     |
| 1-D  | 1    | 向量 vector | v=[1,2,3]                   |
| 2-D  | 2    | 矩阵 matrix | m=[[1,2,3],[4,5,6],[7,8,9]] |
| n-D  | n    | 张量 tensor | t=[[[[.....]]]] n个         |


#### 数据类型: tf.float32  tf.int32

In [1]:
import tensorflow as tf
a = tf.constant([1.0,2.0])
b = tf.constant([3.0,4.0])

result = a+b
print(result)

Tensor("add:0", shape=(2,), dtype=float32)


### 会话 session
执行图

In [2]:
sess = tf.Session()
print(sess.run(result))

[4. 6.]


### 计算图
y = wx+b

In [3]:
x = tf.constant([[1.0,2.0]])
w= tf.constant([[3.0],[4.0]])
y = tf.matmul(x,w)
print(sess.run(y))

[[11.]]


### 前向传播
![image.png](./img/params.png)

### 神经网络的实现
1. 准备数据集，提取特征，作为输入喂给神经网络
2. 搭建NN结构，从输入到输出(先搭建计算图，再用回话执行) NN前向传播算法 -> 计算输出
3. 大量特征数据喂给NN，迭代优化NN参数  NN反向传播算法 -> 优化参数训练模型
4. 使用训练好的模型预测和分类

### 前向传播

In [11]:
x = tf.constant([[0.7,0.5]])
w1 = tf.Variable(tf.random_normal([2,3], stddev=1,seed=1))
w2 = tf.Variable(tf.random_normal([3,1], stddev=1,seed=1))

#定义前向传播过程
a = tf.matmul(x,w1)
y = tf.matmul(a,w2)

#用会话计算结果
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('x:',sess.run(x),sess.run(x).shape,'\n')
    print('w1:',sess.run(w1),',w1 shape',sess.run(w1).shape,'\n')
    print('w2:',sess.run(w2),',w2 shape',sess.run(w2).shape,'\n')
    print('a:',sess.run(a),', a shape:',sess.run(a).shape,'\n')
    print('y:',sess.run(y),', y shape:',sess.run(y).shape,'\n')

x: [[0.7 0.5]] (1, 2) 

w1: [[-0.8113182   1.4845988   0.06532937]
 [-2.4427042   0.0992484   0.5912243 ]] ,w1 shape (2, 3) 

w2: [[-0.8113182 ]
 [ 1.4845988 ]
 [ 0.06532937]] ,w2 shape (3, 1) 

a: [[-1.7892749   1.0888433   0.34134272]] , a shape: (1, 3) 

y: [[3.0904665]] , y shape: (1, 1) 



#### 两层简单的神经网络，全连接
##### 用placeholder定义输入，sess.run()喂入一组数据

In [15]:
# 用placeholder定义输入，sess.run()喂入一组数据
x = tf.placeholder(tf.float32, shape=(1,2))
w1 = tf.Variable(tf.random_normal([2,3],stddev=1,seed=1))
w2 = tf.Variable(tf.random_normal([3,1],stddev=1,seed=1))

#定义前向传播过程
a = tf.matmul(x,w1)
y = tf.matmul(a,w2)

#用会话计算结果
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('y:',sess.run(y, feed_dict={x:[[0.7,0.5]]}),'\n')

y: [[3.0904665]] 



##### 用placeholder定义输入，sess.run()喂入多组数据

In [16]:
# 用placeholder定义输入，sess.run()喂入一组数据
x = tf.placeholder(tf.float32, shape=[None,2])
w1 = tf.Variable(tf.random_normal([2,3],stddev=1,seed=1))
w2 = tf.Variable(tf.random_normal([3,1],stddev=1,seed=1))

#定义前向传播过程
a = tf.matmul(x,w1)
y = tf.matmul(a,w2)

#用会话计算结果
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('y:',sess.run(y, feed_dict={x:[[0.7,0.5],[0.2,0.3],[0.3,0.4],[0.4,0.5]]}),'\n')
    print('w1:',sess.run(w1),',w1 shape',sess.run(w1).shape,'\n')
    print('w2:',sess.run(w2),',w2 shape',sess.run(w2).shape,'\n')

y: [[3.0904665]
 [1.2236414]
 [1.7270732]
 [2.2305048]] 

w1: [[-0.8113182   1.4845988   0.06532937]
 [-2.4427042   0.0992484   0.5912243 ]] ,w1 shape (2, 3) 

w2: [[-0.8113182 ]
 [ 1.4845988 ]
 [ 0.06532937]] ,w2 shape (3, 1) 



### 反向传播
训练模型参数，在所有参数上用梯度下降，使NN模型在训练数据上的损失最小

**均方误差MSE** loss = tf.reduce_mean(tf.square(y-y_predict))

**交叉熵**   loss = tf.reduce_mean(loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=tf.argmax(y_,1),logits=y)))

**反向传播训练方法，以减小liss值为优化目标**

train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)
#### 示例code

In [24]:
import tensorflow as tf
import numpy as np

batch_size = 8

#基于seed产生随机数
rng = np.random.RandomState(2019)
#随机数返回(32,2)的矩阵
X = rng.randn(32,2)
#从X中取出1行，判断如果小于1，Y=1，如果不小于1，Y=0
Y = [[int(x0+x1 < 1)] for (x0,x1) in X]

# 定义神经网络的输入，参数，输出，定义前向传播过程
x = tf.placeholder(tf.float32, shape=[None, 2])
y_ = tf.placeholder(tf.float32, shape=[None, 1])

w1 = tf.Variable(tf.random_normal([2,3], stddev=1, seed=1))
w2 = tf.Variable(tf.random_normal([3,1], stddev=1, seed=1))

a = tf.matmul(x,w1)
y = tf.matmul(a,w2)

#定义loss及反向传播方法
loss = tf.reduce_mean(tf.square(y-y_))
train_step = tf.train.AdamOptimizer(0.001).minimize(loss)

#用会话计算结果
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('未经训练的参数：\n')
    print('w1:',sess.run(w1),',w1 shape',sess.run(w1).shape,'\n')
    print('w2:',sess.run(w2),',w2 shape',sess.run(w2).shape,'\n')
    print('训练模型...')
    for i in range(3001):
        start = (i*batch_size) % 32
        end = start + batch_size
        sess.run(train_step, feed_dict={x:X[start:end], y_:Y[start:end]})
        if i%500==0:
            total_loss = sess.run(loss, feed_dict={x:X, y_:Y})
            print('After %d training step(s), loss on all data is %g:' %(i, total_loss))
            
    print('\n训练后的参数：\n')
    print('w1:',sess.run(w1),',w1 shape',sess.run(w1).shape,'\n')
    print('w2:',sess.run(w2),',w2 shape',sess.run(w2).shape,'\n')

未经训练的参数：

w1: [[-0.8113182   1.4845988   0.06532937]
 [-2.4427042   0.0992484   0.5912243 ]] ,w1 shape (2, 3) 

w2: [[-0.8113182 ]
 [ 1.4845988 ]
 [ 0.06532937]] ,w2 shape (3, 1) 

训练模型...
After 0 training step(s), loss on all data is 17.7311:
After 500 training step(s), loss on all data is 3.64612:
After 1000 training step(s), loss on all data is 0.844747:
After 1500 training step(s), loss on all data is 0.696087:
After 2000 training step(s), loss on all data is 0.693902:
After 2500 training step(s), loss on all data is 0.693887:
After 3000 training step(s), loss on all data is 0.693887:

训练后的参数：

w1: [[-0.31591475  0.92382187  1.2035394 ]
 [-2.1698406  -0.19345604  1.0057589 ]] ,w1 shape (2, 3) 

w2: [[-0.38026863]
 [ 0.9284347 ]
 [-0.9417927 ]] ,w2 shape (3, 1) 



### 学习率

学习率过大：震荡不收敛

学习率过小：收敛速度慢

**指数衰减学习率**

learning_rate = learning_rate_base(学习率初始值) * learning_rate_decay(学习率衰减率) 多少轮更新一次学习率(global_step/运行了几轮的batch_size)

In [None]:
tf.train.exponential_decay(
    learning_rate,
    global_step,
    decay_steps,
    decay_rate,
    staircase=False,
    name=None,
)

In [27]:
LEARNING_RATE_BASE = 0.1 #最初学习率
LEARNING_RATE_DECAY = 0.99 #学习率衰减因子
LEARNING_RATE_STEP = 1 #喂入多少轮batch_size后，更新一次学习率，一般设为 global/batch_size

#运行了几轮batch_size的计数器，初始值为0，设为不被训练
global_step = tf.Variable(0, trainable=False)
#定义指数下降学习率
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE,
                                          global_step,
                                          LEARNING_RATE_STEP,
                                          LEARNING_RATE_DECAY,
                                          staircase=True,
                                          name='learning_rate')

#定义优化参数，初始值为0
w = tf.Variable(tf.constant(5,tf.float32), name='weights')
#定义loss
loss = tf.square(w+1)
#定义反向传播方法
train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=global_step)

#生成会话，训练
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('训练模型...')
    for i in range(41):
        sess.run(train_step)
        w_val = sess.run(w)
        learning_rate_val = sess.run(learning_rate)
        global_step_val = sess.run(global_step)
        loss_val = sess.run(loss)
        print('After %d step(s), global_step is %f, w is %f, learning_rate is %f, loss is %f' %(i, global_step_val,w_val,learning_rate_val,loss_val))

训练模型...
After 0 step(s), global_step is 1.000000, w is 4.900000, learning_rate is 0.099000, loss is 34.810001
After 1 step(s), global_step is 2.000000, w is 4.801047, learning_rate is 0.098010, loss is 33.652149
After 2 step(s), global_step is 3.000000, w is 4.703162, learning_rate is 0.097030, loss is 32.526058
After 3 step(s), global_step is 4.000000, w is 4.606363, learning_rate is 0.096060, loss is 31.431309
After 4 step(s), global_step is 5.000000, w is 4.510670, learning_rate is 0.095099, loss is 30.367481
After 5 step(s), global_step is 6.000000, w is 4.416100, learning_rate is 0.094148, loss is 29.334135
After 6 step(s), global_step is 7.000000, w is 4.322670, learning_rate is 0.093207, loss is 28.330811
After 7 step(s), global_step is 8.000000, w is 4.230396, learning_rate is 0.092274, loss is 27.357040
After 8 step(s), global_step is 9.000000, w is 4.139294, learning_rate is 0.091352, loss is 26.412340
After 9 step(s), global_step is 10.000000, w is 4.049376, learning_rate is

### 滑动平均
记录每个参数一段时间内过往值的平均，增加了模型的泛化性

针对所有参数w和b，像是给参数加了影子，参数变化，影子缓慢追随

衰减率 = min{MOVINg_AVERAGE_DECAY, (1+轮数) / (10+轮数)}

影子 = 衰减率 * 影子 + (1-衰减率) * 参数     影子初始值 = 参数初始值

MOVING_AVERAGE_DECAY=0.99, w1=0, global_step=0, w1的滑动平均为0

w1更新为1时：
- w1的滑动平均值 = min(0.99, 1/10) * 0 + (1-min(0.99, 1/10) * 1) = 0.9

w1更新为10,global_step更新为100时：
- w1的滑动平均值 = min(0.99, 101/110) * 0.9 + (1-min(0.99, 101/110) * 10) = 1.64

In [None]:
tf.train.ExponentialMovingAverage(
    decay,
    num_updates=None,
    zero_debias=False,
    name='ExponentialMovingAverage',
)

ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
ema_op = ema.apply(tf.trainable_variables()) #每次运行此句，所有待优化参数求滑动平均
with tf.control_dependencies([train_step, ema_op]):
    train_op = tf.no_op(name='train')
    
#ema.average(参数名) #查看某参数的滑动平均值

In [29]:
MOVING_AVERAGE_DECAY = 0.99 #学习率衰减因子

#运行了几轮batch_size的计数器，初始值为0，设为不被训练
global_step = tf.Variable(0, trainable=False)
#定义优化参数，初始值为0
w = tf.Variable(tf.constant(5,tf.float32), name='weights')

#实例化滑动平均类，给删减率为0.99，当前轮数global_step
MOVING_AVERAGE_DECAY = 0.99
ema = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY,global_step)
# ema.apply后的括号更新列表，每次运行sess.run(ema_op)时，对更新列表中的元素求滑动平均值
# 在实际应用中会使用tf.trainable_variables()将所有训练参数汇总为列表
# ema_op = ema.apply([w])
ema_op = ema.apply(tf.trainable_variables())

#查看不同迭代中变量取值的变化
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print('初始w=0：',sess.run([w,ema.average(w)]),'\n')
    
    # 参数w赋值为1
    print('w=1:')
    sess.run(tf.assign(w,1))
    sess.run(ema_op)
    print(sess.run([w,ema.average(w)]),'\n')
    
    #更新step和w，模拟出100轮后，w=10
    print('w=10,global_step=100:')
    sess.run(tf.assign(global_step,100))
    sess.run(tf.assign(w,10))
    sess.run(ema_op)
    print(sess.run([w,ema.average(w)]),'\n')
    
    sess.run(ema_op)
    print(sess.run([w,ema.average(w)]),'\n')
    
    sess.run(ema_op)
    print(sess.run([w,ema.average(w)]),'\n')
    
    sess.run(ema_op)
    print(sess.run([w,ema.average(w)]),'\n')
    
    sess.run(ema_op)
    print(sess.run([w,ema.average(w)]),'\n')

初始w=0： [5.0, 5.0] 

w=1:
[1.0, 1.4000001] 

w=10,global_step=100:
[10.0, 2.1036363] 

[10.0, 2.7497022] 

[10.0, 3.3429084] 

[10.0, 3.8875794] 

[10.0, 4.3876867] 

