# 第二讲：神经网络优化

## 1 神经网络复杂度

### 1.1 时间复杂度

## 1.2 空间复杂度

## 2 学习率策略

### 2.1 指数衰减

TensorFlow API: tf.keras.optimizers.schedules.ExponentialDecay

### 2.2 分段常数衰减

TensorFlow API: tf.optimizers.schedules.PiecewiseConstantDecay

## 3 激活函数

### 3.1 sigmoid

### 3.2 tanh

### 3.3 ReLU

### 3.4 Leaky ReLU

### 3.5 softmax

### 3.6 建议

## 4 损失函数

### 4.1 均方误差损失函数

### 4.2 交叉熵损失函数

### 4.3 自定义损失函数

## 5 欠拟合与过拟合

## 6 优化器

### 6.1 SGD

#### 6.1.1 vanilla SGD

#### 6.1.2 SGD with Momentum

#### 6.1.3 SGD with Mesterov Accelerration

### 6.2 AdaGrad

### 6.3 RMSProp

### 6.4 AdaDelta

### 6.5 Adam

### 6.6 优化器选择

### 6.7 优化算法的常用tricks

## 参考链接

## 附录：常用 TensorFlow 及代码实现

### 学习率策略

In [16]:
import tensorflow as tf
w = tf.Variable(tf.constant(5, dtype=tf.float32))

epoch = 100
LR_BASE = 0.2 # 最初学习率
LR_DECAY = 0.99 # 学习率衰减率
LR_STEP = 1 # 喂入多少轮BATCH_SIZE 后，更新一次学习率
for epoch in range(epoch): # for epoch 定义顶层循环，表示对数据集循环epoch次，此例数据集数据仅有1个w,初始化时候constant赋值为5，循环100次迭代。
    lr = LR_BASE * LR_DECAY ** (epoch / LR_STEP)
    with tf.GradientTape() as tape: # with结构到grads框起了梯度的计算过程。
        loss = tf.square(w + 1)
    grads = tape.gradient(loss, w) # .gradient函数告知谁对谁求导
    
    w.assign_sub(lr * grads)  # .assign_sub 对变量做自减 即：w -= lr*grads 即 w = w - lr*grads
    print("After %s epoch ,w is %f, loss is %f, lr is %f" % (epoch, w.numpy(), loss, lr))

After 0 epoch ,w is 2.600000, loss is 36.000000, lr is 0.200000
After 1 epoch ,w is 1.174400, loss is 12.959999, lr is 0.198000
After 2 epoch ,w is 0.321948, loss is 4.728015, lr is 0.196020
After 3 epoch ,w is -0.191126, loss is 1.747547, lr is 0.194060
After 4 epoch ,w is -0.501926, loss is 0.654277, lr is 0.192119
After 5 epoch ,w is -0.691392, loss is 0.248077, lr is 0.190198
After 6 epoch ,w is -0.807611, loss is 0.095239, lr is 0.188296
After 7 epoch ,w is -0.879339, loss is 0.037014, lr is 0.186413
After 8 epoch ,w is -0.923874, loss is 0.014559, lr is 0.184549
After 9 epoch ,w is -0.951691, loss is 0.005795, lr is 0.182703
After 10 epoch ,w is -0.969167, loss is 0.002334, lr is 0.180876
After 11 epoch ,w is -0.980209, loss is 0.000951, lr is 0.179068
After 12 epoch ,w is -0.987226, loss is 0.000392, lr is 0.177277
After 13 epoch ,w is -0.991710, loss is 0.000163, lr is 0.175504
After 14 epoch ,w is -0.994591, loss is 0.000069, lr is 0.173749
After 15 epoch ,w is -0.996452, loss

In [1]:
import tensorflow as tf
import matplotlib.pyplot as plt 

N = 100
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
0.5,
decay_steps=10,
decay_rate=0.9,
staircase=False)
y = []
for global_step in range(N):
    lr = lr_schedule(global_step)
    y.append(lr)
x = range(N)
plt.figure(figsize=(8,6))
plt.plot(x, y, 'r-')
plt.ylim([0,max(plt.ylim())])
plt.xlabel('Step')
plt.ylabel('Learning Rate')
plt.title('ExponentialDecay')
plt.show()

: 

: 

### 激活函数

### 损失函数

### 其它

#### tf.cast
转换数据（张量）类型

In [2]:
import tensorflow as tf
x = tf.constant([1.8, 2.2], dtype=tf.float32)
print(tf.cast(x, tf.int32))

tf.Tensor([1 2], shape=(2,), dtype=int32)


#### tf.random.normal
生成服从正态分布的随机值。

In [3]:
import tensorflow as tf
tf.random.normal([3, 5])

<tf.Tensor: shape=(3, 5), dtype=float32, numpy=
array([[-0.06236406,  0.17700054,  1.7307663 ,  0.99779886, -0.94573677],
       [ 0.5334611 , -0.11980411, -1.006372  , -1.3327197 , -1.8125427 ],
       [ 0.22657698, -0.22983427, -0.8714001 ,  0.6743138 ,  1.4302018 ]],
      dtype=float32)>

#### tf.where
根据condition，取x或y中的值。如果为True，对应位置取x的值；如果为False，对应位置取y的值。

In [4]:
import tensorflow as tf
tf.where([True, False, True, False], [1, 2, 3, 4], [5, 6, 7, 8])

<tf.Tensor: shape=(4,), dtype=int32, numpy=array([1, 6, 3, 8])>

In [7]:
a = tf.constant([1, 2, 3 ,1, 1])
b = tf.constant([0, 1, 3, 4, 5])
c = tf.where(tf.greater(a, b), a, b)
c

<tf.Tensor: shape=(5,), dtype=int32, numpy=array([1, 2, 3, 4, 5])>

In [8]:
import numpy as np
rdm = np.random.RandomState(seed=1)
a = rdm.rand()
b = rdm.rand(2, 3)
print("a:", a)
print("b:", b)

a: 0.417022004702574
b: [[7.20324493e-01 1.14374817e-04 3.02332573e-01]
 [1.46755891e-01 9.23385948e-02 1.86260211e-01]]


#### vstack

In [11]:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = np.vstack((a, b))
print("c:", c)

c: [[1 2 3]
 [4 5 6]]


#### mgrid

In [12]:
import numpy as np
import tensorflow as tf

# 生成等间隔数值点
x, y = np.mgrid[1:3:1, 2:4:0.5]
# 将x, y拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[x.ravel(), y.ravel()]
print("x:", x)
print("y:", y)
print("x.ravel():", x.ravel())
print("y.ravel():", y.ravel())
print("grid:", grid)

x: [[1. 1. 1. 1.]
 [2. 2. 2. 2.]]
y: [[2.  2.5 3.  3.5]
 [2.  2.5 3.  3.5]]
x.ravel(): [1. 1. 1. 1. 2. 2. 2. 2.]
y.ravel(): [2.  2.5 3.  3.5 2.  2.5 3.  3.5]
grid: [[1.  2. ]
 [1.  2.5]
 [1.  3. ]
 [1.  3.5]
 [2.  2. ]
 [2.  2.5]
 [2.  3. ]
 [2.  3.5]]
