# 
target：学会神经网络优化过程，使用正则化减少过拟合，使用优化器更新网络参数
+ 基础知识
+ 神经网络复杂度
+ 指数衰减学习率
+ 激活函数
+ 损失函数
+ 欠拟合和过拟合
+ 正则化减少过拟合
+ 优化器更新网络参数
  + SGD
  + SGDM
  + Adagrad
  + RMSProp
  + Adam(计算量大，但收敛轮数少，收敛速度快)

|函数名|作用|
| ---- | ---- |
|`tf.where(条件语句， 真返回A， 假返回B)`|条件语句为真返回A，为假返回B|
|`np.random.RandomState.rand(维度)`|返回一个[0, 1) 之间的随机数|
|`np.vstack(数组1，数组2)`|将两个数组按垂直方向叠加|
|`np.mgrid[维度1起始值:维度1结束值:步长，维度2起始值:维度2结束值:步长...]`|**对于一个二维的，拉出来的两个矩阵形状一致，第一个矩阵一列一列展开，第二个矩阵一行一行展开**|


## 基础知识
### tf.where()
条件语句为真返回A，为假返回B
`tf.where(条件语句， 真返回A， 假返回B)`


In [7]:
import tensorflow as tf

In [8]:
a = tf.constant([1,2,3,1,1])
b = tf.constant([0,1,2,4,5])
# tf.greater(a, b)a和b当中的元素逐个比较
c = tf.where(tf.greater(a,b), a, b)
print(c)

tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)


### np.random.RandomState.rand()
返回一个[0, 1) 之间的随机数
`np.random.RandomState.rand(维度)`


In [9]:
import numpy as np
# 加了seed=x之后每次生成的随机数相同
rdm = np.random.RandomState(seed=1)
a = rdm.rand() #返回随机标量
b = rdm.rand(2,3) #随机矩阵
print(f'a:{a}\n')
print(f'b:{b}')

a:0.417022004702574

b:[[7.20324493e-01 1.14374817e-04 3.02332573e-01]
 [1.46755891e-01 9.23385948e-02 1.86260211e-01]]


### np.vstack()
将两个数组按垂直方向叠加
`np.vstack(数组1，数组2)`

In [10]:
a = np.array([1,2,3])
b = np.array([4,5,6])
c = np.vstack((a,b))
print(f'c is {c}')

c is [[1 2 3]
 [4 5 6]]


### np.mgrid[]、.ravle()、np.c_[]
`np.mgrid[维度1起始值:维度1结束值:步长，维度2起始值:维度2结束值:步长...]`
**对于一个二维的，拉出来的两个矩阵形状一致，第一个矩阵一列一列展开，第二个矩阵一行一行展开**

In [11]:
a = np.mgrid[1:6:1,1:6:2]
x,y = np.mgrid[1:3:1, 2:4:0.5]
print(f'x is \n{x}\n')
print(f'y is \n{y}\n')
print(f'x.ravel() is \n {x.ravel()}')
print(f'y.ravel() is \n {y.ravel()}')
print(f'np.c_[x.ravel(),y.ravel] is\n {np.c_[x.ravel(),y.ravel()]}')

x is 
[[1. 1. 1. 1.]
 [2. 2. 2. 2.]]

y is 
[[2.  2.5 3.  3.5]
 [2.  2.5 3.  3.5]]

x.ravel() is 
 [1. 1. 1. 1. 2. 2. 2. 2.]
y.ravel() is 
 [2.  2.5 3.  3.5 2.  2.5 3.  3.5]
np.c_[x.ravel(),y.ravel] is
 [[1.  2. ]
 [1.  2.5]
 [1.  3. ]
 [1.  3.5]
 [2.  2. ]
 [2.  2.5]
 [2.  3. ]
 [2.  3.5]]


![NN复杂度](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/NN复杂度.4p7scu3y93c0.webp)

![指数衰减学习率](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/指数衰减学习率.3qh72hlvep40.webp)

![参数优化器](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/参数优化器.53m04ubwz040.webp)

![SGD](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/SGD.b6mw8gf0qz4.webp)

```python
# SGD
w1.assign_sub(lr * grads[0])
b1.assign_sub(lr * grads[1])
```

![SGDM](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/SGDM.5um4zy6nwvo0.webp)

```python
# m_t = β * m_(t-1) + (1 - β) * g_t

m_w , m_b = 0, 0
beta = 0.9

m_w = beta * m_w + (1 - beta) * grad[0]
m_b = beta * m_b + (1- beta) * grad[1]

w1.assign_sub(lr * m_w)
b1.assign_sub(lr * m_b)
```

![Adagrad](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/Adagrad.4plh2jamkf40.webp)

```python
# adagrad
v_w , v_b = 0, 0
v_w += tf.square(grads[0])
v_b += tf.square(grads[1])
w1.assign_sub(lr * grads[0]) / tf.sqrt(v_w)
b1.assign_sub(lr * grads[1]) / tf.sqrt(v_b)
```

![RMSProp](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230826/RMSProp.5y4db4vpblo0.webp)

```python
# RMSProp
v_w , v_b = 0, 0
beta = 0.9

v_w = beta * v_w + (1 - beta) * tf.square(grads[0])
v_b = beta * v_b + (1 - beta) * tf.square(grads[1])
w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))


````

![Adam](https://cdn.staticaly.com/gh/SisyphusTang/Picture-bed@master/20230828/Adam.64ymm4c20mw0.webp)

```python
# Adam
m_w , m_b = 0, 0
v_w , v_b = 0, 0

beta1, beta2 = 0.9, 0.999
delta_w , delta_b = 0,0
global_step = 0

m_w = beta1 * m_w  + (1 - beta1) * grads[0]
m_b = beta1 * m_b +  (1- beta1) * grads[1]
v_w = beta2 * v_w + (1- beta2) * tf.square(grads[0])
v_b = beta2 * v_b + (1- beta2) * tf.square(grads[1])

m_w_correction = m_w / (1 - tf.pow(beta1, int(global_step)))
m_b_correction = m_b / (1 - tf.pow(beta1, int(global_step)))
v_w_correction = v_w / (1 - tf.pow(beta2, int(global_step)))
v_b_correction = v_b / (1 - tf.pow(beta2, int(global_step)))

w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))
```