$$w_{t+1} = w_{t} - \text{learning rate}\nabla$$

 - $w_{t+1}$:更新后的参数
 - $w_t$:当前的参数
 - $\text{learning_rate}$:学习率
 - $\nabla$:损失函数的梯度

eg:损失函数为$loss = (w + 1)^2$, $\nabla = 2(w + 1)$, 初始的$w$设置为5， 学习率设置为0.2，根据参数更新公式有：

 - $2.6 = 5 - 0.2 * (2 * 5 + 2)$
 - $1.16 = 2.6 - 0.2 * (2 * 2.6 + 2)$
 - $\cdots$

#### 1 - right learning rate

In [8]:
import tensorflow as tf

w = tf.Variable(tf.constant(5, tf.float32))
loss = tf.square(w + 1)
train = tf.train.GradientDescentOptimizer(0.2).minimize(loss)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for i in range(40):
        sess.run(train)
        print(sess.run(w), end=" ")

2.6 1.16 0.296 -0.22240001 -0.53344 -0.720064 -0.8320384 -0.899223 -0.9395338 -0.9637203 -0.9782322 -0.9869393 -0.9921636 -0.99529815 -0.9971789 -0.99830735 -0.9989844 -0.99939066 -0.9996344 -0.99978065 -0.9998684 -0.999921 -0.9999526 -0.99997157 -0.99998295 -0.99998975 -0.99999386 -0.9999963 -0.9999978 -0.9999987 -0.9999992 -0.9999995 -0.9999997 -0.9999998 -0.9999999 -0.99999994 -0.99999994 -0.99999994 -0.99999994 -0.99999994 

#### 2 - big learning rate

In [10]:
import tensorflow as tf

w = tf.Variable(tf.constant(5, tf.float32))
loss = tf.square(w + 1)
train = tf.train.GradientDescentOptimizer(1).minimize(loss)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for i in range(40):
        sess.run(train)
        print(sess.run(w), end=" ")

-7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 -7.0 5.0 

#### 3 - small learning rate

In [11]:
import tensorflow as tf

w = tf.Variable(tf.constant(5, tf.float32))
loss = tf.square(w + 1)
train = tf.train.GradientDescentOptimizer(0.0001).minimize(loss)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for i in range(40):
        sess.run(train)
        print(sess.run(w), end=" ")

4.9988 4.9976 4.9964004 4.995201 4.994002 4.992803 4.9916043 4.990406 4.9892077 4.98801 4.986812 4.985615 4.9844174 4.9832206 4.9820237 4.9808273 4.979631 4.978435 4.977239 4.9760437 4.9748483 4.9736533 4.9724584 4.971264 4.9700694 4.9688754 4.9676814 4.966488 4.9652944 4.9641013 4.9629083 4.9617157 4.960523 4.959331 4.958139 4.9569473 4.9557557 4.9545646 4.9533734 4.952183 

**指数衰减的学习率**

$$\text{learning_rate} = \text{LEARNING_RATE_BASE} * \text{LEARNING_RATE_DECAY}^{\frac{\text{global_step}}{\text{LEARNING_RATE_STEP}}}$$

 - $\text{LEARNING_RATE_BASE}$:学习率基数
 - $\text{LEARNING_RATE_DECAY}$:学习率衰减
 - $\text{global_step}$:运行了几轮Batch_size
 - $\text{LEARNING_RATE_STEP}$:喂入多少轮batch_size之后更新一次学习率=总样本数/Batch_Size

In [19]:
LEARNING_RATE_BASE = 0.1
LEARNING_RATE_DECAY = 0.99
LEARNING_RATE_STEP = 10

global_step = tf.Variable(0, trainable = False)
learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, LEARNING_RATE_STEP, LEARNING_RATE_DECAY, staircase=True)
w = tf.Variable(tf.constant(5, tf.float32))
loss = tf.square(w - 1)
train = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for i in range(40):
        sess.run(train)
        global_step_val = sess.run(global_step)
        learning_rate_val = sess.run(learning_rate)
        w_val = sess.run(w)
        loss_val = sess.run(loss)
        print("global_step=", global_step_val, "learning_rate=", learning_rate_val, "w=", w_val, "loss_val=", loss_val)

global_step= 1 learning_rate= 0.1 w= 4.2 loss_val= 10.239999
global_step= 2 learning_rate= 0.1 w= 3.56 loss_val= 6.5536
global_step= 3 learning_rate= 0.1 w= 3.0479999 loss_val= 4.1943035
global_step= 4 learning_rate= 0.1 w= 2.6383998 loss_val= 2.684354
global_step= 5 learning_rate= 0.1 w= 2.31072 loss_val= 1.7179868
global_step= 6 learning_rate= 0.1 w= 2.0485759 loss_val= 1.0995114
global_step= 7 learning_rate= 0.1 w= 1.8388608 loss_val= 0.70368737
global_step= 8 learning_rate= 0.1 w= 1.6710886 loss_val= 0.45035988
global_step= 9 learning_rate= 0.1 w= 1.5368708 loss_val= 0.2882303
global_step= 10 learning_rate= 0.099 w= 1.4294966 loss_val= 0.18446738
global_step= 11 learning_rate= 0.099 w= 1.3444563 loss_val= 0.11865015
global_step= 12 learning_rate= 0.099 w= 1.2762539 loss_val= 0.07631624
global_step= 13 learning_rate= 0.099 w= 1.2215557 loss_val= 0.049086932
global_step= 14 learning_rate= 0.099 w= 1.1776876 loss_val= 0.0315729
global_step= 15 learning_rate= 0.099 w= 1.1425055 loss_va