# Optimizer

```
本篇博客探索所用tensorflow的优化器解决最优化问题
```

In [1]:
import tensorflow as tf

定义目标函数， $loss = (x-3)^2$， 求goal最小时，x的值：

In [2]:
# x = tf.placeholder(tf.float32)
x = tf.Variable(tf.truncated_normal([1]))
goal = tf.pow(x-3,2)

In [3]:
with tf.Session() as sess:
    x.initializer.run()
    print x.eval()
    print goal.eval()

[-0.09843425]
[ 9.60029411]


使用梯度下降优化器解决问题。

## 1. 使用minimize()

In [4]:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)
train_step = optimizer.minimize(goal)

In [5]:
def train():
    with tf.Session() as sess:
        x.initializer.run()
        for i in range(10):
            print "x: ", x.eval()
            train_step.run()
            print "goal: ",goal.eval()    
train()

x:  [-1.07398713]
goal:  [ 10.62231827]
x:  [-0.25918981]
goal:  [ 6.79828358]
x:  [ 0.39264813]
goal:  [ 4.3509016]
x:  [ 0.91411847]
goal:  [ 2.78457713]
x:  [ 1.33129478]
goal:  [ 1.78212929]
x:  [ 1.66503584]
goal:  [ 1.14056277]
x:  [ 1.93202865]
goal:  [ 0.72996008]
x:  [ 2.14562297]
goal:  [ 0.46717459]
x:  [ 2.31649828]
goal:  [ 0.29899168]
x:  [ 2.45319867]
goal:  [ 0.19135472]


## 2 . minimize() = compute_gradients() + apply_gradients()

拆分成计算梯度和应用梯度两个步骤。

In [6]:
# compute_gradients 返回的是：A list of (gradient, variable) pairs
gra_and_var = optimizer.compute_gradients(goal)
train_step = optimizer.apply_gradients(gra_and_var)
train()

x:  [-0.91358572]
goal:  [ 9.80233669]
x:  [-0.13086852]
goal:  [ 6.27349615]
x:  [ 0.49530518]
goal:  [ 4.01503706]
x:  [ 0.99624419]
goal:  [ 2.56962395]
x:  [ 1.39699531]
goal:  [ 1.64455926]
x:  [ 1.71759629]
goal:  [ 1.05251801]
x:  [ 1.97407699]
goal:  [ 0.6736114]
x:  [ 2.17926168]
goal:  [ 0.43111134]
x:  [ 2.3434093]
goal:  [ 0.2759113]
x:  [ 2.47472739]
goal:  [ 0.17658314]


## 3. 进一步

- clip_by_global_norm:修正梯度值，用于控制梯度爆炸的问题。梯度爆炸和梯度弥散的原因一样，都是因为链式法则求导的关系，导致梯度的指数级衰减。为了避免梯度爆炸，需要对梯度进行修剪。 

In [7]:
gradients, vriables = zip(*optimizer.compute_gradients(goal))
gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
train_step = optimizer.apply_gradients(zip(gradients, vriables))
train()

x:  [-0.06642474]
goal:  [ 8.6519804]
x:  [ 0.05857526]
goal:  [ 7.93224907]
x:  [ 0.18357527]
goal:  [ 7.24376774]
x:  [ 0.30857527]
goal:  [ 6.58653641]
x:  [ 0.43357527]
goal:  [ 5.96055555]
x:  [ 0.55857527]
goal:  [ 5.36582375]
x:  [ 0.68357527]
goal:  [ 4.80234289]
x:  [ 0.80857527]
goal:  [ 4.27011156]
x:  [ 0.93357527]
goal:  [ 3.76913023]
x:  [ 1.05857527]
goal:  [ 3.2993989]


- 加入学习率衰减：

In [8]:
# global_step 记录当前是第几个batch
global_step = tf.Variable(0)
learning_rate = tf.train.exponential_decay(
    3.0, global_step, 3, 0.3, staircase=True)
optimizer2 = tf.train.GradientDescentOptimizer(learning_rate)
gradients, vriables = zip(*optimizer2.compute_gradients(goal))
gradients, _ = tf.clip_by_global_norm(gradients, 1.25)
train_step = optimizer2.apply_gradients(zip(gradients, vriables), 
                                       global_step=global_step)
with tf.Session() as sess:
        global_step.initializer.run()
        x.initializer.run()
        for i in range(10):
            print "x: ", x.eval()
            train_step.run()
            print "goal: ",goal.eval()    

x:  [ 0.43469602]
goal:  [ 1.40350509]
x:  [ 4.1846962]
goal:  [ 6.58078384]
x:  [ 0.4346962]
goal:  [ 1.40350509]
x:  [ 4.1846962]
goal:  [ 0.00356364]
x:  [ 3.0596962]
goal:  [ 0.00228072]
x:  [ 2.95224309]
goal:  [ 0.00145967]
x:  [ 3.03820562]
goal:  [ 0.00030886]
x:  [ 3.01757455]
goal:  [  6.53558600e-05]
x:  [ 3.0080843]
goal:  [  1.38298683e-05]
x:  [ 3.00371885]
goal:  [  9.71175723e-06]
