# 1. 优化算法

TensorFlow提供了多种优化算法:
- tf.train.GradientDescentOptimizer
- tf.train.AdadeltaOptimizer
- tf.train.AdagradOptimizer
- tf.train.AdagradDAOptimizer
- tf.train.MomentumOptimizer
- tf.train.AdamOptimizer
- tf.train.FtrlOptimizer
- tf.train.ProximalGradientDescentOptimizer
- tf.train.ProximalAdagradOptimizer
- tf.train.RMSPropOptimizer

参考：https://www.tensorflow.org/api_guides/python/train#Optimizers

其中AdamOptimizer用的较多，其中使用方法如下：
```
# cross_entropy是损失函数
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)
```
详细API描述，请参考：https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer


**练习**: 请将下列代码的优化器更换成`AdamOptimizer`，并通过TensorBoard查看准确率。

In [2]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 导入数据
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

def add_layer(inputs, in_size, out_size, activation_function=None):
    # add one more layer and return the output of this layer
    with tf.name_scope('layer'):
        with tf.name_scope('weights'):
            Weights = tf.Variable(tf.truncated_normal([in_size, out_size]), name='W')
            
        with tf.name_scope('biases'):
            biases = tf.Variable(tf.constant(0.1, shape=[1, out_size]), name='b')
            
        with tf.name_scope('Wx_plus_b'):
            Wx_plus_b = tf.add(tf.matmul(inputs, Weights), biases)
            
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b, )
    
    tf.summary.histogram('weights', Weights)
    tf.summary.histogram('biases', biases)
    
    return outputs

with tf.Graph().as_default() as g:
    with tf.name_scope('input'):
        # X: 输入
        X = tf.placeholder(tf.float32, [None, 784], name="X")
        # Y_: 标签
        Y_ = tf.placeholder(tf.float32, [None, 10])

    # L1：200个神经元
    Y1 = add_layer(X, 784, 200, tf.nn.sigmoid)

    # L2：100个神经元
    Y2 = add_layer(Y1, 200, 100, tf.nn.sigmoid)

    # L3: 60个神经元
    Y3 = add_layer(Y2, 100, 60, tf.nn.sigmoid)

    # L4: 30个神经元
    Y4 = add_layer(Y3, 60, 30, tf.nn.sigmoid)

    # L5: 10个神经元
    Ylogits = add_layer(Y4, 30, 10, tf.nn.sigmoid)

    # Output
    Y = tf.nn.softmax(Ylogits)

    # 损失函数
    with tf.name_scope('loss'):
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y_, logits=Ylogits))
        
    tf.summary.scalar('cross_entropy', cross_entropy)
        
    # 优化算法
    with tf.name_scope('train'):
        train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
        
    # 计算准确率
    with tf.name_scope('accuracy'):
        correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    tf.summary.scalar('accuracy', accuracy)
    
    merged = tf.summary.merge_all()
    
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        # 将当前会话中的计算图保存
        writer = tf.summary.FileWriter("logs/", sess.graph)

        for i in range(10000):
            batch_xs, batch_ys = mnist.train.next_batch(100)
            summary, _ = sess.run([merged, train_step], feed_dict={X: batch_xs, Y_: batch_ys})

            writer.add_summary(summary, i)
            
            if i%100 == 0:
                print(accuracy.eval({X: mnist.test.images, Y_: mnist.test.labels}))

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
0.0892
0.0879
0.0876
0.0884
0.1222
0.1467
0.1998
0.4163
0.5933
0.6245
0.6468
0.6579
0.6722
0.6787
0.7095
0.7588
0.775
0.7849
0.793
0.7989
0.8062
0.8095
0.815
0.8197
0.821
0.8273
0.8296
0.8339
0.8376
0.8434
0.8442
0.8481
0.8509
0.8537
0.8575
0.8592
0.8621
0.8634
0.8666
0.8676
0.8719
0.8715
0.8737
0.8748
0.876
0.8778
0.8784
0.8788
0.8808
0.8817
0.8814
0.8832
0.8858
0.887
0.8866
0.8888
0.891
0.8904
0.8914
0.8905
0.8928
0.8939
0.8949
0.8948
0.8954
0.8957
0.8949
0.8968
0.898
0.8981
0.8978
0.8984
0.8995
0.8998
0.8991
0.9002
0.9007
0.9019
0.9031
0.9035
0.9022
0.9025
0.9028
0.903
0.904
0.9049
0.9048
0.9058
0.9048
0.9045
0.9062
0.9059
0.9074
0.9074
0.9075
0.9094
0.9084
0.9099
0.9094
0.9095


# 2. Hyperparameter-学习速率

在机器学习的上下文中，超参数是在开始学习过程之前设置值的参数，而不是通过训练得到的参数数据。通常情况下，需要对超参数进行优化，给学习机选择一组最优超参数，以提高学习的性能和效果。例如在我们的MNIST任务中，AdamOptimizer的学习速率参数就是一个待确定的值。

为了更好的比较不同超参数下模型的性能，一种比较简单的的方法就是得到各个参数下模型的性能，然后进行对比。TensorBoard支持同时显示不同超参数下的数据，只需要将不同参数下产生的TensorBoard数据写入到不同的子目录即可。

下面是一个很简单的例子，实现了$y=x+step$的功能，将不同的step的数据写入到`logs`的不同子目录下

In [1]:
import tensorflow as tf

for step in [k for k in range(10)]:
    with tf.Graph().as_default() as g:
        x = tf.placeholder(tf.float32, shape=(), name='x')
        y = x+step
        tf.summary.scalar('y', y)

        merged = tf.summary.merge_all()

        with tf.Session() as sess:
            writer = tf.summary.FileWriter('logs/step_{0}/'.format(step))

            for i in range(100):
                summary, _ = sess.run([merged, y], feed_dict={x: i/100})

                writer.add_summary(summary, i)

                i += 1

  from ._conv import register_converters as _register_converters


**练习**: 根据上述代码，我们探索一下在不同学习速率下模型的性能变化。输出的日志文件保存在`logs/adm_lr_{value}/`子目录下，其中`value`是具体的学习速率值。例如lr=0.001，则对应的TensorFlow事件输出到`logs/lr_0.001`目录下 (提示，修改tf.summary.FileWriter的参数)。
学习速率参数在`AdamOptimizer`中指定，其中学习速率取值从0.001开始，直到0.01，步长为0.001

运行完成之后TensorBoard显示效果图如下：
![image.png](http://p811pjpxl.bkt.clouddn.com/16-1.png)

In [1]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 导入数据
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

def add_layer(inputs, in_size, out_size, activation_function=None):
    # add one more layer and return the output of this layer
    with tf.name_scope('layer'):
        with tf.name_scope('weights'):
            Weights = tf.Variable(tf.truncated_normal([in_size, out_size]), name='W')
            
        with tf.name_scope('biases'):
            biases = tf.Variable(tf.constant(0.1, shape=[1, out_size]), name='b')
            
        with tf.name_scope('Wx_plus_b'):
            Wx_plus_b = tf.add(tf.matmul(inputs, Weights), biases)
            
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b, )
    
    tf.summary.histogram('weights', Weights)
    tf.summary.histogram('biases', biases)
    
    return outputs

with tf.Graph().as_default() as g:
    with tf.name_scope('input'):
        # X: 输入
        X = tf.placeholder(tf.float32, [None, 784], name="X")
        # Y_: 标签
        Y_ = tf.placeholder(tf.float32, [None, 10])

    # L1：200个神经元
    Y1 = add_layer(X, 784, 200, tf.nn.sigmoid)

    # L2：100个神经元
    Y2 = add_layer(Y1, 200, 100, tf.nn.sigmoid)

    # L3: 60个神经元
    Y3 = add_layer(Y2, 100, 60, tf.nn.sigmoid)

    # L4: 30个神经元
    Y4 = add_layer(Y3, 60, 30, tf.nn.sigmoid)

    # L5: 10个神经元
    Ylogits = add_layer(Y4, 30, 10, tf.nn.sigmoid)

    # Output
    Y = tf.nn.softmax(Ylogits)

    # 损失函数
    with tf.name_scope('loss'):
        cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=Y_, logits=Ylogits))
        
    tf.summary.scalar('cross_entropy', cross_entropy)
        
    # 优化算法
    with tf.name_scope('train'):
        train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)
        
    # 计算准确率
    with tf.name_scope('accuracy'):
        correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    tf.summary.scalar('accuracy', accuracy)
    
    merged = tf.summary.merge_all()
    
    with tf.Session() as sess:
        tf.global_variables_initializer().run()
        # 将当前会话中的计算图保存
        writer = tf.summary.FileWriter("logs/", sess.graph)

        for i in range(10000):
            batch_xs, batch_ys = mnist.train.next_batch(100)
            summary, _ = sess.run([merged, train_step], feed_dict={X: batch_xs, Y_: batch_ys})

            writer.add_summary(summary, i)
            
            if i%100 == 0:
                print(accuracy.eval({X: mnist.test.images, Y_: mnist.test.labels}))

  from ._conv import register_converters as _register_converters


Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
Instructions for updating:
Please write your own downloading logic.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
0.1414
0.8695
0.9122
0.9173
0.9263
0.9369
0.937
0.94
0.9432
0.9348
0.9453
0.9277
0.9431
0.9399
0.9454
0.9412
0.9507
0.9482
0.9462
0.9449
0.9453
0.9518
0.948
0.9528
0.9516
0.952
0.9489
0.9521
0.953
0.9442
0.9536
0.9514
0.9536
0.9419
0.9503
0.9573
0.9516
0.9537
0.9545
0.9544
0.9572
0.9447
0.9563
0.9516

# 3. 对比不同算法的性能 (Optional)

**练习**: 对比`GradientDescentOptimizer`和`tf.train.AdamOptimizer`的性能区别。
提示：使用GradientDescentOptimizer算法，学习速率从0.1开始一直到1.0，输出到子目录`logs/gd_lr_{value}`下。

# 4. 本地安装TF环境

首先安装Python 3的环境，然后通过pip进行安装
```
pip install tensorflow
pip install jupyter
```

安装完成后，启动jupyter服务端：
```
jupyter notebook
```