# Batch Normalization – Practice

批量正则化在构建深层神经网络时最有用。为了证明这一点，我们将创建一个包含<u>**20个卷积层**</u>的卷积神经网络，然后是<u>**1个全连接层**</u>。我们将使用它对*MNIST*数据集中的手写数字进行分类，您现在应该已经熟悉了。         
          
这**不是**一个很好的网络分类*MINIST*数字。你可以创建一个**简单**网络，得到**更好**结果。但是，为了让您亲身体验批量正则化，我们必须举一个例子：   
1.足够复杂，训练将受益于批量正则化。     
2.简单到可以快速训练，因为这是一个简短的练习，只是为了给您提供一些添加批量正则化的实践。        
3.简单到不需要额外的资源就能很容易理解体系结构。     

此笔记本包含两个可以编辑的神经网络版本。第一个使用来自`tf.layers`包的高级函数。第二个是相同的网络，但只使用`tf.nn`包中的低级函数。

1. [Batch Normalization with `tf.layers.batch_normalization`](#example_1)
2. [Batch Normalization with `tf.nn.batch_normalization`](#example_2)

以下单元格加载TensorFlow，必要时下载*MNIST*数据集，并将其加载到名为`MNIST`的对象中。在笔记本上运行任何其他内容之前，您需要运行此单元格。

In [1]:
%%time
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True, reshape=False)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
CPU times: user 13.6 s, sys: 830 ms, total: 14.4 s
Wall time: 14.5 s


# Batch Normalization using `tf.layers.batch_normalization`<a id="example_1"></a>

$$\color{red}{Attention}$$    
此版本的网络几乎对所有内容都使用`tf.layers`，并希望您使用[`tf.layers.batch_normalization`](https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization) 
***

我们将使用以下函数在网络中创建完全连接的层。我们将用指定数量的神经元和ReLU激活函数创建它们。     
         
此版本的函数不包括*批量正则化*。

In [2]:
"""
DO NOT MODIFY THIS CELL
"""
def fully_connected(prev_layer, num_units):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, activation=tf.nn.relu)
    return layer

我们将使用以下函数在网络中创建卷积层。它们是非常基本的：我们总是使用3x3内核，ReLU激活函数，在奇数深度的层上跨步1x1，在偶数深度的层上跨步2x2。在这个网络中，我们根本不需要把层集中起来。        
              
此版本的函数不包括*批量正则化*。

In [3]:
"""
DO NOT MODIFY THIS CELL
"""
def conv_layer(prev_layer, layer_depth):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1   # (not good)
    conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', activation=tf.nn.relu)
    return conv_layer

**运行下面的单元格**，以及前面的单元格（以加载数据集并定义必要的函数）。       
       
这个单元在没有**批量规范化的情况下构建网络**，然后在MNIST数据集上训练它。它在训练过程中定期显示丢失和准确性数据。

In [4]:
"""
DO NOT MODIFY THIS CELL
"""
def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])
    
    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels})
                print('Batch: {:>4}/ {:>4}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'
                      .format(batch_i,num_batches ,loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys})
                print('Batch: {:>4}/ {:>4}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'
                      .format(batch_i,num_batches, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]]})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Batch:  0: Validation loss: 0.69101, Validation accuracy: 0.10700
Batch: 25: Training loss: 0.49650, Training accuracy: 0.04688
Batch: 50: Training loss: 0.32611, Training accuracy: 0.10938
Batch: 75: Training loss: 0.32385, Training accuracy: 0.14062
Batch: 100: Validation loss: 0.32515, Validation accuracy: 0.11260
Batch: 125: Training loss: 0.32502, Training accuracy: 0.14062
Batch: 150: Training loss: 0.32563, Training accuracy: 0.06250
Batch: 175: Training loss: 0.32641, Training accuracy: 0.09375
Batch: 200: Validation loss: 0.32502, Validation accuracy: 0.11260
Batch: 225: Training loss: 0.32480, Training accuracy: 0.07812
Batch: 250: Training loss: 0.32697, Training accuracy: 0.07812
Batch: 275: Training loss: 0.32777, Training accuracy: 0.06250
Batch: 300: Validation loss: 0.32531, Validation accuracy: 0.09760
Batch: 325: Training loss: 0.32744, Training accuracy: 0.04688
Batch: 350: Training loss: 0.32578, Training accuracy: 0.12500
Batch: 375: Training loss: 0.32504, Trainin

有了这么多层，这个网络需要很多迭代才能学习。当你完成这800个批次的培训时，你的最终测试和验证准确率可能不会超过10%。（每次都会有所不同，但很可能低于15%。）        
       
使用批处理规范化，您将能够在相同的批处理数中将同一网络训练到90%以上。      
     
# 添加批处理规范化
我们已经复制了前三个单元格来开始。**编辑这些单元格**以向网络添加批量规范化。对于这个练习，你应该使用[`tf.layers.batch_normalization`](https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization) 来处理大部分数学问题，但需要对网络进行一些其他更改才能集成批处理规范化。您可能需要参考课程笔记本来提醒自己一些重要的事情，例如图形操作需要知道您是否正在执行培训或推理。      
        
如果你陷入困境，你可以查看`Batch_Normalization_Solutions` 笔记本，看看我们是怎么做的。

**TODO:** 修改`fully_connected`以将批处理规范化添加到它创建的完全连接层。如果有帮助，可以随意更改函数的参数。

In [5]:
def fully_connected(prev_layer, num_units, is_training):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new fully connected layer
    """
    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)
    layer = tf.layers.batch_normalization(layer, training=is_training)
    layer = tf.nn.relu(layer)
    return layer

**TODO:** 修改`conv_layer`，将批处理规范化添加到它创建的卷积层。如果有帮助，可以随意更改函数的参数。

1.**在标准化之前不使用偏差且在使用ReLU激活函数前添加批量规范化的备用解决方案。** <font color=blue>conv2d -> bn -> relu</font>

In [6]:
# def conv_layer(prev_layer, layer_depth, is_training):
#     """
#     Create a convolutional layer with the given layer as input.
    
#     :param prev_layer: Tensor
#         The Tensor that acts as input into this layer
#     :param layer_depth: int
#         We'll set the strides and number of feature maps based on the layer's depth in the network.
#         This is *not* a good way to make a CNN, but it helps us create this example with very little code.
#     :param is_training: bool or Tensor
#         Indicates whether or not the network is currently training, which tells the batch normalization
#         layer whether or not it should update or use its population statistics.
#     :returns Tensor
#         A new convolutional layer
#     """
#     strides = 2 if layer_depth % 3 == 0 else 1             # 
#     conv_layer = tf.layers.conv2d(prev_layer, layer_depth*4, 3, strides, 'same', use_bias=False, activation=None)
#     conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)
#     conv_layer = tf.nn.relu(conv_layer)

#     return conv_layer

2.**在卷积层中使用偏差但仍在ReLU激活函数之前添加批量规范化的替代解决方案:**
<font color=blue>conv2d(bias) -> bn -> relu</font>

In [34]:
def conv_layer(prev_layer, layer_num, is_training):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_num % 3 == 0 else 1
    conv_layer = tf.layers.conv2d(prev_layer, layer_num*4, 3, strides, 'same', use_bias=True, activation=None)
    conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)
    conv_layer = tf.nn.relu(conv_layer)
    return conv_layer

3.**在批处理规范化之前使用bias和ReLU激活函数的备用解决方案:** <font color=blue>conv2d(bias) -> relu -> bn</font>

In [35]:
# def conv_layer(prev_layer, layer_num, is_training):
#     """
#     Create a convolutional layer with the given layer as input.
    
#     :param prev_layer: Tensor
#         The Tensor that acts as input into this layer
#     :param layer_depth: int
#         We'll set the strides and number of feature maps based on the layer's depth in the network.
#         This is *not* a good way to make a CNN, but it helps us create this example with very little code.
#     :param is_training: bool or Tensor
#         Indicates whether or not the network is currently training, which tells the batch normalization
#         layer whether or not it should update or use its population statistics.
#     :returns Tensor
#         A new convolutional layer
#     """
#     strides = 2 if layer_num % 3 == 0 else 1
#     conv_layer = tf.layers.conv2d(prev_layer, layer_num*4, 3, strides, 'same', use_bias=True, activation=tf.nn.relu)
#     conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)
#     return conv_layer

4.**在标准化之前使用ReLU激活函数但没有偏差的备用解决方案。** <font color=blue>conv2d -> relu -> bn</font>

In [36]:
# def conv_layer(prev_layer, layer_num, is_training):
#     """
#     Create a convolutional layer with the given layer as input.
    
#     :param prev_layer: Tensor
#         The Tensor that acts as input into this layer
#     :param layer_depth: int
#         We'll set the strides and number of feature maps based on the layer's depth in the network.
#         This is *not* a good way to make a CNN, but it helps us create this example with very little code.
#     :param is_training: bool or Tensor
#         Indicates whether or not the network is currently training, which tells the batch normalization
#         layer whether or not it should update or use its population statistics.
#     :returns Tensor
#         A new convolutional layer
#     """
#     strides = 2 if layer_num % 3 == 0 else 1
#     conv_layer = tf.layers.conv2d(prev_layer, layer_num*4, 3, strides, 'same', use_bias=False, activation=tf.nn.relu)
#     conv_layer = tf.layers.batch_normalization(conv_layer, training=is_training)
#     return conv_layer

$$\color{red}{Summary}$$

批处理规范化仍然是一个足够新的想法，研究人员仍在探索如何最好地使用它。一般来说，人们似乎同意消除层的偏差（因为批处理规范化已经有了缩放和移位的条件），并在层的非线性激活函数之前添加批处理规范化。然而，对于一些网络来说，它在其他方面也会很好地工作。   
      
为了演示这一点，下面三个版本的conv_layer展示了实现批处理规范化的其他方法。如果尝试使用这些函数的任何版本运行，它们都应该仍然可以正常工作（尽管某些版本可能仍然比其他版本工作得更好）。 

***

**TODO:** 编辑`train`函数以支持批量规范化。你需要确保网络知道它是否在训练，并且你需要确保它正确地更新和使用它的人口统计数据。
         
为了改进`train`，我们做了以下工作：         
1.添加了`is_training`，一个用于存储布尔值的占位符，该值指示网络是否正在进行训练。        
2.传递`is_training`到`fully_connected` 和 `conv_layer` 函数。          
3.每次我们在课程中调用`run`，我们都会将`is_training`的适当值添加到`feed_dict`中。            
4.将`train_opt`的创建移动到一个`with tf.control_dependencies...`段落中。这对于获取使用`tf.layers.batch_normalization`创建的规范化层以更新其总体统计数据是必要的，我们在执行推断时需要这些数据。        

In [37]:
def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])

    '''Add placeholder to indicate whether or not we're training the model'''
    is_training = tf.placeholder(tf.bool) 

    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i, is_training)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100, is_training)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    
    '''Tell TensorFlow to update the population statistics while training'''
    # Wrapper for `Graph.control_dependencies()` using the default graph.
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)): 
        train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels,
                                                              is_training: False})
                print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels, 
                                  is_training: False})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels,
                                  is_training: False})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        '''Score the first 100 test images individually. This won't work if batch normalization isn't implemented correctly.'''
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]],
                                                    is_training: False})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Batch:  0: Validation loss: 0.69115, Validation accuracy: 0.11260
Batch: 25: Training loss: 0.58017, Training accuracy: 0.14062
Batch: 50: Training loss: 0.46078, Training accuracy: 0.14062
Batch: 75: Training loss: 0.39983, Training accuracy: 0.06250
Batch: 100: Validation loss: 0.36252, Validation accuracy: 0.09760
Batch: 125: Training loss: 0.34913, Training accuracy: 0.12500
Batch: 150: Training loss: 0.33554, Training accuracy: 0.09375
Batch: 175: Training loss: 0.32542, Training accuracy: 0.23438
Batch: 200: Validation loss: 0.28640, Validation accuracy: 0.31160
Batch: 225: Training loss: 0.23886, Training accuracy: 0.45312
Batch: 250: Training loss: 0.24569, Training accuracy: 0.50000
Batch: 275: Training loss: 0.16317, Training accuracy: 0.64062
Batch: 300: Validation loss: 0.05708, Validation accuracy: 0.90580
Batch: 325: Training loss: 0.04002, Training accuracy: 0.92188
Batch: 350: Training loss: 0.09054, Training accuracy: 0.85938
Batch: 375: Training loss: 0.06187, Trainin

使用批处理规范化，您现在应该可以获得超过90%的精度。还要注意输出的最后一行：`Accuracy on 100 samples`。如果此值较低，而其他值看起来都很好，则意味着您没有正确实现批处理规范化。具体来说，这意味着你要么**在训练时没有计算总体均值和方差**，要么**在推理时没有使用这些值**

# Batch Normalization using `tf.nn.batch_normalization`<a id="example_2"></a>

大多数情况下，您可以专门使用较高级别的函数，但有时您可能希望在较低级别工作。例如，如果您想要实现一个新特性（某个新特性使得TensorFlow还没有包含它的高级实现，比如LSTM中的批处理规范化），那么您可能需要知道这些事情。    
         
此版本的网络几乎对所有内容都使用`tf.nn`，并希望您使用[`tf.nn.batch_normalization`](https://www.tensorflow.org/api_docs/python/tf/nn/batch_normalization).

**optional TODO:** 您可以在编辑前运行接下来的三个单元格，以查看网络在没有批处理规范化的情况下的性能。但是，结果应该与添加批处理规范化之前的示例中看到的结果几乎相同。     
      
**TODO：** 修改`fully_connected`以将批处理规范化添加到它创建的完全连接层。如果有帮助，可以随意更改函数的参数。      
            
**Note：** 为了方便起见，我们继续使用`tf.layers.dense`作为`fully_connected`层。在类中的这一点上，用`prev_layer`和显式权重和偏差变量之间的矩阵操作替换它应该没有问题。
***
这种完全连接的实现比使用`tf.layers`的实现要复杂得多。但是，如果你浏览了`Batch_Normalization_Lesson`笔记本，事情看起来应该很熟悉。要添加批处理规范化，我们执行了以下操作：     
      
1.将`is_training`参数添加到函数签名中，以便我们可以将该信息传递到批处理规范化层。        
2.去除了稠密层的bias和activate函数。  
3.添加了gamma、beta、pop_mean和pop_variance变量。            
4.使用`tf.cond`进行不同的处理训练和推理。         
5.训练时，我们使用`tf.nn`矩来计算批均值和方差。然后我们更新总体统计数据，并使用`tf.nn.batch_normalization`来使用批次统计数据规范化层的输出。注意具有`tf.control_dependencies...`语句-这是强制TensorFlow运行更新填充统计信息的操作所必需的。     
6.在推断过程中（即不训练时），我们使用`tf.nn.batch_normalization `，使用我们在训练期间计算的总体统计数据来规范化层的输出。       
7.将规范化值传递到ReLU激活函数中。      
          
如果这些代码中的任何一个不清楚，它几乎与我们在`Batch_Normalization_Lesson`笔记本中的`full_connected`函数中显示的内容相同。请看这篇文章以获得广泛的评论。

In [40]:
def fully_connected(prev_layer, num_units, is_training):
    """
    Create a fully connectd layer with the given layer as input and the given number of neurons.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param num_units: int
        The size of the layer. That is, the number of units, nodes, or neurons.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new fully connected layer
    """

    layer = tf.layers.dense(prev_layer, num_units, use_bias=False, activation=None)

    gamma = tf.Variable(tf.ones([num_units]))
    beta = tf.Variable(tf.zeros([num_units]))

    pop_mean = tf.Variable(tf.zeros([num_units]), trainable=False)
    pop_variance = tf.Variable(tf.ones([num_units]), trainable=False)

    epsilon = 1e-3
    
    def batch_norm_training():
        batch_mean, batch_variance = tf.nn.moments(layer, [0])

        decay = 0.99
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
        train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):
            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
 
    def batch_norm_inference():
        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
    return tf.nn.relu(batch_normalized_output)

**TODO：**修改`conv_layer`，将批处理规范化添加到它创建的完全连接的层。如果有帮助，可以随意更改函数的参数。   
    
**Note：**与前面使用`tf.layers`的示例不同，向这些卷积层添加批处理规范化确实需要与在`fully_connected`中所做的略有不同。

In [41]:
def conv_layer(prev_layer, layer_depth, is_training):
    """
    Create a convolutional layer with the given layer as input.
    
    :param prev_layer: Tensor
        The Tensor that acts as input into this layer
    :param layer_depth: int
        We'll set the strides and number of feature maps based on the layer's depth in the network.
        This is *not* a good way to make a CNN, but it helps us create this example with very little code.
    :param is_training: bool or Tensor
        Indicates whether or not the network is currently training, which tells the batch normalization
        layer whether or not it should update or use its population statistics.
    :returns Tensor
        A new convolutional layer
    """
    strides = 2 if layer_depth % 3 == 0 else 1
    
    in_channels = prev_layer.get_shape().as_list()[3]
    out_channels = layer_depth*4
    
    weights = tf.Variable(
        tf.truncated_normal([3, 3, in_channels, out_channels], stddev=0.05))
    
    layer = tf.nn.conv2d(prev_layer, weights, strides=[1,strides, strides, 1], padding='SAME')

    gamma = tf.Variable(tf.ones([out_channels]))
    beta = tf.Variable(tf.zeros([out_channels]))

    pop_mean = tf.Variable(tf.zeros([out_channels]), trainable=False)
    pop_variance = tf.Variable(tf.ones([out_channels]), trainable=False)

    epsilon = 1e-3
    
    def batch_norm_training():
        # Important to use the correct dimensions here to ensure the mean and variance are calculated 
        # per feature map instead of for the entire layer
        batch_mean, batch_variance = tf.nn.moments(layer, [0,1,2], keep_dims=False)

        decay = 0.99
        train_mean = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
        train_variance = tf.assign(pop_variance, pop_variance * decay + batch_variance * (1 - decay))

        with tf.control_dependencies([train_mean, train_variance]):
            return tf.nn.batch_normalization(layer, batch_mean, batch_variance, beta, gamma, epsilon)
 
    def batch_norm_inference():
        return tf.nn.batch_normalization(layer, pop_mean, pop_variance, beta, gamma, epsilon)

    batch_normalized_output = tf.cond(is_training, batch_norm_training, batch_norm_inference)
    return tf.nn.relu(batch_normalized_output)

**TODO:** 编辑`train`函数以支持批量规范化。你需要确保网络知道它是否在训练。

In [44]:
def train(num_batches, batch_size, learning_rate):
    # Build placeholders for the input samples and labels 
    inputs = tf.placeholder(tf.float32, [None, 28, 28, 1])
    labels = tf.placeholder(tf.float32, [None, 10])

    # Add placeholder to indicate whether or not we're training the model
    is_training = tf.placeholder(tf.bool)

    # Feed the inputs into a series of 20 convolutional layers 
    layer = inputs
    for layer_i in range(1, 20):
        layer = conv_layer(layer, layer_i, is_training)

    # Flatten the output from the convolutional layers 
    orig_shape = layer.get_shape().as_list()
    layer = tf.reshape(layer, shape=[-1, orig_shape[1] * orig_shape[2] * orig_shape[3]])

    # Add one fully connected layer
    layer = fully_connected(layer, 100, is_training)

    # Create the output layer with 1 node for each 
    logits = tf.layers.dense(layer, 10)
    
    # Define loss and training operations
    model_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=labels))
    train_opt = tf.train.AdamOptimizer(learning_rate).minimize(model_loss)
    
    # Create operations to test accuracy
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    
    # Train and test the network
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for batch_i in range(num_batches):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)

            # train this batch
            sess.run(train_opt, {inputs: batch_xs, labels: batch_ys, is_training: True})
            
            # Periodically check the validation or training loss and accuracy
            if batch_i % 100 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: mnist.validation.images,
                                                              labels: mnist.validation.labels,
                                                              is_training: False})
                print('Batch: {:>2}: Validation loss: {:>3.5f}, Validation accuracy: {:>3.5f}'.format(batch_i, loss, acc))
            elif batch_i % 25 == 0:
                loss, acc = sess.run([model_loss, accuracy], {inputs: batch_xs, labels: batch_ys, is_training: False})
                print('Batch: {:>2}: Training loss: {:>3.5f}, Training accuracy: {:>3.5f}'.format(batch_i, loss, acc))

        # At the end, score the final accuracy for both the validation and test sets
        acc = sess.run(accuracy, {inputs: mnist.validation.images,
                                  labels: mnist.validation.labels, 
                                  is_training: False})
        print('Final validation accuracy: {:>3.5f}'.format(acc))
        acc = sess.run(accuracy, {inputs: mnist.test.images,
                                  labels: mnist.test.labels,
                                  is_training: False})
        print('Final test accuracy: {:>3.5f}'.format(acc))
        
        # Score the first 100 test images individually, just to make sure batch normalization really worked
        correct = 0
        for i in range(100):
            correct += sess.run(accuracy,feed_dict={inputs: [mnist.test.images[i]],
                                                    labels: [mnist.test.labels[i]],
                                                    is_training: False})

        print("Accuracy on 100 samples:", correct/100)


num_batches = 800
batch_size = 64
learning_rate = 0.002

tf.reset_default_graph()
with tf.Graph().as_default():
    train(num_batches, batch_size, learning_rate)

Batch:  0: Validation loss: 0.69088, Validation accuracy: 0.09900
Batch: 25: Training loss: 0.57144, Training accuracy: 0.15625
Batch: 50: Training loss: 0.46165, Training accuracy: 0.10938
Batch: 75: Training loss: 0.39725, Training accuracy: 0.06250
Batch: 100: Validation loss: 0.36217, Validation accuracy: 0.09240
Batch: 125: Training loss: 0.34594, Training accuracy: 0.10938
Batch: 150: Training loss: 0.34605, Training accuracy: 0.12500
Batch: 175: Training loss: 0.36240, Training accuracy: 0.06250
Batch: 200: Validation loss: 0.36065, Validation accuracy: 0.09240
Batch: 225: Training loss: 0.37682, Training accuracy: 0.14062
Batch: 250: Training loss: 0.38396, Training accuracy: 0.10938
Batch: 275: Training loss: 0.41749, Training accuracy: 0.04688
Batch: 300: Validation loss: 0.39758, Validation accuracy: 0.14540
Batch: 325: Training loss: 0.43828, Training accuracy: 0.15625
Batch: 350: Training loss: 0.55839, Training accuracy: 0.10938
Batch: 375: Training loss: 0.50879, Trainin

再次，批量标准化模型的精度应达到90%以上。在这个低层次上实现时，有很多细节可能会出错，所以如果你成功了-干得好！如果没有，不用担心，只要看看`Batch_Normalization_Solutions`笔记本，看看哪里出了问题。

$$\;$$