# `Scopes` in TensorFlow-Slim
*by Marvin Bertin*
<img src="../images/tensorflow.png" width="400">

## Scopes

Native TensorFlow already provides scoping mechanisms that help organize and simplify your code with `name_scope` and `variable_scope`.

TF-Slim provides a new scope called `arg_scope`. This scope lets you specify default arguments for specific operations within that scope. This becomes especially useful when dealing with very deep neural networks.

## Deep CNNs
<img src="../images/VGG.jpg" width="800">

<img src="../images/inception.png" width="1000">

**Other very deep neural network architectures**
- [Residual Networks](https://arxiv.org/abs/1512.03385)
- [Dense Networks](https://arxiv.org/abs/1608.06993)

The following methods will be covered in this notebook:
```
slim.arg_scope
slim.l2_regularizer
slim.flatten
slim.utils.convert_collection_to_dict
slim.batch_norm
```

## Import TensorFlow

In [1]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import tensorflow as tf
slim = tf.contrib.slim

## Import Helper Functions

In [2]:
import sys  
sys.path.append("../") 

from utils.pretty_printer import inspect_variables, inspect_layers

## Argument Scope

    => slim.arg_scope(*args, **kwds)
    Docstring:
    Stores the default arguments for the given set of list_ops.

    Args:
      list_ops_or_scope: List or tuple of operations to set argument scope for or
        a dictionary containing the current scope.
      **kwargs: keyword=value that will define the defaults for each op in
                list_ops. All the ops need to accept the given set of arguments.

    Yields:
      the current_scope, which is a dictionary of {op: {arg: value}}

## Define Slim model
Using argument scoping a 11-layer neural net can be made cleaner, simpler and easier to maintain. Argument values specified in the scope can be overridden in two ways:
- locally - define the argument directly in the function call.
- nested scopes - define mutiple `arg_scopes` within eachother.


In [3]:
def slim_model(inputs):

    # defaut parameters for conv2d and fully_connected layers
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                          activation_fn=tf.nn.relu,
                          weights_initializer=tf.truncated_normal_initializer(stddev=0.1),
                          biases_initializer=tf.constant_initializer(1.0),
                          weights_regularizer=slim.l2_regularizer(0.05)):
        # baises_regularizer is left None
        # It is less common/useful, assuming proper data preprocessing/mean subtraction

        # default parameters for conv2d (overrides previous scope)
        with slim.arg_scope([slim.conv2d],
                            kernel_size = [3,3],
                            padding='SAME',
                            biases_initializer=tf.constant_initializer(0.0)):
            # zero bias initialization is more common
            # asymmetry breaking is provided by the random initialization of weights
             
            # default parameters for max_pool2d    
            with slim.arg_scope([slim.max_pool2d],
                                kernel_size = [2,2],
                                padding = 'SAME'):
                # conv1
                net = slim.repeat(inputs, 2, slim.conv2d, 64, scope='conv1')
                net = slim.max_pool2d(net, scope='pool1')
                # conv2
                net = slim.repeat(net, 2, slim.conv2d, 128, scope='conv2')
                net = slim.max_pool2d(net, scope='pool2')
                # reshape tensor to matrix
                net = slim.flatten(net)
                # fc3
                net = slim.fully_connected(net, 1024, scope='fc3')
                net = slim.dropout(net, 0.5, scope='dropout3')
                # fc4
                net = slim.fully_connected(net, 256, scope='fc4')
                net = slim.dropout(net, 0.5, scope='dropout4')
                # linear prediction, override the activation_fn in scope
                net = slim.fully_connected(net, 10, activation_fn=None, scope='linear')
                # softmax4
                # categorical probability distribution over output vector
                outputs = slim.softmax(net, scope='softmax4')
                return outputs

## Add model to Graph

In [4]:
g = tf.Graph()
with g.as_default():
    
    # 4D Tensor placeholder for input images
    inputs = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name="images")
    
    with tf.variable_scope("TF-Slim", [inputs]):
        # add model to graph
        outputs = slim_model(inputs)


## Inspect Variables

In [5]:
with g.as_default():
    print("Parameters:")
    inspect_variables(slim.get_variables(scope="TF-Slim"))
    print("Inputs/Outputs:")
    inspect_variables([inputs, outputs])

Parameters:
name = TF-Slim/conv1/conv1_1/weights:0                         shape = (3, 3, 3, 64)
name = TF-Slim/conv1/conv1_1/biases:0                          shape = (64,)
name = TF-Slim/conv1/conv1_2/weights:0                         shape = (3, 3, 64, 64)
name = TF-Slim/conv1/conv1_2/biases:0                          shape = (64,)
name = TF-Slim/conv2/conv2_1/weights:0                         shape = (3, 3, 64, 128)
name = TF-Slim/conv2/conv2_1/biases:0                          shape = (128,)
name = TF-Slim/conv2/conv2_2/weights:0                         shape = (3, 3, 128, 128)
name = TF-Slim/conv2/conv2_2/biases:0                          shape = (128,)
name = TF-Slim/fc3/weights:0                                   shape = (8192, 1024)
name = TF-Slim/fc3/biases:0                                    shape = (1024,)
name = TF-Slim/fc4/weights:0                                   shape = (1024, 256)
name = TF-Slim/fc4/biases:0                                    shape = (256,)
name = T

## Examine Layer Structure
As neural networks gets deeper and more complicated it is important to keep track of the shape of your data as it passes through every layer operations in the graph. Argument scoping and other slim functions can help us keep track of the layer transformations.

---
    => slim.utils.convert_collection_to_dict(collection)
    Docstring:
    Returns an OrderedDict of Tensors using get_tensor_alias as key.

    Args:
      collection: A collection.

    Returns:
      An OrderedDict of {get_tensor_alias(tensor): tensor}
      
## Argument Scoping for Collecting Layer Endpoints

In [6]:
g = tf.Graph()
with g.as_default():
    
    inputs = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name="images")
    
    with tf.variable_scope('TF-Slim', [inputs]) as vs:
            end_points_collection = vs.original_name_scope + '_end_points'
            
            # Collect outputs/endpoints for conv2d, fully_connected and max_pool2d.
            with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                                outputs_collections=end_points_collection):
                # add model to graph
                outputs = slim_model(inputs)
                
                # get the layer endpoints
                end_points = slim.utils.convert_collection_to_dict(end_points_collection)

## Inspect Parameters and Layers

In [7]:
with g.as_default():
    print("Parameters:")
    inspect_variables(slim.get_variables(scope="TF-Slim"))
    print("Layers:")
    inspect_layers(end_points)

Parameters:
name = TF-Slim/conv1/conv1_1/weights:0                         shape = (3, 3, 3, 64)
name = TF-Slim/conv1/conv1_1/biases:0                          shape = (64,)
name = TF-Slim/conv1/conv1_2/weights:0                         shape = (3, 3, 64, 64)
name = TF-Slim/conv1/conv1_2/biases:0                          shape = (64,)
name = TF-Slim/conv2/conv2_1/weights:0                         shape = (3, 3, 64, 128)
name = TF-Slim/conv2/conv2_1/biases:0                          shape = (128,)
name = TF-Slim/conv2/conv2_2/weights:0                         shape = (3, 3, 128, 128)
name = TF-Slim/conv2/conv2_2/biases:0                          shape = (128,)
name = TF-Slim/fc3/weights:0                                   shape = (8192, 1024)
name = TF-Slim/fc3/biases:0                                    shape = (1024,)
name = TF-Slim/fc4/weights:0                                   shape = (1024, 256)
name = TF-Slim/fc4/biases:0                                    shape = (256,)
name = T

## Batch Normalization

**Problem with training deeper neural networks**
- *Internal Covariate Shift*
    - change in the input distribution to internal layers of deep networks.
    - the covariate shift is amplified down the network.
- *Vanishing Gradient*
    - nonlinearities like tanh or sigmoid tend to get stuck in the saturation region as the network grows deeper.
    - instead use ReLU (no saturation region), smaller learning rates, careful initializations.
    
** Whats is Batch Normalization?**

The Batch Normalization is wildly popular a method that addresses the various issues related to training of Deep Neural Networks. It has led to significant performance improvements in state-of-the-art models, by making normalization a part of the neural network architecture itself. 

<br>
<div style="float:left;margin-right:5px;">
    <img src="../images/BN3.png" width="400" />
    <p style="text-align:center;">*Batch Normalization Implementation*</p>
</div>
<div style="float:center;margin-right:5px;">
    <img src="../images/BN1.png" width="400" />
    <p style="text-align:center;">* Validation accuracy of Inception
and its batch-normalized variants vs. number of
training steps*</p>
</div>


** Advantages of Batch Normalization**
- Faster convergence with higher learning rate.
- More stable input distrubtion (reduction of covariate shift).
- Training gradients are less sensitive to input/parameter scale and initialization.
- Regularizes the model and reduces the need for dropout, l2/l1 regularization.
- Model trains with saturating nonlinearities (avoids vanishing gradients)

<img src="../images/BN2.png" width="800" align=center>
<center>* (a) The test accuracy of the MNIST network
trained with and without Batch Normalization, vs. the
number of training steps. Batch Normalization helps the
network train faster and achieve higher accuracy. (b,
c) The evolution of input distributions to a typical sigmoid,
over the course of training, shown as {15, 50, 85}th
percentiles. *</center>

** Want to learn more?**
- [Batch Normalization: Accelerating Deep Network Training b
y
Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167v3.pdf) (the original paper)
- [Batch Normalization in
Neural Network](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=8&cad=rja&uact=8&ved=0ahUKEwj5s430ge3QAhUP3mMKHaFaCdkQFghOMAc&url=https%3A%2F%2Fbcourses.berkeley.edu%2Ffiles%2F66022277%2Fdownload%3Fdownload_frd%3D1%26verifier%3DoaU8pqXDDwZ1zidoDBTgLzR8CPSkWe6MCBKUYan7&usg=AFQjCNGHyy4qhRcwLLU2RunAgM3iqFMcjQ&sig2=topE-0sw8Qb08mPfdBXO4A&bvm=bv.141320020,d.cGw)
- [Understanding the backward pass through Batch Normalization Layer](https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html)

## Define Slim Model with Batch Normalization

In [8]:
def slim_model_BN(inputs):

    # defaut parameters for conv2d and fully_connected layers
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                          activation_fn=tf.nn.relu,
                          weights_initializer=tf.truncated_normal_initializer(stddev=0.1),
                          biases_initializer=tf.constant_initializer(1.0),
                          weights_regularizer=slim.l2_regularizer(0.05)):

        # default parameters for conv2d (overrides previous scope)
        with slim.arg_scope([slim.conv2d],
                            kernel_size = [3,3],
                            padding='SAME',
                            normalizer_fn=slim.batch_norm, # add batch normalization layer
                            biases_initializer=tf.constant_initializer(0.0)):
             
            # default parameters for max_pool2d    
            with slim.arg_scope([slim.max_pool2d],
                                kernel_size = [2,2],
                                padding = 'SAME'):
                # conv1
                net = slim.repeat(inputs, 2, slim.conv2d, 64, scope='conv1')
                net = slim.max_pool2d(net, scope='pool1')
                # conv2
                net = slim.repeat(net, 2, slim.conv2d, 128, scope='conv2')
                net = slim.max_pool2d(net, scope='pool2')
                # reshape tensor to matrix
                net = slim.flatten(net)
                # fc3
                net = slim.fully_connected(net, 1024, scope='fc3')
                net = slim.dropout(net, 0.5, scope='dropout3')
                # fc4
                net = slim.fully_connected(net, 256, scope='fc4')
                net = slim.dropout(net, 0.5, scope='dropout4')
                # linear prediction, override the activation_fn in scope
                net = slim.fully_connected(net, 10, activation_fn=None, scope='linear')
                # softmax4
                outputs = slim.softmax(net, scope='softmax4')
                return outputs

## Build Graph

In [9]:
g = tf.Graph()
with g.as_default():
    
    inputs = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name="images")
    
    with tf.variable_scope('TF-Slim', [inputs]) as vs:
            end_points_collection = vs.original_name_scope + '_end_points'
            
            # Collect outputs/endpoints for conv2d, fully_connected and max_pool2d.
            with slim.arg_scope([slim.conv2d, slim.fully_connected, slim.max_pool2d],
                                outputs_collections=end_points_collection):
                # add model to graph
                outputs = slim_model_BN(inputs)
                
                # get the layer endpoints
                end_points = slim.utils.convert_collection_to_dict(end_points_collection)

## Inspect Parameters and Layers

In [10]:
with g.as_default():
    print("Parameters:")
    inspect_variables(slim.get_variables(scope="TF-Slim"))
    print("Layers:")
    inspect_layers(end_points)

Parameters:
name = TF-Slim/conv1/conv1_1/weights:0                         shape = (3, 3, 3, 64)
name = TF-Slim/conv1/conv1_1/BatchNorm/beta:0                  shape = (64,)
name = TF-Slim/conv1/conv1_1/BatchNorm/moving_mean:0           shape = (64,)
name = TF-Slim/conv1/conv1_1/BatchNorm/moving_variance:0       shape = (64,)
name = TF-Slim/conv1/conv1_2/weights:0                         shape = (3, 3, 64, 64)
name = TF-Slim/conv1/conv1_2/BatchNorm/beta:0                  shape = (64,)
name = TF-Slim/conv1/conv1_2/BatchNorm/moving_mean:0           shape = (64,)
name = TF-Slim/conv1/conv1_2/BatchNorm/moving_variance:0       shape = (64,)
name = TF-Slim/conv2/conv2_1/weights:0                         shape = (3, 3, 64, 128)
name = TF-Slim/conv2/conv2_1/BatchNorm/beta:0                  shape = (128,)
name = TF-Slim/conv2/conv2_1/BatchNorm/moving_mean:0           shape = (128,)
name = TF-Slim/conv2/conv2_1/BatchNorm/moving_variance:0       shape = (128,)
name = TF-Slim/conv2/conv2_2/weigh

## Next Lesson
### Multi-layer Feedforward Neural Network in TensorFlow-Slim
-  Define feedforward network for classification task on synthetic dataset.

<img src="../images/divider.png" width="100">