Copyright (c) 2020 Graphcore Ltd. All rights reserved.

# TensorFlow 2: Tensor Inspection Techniques

In this tutorial you will train a selection of simple fully connected models
on the MNIST numeral data set and see how tensors (containing activations
and gradients) can be returned to the host via outfeeds for inspection.

An outfeed is the counterpart to an infeed and manages the transfer of data 
(like tensors, tuples or dictionaries of tensors) from the IPU to the host. 
To learn more about using outfeeds, see [outfeed queues](https://docs.graphcore.ai/projects/tensorflow-user-guide/en/latest/api.html#outfeed-queue).

Outfeeds can be useful for debugging, but can significantly increase the amount
of memory required on the IPU(s). When pipelining, you could use a smaller
value for the gradient accumulation count to mitigate this. Also consider using
a small number of [steps per execution](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L537)
to reduce memory footprint. 

In this demo, filters can be used to return only a subset of the activations
and gradients. The outfed information can be returned to a variable or can be
printed to the standard output. In [`outfeed_callback.py`](https://github.com/graphcore/tutorials/blob/master/feature_examples/tensorflow2/inspecting_tensors/outfeed_callback.py)
the implementation is to print this information to the standard output.

## How to use this demo

### File structure and local imports

* `mnist.py` The main Python script.
* `mnist_code_only.py` Autogenerated script without any comments.
* `mnist.ipynb` Autogenerated interactive Jupyter Notebook tutorial.
* `outfeed_callback.py` Contains a custom callback that dequeues an outfeed 
  queue at the end of every epoch.
* `outfeed_layers.py` Custom layers that (selectively) add the inputs 
  (for example, activations from the previous layer) to a dict that will be 
  enqueued on an outfeed queue.
* `outfeed_optimizer.py` Custom optimizer that outfeeds the gradients generated
  by a wrapped optimizer.
* `outfeed_wrapper.py` Contains the `MaybeOutfeedQueue` class, see below.
* `README.md` Markdown autogenerated file.
* `requirements.txt` Required packages for this tutorial
* `tests` Subdirectory containing test scripts.

### Custom classes descriptions

This tutorial uses the following classes, which are implemented in separate
[Python files](https://github.com/graphcore/tutorials/tree/master/feature_examples/tensorflow2/inspecting_tensors):

* `outfeed_wrapper.MaybeOutfeedQueue` - a wrapper for an IPUOutfeedQueue that 
  allows key-value pairs to be selectively added to a dictionary that can then 
  be enqueued.
* `outfeed_optimizer.OutfeedOptimizer` - a custom optimizer that enqueues 
  gradients using a `MaybeOutfeedQueue`, with the choice of whether to enqueue 
  the gradients after they are computed (the pre-accumulated gradients) or 
  before they are applied (the accumulated gradients).
* `outfeed_layers.Outfeed` - a Keras layer that puts the inputs into 
  a dictionary and enqueues it on an IPUOutfeedQueue.
* `outfeed_layers.MaybeOutfeed` - a Keras layer that uses a MaybeOutfeedQueue 
  to selectively put the inputs into a dict and optionally enqueues the dict. 
  At the moment, this layer cannot be used with non-pipelined Sequential models.
* `outfeed_callback.OutfeedCallback` - a Keras callback to dequeue an outfeed
  queue at the end of every epoch, printing some statistics about the tensors.

### Environment preparation

Install the Poplar SDK following the instructions in the [Getting Started](https://docs.graphcore.ai/en/latest/getting-started.html)
guide for your IPU system. Make sure to run the `enable.sh` scripts for Poplar 
and PopART and activate a Python3 virtualenv with PopTorch installed.
Then install the package requirements:
```bash
pip install -r requirements.txt
```

### Required imports
>**Note**
>The Graphcore TensorFlow 2 wheel is bundled with Graphcore Poplar SDK. Please
>ensure you install this wheel rather than the default public wheel, as it 
>contains IPU specific functionality in the `ipu` submodule.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.python import ipu

from outfeed_callback import OutfeedCallback
from outfeed_optimizer import OutfeedOptimizer, OutfeedOptimizerMode
import outfeed_layers
from outfeed_wrapper import MaybeOutfeedQueue

## General approach to code in this tutorial

You will notice that a lot of code has been extracted to functions. This is 
mainly because when running in a Jupyter notebook most of the code has to be 
executed in the same Python context manager (which is scoped per cell). To 
avoid giant Jupyter notebook cells, you will only find invocations of functions
later once the Tensorflow IPU context has been used.

## Dataset preparation

We need to load the dataset and perform some normalization of values. Below
you will find a helper function to use inside IPU context, which will load
the input data with labels.

In [None]:
def create_dataset():
    mnist = keras.datasets.mnist

    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    # Add a channels dimension.
    x_train = tf.expand_dims(x_train, -1)
    x_test = tf.expand_dims(x_test, -1)

    train_ds = tf.data.Dataset \
        .from_tensor_slices((x_train, y_train)) \
        .shuffle(len(x_train)) \
        .batch(32, drop_remainder=True)

    train_ds = train_ds.map(
        lambda d, l: (tf.cast(d, tf.float32), tf.cast(l, tf.float32))
    )
    return train_ds

## General description of the model

By default, the tutorial runs a three layer fully connected model, pipelined 
over two IPUs. Gradients for one of the layers, and activations for two of 
the layers, are returned for inspection on the host. This can be changed using 
options.

The gradient accumulation count (`gradient_accumulation_steps_per_replica`)
determines the pipeline depth, so the number of activations and gradients 
added to the outfeed queues will be proportional to the gradient accumulation 
value. Additionally, the outfeed callback is called at the end of the epoch, 
so the number of steps per epoch will also affect the amount of data 
in the queues.

In [None]:
def create_pipeline_sequential_model(multi_activations_outfeed_queue):
    seq_model = keras.Sequential([
        keras.layers.Flatten(),
        keras.layers.Dense(256, activation='relu', name="Dense_256"),
        keras.layers.Dense(128, activation='relu', name="Dense_128"),
        outfeed_layers.MaybeOutfeed(multi_activations_outfeed_queue,
                                    final_outfeed=False,
                                    name="Dense_128_acts"),
        keras.layers.Dense(10, activation='softmax', name="Dense_10"),
        outfeed_layers.MaybeOutfeed(multi_activations_outfeed_queue,
                                    final_outfeed=True,
                                    name="Dense_10_acts")
    ])
    seq_model.set_pipelining_options(gradient_accumulation_steps_per_replica=4)
    seq_model.set_pipeline_stage_assignment([0, 0, 1, 1, 1, 1])
    return seq_model

## Configuring the demo

Choose values for the following variables that hold parameters.
If you change them for experimentation in a Jupyter notebook, re-run all
the cells below including this one.

In [None]:
# [boolean] Should the code outfeed the pre-accumulated gradients, rather than
# accumulated gradients? Only makes a difference when using gradient
# accumulation, which is always the case when pipelining is enabled.
outfeed_pre_accumulated_gradients = False

# Number of steps to run per execution. The number of batches to run for
# each TensorFlow function call. At most it would execute a full epoch.
steps_per_execution = 500

# Number of steps per epoch. The total number of steps (batches of samples)
# for one epoch to finish and starting the next one. The default `None` is
# equal to the number of samples divided by the batch size.
steps_per_epoch = steps_per_execution

# Number of epochs
epochs = 3

# [List] String values representing which gradients to add to the dictionary
# that is enqueued on the outfeed queue. Pass `[none]` to disable filtering.
gradients_filters = ['Dense_128']

# [List] Activation filters - strings representing which activations in the
# second `PipelineStage` to add to the dictionary that is enqueued on the
# outfeed queue. Pass `[none]` to disable filtering. Applicable only for
# pipelined models.
activations_filters = ['none']

If the above `outfeed_pre_accumulated_gradients` is set to `True`, then
modify the outfeed optimizer mode. You can read more about this in [outfeed_optimizer](https://github.com/graphcore/tutorials/blob/master/feature_examples/tensorflow2/inspecting_tensors/outfeed_optimizer.py).

In [None]:
if outfeed_pre_accumulated_gradients:
    outfeed_optimizer_mode = OutfeedOptimizerMode.AFTER_COMPUTE
else:
    outfeed_optimizer_mode = OutfeedOptimizerMode.BEFORE_APPLY

Define a helper function to parse user input for filters:

In [None]:
def process_filters(filters_input):
    if len(filters_input) == 1 and filters_input[0].lower() == "none":
        return None
    return filters_input

Next we define a helper function to create the Keras model with callbacks.
Inside, multiple outfeed queues and callbacks are created based on the user
prepared lists of layers in variables `gradients_filters` and 
`activations_filters`.

In [None]:
def model_with_callbacks(gradients_filters, activations_filters):
    optimizer_q = MaybeOutfeedQueue(filters=process_filters(gradients_filters))
    act_q = MaybeOutfeedQueue(filters=process_filters(activations_filters))

    gradients_cb = OutfeedCallback(outfeed_queue=optimizer_q,
                                   name="Gradients callback")
    multi_layer_cb = OutfeedCallback(outfeed_queue=act_q,
                                     name="Multi-layer activations callback")

    callbacks = [gradients_cb, multi_layer_cb]
    seq_model = create_pipeline_sequential_model(act_q)
    return seq_model, callbacks, optimizer_q

Initialise IPU configuration - more details [here](https://docs.graphcore.ai/projects/tensorflow-user-guide/en/latest/api.html#tensorflow.python.ipu.config.IPUConfig).

In [None]:
cfg = ipu.config.IPUConfig()
cfg.auto_select_ipus = 2
cfg.configure_ipu_system()

## Training the model on an IPU

If you are using Keras, you must instantiate your Keras model inside of 
a strategy scope, which is a Python context manager.
More details about the `IPUStrategy` API can be found [here](https://docs.graphcore.ai/projects/tensorflow-user-guide/en/latest/targeting_tf2.html#ipustrategy).

In [None]:
strategy = ipu.ipu_strategy.IPUStrategy()

Use the `strategy.scope()` context to ensure that everything within that 
context will be compiled for the IPU device. You should do this instead of 
using the `tf.device` context that is used in TensorFlow1.

This tutorial uses queues for handling of outfeeds.

In [None]:
with strategy.scope():
    seq_model, callbacks, optimizer_outfeed_queue = \
        model_with_callbacks(gradients_filters, activations_filters)

    # Build the graph passing an OutfeedOptimizer to enqueue selected gradients
    seq_model.compile(
        loss=keras.losses.SparseCategoricalCrossentropy(),
        optimizer=OutfeedOptimizer(
            wrapped_optimizer=keras.optimizers.SGD(),
            outfeed_queue=optimizer_outfeed_queue,
            outfeed_optimizer_mode=outfeed_optimizer_mode,
            model=seq_model
        ),
        steps_per_execution=steps_per_execution
    )

    # Train the model passing the callbacks to see the gradients
    # and activations stats
    seq_model.fit(
        create_dataset(),
        callbacks=callbacks,
        steps_per_epoch=steps_per_epoch,
        epochs=epochs
    )

Example callback outfeed print would look a bit like this:
```
Gradients callback
key: Dense_128/bias:0_grad shape: (125, 128)
key: Dense_128/kernel:0_grad shape: (125, 256, 128)
Epoch 3 - Summary Stats
Index Name                         Mean         Std          Minimum      Maximum      NaNs    infs   
0     Dense_128/bias:0_grad        -0.000663    0.021037     -0.108830    0.111534     False   False  
1     Dense_128/kernel:0_grad      -0.000120    0.012575     -0.186476    0.183576     False   False  

Single layer activations callback
No data enqueued
```