# Training a Neural Network Model with FFBP
This notebook provides instructions on how to create, train, test, and analyze a feedforward neural network model using pdpyflow's FFBP package and Tensorflow. If you don't have any experience with Tensorflow, try the [getting started tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/getting_started/getting_started.ipynb) or visit the [website](https://www.tensorflow.org/get_started/) to get more information. The FFBP package is intended to simplify the process of constructing a [Tensorflow Graph](https://www.tensorflow.org/programmers_guide/graphs) for neural network modeling. A Tensorflow graph is a computational structure that *describes* the flow of data (tensors) through various computational operations. Thus, the processes of constructing a graph and running it are separate, as is the analysis of data generated by running a model.

Therefore, training or testing, and analyzing a neural network involves the following steps:
- Preparing input data ([tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/prepare_input.ipynb)) [&#x21F1;](#step1)
- Constructing a model ([tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/build_model.ipynb)) [&#x21F1;](#step2)
- Running the model [&#x21F1;](#step3)
- Accessing and analyzing data ([tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/analyzing_data.ipynb)) [&#x21F1;](#step4)

This tutorial gives an overview of these steps, but more information on some of them is provided in the corresponding links. See those if you feel like you need more understanding of what is laid out below. To illustrate how a neural network model can be created, trained, tested, and analyzed, we use as an example the model of [semantic cognition](https://stanford.edu/~jlmcc/papers/RogersMcC08BBSFinalProof.pdf) by Rogers and McClelland. The model is a simple feedforward neural network illustrated in the figure below. 

<img src="materials/semantic_network.png" width=60%>

As you can see in this figure, the network consists of two input layers (one called item and the other called relation), two hidden layers, (one called representation and the other called just hidden) and one output layer (called attribute). Where there are arrows between layers, they represent a full matrix of connections from every sending unit in the pool on the left to every receiving unit in the pool on the right.

We begin by importing the packages required for this tutorial and creating a Tensorflow graph. We want to make sure that we are adding various network and data [enqueueing](https://www.tensorflow.org/versions/r0.12/api_docs/python/io_ops/queues) elements to this graph. To achieve, this we will use the graph's handle with Python's **`with`** statements, thereby creating a [context manager](https://docs.python.org/3/reference/datamodel.html#context-managers) which allows elaborating the graph.

In [None]:
import tensorflow as tf
import FFBP

FFBP_GRAPH = tf.Graph()

<a id='step1'> </a>
## Preparing input data
A detailed documentation of the `FFBP.InputData` class is provided in a separate [tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/prepare_input.ipynb). Here, we create two instances of the `FFBP.InputData` class, one for training (`TRAIN_DATA`) and one for testing (`TEST_DATA`). The `shuffle_seed` argument is set to `1` (for reproducibility) only for the training data set; we don't want to shuffle the test items (so we omit it for `TEST_DATA`). Note that the `inp_size` argument is a tuple of two integers corresponding to the sizes of item and relation layers, respectively. Although we are using the same data set for training and testing, it is possible to use different ones.

In this case, the training and testing data both consist of the same set of 32 training cases (so `data_len = 32`), each consisting of a one-hot item input vector and a one-hot relation input vector (for example specifying that the item is "canary" and the relation is "can", as shown in the figure) and a vector of associated outputs (in this case the set of things a canary can do).

In [None]:
with FFBP_GRAPH.as_default():
    
    # Create data for training
    TRAIN_DATA = FFBP.InputData(
        path_to_data_file = 'materials/semantic_data.txt',
        batch_size        = 1,
        inp_size          = (8,4), # same as the sizes of item and relation layers
        targ_size         = 36,    # same as the size of the attribute layer
        data_len          = 32,    # number of training examples
        shuffle_seed      = -1     # -N for random seed, N for non-random seed, None to turn off randomization
    )
    
    # Create data for testing
    TEST_DATA = FFBP.InputData(
        path_to_data_file = 'materials/semantic_data.txt',
        batch_size        = 1,
        inp_size          = (8,4), 
        targ_size         = 36,
        data_len          = 32
    )

<a id='step2'> </a>
## Constructing a model
A more thorough documentation of the objects specified below, including `FFBP.BasicLayer` and `FFBP.Model` is provided in a separate [tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/build_model.ipynb). Below is an example of how to outline the network structure and specify the flow of data for the semantic network. We specify arbitrary random initialization seeds (`2`, `3`, and `4`, respectively) for reproducibility. Note that `hidden_layer` takes the concatenation of tensors computed by `representation_layer.output` and `relation_inp` as input, so the `layer_input` argument to this layer is given as a tuple containing the two.

In [None]:
wr = .1    # weight initialization range will be set between -.1 and .1
s = -1  # (pseudo)randomization seed for weight init will be random

# Add network components to the graph
with FFBP_GRAPH.as_default():
    
    MODEL_NAME = 'semantic_net'
    with tf.name_scope(MODEL_NAME):
        
        # Input placeholders (input layers)
        ITEM_INP     = tf.placeholder(dtype = tf.float32, shape=[None, 8], name='item_inp')
        RELATION_INP = tf.placeholder(dtype = tf.float32, shape=[None, 4], name='relation_inp')
        
        # First hidden layer (representation)
        REPRESENTATION_LAYER = FFBP.BasicLayer(
            layer_name  = 'representation_layer', 
            layer_input = ITEM_INP, 
            size        = 8, 
            wrange      = [-wr, wr], 
            nonlin      = tf.nn.sigmoid, 
            seed        = s
        )
        
        # Second hidden layer
        HIDDEN_LAYER = FFBP.BasicLayer(
            layer_name  = 'hidden_layer', 
            layer_input = (REPRESENTATION_LAYER.output, RELATION_INP),
            size        = 15, 
            wrange      = [-wr, wr], 
            nonlin      = tf.nn.sigmoid, 
            seed        = s
        )
        
        # Output layer
        ATTRIBUTE_LAYER = FFBP.BasicLayer(
            layer_name  = 'attribute_layer', 
            layer_input = HIDDEN_LAYER.output, 
            size        = 36, 
            wrange      = [-wr, wr], 
            nonlin      = tf.nn.sigmoid, 
            seed        = s
        )
        
        # Target placeholder
        TARGET = tf.placeholder(dtype = tf.float32, shape=[None, 36], name='targets')
        
        # Optimization specs
        OPTIMIZER = tf.train.GradientDescentOptimizer(learning_rate=.1, name='SGD_optimizer')
        LOSS = tf.reduce_sum(tf.squared_difference(TARGET, ATTRIBUTE_LAYER.output), name='loss')
        
        # Model
        MODEL = FFBP.Model(
            name       = MODEL_NAME,
            layers     = [REPRESENTATION_LAYER, HIDDEN_LAYER, ATTRIBUTE_LAYER],
            train_data = TRAIN_DATA, 
            test_data  = TEST_DATA,
            inp        = [ITEM_INP, RELATION_INP],
            targ       = TARGET,
            loss       = LOSS,
            optimizer  = OPTIMIZER,
        )

<a id='step3'> </a>
## Running the model
The model is run inside two for-loops (one nested inside the other): the (inner) train loop and the (outer) run loop. In a single iteration of the run loop, model parameters are initialized either randomly or by restoration from an existing checkpoint directory. A single iteration of the inner train loop corresponds to a single epoch of training/testing. Thus, minimally, a valid run routine would look something like this:
```python
# start outer run loop
for run_ind in range(NUM_RUNS):
    
    # open new session for existing graph
    with tf.Session(graph=FFBP_GRAPH) as sess:
        
        # initialize variables and start queue
        sess.run(tf.global_variables_initializer())
        sess.run(tf.local_variables_initializer())
        coordinator = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coordinator)
        
        # start inner epoch loop
        for i in range(NUM_EPOCHS):
            test_loss, snapshot = MODEL.test_epoch(session=sess, verbose=True)
            train_loss = MODEL.train_epoch(session=sess, verbose=False)
        
        # stop queues
        coordinator.request_stop()
        coordinator.join(threads)

```
Here, for each of the `NUM_RUNS` different runs of the model, we start a new Tensorflow session. Within each session, we randomly initilize all global and local variables (this reinitializes model parameters and local queue variables), start the queues, and run the train loop. Inside the train loop we test and train the model for a `NUM_EPOCHS` number of times.

In order to store test data, as well as to [save/restore](https://www.tensorflow.org/programmers_guide/saved_model) the model parameters we need to make a few additions. For this the `FFBP.ModelSaver` class is available. `FFBP.ModelSaver`'s constructor takes up to two arguments:
- **`restore_from`** : (*default*=`None`) a string path to a restoration directory which contains model checkpoint files. The path can be either absolute (e.g. `/Users/username/path/to/file`) or relative to the working directory of the current notebook.
- **`logdir`** : (*default*=`None`) a string path to the log directory where model parameters and test data will be saved. If `None` a new directory (e.g. `/logdirs/logdir_000`) is automatically created in the same location as the current notebook. Depending on what one decides to save, this directory will contain a checkpoints subdirectory from which model parameters could be restored and `runlog.pkl` file(s) containing test data from a single model run. Detailed instructions on how to access data from runlogs can be found [here](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/analyze_data.ipynb).

The saver includes two methods relating to saving and/or restoring model parameters, and two methods for storing model-generated data:
- **`init_model`**`(`*`session, init_epoch=0`*`)` : initializes model parameters. If the saver was given a restore path upon construction, the parameters will be restored from the existing checkpoint directory. Otherwise, the parameters will be initialized anew, based on the value of `wrange` for each individual layer. The `session` argument is required and should be the current session of the run. The user has control over indexing the initial epoch via the `init_epoch` parameter. The value of `init_epoch` is returned if the model is initialized from scratch, and a restored value is returned if the model is restored from a checkpoint.
- **`save_model`**`(`*`session, model`*`)` : saves a given *`model`* to the saver's `logdir` and returns the string path of the log directory.
- **`log_test`**`(`*`snap, run_ind`*`)` : logs a given *`snapshot`*[&#x21F1;](#step4) by appending it to the corresponding runlog indexed by *`run_ind`* and returns the string path of the log directory.
- **`log_loss`**`(`*`loss, enum, run_ind`*`)` : logs a given *`loss`* and `enum` by appending them to the corresponding runlog indexed by *`run_ind`* and returns the string path of the log directory.

Below we provide the run routine that executes one run of training, with occasional pre-training tests. The training will proceed for the given number of epochs or until the train loss goes below the value of `ECRIT`. Regardless of when the training ends, the model will be tested one final time and saved (if `SAVE_FINAL=True`). Model parameters and test data will be stored at a new log directory, producing a new file indexed by `run_ind` (e.g. `checkpoints_directory_0` and `runlog_0.pkl`).

All of the run parameters are brought to the top of the cell for presentation. We specify `TEST_EPOCHS` outside the run loop as a list of integer values. If the user wants to checkpoint the model occasionally, `SAVE_EPOCHS` should be a similar list specifying when the model parameters are to be checkpointed. Although the code below does not checkpoint the model at any point, the corresponding run parameters (`SAVE_EPOCHS=[None]` and `SAVE_FINAL=False`) are provided for illustration. Running the following cell will print out the test losses, indexed by the corresponding epoch.

In [None]:
%%bash
rm -rf logdirs

In [None]:
# Set up run parameters
NUM_RUNS    = 1
NUM_EPOCHS  = 3000
TEST_EPOCHS = [i for i in range(0,NUM_EPOCHS,500)]
SAVE_EPOCHS = [None]
SAVE_FINAL  = False
ECRIT       = 0.05

# Create a model saverhttp://localhost:8888/notebooks/tutorials/building_models/main_tutorial.ipynb#
saver = FFBP.ModelSaver(restore_from=None, logdir=None)

# Start run loop
for run_ind in range(NUM_RUNS):
    print('>>> RUN {}'.format(run_ind))
    
    with tf.Session(graph=FFBP_GRAPH) as sess:

        # restore or initialize FFBP_GRAPH variables:
        start_epoch = saver.init_model(session=sess)

        # create coordinator and start queue runners
        coordinator = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coordinator)
        
        # Start train loop
        for i in FFBP.prog_bar(
            sequence=range(start_epoch, start_epoch + NUM_EPOCHS), 
            name='Run {}/{}, Epoch'.format(run_ind,NUM_RUNS),
            every=1):
            
            # Test model occasionally
            if any([i==test_epoch for test_epoch in TEST_EPOCHS]):
                test_loss, snapshot = MODEL.test_epoch(session=sess, verbose=True)
                saver.log_test(snapshot, run_ind)

            # Run one training epoch
            train_loss, enum = MODEL.train_epoch(session=sess, verbose=False)
            saver.log_loss(train_loss, enum, run_ind)

            # Save model occasionally
            if any([i==save_epoch for save_epoch in SAVE_EPOCHS]):
                saver.save_model(session=sess, model=MODEL, run_ind=run_ind)

            # Do final test, stop queues, and break out from training loop
            if train_loss < ECRIT or i == start_epoch + (NUM_EPOCHS - 1): 
                print('Final test ({})'.format(
                    'loss < ecrit' if train_loss < ECRIT else 'reached last epoch'))

                test_loss, snapshot = MODEL.test_epoch(session=sess, verbose=True)
                saver.log_test(snapshot, run_ind)

                coordinator.request_stop()
                coordinator.join(threads)

                if SAVE_FINAL: saver.save_model(session=sess, model=MODEL, run_ind=run_ind)
                break

<a id='step4'> </a>
## Accessing and analyzing the results

In the previous section, an instance of `FFBP.ModelSaver()` was used to store two different kinds of data: (1) loss values accumulated over a each training epoch and (2) network snapshots produced by testing the model occasionally. Both of these data structures were stored into a single runlog file, which can be accessed independently of running the model. FFBP provides a couple of functions for viewing these data (see related [tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/analyze_data.ipynb)). 

Here we will explain the structure of the runlog file and provide an example of how data from several such files can be accessed programatically for analysis and visualization. We will accomplish the latter by looking into one of the FFBP's visualization functions, `FFBP.view_progress()`.

Let us first understand what runlogs are. Information from different runs is captured in separate runlog files. The saver creates a new runlog whenever it encounters a new run index in one of its saving methods. If the file for the current run exists, the saver will append new information to it. The manner in which information is added depends on which method was used. 

Since there are only two methods for logging data, a runlog is a Python dictionary (dict) object that can have up to two keys: `'loss_data'` and `'test_data'`. These string keys point to different structures that can be understood separately.

### Loss data
The data structure `runlog['loss_data']` is itself a dict that stores two lists of numeric values, indexed, respectively, by keys `'vals'` and `'enums'`. The former list contains the loss value(s) accumulated over one epoch of training, and the latter contains the corresponding epoch number(s). Note that the epoch number is logged *after* being incremented due to the preceding training epoch. It helps to think of the epoch number as the number of times the network has experienced (i.e. learned from) every example in the training set. These two lists are always the same length within a single run, but both can differ from similar lists across runs.

### Test data
The data structure `runlog['test_data']` is a bit more elaborate, but about just as simple. It is a list of individual network state "snapshots" organized by the order of appending. Each snapshot is a dict that contains (1) model-level data and (2) layer-level data for each layer in the network. The model-level data keys point to numpy arrays of various shapes. The and descriptions of these arrays are provided in **Table 1** below; `DATA_LEN` and `M_INP_SIZE` correspond to the number of patterns in the testing set and the sum of sizes of all input layers (placeholders) of the model:

<br><center><b>Table 1. Model-level data</b></center>

| Key      |     Type      | Shape                  | Description                                         |
|----------|:-------------:|------------------------|-----------------------------------------------------|
| 'enum'   |  numpy.int32  | ()                     | epoch number of test                                |
| 'loss'   | numpy.ndarray | (DATA_LEN,)            | a numpy array of per pattern loss values            |
| 'labels' | numpy.ndarray | (DATA_LEN,)            | a numpy array of pattern labels (encoded in binary) |
| 'input'  | numpy.ndarray | (DATA_LEN, M_INP_SIZE) | a numpy array of test input patterns                |
| 'target' | numpy.ndarray | (DATA_LEN, M_INP_SIZE) | a numpy array of target patterns                    |
<br>

Other keys found in a snapshot point to a deeper structure and index layer-level data. For example, test information collected for the hidden layer in the semantic netowork example could be obtained by `snapshot['HIDDEN_LAYER']`. The object returned is a dict of layer-level data collected from the hidden layer of the network. This sub-dict's keys are also strings, and their values are also numpy arrays of various shapes. See the summary in **Table 2**. `L_INP_SIZE` is the number of sending units feeding into the layer, and `LAYER_SIZE` is the number of receiving units in the layer:


<br><center><b>Table 2. Layer-level data</b></center>

| Key          | Shape                              | Description                                                                                                                                                                      |
|--------------|------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 'input_'     | (DATA_LEN, LAYER_SIZE)             | Array of inputs to the layer. `axis=0` (first dimension) is used to index test items, `axis=1` indexes layer's units.                                                            |
| 'weights'    | (LAYER_SIZE, L_INP_SIZE)           | Layer weight matrix. `axis=0` indexes receiving units, `axis=1` indexes sending units.                              |
| 'biases'     | (LAYER_SIZE, )                     | Layer biases. `axis=0` indexes receiving units.                                                                                                                                  |
| 'net_input'  | (DATA_LEN, LAYER_SIZE)             | Array of net input values across the layer's units. `axis=0` indexes test items, `axis=1` indexes layer units.                                                                   |
| 'output'     | (DATA_LEN, LAYER_SIZE)             | Array of activation (output) values across the layer's units. `axis=0` indexes test items, `axis=1` indexes layer units.                                                         |
| 'gweights'   | (DATA_LEN, L_INP_SIZE, LAYER_SIZE) | Array of the gradient matrices with respect to weights indexed by test items. `axis=0` indexes test item, `axis=1` indexes receiving units, and  `axis=2` indexes sending units. |
| 'gbiases'    | (DATA_LEN, LAYER_SIZE)             | Array of gradients with respect to biases indexed by test items. `axis=0` indexes test item, `axis=1` indexes layer's units                                                      |
| 'gnet_input' | (DATA_LEN, LAYER_SIZE)             | Array of gradients with respect to net input values indexed by test items. `axis=0` indexes test item, `axis=1` indexes layer's units                                            |
| 'goutput'    | (DATA_LEN, LAYER_SIZE)             | Array of gradients with respect to activation values indexed by test items. `axis=0` indexes test item, `axis=1` indexes layer's units                                           |
| 'sgweights'  | (LAYER_SIZE, L_INP_SIZE)           | The (reduced) sum of 'gweights' along `axis=0`.                                                                                                                                  |
| 'sgbiases'   | (LAYER_SIZE, )                     | The (reduced) sum of 'gbiases' along `axis=0`.                                                                                                                                   |
<br>

Having the runlog structure under our belt we can do something useful with it. First, let us do a very basic visualization of our example network's learning over time. For this we will need to load the stored runlog file, access the appropriate values and use a plotting function to view the graph. To load the runlog, we will use the [`pickle`](https://docs.python.org/3/library/pickle.html) library and to plot the data we will require [`pyplot`](https://matplotlib.org/users/pyplot_tutorial.html), so let's import these tools first.


In [None]:
import pickle
import matplotlib.pyplot as plt

Now, let's define a couple of functions for loading and plotting the data:

In [None]:
def load_runlog(runlog_path):
    ''' returns an unpickled runlog stored at runlog path '''
    
    with open(runlog_path, 'rb') as snap_file:
        test_data = pickle.load(snap_file)
    return test_data

def plot_loss(runlog):
    ''' plots the learning curve for data stored in runlog'''
    
    # get training loss and epochs data
    data, enums = runlog['loss_data']['vals'], runlog['loss_data']['enums']
    
    # plot training data
    plt.plot(enums, data)
    plt.title('Plot of loss over time')
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.grid()
    plt.show()
    

Using these functions we can view the training loss over time:

In [None]:
path = saver.logdir + '/' + 'runlog_0.pkl'
runlog = load_runlog(path)

plot_loss(runlog)

Let us now inspect the `'test_data'` part of the runlog. Below, for each `items` pair of the second (`[1]`) snapshot of the `test_data` log, we print out the key and its value's shape. However, if the value is a `dict` (i.e. if there's a deeper layer-level structure), we start a similar inner loop to print out the contents of this `dict`.

In [None]:
test_data = runlog['test_data']

# test_data is a list of individual snapshots
print('test_data is and instance of {} of length {}\n'.format(type(test_data), len(test_data)))

# We can itirate over an arbitrary snapshot, summarizing each of its values:

# loop through key, value pairs of the second entry in test_data
for k, v in test_data[1].items():
    
    # if the value is a dict instance, start an inner loop that goes through that dict's items
    if isinstance(v, dict):
        
        # print the name of the layer
        print(k+': {')
        
        # for each key and value, print the key and the shape of the value
        for sub_k, sub_v in v.items():
            text = '\t {:10} : {}'.format(sub_k, sub_v.shape)
            print(text)
        print('}')
            
    # if test_data value is not a dict, just print its key and shape
    else:
        print('{:10} : {}'.format(k, v.shape))