# Training a Neural Network Model with FFBP
This notebook provides instructions on how to train a feedforward neural network using pdpyflow's FFBP package and Tensorflow. The FFBP package is intended to simplify the process of constructing a [Tensorflow Graph](https://www.tensorflow.org/programmers_guide/graphs) for neural network modeling. Tensorflow graph is a computational structure that *describes* the flow of data (tensors) through various computational operations. Thus, the processes of constructing a graph and running it are separate. 

In order to train or test a neural network we need to follow three steps:
- Prepare input data ([tutorial notebook](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/prepare_input.ipynb)) [&#x21F1;](#step1)
- Construct model ([tutorial notebook](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/build_model.ipynb)) [&#x21F1;](#step2)
- Run model [&#x21F1;](#step3)

We begin by importing the packages required for this tutorial and creating a [Tensorflow Graph](https://www.tensorflow.org/programmers_guide/graphs). We want to make sure that we are adding network and data [queue](https://www.tensorflow.org/versions/r0.12/api_docs/python/io_ops/queues) elements to this graph.

In [None]:
import tensorflow as tf
import FFBP
tf.logging.set_verbosity(tf.logging.ERROR) # Prevent unwanted logging messages by tensorflow

FFBP_GRAPH = tf.Graph()

<a id='step1'> </a>
## Prepare Input Data
Next, we need to create `FFBP.InputData` for training and testing. Refer to the corresponding [notebook tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/prepare_input.ipynb) to learn more on how to create an `InputData` object. In the cell below we create two `InputData` objects, `TRAIN_DATA` and `TEST_DATA` that will be used for training and testing, respectively. Note, by convention we capitalize the names of variables that are referenced across notebook cells (e.g. `FFBP_GRAPH`, `NUM_EPOCHS`).

In [None]:
NUM_EPOCHS = 1000

with FFBP_GRAPH.as_default():
    
    # Create data for training
    TRAIN_DATA = FFBP.InputData(
        path_to_data_file = 'materials/auto_data_train.txt',
        num_epochs = NUM_EPOCHS,
        batch_size = 4,
        data_len = 8,
        inp_size = 8, 
        targ_size = 8,
        shuffle_seed = None
    )
    # Create data for testing
    TEST_DATA = FFBP.InputData(
        path_to_data_file = 'materials/auto_data_test.txt',
        num_epochs = NUM_EPOCHS,
        batch_size = 1,
        inp_size = 8, 
        targ_size = 8,
        data_len = 15
    )

<a id='step2'> </a>
## Construct Model
Further, let's outline the network structure and specify the flow of data through it. A more detailed description of how this is done is provided in a separate [tutorial](https://github.com/alex-ten/pdpyflow/blob/master/tutorials/building_models/build_model.ipynb). Here we set the same weight initialization range for the hidden and output layers by defining and referencing a single weight range (`wr`) variable.

In [None]:
wr = 0.25
lr = .4
m = 0

# Add network components to the graph
with FFBP_GRAPH.as_default():
    model_name = 'autoencoder'
    with tf.name_scope(model_name):
        
        # Create input and target placeholder
        input_  = tf.placeholder(dtype = tf.float32, shape=[None, 8], name='model_inp')
        target = tf.placeholder(dtype = tf.float32, shape=[None, 8], name='targets')
        
        # Create first hidden layer
        hidden_layer = FFBP.BasicLayer(
            layer_name = 'hidden_layer', 
            layer_input = input_, 
            size = 3, 
            wrange = [-wr, wr], 
            nonlin = tf.nn.sigmoid, 
            bias = True, 
            seed = None
        )
        
        # Create another first-level hidden layer
        output_layer = FFBP.BasicLayer(
            layer_name = 'output_layer', 
            layer_input = hidden_layer.output, 
            size = 8, 
            wrange = [-wr, wr], 
            nonlin = tf.nn.sigmoid, 
            bias = True, 
            seed = None
        )
        
        MODEL = FFBP.Model(
            name = model_name,
            layers = [hidden_layer, output_layer],
            train_data = TRAIN_DATA, 
            inp        = input_,
            targ       = target,
            loss       = tf.squared_difference(target, output_layer.output, name='loss_function'),
            optimizer  = tf.train.MomentumOptimizer(lr, m),
            test_data  = TEST_DATA
        )

<a id='step3'> </a>
## Run Model
The model is run inside two for-loops (one nested inside the other): the (inner) train loop and the (outer) run loop. In a single iteration of the run loop, model parameters are initialized either randomly or by restoration from an existing checkpoint directory. A single iteration of the inner train loop corresponds to a single epoch of training/testing. Thus, minimally, a valid run routine would look something like this (in pseudocode):
```python
# start outer run loop
for run_ind in range(NUM_RUNS):
    
    # open new session for existing graph
    with tf.Session(graph=FFBP_GRAPH) as sess:
        
        # initialize variables and start queue
        sess.run(tf.global_variables_initializer())
        sess.run(tf.local_variables_initializer())
        coordinator = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coordinator)
        
        # start inner epoch loop
        for i in range(NUM_EPOCHS):
            testloss, snap = MODEL.test_epoch(session=sess, verbose=True)
            loss = MODEL.train_epoch(session=sess, verbose=False)
        
        # stop queues
        coordinator.request_stop()
        coordinator.join(threads)

```
Here, we start a new Tensorflow session for each run iteration and iterate over the `range(NUM_RUNS)`. Within each session we initilize global and local variables (this reinitializes model parameters and local queue variables),  start the queues, and run the train loop. Inside the train loop we test and train the model for a `NUM_EPOCHS` number of times.

In order to store test data, as well as to [save/restore](https://www.tensorflow.org/programmers_guide/saved_model) it we need to make a few additions. For this the `FFBP.ModelSaver` class is available. The constructor takes up to two arguments:
- **`restore_from`** : (*default*=`None`) a string path to a restoration directory which contains model checkpoint files. The path can be either absolute (e.g. `/Users/username/path/to/file`) or relative to the working directory of the current notebook.
- **`logdir`** : (*default*=`None`) a string path to the log directory where model parameters and test data will be saved. If `None` a new directory (e.g. `/logdirs/logdir_000`) is automatically created in the same location as the current notebook. Depending on what one decides to save, this directory will contain a checkpoints subdirectory from which model parameters could be restored and `runlog.pkl` file(s) containing test data from a single model run. Some instructions on how to access data from runlogs can be found [here](link).

The saver includes several methods for saving and/or restoring model parameters as well as for storing test data:
- **`init_model`**`(`*`session, init_epoch=0`*`)` : initializes model parameters. If the saver contains a restore path, the parameters will be restored from the existing checkpoint directory. Otherwise, the parameters will be initialized anew based on the value of `wrange` for each individual layer. The `session` argument is required and should be the current session of the run. The user has control over indexing the initial epoch through the `init_epoch` parameter. The value of `init_epoch` is returned if the model is initialized from scratch, and a restored value is returned if the model is restored from a checkpoint.
- **`save_model`**`(`*`session, model`*`)` : saves a given *`model`* to the saver's `logdir`.
- **`save_test`**`(`*`snap, run_ind`*`)` : saves a given *`snapshot`* to the corresponding runlog indexed by *`run_ind`*.
- **`save_loss`**`(`*`loss, run_ind`*`)` : saves a given *`loss`* to the corresponding runlog indexed by *`run_ind`*.

Below provide the run routine that executes `3` runs of training, intermitted by occasional tests. The training will run for the given number of epochs or until the train loss goes below the `ECRIT`. Regardless of when the training ends, the model will be tested one final time and saved. Model parameters and test data go to the corresponding log directory, each run producing new files indexed by `run_ind` (e.g. for `run_ind=0`, the model is saved to `checkpoints_directory_0`, and tast data is written to `runlog_0.pkl`).

We define `TEST_EPOCHS` beforehand as a list of integer values: `[i for i in range(0,NUM_EPOCHS,100)]` (if you are not familiar with Python's list comprehensions, consult this [page](http://www.pythonforbeginners.com/basics/list-comprehensions-in-python) or search for 'python list comprehensions'). Thus, the model will be tested on every hundredth epoch. `SAVE_EPOCHS` is a similar list with a single element corresponding to the last epoch. Try out the code below, tweaking parameters as you please. Take note of new files that get added to this notebooks working directory in the process.

In [None]:
# Set up run parameters
NUM_RUNS = 3
TEST_EPOCHS = [i for i in range(0,NUM_EPOCHS,100)]
SAVE_EPOCHS = [NUM_EPOCHS-1]
ECRIT = 0.01

# Create ModelSaver
saver = FFBP.ModelSaver(restore_from=None, logdir=None)

for run_ind in range(NUM_RUNS):
    print('>>> RUN {}'.format(run_ind))
    
    with tf.Session(graph=FFBP_GRAPH) as sess:

        # restore or initialize FFBP_GRAPH variables:
        start_epoch = saver.init_model(session=sess)

        # create coordinator and start queue runners
        coordinator = tf.train.Coordinator()
        threads = tf.train.start_queue_runners(coord=coordinator)

        for i in range(start_epoch, start_epoch + NUM_EPOCHS):
            # Test model occasionally
            if any([i==test_epoch for test_epoch in TEST_EPOCHS]):
                testloss, snap = MODEL.test_epoch(session=sess, verbose=True)
                saver.save_test(snap, run_ind)

            # Run one training epoch
            trainloss = MODEL.train_epoch(session=sess, verbose=False)
            saver.save_loss(trainloss, run_ind)

            # Save model occasionally
            if any([i==save_epoch for save_epoch in SAVE_EPOCHS]):
                saver.save_model(session=sess, model=MODEL, run_ind=run_ind)

            # Do final test, stop queues, and break out from training loop
            if trainloss < ECRIT or i == start_epoch + (NUM_EPOCHS - 1): 
                print('Final test ({})'.format(
                    'loss < ecrit' if trainloss < ECRIT else 'num_epochs reached'))

                testloss, snap = MODEL.test_epoch(session=sess, verbose=True)
                saver.save_test(snap, run_ind)

                coordinator.request_stop()
                coordinator.join(threads)

                saver.save_model(session=sess, model=MODEL, run_ind=run_ind)
                break