This notebook is an introduction in how to use the Tensorflow framework.

We will cover the components needed for a simple feedfoward neural network.

Key concepts & Tensorflow paradigms covered:
- Placeholders
- Variable
- Initializers
- Sessions

In [17]:
import numpy as np
import tensorflow as tf

## Tensorflow Graph Construction

In [18]:
#  we use a set of tuples to define the network structure
input_shape = (4,)  #  the length of the input numpy array
input_nodes = (6,)  #  the number of nodes in the input layer
hidden_nodes = (8,)  
output_shape = (4,)  #  num. nodes in output layer and the length of the output numpy array

The first Tensorflow paradigm is the **Placeholder**.

Tensorflow uses placeholders to feed data into the network.

The placeholder is fed using numpy arrays.

The first dimension is the number of samples in the batch - we use None to be able to input any batch size we want.

The second dimension is the shape of one sample - i.e. the length of the input numpy array.

In [3]:
network_input = tf.placeholder(tf.float32, shape=(None, *input_shape), name='network_input')

The next two Tensorflow paradigms are the **Variable** and **Initializer**.  

Variable objects are used to hold tensors of variable that tensorflow can change.  Both weights and biases are tf.Variables.

We also need to tell Tensorflow what initial values we want our varibles to be.  

To do the initialization we pass in the tf.random_normal initializer.  The shape of the Variable tensor is specified in the intializer.

The * operation is used to unpack the tuples.

In [4]:
input_weights = tf.Variable(tf.random_normal(shape=(*input_shape, *input_nodes), name='input_weights'))

We use the same pattern for the biases.

In [5]:
input_bias = tf.Variable(tf.zeros(shape=(*input_nodes,)), name='input_bias')

Now we can form the input layer using matrix multiplication between the input & weights, then add the biases.

In [6]:
pre_activation = tf.add(tf.matmul(network_input, input_weights), input_bias)

Finally we can form the layer by squeezing the output through a rectified linear unit.

In [7]:
input_layer = tf.nn.relu(pre_activation, name='input_layer_output')

We create a hidden layer using the same logic.

In [8]:
h_w = tf.Variable(tf.random_normal((*input_nodes, *hidden_nodes)), name='hidden_weights')
h_b = tf.Variable(tf.zeros((*hidden_nodes, ), name='hidden_bias'))
hidden_layer = tf.nn.relu(tf.add(tf.matmul(input_layer, h_w), h_b))

The output layer has no activation function (aka a linear activation function).  This allows the network to output negative values.

In [9]:
o_w = tf.Variable(tf.random_normal((*hidden_nodes, *output_shape)), name='output_weights')
o_b = tf.Variable(tf.zeros((*output_shape, ), name='output_bias'))
output = tf.add(tf.matmul(hidden_layer, o_w), o_b)

All of the code above allows us to make predictions with our network - i.e. we can do forward passes across the network.

Below we will setup the code for training.

We first need another placeholder.  This will serve as the target value for the network (aka y_train) - what our network should be outputting.

In [10]:
target = tf.placeholder(tf.float32, shape=(None, *output_shape), name='target')

In [11]:
loss = tf.losses.mean_squared_error(target, output, scope='loss_function')

We need an optimizer to do the heavy lifting of training the network.  

Here we use the Adam optimizer.  Note that the input here of learning rate is one of the most important in training any neural network.

In [12]:
optimizer = tf.train.AdamOptimizer(learning_rate=0.001)

And finally a Tensorflow operation to actually do the training.

In [13]:
train_op = optimizer.minimize(loss)

Now all of the machinery to do forward passes and to back propagate error is in place.  

Add in a few Tensorflow summary operations to track what is going on in our network.

In [19]:
tf.summary.tensor_summary('input', network_input)
tf.summary.histogram('input_weights', input_weights)
tf.summary.histogram('output_bias', o_b)
tf.summary.scalar('loss', loss)
merged = tf.summary.merge_all()

## Running a Tensorflow Session

The first thing our model needs is data.  The function below generates training data for a simple function.

In [15]:
def generate_data(num_samples):
    """
    Generates training data for our network.

    args
        num_samples (int)

    returns
        net_in (np.array)  fed into the network (aka features or x)
        net_out (np.array)  aka target or y - the value we are trying to approx.
    """
    net_in = np.random.rand(num_samples, *input_shape)
    net_out = net_in + 10

    return net_in, net_out

#  run the function to get data to train with
inputs, targets = generate_data(100)

We now introduce another key Tensorflow paradigm - the Session.

You can think of a Session as one instance of the model.  

In [16]:
with tf.Session() as sess:
    
    #  first a necessary bit of Tensorflow boiler plate - initializing variables
    #  we do this operation using the Session
    sess.run(tf.global_variables_initializer())
    
    #  we create a FileWriter object to write out summarys to a file for Tensorboard to read
    writer = tf.summary.FileWriter('./logs', graph=sess.graph)
    
    #  now the training loop
    #  we are not splitting our training data into batches
    
    for train_step in range(100):
        #  now we run the tensorflow graph
        #  the graph is run by calling the .run method on the session
        #  the run method takes two inputs:
        #   fetches = the tf operations to run
        #   feed_dict = values for the placeholders

        #  here we fetch two operations
        #   train_op - the operation to train the network
        #   summary - the summary operations for the graph
        fetches = [loss, train_op, merged]

        #  the feed_dict is a dictionary with
        #   keys = the placeholders
        #   values = the numpy arrays
        #   note that we feed in multiple samples
        feed_dict = {network_input: inputs,
                     target: targets}
        #  finally we run the session using the fetches and feed_dict
        loss_value, _, summary = sess.run(fetches, feed_dict)
        #  the operation to add the summary to the tensorboard output file
        writer.add_summary(summary, train_step)
        print('step {} loss {}'.format(train_step, loss_value))        
        
    #  now training is done
    #  generate a test set 
    test_in, test_out = generate_data(5)

    #  here we get predictions from our network - we don't train
    pred = sess.run(output, {network_input: test_in})
    print(pred)
    print(test_out)

step 0 loss 109.13784790039062
step 1 loss 108.73883056640625
step 2 loss 108.33975219726562
step 3 loss 107.94058227539062
step 4 loss 107.54119873046875
step 5 loss 107.14191436767578
step 6 loss 106.74327850341797
step 7 loss 106.34396362304688
step 8 loss 105.94388580322266
step 9 loss 105.5434341430664
step 10 loss 105.14512634277344
step 11 loss 104.74711608886719
step 12 loss 104.3487319946289
step 13 loss 103.94924926757812
step 14 loss 103.54931640625
step 15 loss 103.14877319335938
step 16 loss 102.74767303466797
step 17 loss 102.34693145751953
step 18 loss 101.94611358642578
step 19 loss 101.54527282714844
step 20 loss 101.1436996459961
step 21 loss 100.74148559570312
step 22 loss 100.33831024169922
step 23 loss 99.93470001220703
step 24 loss 99.53063201904297
step 25 loss 99.12427520751953
step 26 loss 98.71720886230469
step 27 loss 98.30912017822266
step 28 loss 97.89996337890625
step 29 loss 97.49026489257812
step 30 loss 97.0799331665039
step 31 loss 96.66816711425781
st

To run view the data plotted to tensorboard

```
$ cd dsr_rl/practical/generic_lessons
$ tensorboard --logdir='./logs'
```

Now go open a browser and go to
`http://localhost:6006/`

And you should be able to see the tensorboard server