# Working with TensorFlow

**Objective**: Build a deep learning model that can learn the alphabet.

**Agenda**
1. Defining the problem
2. An overview of the end model
3. Short theoretical recap/overview
4. Model implementation
   - Encoding the data
   - Generating the training data
   - Building the computational graph
   - Training
5. Using `TensorBoard` to overview model graph and `loss`/`accuracy` evolution
   - Structuring the graph with name scopes
   - Adding model summaries
   - Bonus: debugging data with `text summary`

## Defining the probelm

Although the objective is to **train a deep learning model to learn the alphabet**, this doesn't shed any light on how we can tackle the problem.

Since deep learning models have proven to be very good at classification tasks we need to 'reshape' the problem as a classification problem.

The new 'shape' of the problem looks like this:

> Create a deep learning model that when given a sequence of consecutive letters will output, for each letter of the alphabet, the probability of it being the next letter in the sequence.


**Example**
Given `KLM` as input, the model would output something like this:
```
   Letter  Probability
  ---------------------
   A        0.07560498
   B        0.01971263
   C        0.01407314
   D        0.00286496
   E        0.01043301
   F        0.01803329
   G        0.03739211
   H        0.00691894
   I        0.0135167
   J        0.08230913
   K        0.02166412
   L        0.01301833
   M        0.00820917
   N        0.31165153
   O        0.01616119
   P        0.05539528
   Q        0.00634293
   R        0.01654692
   S        0.06636301
   T        0.00361082
   U        0.02698876
   V        0.00770648
   W        0.07386798
   X        0.05238049
   Y        0.01679033
   Z        0.0224438
```

## An overview of the end model

Before digging into the code it's good to pause and ponder upon the model architecture.

Since the model is quite simple our model will have at its core just two deep learning components:
1. A **Long Short - Term Memory** (`LSTM`) cell followed by
2. A **fully connected** (`dense`) cell

Although the above fully define the model we'll be building they don't define the full computational graph. To have a full (and *functional*) computational graph we'll also need the following components:
- A **placeholder** node which will feed the input data to the model
- Another **placeholder** that will receive the labeled data in training
- During training we will need a node to measure the **loss** or *how far away is the predicted output to the expected output*
- Also for training we'll want to measure the **accuracy** of the model and subsequently we'll need a node for that in the graph

An overview of the graph is in the image below.

![Model architecture](./img/model.png)

**Remarks**:
1. The image contains an additional node `predictions` which practically is the output of the `dense` layer and therefore part of it. However, it's better to keep it as a separate node in order to have a better view of how the data flows through the graph.
2. The *real* computational graph contains a lot more nodes but those nodes pertain more to the inner workings of TensorFlow than to our objective so we won't concentrate on them.

## Short theoretical recap/overview

Theoretical foundations of deep learning are outside of this workshop but we still need some notion of theory in order to build our model. So, let's have a quick dive into it.

**Fully connected a. k. a. dense layer**
![Dense layer](./img/dense.png)

- A fully connected layer is a neural network in which all input nodes are connected to the output/hidden nodes
- Each hidden/output node calculates its value as the sum of weighted values of input nodes $h_i=\sum{x_i\cdot{w_{i,j}}}$
- This operation is nothing else but a **matrix multiplication** of the *weights matrix* $\pmb{W}$ with the *column vector* $\pmb{x}$
- The *actual* output of the network is the value from the output nodes passed through the *activation function* $y_i=\sigma{(h_i)}$

**Long Short-Term Memory networks**
![LSTM network](./img/lstm-chain.png)

- Are a special type of Recurrent Neural Networks which are capable of learning long-term dependencies
- It does so by passing input through several gates (formulas are here just for fun):

  $$
    f_t=\sigma(W_f\cdot[h_{t-1},x_t]+b_f)
  $$

  $$
    i_t=\sigma(W_i\cdot[h_{t-1},x_t]+b_i)
  $$

  $$
    \widetilde{C}=tanh(W_C\cdot[h_{t-1},x_t]+b_C)
  $$

- Afterwards the cell state is updated:

  $$
    C_t=f_t*C_{t-1}+i_t*\widetilde{C}_t
  $$

- Then the outputs are calculated:
  $$
    o_t=\sigma(W_o\cdot[h_{t-1},x_t]+b_0)
  $$

  $$
    h_t=o_t*tanh(C_t)
  $$

- Although theoretically a Recurrent Neural Network can process sequences of arbitrary length, in practice the network is unrolled into a (parameterized) number of concatenated cells

## Model implementation

Let's start with importing the required modules: `numpy` for numeric manipulation and obviously `tensorflow` for the model.

In [None]:
import numpy as np
import tensorflow as tf

LSTM cells are defined in `tensorflow.contrib.rnn` so let's import that:

In [None]:
from tensorflow.contrib import rnn

We'll also need some utility functions:

In [None]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical

Let's set the random seed in order to obtain reproducible results:

In [None]:
np.random.seed(2018)

And let's define a string to hold the alphabet letters

In [None]:
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

### Encoding the data

Since we'll be doing a bunch of matrix multiplications we cannot send to our model a sequence of letters; thus we need to encode the sequence into a numerical form.

To do so, we'll take the easiest approach and just assign each letter a zero-based index.

We'll create a class that holds the mappings between each letter and it's index and the mappings between each index and its associated letter.

Afterwards we'll create two methods:
- `encode` will receive a sequence or a single letter and will output the sequence of corresponding indices or the corresponding index respecively
- `decode` will receive the predicted index and will return the corresponding letter

In [None]:
class Encoding():
    def __init__(self):
        self._char_to_int = dict((c, i) for i, c in enumerate(alphabet))
        self._int_to_char = dict((i, c) for i, c in enumerate(alphabet))

    def encode_sequence(self, sequence):
        return [self._char_to_int[char] for char in sequence]

    def encode_letter(self, letter):
        return self._char_to_int[letter]

    def decode_letter(self, value):
        return self._int_to_char[value]


Let's do a small test:

In [None]:
e = Encoding()
print('DEF->{}'.format(e.encode_sequence('DEF')))
print('A->{}'.format(e.encode_letter('A')))
print('24->{}'.format(e.decode_letter(24)))

### Generating the training data

Before building a model we need to have a data set on which to train the model.

Again, since the model is quite simple we can just generate random sequences of consecutive letters as the input data and grab the next letter in the alphabet as the label.

To do so, we'll define a new class called `Dataset`:

In [None]:
class Dataset():
    def __init__(self, size=1000, seq_length=5, print_data=True):
        self._size = size
        self._sequence_length = seq_length
        self._inputs = []
        self._labels = []
        self._print_data = print_data
        self._encoding = Encoding()

    @property
    def inputs(self):
        return self._inputs

    @property
    def labels(self):
        return self._labels

    @property
    def num_classes(self):
        return len(alphabet)

    @property
    def sequence_length(self):
        return self._sequence_length

    def initialize(self):
        self._generate_random_data()
        self._reshape_inputs()
        self._normalize_inputs()
        self._normalize_labels()

    def shuffle(self):
        perm = np.arange(self._size)
        np.random.shuffle(perm)
        self._inputs = self._inputs[perm]
        self._labels = self._labels[perm]


    def _normalize_labels(self):
        self._labels = to_categorical(self._labels, num_classes=self.num_classes)
        self._labels = np.reshape(self._labels, (self._size, 1, self.num_classes))

    def _reshape_inputs(self):
        self._inputs = pad_sequences(self._inputs,
                                     maxlen=self._sequence_length,
                                     dtype='float32')
        self._inputs = np.reshape(self._inputs, (self._size, self._sequence_length, 1))

    def _normalize_inputs(self):
        self._inputs = self._inputs / float(self.num_classes)

    def _generate_random_data(self):
        for i in range(self._size):
            start = np.random.randint(self.num_classes - 2)
            end = np.random.randint(start, min(start + self._sequence_length, self.num_classes - 1))
            input_seq = alphabet[start:end + 1]
            output_seq = alphabet[end + 1]

            if(self._print_data):
                print("{}->{}".format(input_seq, output_seq))

            sample = self._encoding.encode_sequence(input_seq)
            label = self._encoding.encode_letter(output_seq)

            self._inputs.append(sample)
            self._labels.append(label)

In God we trust; the rest we test. Let's see if our `Dataset` class behaves as expected.

In [None]:
dataset = Dataset(size=5)
dataset.initialize()

### Building the computational graph

To build the computational graph of the model we just have to link toghether the operations on tensors.

Each graph has one or more entry points into which the data is fed. In our case we need to define two of these points: one for the input data and another one for the labels of the training samples.

To define our entry points we'll declare two [placeholders](https://www.tensorflow.org/api_docs/python/tf/placeholder) upon which we'll subsequently build our graph:

In [None]:
x = tf.placeholder('float32', shape=(dataset.sequence_length, 1), name='inputs')
y = tf.placeholder('float32', shape=(1, dataset.num_classes), name="labels")

Before creating the LSTM node we need to make sure it will receive input of proper shape. The proper shape in this case is a list of scalars. We can reshape the input tensor with [`tf.split`](https://www.tensorflow.org/api_docs/python/tf/split) operation.

In [None]:
x1 = tf.split(x, dataset.sequence_length)

Next, we'll build the LSTM node. As stated before, LSTM networks are quite heavy on computational resources thus we cannot have infinite sequence length. We'll just link together 5 units to accomodate the maximum length of the sequences from the dataset.

We need to specify the number of units of the LSTM cell; normally this would be a hyperparameter but for now we can leave it at 32.

In [None]:
inner_cells = [rnn.BasicLSTMCell(num_units=32) for _ in range(dataset.sequence_length)]
rnn_cell = rnn.MultiRNNCell(inner_cells)
outputs, states = rnn.static_rnn(rnn_cell, x1, dtype=tf.float32)

Now that we have the outputs of the LSTM layer (we're not interested in intermediary states) we need to take the last output and reshape it to a tensor of shape `[1, 32]`.

To do so, we employ [`tf.nn.embedding_lookup`](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) to get the last output and [`tf.reshape`](https://www.tensorflow.org/api_docs/python/tf/reshape) to reshape.

In [None]:
output = tf.nn.embedding_lookup(outputs, dataset.sequence_length - 1)
output = tf.reshape(output, [1, 32])

Not really necessary in our case but for demonstrative purposes let's implement the famous equation
$$
\hat{\pmb{y}}=\pmb{W}\pmb{x}+\pmb{b}
$$

To do so we need to declare the square matrix $\pmb{W}$ and bias vector $\pmb{b}$ as two variables initialized with random values from a normal probability distribution.

In [None]:
W = tf.Variable(tf.random_normal([32, 32]), name="W")
b = tf.Variable(tf.random_normal([1, 32]), name="b")

Now, the output becomes:

In [None]:
output = tf.matmul(output, W) + b

Now we can link the output of the LSTM layer to a dense layer created with [`tf.layers.dense`](https://www.tensorflow.org/api_docs/python/tf/layers/Dense):

In [None]:
dense = tf.layers.dense(inputs=output, units=dataset.num_classes)

To get the class probabilities from the output of the `dense` layer we need to apply [`tf.nn.softmax`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax) to it and to get the pedicted letter we apply [`tf.argmax`](https://www.tensorflow.org/api_docs/python/tf/argmax) which returns the index of the letter.

In [None]:
probs = tf.nn.softmax(dense)
pred = tf.argmax(probs, 1)

This is how the data will flow through the graph. But to get meaningfull and correct results from the graph we need to add the nodes which will be responsible for minimizing the loss and measuring accuracy during training.

To minimize the loss, we'll define cost node using [`tf.reduce_mean`](https://www.tensorflow.org/api_docs/python/tf/reduce_mean) and [`tf.nn.softmax_cross_entropy_with_logits_v2`](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2). Afterwards we'll create an `optimizer` node ([`tf.train.AdamOptimizer`](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) in our case) to minimize the cost.

In [None]:
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=dense, labels=y))
optimizer = tf.train.AdamOptimizer().minimize(cost)

To measure the accuracy we create new nodes using [`tf.metrics.accuracy`](https://www.tensorflow.org/api_docs/python/tf/metrics/accuracy) to which we pass the predictions and the expected index from the label.

In [None]:
accuracy, accuracy_update = tf.metrics.accuracy(labels=tf.argmax(y, 1), predictions=pred, name='acc')

The model is ready to be trained now.

### Training the model

Before training the model, let's generate some proper amount of training data.

In [None]:
dataset = Dataset(size=1000, print_data=False)
dataset.initialize()

To train the model we must initialize global and local variables first (`accuracy_update` is a local variable).

In [None]:
init = tf.global_variables_initializer()
init_locals = tf.local_variables_initializer()

Then we'll proceed according to the drill - we will initialize a session, run the variable initializers and train the model for 500 epochs (another hyperparameter).

At each epoch we will shuffle the dataset and will feed the whole dataset to the optimizer. Afterwards we'll measure the accuracy and loss for the whole dataset and print the last values.

In [None]:
session = tf.Session()
session.run(init)
session.run(init_locals)
epoch = 0
while epoch < 500:
    print("Training epoch {:<4d}".format(epoch), end='\t')
    dataset.shuffle()
    # Train the model
    for instance, label in zip(dataset.inputs, dataset.labels):
        session.run(optimizer, feed_dict={x: instance,
                                          y: label})
        # Calculate accuracy and loss
    for instance, label in zip(dataset.inputs, dataset.labels):
        acc, update_op, loss = session.run([accuracy, accuracy_update, cost],
                                           feed_dict={x: instance,
                                                      y: label})
    print("Accuracy: {:.8f} \tLoss: {:.8f}".format(acc, loss))
    epoch = epoch + 1


Now, let's put our model to the test.

In [None]:
seq = input('Enter a sequence of max 5 consecutive letters:')
seq = seq.upper()
print("You entered {}".format(seq))

Once we have the input sequence, we need to encode, reshape and normalize it.

In [None]:
seq = e.encode_sequence(seq)
seq = pad_sequences([seq], dataset.sequence_length)
seq = np.reshape(seq, (dataset.sequence_length, 1))
seq = seq / float(dataset.num_classes)

Then, send it to the trained model.

In [None]:
result = session.run(pred, feed_dict={x: seq})
letter = result[0]
print("The next letter is: {}".format(e.decode_letter(letter)))

Finally, close the session.

In [None]:
session.close()

## References

* Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
* [TensorFlow API](https://www.tensorflow.org/api_docs/python/tf)