# Working with Tensoflow

**Objective**: Build a deep learning model that can learn the alphabet.

**Agenda**
1. Defining the problem
2. An overview of the end model
3. Short theoretical recap/overview
4. Model implementation
   - Encoding the data
   - Generating the training data
   - Building the model
   - Training
5. Using `Tensorboard` to overview model graph and `loss`/`accuracy` evolution
   - Structuring the graph with name scopes
   - Adding model summaries
   - Bonus: debugging data with `text summary`

## Defining the probelm

Although the objective is to **train a deep learning model to learn the alphabet**, this doesn't shed any light on how we can tackle the problem.

Since deep learning models have proven to be very good at classification tasks we need to 'reshape' the problem as a classification problem.

The new 'shape' of the problem looks like this:

> Create a deep learning model that when given a sequence of consecutive letters will output, for each letter of the alphabet, the probability of it being the next letter in the sequence.


**Example**
Given `KLM` as input, the model would output something like this:
```
   Letter  Probability
  ---------------------
   A        0.07560498
   B        0.01971263
   C        0.01407314
   D        0.00286496
   E        0.01043301
   F        0.01803329
   G        0.03739211
   H        0.00691894
   I        0.0135167
   J        0.08230913
   K        0.02166412
   L        0.01301833
   M        0.00820917
   N        0.31165153
   O        0.01616119
   P        0.05539528
   Q        0.00634293
   R        0.01654692
   S        0.06636301
   T        0.00361082
   U        0.02698876
   V        0.00770648
   W        0.07386798
   X        0.05238049
   Y        0.01679033
   Z        0.0224438
```

## An overview of the end model

Before digging into the code it's good to pause and ponder upon the model architecture.

Since the model is quite simple our model will have at its core just two deep learning components:
1. A **Long Short - Term Memory** (`LSTM`) cell followed by
2. A **fully connected** (`dense`) cell

Although the above fully define the model we'll be building they don't define the full computational graph. To have a full (and *functional*) computational graph we'll also need the following components:
- A **placeholder** node which will feed the input data to the model
- Another **placeholder** that will receive the labeled data in training
- During training we will need a node to measure the **loss** or *how far away is the predicted output to the expected output*
- Also for training we'll want to measure the **accuracy** of the model and subsequently we'll need a node for that in the graph

An overview of the graph is in the image below.

![Model architecture](./img/model.png)

**Remarks**:
1. The image contains an additional node `predictions` which practically is the output of the `dense` layer and therefore part of it. However, it's better to keep it as a separate node in order to have a better view of how the data flows through the graph.
2. The *real* computational graph contains a lot more nodes but those nodes pertain more to the inner workings of Tensorflow than to our objective so we won't concentrate on them.

## Short theoretical recap/overview

Theoretical foundations of deep learning are outside of this workshop but we still need some notion of theory in order to build our model. So, let's have a quick dive into it.

**Fully connected a. k. a. dense layer**
![Dense layer](./img/dense.png)

- A fully connected layer is a neural network in which all input nodes are connected to the output/hidden nodes
- Each hidden/output node calculates its value as the sum of weighted values of input nodes $h_i=\sum{x_i\cdot{w_{i,j}}}$
- This operation is nothing else but a **matrix multiplication** of the *weights matrix* $\pmb{W}$ with the *column vector* $\pmb{x}$
- The *actual* output of the network is the value from the output nodes passed through the *activation function* $y_i=\sigma{(h_i)}$

**Long Short-Term Memory networks**
![LSTM network](./img/lstm-chain.png)

- Are a special type of Recurrent Neural Networks which are capable of learning long-term dependencies
- It does so by passing input through several gates (formulas are here just for fun):

  $$
    f_t=\sigma(W_f\cdot[h_{t-1},x_t]+b_f)
  $$

  $$
    i_t=\sigma(W_i\cdot[h_{t-1},x_t]+b_i)
  $$

  $$
    \widetilde{C}=tanh(W_C\cdot[h_{t-1},x_t]+b_C)
  $$

- Afterwards the cell state is updated:

  $$
    C_t=f_t*C_{t-1}+i_t*\widetilde{C}_t
  $$

- Then the outputs are calculated:
  $$
    o_t=\sigma(W_o\cdot[h_{t-1},x_t]+b_0)
  $$

  $$
    h_t=o_t*tanh(C_t)
  $$

- Although theoretically a Recurrent Neural Network can process sequences of arbitrary length, in practice the network is unrolled into a (parameterized) number of concatenated cells

## Model implementation

Let's start with importing the required modules: `numpy` for numeric manipulation and obviously `tensoflow` for the model.

In [10]:
import numpy as np
import tensorflow as tf

LSTM cells are defined in `tensorflow.contrib.rnn` so let's import that:

In [11]:
from tensorflow.contrib import rnn

We'll also need some utility functions:

In [12]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical

Let's set the random seed in order to obtain reproducible results:

In [13]:
np.random.seed(2018)

And let's define a string to hold the alphabet letters

In [14]:
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

### Encoding the data

Since we'll be doing a bunch of matrix multiplications we cannot send to our model a sequence of letters; thus we need to encode the sequence into a numerical form.

To do so, we'll take the easiest approach and just assign each letter a zero-based index.

We'll create a class that holds the mappings between each letter and it's index and the mappings between each index and its associated letter.

Afterwards we'll create two methods:
- `encode` will receive a sequence or a single letter and will output the sequence of corresponding indices or the corresponding index respecively
- `decode` will receive the predicted index and will return the corresponding letter

In [15]:
class Encoding():
    def __init__(self):
        self._char_to_int = dict((c, i) for i, c in enumerate(alphabet))
        self._int_to_char = dict((i, c) for i, c in enumerate(alphabet))

    def encode(self, sequence):
        if len(sequence)==1:
            return self._char_to_int[sequence[0]]
        return [self._char_to_int[char] for char in sequence]

    def decode(self, value):
        return self._int_to_char[value]


Let's do a small test:

In [16]:
e = Encoding()
print('DEF->{}'.format(e.encode('DEF')))
print('A->{}'.format(e.encode('A')))
print('24->{}'.format(e.decode(24)))

DEF->[3, 4, 5]
A->0
24->Y


### Generating the training data

Before building a model we need to have a data set on which to train the model.

Again, since the model is quite simple we can just generate random sequences of consecutive letters as the input data and grab the next letter in the alphabet as the label.

To do so, we'll define a new class called `Dataset`:

In [17]:
class Dataset():
    def __init__(self, size=1000, seq_length=5, print_data=True):
        self._size = size
        self._sequence_length = seq_length
        self._inputs = []
        self._labels = []
        self._print_data = print_data
        self._encoding = Encoding()

    @property
    def inputs(self):
        return self._inputs

    @property
    def labels(self):
        return self._labels

    @property
    def num_classes(self):
        return len(alphabet)

    @property
    def sequence_length(self):
        return self._sequence_length

    def initialize(self):
        self._generate_random_data()
        self._reshape_inputs()
        self._normalize_inputs()
        self._normalize_labels()

    def shuffle(self):
        perm = np.arange(self._size)
        np.random.shuffle(perm)
        self._inputs = self._inputs[perm]
        self._labels = self._labels[perm]


    def _normalize_labels(self):
        self._labels = to_categorical(self._labels, num_classes=self.num_classes)
        self._labels = np.reshape(self._labels, (self._size, 1, self.num_classes))

    def _reshape_inputs(self):
        self._inputs = pad_sequences(self._inputs,
                                     maxlen=self._sequence_length,
                                     dtype='float32')
        self._inputs = np.reshape(self._inputs, (self._size, self._sequence_length, 1))

    def _normalize_inputs(self):
        self._inputs = self._inputs / float(self.num_classes)

    def _generate_random_data(self):
        for i in range(self._size):
            start = np.random.randint(self.num_classes - 2)
            end = np.random.randint(start, min(start + self._sequence_length, self.num_classes - 1))
            input_seq = alphabet[start:end + 1]
            output_seq = alphabet[end + 1]

            if(self._print_data):
                print("{}->{}".format(input_seq, output_seq))

            sample = self._encoding.encode(input_seq)
            label = self._encoding.encode(output_seq)

            self._inputs.append(sample)
            self._labels.append(label)

In God we trust; the rest we test. Let's see if our `Dataset` class behaves as expected.

In [18]:
dataset = Dataset(size=5)
dataset.initialize()

GHI->J
JKLMN->O
UV->W
XY->Z
PQRS->T


## References

* Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
* [Tensorflow API](https://www.tensorflow.org/api_docs/python/tf)