# Using Keras on top of TensorFlow for fast prototyping

**Objective**: Build a deep learning model that can learn the alphabet using Keras and compare with TensorFlow implementation

**Agenda**
1. Recap of the previous chapter
2. What is Keras and why should I care?
3. Implementing the same model with Keras
   - Building the computational graph
   - Setting up TensorBoard visualization
   - Training and evaluating the model
4. Conclusions on the difference between TensorFlow and Keras

## Recap of the previous chapter

In [Chapter 2](ch02-working-with-tensorflow.ipynb) of the workshop we built a deep learning model capable of predicting the next letter of the alphabet based on an input sequence of consecutive letters.

We did so by transforming the objective into a classification problem where each letter of the alphabet becomes an output class and the model needed to learn to predict what is the probability of each letter being the next letter in the alphabet given the input sequence.

Afterwards we created a model like in the image below using LSTM cells and a fully connected layer.

![Model architecture](./img/model.png)

In this chapter we will tackle the same task but this time we will be using a powerfull library which enables us to quickly build and validate models - **Keras**.

But first, a few words about Keras.

## What is Keras and why should I care?

### What is Keras?

As the [documentation](https://keras.io/) specifies,

> Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a **focus on enabling fast experimentation**. Being able to go from idea to result with the least possible delay is key to doing good research.

### Installation and configuration

To install Keras in your Python environment just run
```
pip install keras
```

After installation you can [configure Keras to run on a specific backend](https://keras.io/backend/#switching-from-one-backend-to-another) by changing (or creating) the configuration file located at
- `$HOME/.keras/keras.json` on **Linux** or
- `%USERPROFILE%/.keras/keras.json` on **Windows**.

The default configuration file looks like this:
```
{
	"image_data_format": "channels_last",
	"epsilon": 1e-07,
	"floatx": "float32",
	"backend": "tensorflow"
}
```

As TensorFlow is the default backend for Keras there's nothing for us to configure.

### The two flavors of Keras

Keras comes in two flavors:
- The `Sequential` model which is **a linear stack of layers**
- The `Functional API` which allows building **more complicated models** such as models with multiple inputs, non-linear graphs etc.

In our exercise we will be using `Sequential` model. The answer to why should I care? will hopefully be self evident after the implementation. Otherwise you'll find it within the conclusions.

## Implementing the same model with Keras

Let's build the same model as in previous chapter in Keras to see the difference between those two.

As usual, let's start with the imports:

In [None]:
import numpy as np
from keras.utils.np_utils import to_categorical
from keras.preprocessing.sequence import pad_sequences

As stated earlier we will be using the `Sequential` API and our model contains a `LSTM` layer and a `Dense` layer; let's import those:

In [3]:
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

We will want to see our model graph and statistics in `TensorBoard` but before setting that up we need some more imports:

In [13]:
from keras.callbacks import TensorBoard
import os.path as path
import datetime
import tempfile

Again, for reproducibility, let's set the random seed:

In [5]:
np.random.seed(2018)

We'll also need the utility constructs from the previous chapter so let's put them here. First the `alphabet` and the `Encoding` class:

In [6]:
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

In [7]:
class Encoding():
    def __init__(self):
        self._char_to_int = dict((c, i) for i, c in enumerate(alphabet))
        self._int_to_char = dict((i, c) for i, c in enumerate(alphabet))

    def encode_sequence(self, sequence):
        return [self._char_to_int[char] for char in sequence]

    def encode_letter(self, letter):
        return self._char_to_int[letter]

    def decode_letter(self, value):
        return self._int_to_char[value]


Then the `Dataset` but with a small change in the `_normalize_labels` method: we will no longer need to reshape our outputs to `(1000, 1, 26)`; the Keras model will accept the shape `(1000, 26)`.

In [20]:
class Dataset():
    def __init__(self, size=1000, seq_length=5, print_data=True):
        self._size = size
        self._sequence_length = seq_length
        self._inputs = []
        self._labels = []
        self._print_data = print_data
        self._encoding = Encoding()

    @property
    def inputs(self):
        return self._inputs

    @property
    def labels(self):
        return self._labels

    @property
    def num_classes(self):
        return len(alphabet)

    @property
    def sequence_length(self):
        return self._sequence_length

    def initialize(self):
        self._generate_random_data()
        self._reshape_inputs()
        self._normalize_inputs()
        self._normalize_labels()

    def shuffle(self):
        perm = np.arange(self._size)
        np.random.shuffle(perm)
        self._inputs = self._inputs[perm]
        self._labels = self._labels[perm]


    def _normalize_labels(self):
        self._labels = to_categorical(self._labels, num_classes=self.num_classes)

    def _reshape_inputs(self):
        self._inputs = pad_sequences(self._inputs,
                                     maxlen=self._sequence_length,
                                     dtype='float32')
        self._inputs = np.reshape(self._inputs, (self._size, self._sequence_length, 1))

    def _normalize_inputs(self):
        self._inputs = self._inputs / float(self.num_classes)

    def _generate_random_data(self):
        for i in range(self._size):
            start = np.random.randint(self.num_classes - 2)
            end = np.random.randint(start, min(start + self._sequence_length, self.num_classes - 1))
            input_seq = alphabet[start:end + 1]
            output_seq = alphabet[end + 1]

            if(self._print_data):
                print("{}->{}".format(input_seq, output_seq))

            sample = self._encoding.encode_sequence(input_seq)
            label = self._encoding.encode_letter(output_seq)

            self._inputs.append(sample)
            self._labels.append(label)

Now that the dependencies are imported let's create the required instances and initialize them:

In [21]:
e = Encoding()
dataset = Dataset(print_data=False)
dataset.initialize()

### Building the computational graph

The first thing we need to do is to declare that this is a `Sequential` model:

In [22]:
model = Sequential()

Now let's add the layers:
- A `LSTM` layer with 32 units and input shape `(5, 1)`
- A `Dense` layer with 26 units and `softmax` activation

In [23]:
model.add(LSTM(32, input_shape=(dataset.sequence_length, 1)))
model.add(Dense(dataset.num_classes, activation='softmax'))

And that is it. We now have built our model.

Let's see a summary of it using [`keras.utils.print_summary`](https://keras.io/utils/#print_summary) method:

In [None]:
print(model.summary())

### Setting up TensorBoard visualization

Although the summary above does provide some quick info about our model it can't compete with what TensorBoard can offer regarding model visualization so let's plug TensorBoard visualization into the model.

To do so, we need to provide a [`Callback`](https://keras.io/callbacks/) to the `fit` method of our model which will write the data to the summary file. Luckily for us, Keras already provides such a callback called [`TensorBoard`](https://keras.io/callbacks/#tensorboard). Let's set it up.

Good practices encourage us to write summary files in separate folders per run/training and start TensorBoard in the parent directory so we'll do that:

In [14]:
temp_dir = tempfile.gettempdir()
run_dir = datetime.datetime.now().strftime('%Y-%m-%d-%H%M')
log_dir = path.join(temp_dir, 'tensorflow-workshop', run_dir)

Now that we have set up the output directory for the run let's instantiate the callback:

In [15]:
tensorboardDisplay = TensorBoard(log_dir=log_dir,
                                 histogram_freq=0,
                                 write_graph=True,
                                 write_images=True,
                                 write_grads=True,
                                 batch_size=1)

### Training and evaluating the model

Although our model is defined we cannot train it yet because it lacks the optimizer and the loss. To prepare the model for training we need to call the [`compile`](https://keras.io/models/sequential/#compile) method and pass to it the optimizer, loss and metrics we want to track:

In [25]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Now, let's do the actual training and plug-in the TensorBoard callback to see how loss and accuracy evolve:

In [None]:
model.fit(dataset.inputs, dataset.labels, epochs=500, batch_size=1,
          callbacks=[tensorboardDisplay])

root_dir = path.join(temp_dir, 'tensorflow-workshop')
print("Training finished.\nStart TensorBoard in '{}' to visualize the model.".format(root_dir))

Let's put the model to test:

In [None]:
seq = input('Enter a sequence of max 5 consecutive letters:')
seq = seq.upper()
print("You entered {}".format(seq))
seq = e.encode_sequence(seq)
seq = pad_sequences([seq], dataset.sequence_length)
seq = np.reshape(seq, (1, dataset.sequence_length, 1))
seq = seq / float(dataset.num_classes)
prediction = model.predict(seq)
letter = np.argmax(prediction)
print("The next letter is: {}".format(e.decode_letter(letter)))

## Conclusions on the difference between TensorFlow and Keras

- Keras API is shorter and more expressive thus allowing fast prototyping and shorter development time for building and trying a model.

## References

* [Keras documentation](https://keras.io)