# Working with Tensoflow

**Objective**: Build a deep learning model that can learn the alphabet.

**Agenda**
1. Defining the problem
2. An overview of the end model
3. Short theoretical recap/overview
4. Implementation
   - Encoding the data
   - Building the model
   - Training
5. Using `Tensorboard` to overview model graph and `loss`/`accuracy` evolution
   - Structuring the graph with name scopes
   - Adding model summaries
   - Bonus: debugging data with `text summary`

## Defining the probelm

Although the objective is to **train a deep learning model to learn the alphabet**, this doesn't shed any light on how we can tackle the problem.

Since deep learning models have proven to be very good at classification tasks we need to 'reshape' the problem as a classification problem.

The new 'shape' of the problem looks like this:

> Create a deep learning model that when given a sequence of consecutive letters will output, for each letter of the alphabet, the probability of it being the next letter in the sequence.


**Example**
Given `KLM` as input, the model would output something like this:
```
   Letter  Probability
  ---------------------
   A        0.07560498
   B        0.01971263
   C        0.01407314
   D        0.00286496
   E        0.01043301
   F        0.01803329
   G        0.03739211
   H        0.00691894
   I        0.0135167
   J        0.08230913
   K        0.02166412
   L        0.01301833
   M        0.00820917
   N        0.31165153
   O        0.01616119
   P        0.05539528
   Q        0.00634293
   R        0.01654692
   S        0.06636301
   T        0.00361082
   U        0.02698876
   V        0.00770648
   W        0.07386798
   X        0.05238049
   Y        0.01679033
   Z        0.0224438
```

## An overview of the end model

Before digging into the code it's good to pause and ponder upon the model architecture.

Since the model is quite simple our model will have at its core just two deep learning components:
1. A **Long Short - Term Memory** (`LSTM`) cell followed by
2. A **fully connected** (`dense`) cell

Although the above fully define the model we'll be building they don't define the full computational graph. To have a full (and *functional*) computational graph we'll also need the following components:
- A **placeholder** node which will feed the input data to the model
- Another **placeholder** that will receive the labeled data in training
- During training we will need a node to measure the **loss** or *how far away is the predicted output to the expected output*
- Also for training we'll want to measure the **accuracy** of the model and subsequently we'll need a node for that in the graph

An overview of the graph is in the image below.

![Model architecture](./img/model.png)

**Remarks**:
1. The image contains an additional node `predictions` which practically is the output of the `dense` layer and therefore part of it. However, it's better to keep it as a separate node in order to have a better view of how the data flows through the graph.
2. The *real* computational graph contains a lot more nodes but those nodes pertain more to the inner workings of Tensorflow than to our objective so we won't concentrate on them.

## Short theoretical recap/overview

In [2]:
import numpy as np
import tensorflow as tf

[LSTM]:Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.