# Goal
This notebook is my live notes for figuring out how to use Tensorflows  Estimator API. 
I read the [official docs](https://www.tensorflow.org/guide/estimators) but they were a bit vauge for my taste, e.g. I couldn't read them and know what to do. 

For context, I have a toy problem, that has numbers as words in German and the actual words. I want to do two toy problems, one is classify whether a given word is even or odd. The other is convert from word to number using [NALU](https://arxiv.org/abs/1808.00508). It's 20:30 now and I have a baby that's switching between crying, feeding and sleeping. Let's see how far he lets us get. 

# Step one - "Literature Review"
The official docs point to a collection of [official models](https://github.com/tensorflow/models/tree/master/official) that are well maintained and serve as references for the high level APIs like Estimator. This is great. Even better, they have an implementation of the [Transformer model](https://github.com/tensorflow/models/tree/master/official/transformer) which is in the realm of NLP. That;s articulrly important because dataloading in NLP is a black art and getting it to play nice with a new API will be blacker than black so it's nice to have a reference to copy paste from
![funny](https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1457364208i/29437996._UY630_SR1200,630_.jpg)

## A Caveat - The Transformer implementation is too good for me
The implementation in the transormer model is a bit too good. It covers a lot of things I don't really need now like Bleu scores, running on TPUs and distributed training. Yes, these are highlights of the Estimator API but they are highlights I don't need now. So really the first thing I'll do is mark what I need to keep and delete

## What we know

So basically the Estimator API says "Give me a function that returns Data, and another function that returns a model, and I'll combine them, run them, calculate the metrics, save checkpoints, distribute it across nodes and make you coffee. 


So what we need to figure out is 
* How to write a model function
* What are the specifications for the data function
    * Can we use feed dicts or only TFRecords ? 
    * It looks like we need to seperate examples and labels for the API, where should we do that ? 
    * Can we preproccess the text in Python. 
    
  

## How to write a model function 
So the transformer model function is [here](https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L69)
Notably, it's signature is 
```python
def model_fn(features, labels, mode, params):

```
I guess features is inputs, labels is labels, I know that mode is one of [Train,Test,Predict] or something semantically equaivalent. 
If you look [here]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L81-L89) you'll see that the model returns something in the case that mode==PREDICT and then [here]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L125-L138) something else if mode==TRAIN.

Cool. so first thing we know is that it should do different things based on mode. This is actually super duper awesome because before this their wasn't a canonical way to make that seperation and every project reinvented the wheel. So while a bit complex, it's great. 

### What is the model_fn returning
So when we look at what the model function is returing we see it returns an [EstimatorSpec](https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec)  
```python
      return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

```
Just looking at how it's called here this thing actually makes sense. 
The docs say

>For mode == ModeKeys.TRAIN: required fields are loss and train_op.

>For mode == ModeKeys.EVAL: required field is loss.

>For mode == ModeKeys.PREDICT: required fields are predictions.


So that tells us what the minumum we need to pass in. I wonder why we need to pass the loss into the estimator during training, since I assume it's implied in the training op that is minimizing the loss. But whatever.

### Where did loss and train_op come from ?

So we saw in that little python snippet that we pass loss and train_op but where did they come from ?
So in the begining of the model_fn they do 
```python
    model = transformer.Transformer(params, mode == tf.estimator.ModeKeys.TRAIN)

    logits = model(inputs, targets)
```
Where transformer is an import from some other directory. 
Then, once they've checked they are not in prediction mode, they calculate the loss
```python
    xentropy, weights = metrics.padded_cross_entropy_loss(
        logits, targets, params["label_smoothing"], params["vocab_size"])
    loss = tf.reduce_sum(xentropy) / tf.reduce_sum(weights)
```
Then they check if they are in eval or train mode, and if they are in train mode they also set up the train op 
```python
      train_op, metric_dict = get_train_op_and_metrics(loss, params)
```
Where get_train_op_and_metrics is defined in the file. 

### Two patterns emerge
The first pattern to emerge is that they calculate only what is needed. Instead of saying calculate we can say they only set up the graph-ops that are needed. E.g. they don't make the train_op if they don't need it. I wonder if this is just good engineering or serves a more "practical purpose", e.g. to leave space on GPU or someething. 

Second pattern to emerge, which is **more important** is their use of imports and helper functions. Obviously this is a good pratice, and I mention it because i feel that when I copy and paste from X I'm not always confident about what "style" of programming I can use. This holds especially true in ML and frontend code where the software engineering chops vary wildly within the respective communities. But I digress


# Who calls the model_fn ?
So we've seen that the model_fn they define returns an esitmator spec when it gets data and a mode. Who calls it the model_fn and where does it get the data from ? 
Well [here]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L499-L502) is an example. Let's copy paste it!

```python
    return tf.estimator.Estimator(
        model_fn=model_fn, model_dir=flags_obj.model_dir, params=params,
        config=tf.estimator.RunConfig(train_distribute=distribution_strategy))
```

Cool. So it makes sense that the Estimator class will call the model function, which will in turn return an estimator spec based on the mode. The thing is, model_fn is called with features and labels, where did they come from ? It's not obvious just from looking at this function. I can only assume that it's somehow specified in params. I vaugely remembered something about an input_fn so I searched the code for it and found 
[this morsel]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L334-L337) 
```python
    estimator.train(
        dataset.train_input_fn,
        steps=schedule_manager.single_iteration_train_steps,
        hooks=train_hooks)
```
And so, I realize that the estimator instance returned from instantiaing the Estimator class with our model_fn has a method on it called train that accepts an input function. Shoutout to timeless [Execution in the Kingdom of nouns](https://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html).
It's actually very sensible, an instance of an estimator has a bunch of methods, like train, evaluate and predict, which happen to correspond to the things we'd like to do with a model. In any of these cases, we need to provide data to our model, which is done through an input_fn. We can do a bunch of extra fancy things which we'll get to later. 

## A question is answered! We now know the constraints on the data input
We can now go look at the docs for the estimators [train](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#train) method and see what we can pass around in the input_fn. To quote the docs

        input_fn: A function that provides input data for training as minibatches. See Premade Estimators for more information. The function should construct and return one of the following: 
        * A tf.data.Dataset object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below. 
        * A tuple (features, labels): Where features is a tf.Tensor or a dictionary of string feature name to Tensor and labels is a Tensor or a dictionary of string label name to Tensor. 
        Both features and labels are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.

When we set out on this adventure I asked 
* Can we use feed dicts or only TFRecords ? 
* It looks like we need to seperate examples and labels for the API, where should we do that ? 
* Can we preproccess the text in Python. 

Let's answer them

### Can we use feed dicts or only TFRecords ? 
Apperently no way to use feeddicts. But, we don't need to use TFRecords, we just need a function that reads data on the fly and returns tensors.
### It looks like we need to seperate examples and labels for the API, where should we do that ? 
Well, input function says to return a Dataset object that returns a tuple. It isn't very specific about what that tuple should be. 

I'm not being nitpicky, I like to pass a lot of tensors in dictionaries, for example, I want to pass my examples in a tensor and another tensor with their lengths and put both of those in a dict. Apperently, I can, so long as I return a tuple of two dicts.

I'll guestimate that the tensorflow folks were trying to avoid stuff like this when desiging the API. Probably the logic is that if you follow it to the letter you'll have super portable models that you can share and swap out parts. In my way, with the dicts, your model_fn now needs to parse dicts. Works for me. 

### Can we preproccess the text in Python ?
I think we can. We'll try to do that in a bit


# Setting up for Tensorboard
Let's face it, the best part of doing deep learning is watching the loss go down on Tensorboard. While the estimator API promises to let us do that for free, we haven't seen how. 

So actually, in the Transformer example they have this cool function that gets a loss and some params and returns both the train ops and the metrics we want in tensorflow. It's [here](https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L166-L194). Basically they have a dict called metric_dict and it has names of scalars and scalrs. Then they run a function, [record_scalars](https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L141) and that sets up the scalars for tensorboard. 

If you dig around, they do the same thing in slightly different ways, but really their is no magic here. You call tf.summary.X (or tf.contrib.summary.X)  and Estimator will take care of the rest. Amen

# Pre-Summary - Setting up some context
As I mentioned, I have a toy task! It consists of taking a word in German that represents a number and predicting if it is even or odd. Conveniently, I have a program that gives me dicts whose keys are numbers and values are  their German word equivalent. Check it out (Disclaimer, I wrote this program and my German is shameful so maybe its wrong) 

In [3]:
from utils.numtoWord import createNum2WordDict
createNum2WordDict(size=10,high=100)

{8: 'acht',
 11: 'elf',
 14: 'vierzehn',
 20: 'undzwanzig',
 54: 'vierundfünfzig',
 62: 'zweiundsechzig',
 63: 'dreiundsechzig',
 72: 'zweiundsiebzig',
 82: 'zweiundachtzig',
 91: 'einundneunzig'}

Cool! 
Now, let's see how we make that into something that maps Words to even numbers. (0 means odd, 1 means even)

In [5]:
d = createNum2WordDict(size=10,high=100)
d = {key: (val,(key+1)%2) for key,val in d.items()}
d

{1: ('eins', 0),
 3: ('drei', 0),
 44: ('vierundvierzig', 1),
 52: ('zweiundfünfzig', 1),
 59: ('neunundfünfzig', 0),
 69: ('neunundsechzig', 0),
 71: ('einundsiebzig', 0),
 79: ('neunundsiebzig', 0),
 90: ('undneunzig', 1),
 93: ('dreiundneunzig', 0)}

Cool, so now that I have my data here is what I want
1. A model that 
    * Is an LSTM
    * Reads the words charechter by charechter
    * Predicts if they are even or odd
2. An input function that 
    * calls my fancy function above and returns tensors in the proper format
3. An Estimarot that
    * Uses my model via a model_fn and my input_fn to train and evaluate 
    * To see progress and accuracy in Tensorboard
    * As a bonus, to do a one line deploy of this to Google-ml

# Summary
So now I can summarize what I've learnt in light of what I want to do, e.g. derive a recipe. 
Basically
1. Find your data
2. Write a function that returns a Dataset object which in itself returns a tuple
3. Define your model somewhere, as a function that returns logits / predictions
4. Write a model_fn, 
    * Takes as input
        * features and labels are the tuple from your input function
        * mode is one of the values of [ModeKeys](https://www.tensorflow.org/api_docs/python/tf/estimator/ModeKeys)
        * params are paramaters we haven't disucssed 
    * Returns an instance of an EstimatorSpec
        * That does what needs to be done based on the mode (e.g. trains, or just predicts) 
5. Instantiate an Estimator with the model_fn
6. Call estimator.train/eval/predict with the relevant input_fn

It is now 22:07, so it took me an hour and forty to figure this out. My child did not interfere much so this was more or less continuous. 

Armed with this new knoweledge, I'm going to walk the dogs and then actually do this

