# Goal
This notebook is my live notes for figuring out how to use Tensorflows  Estimator API. 
I read the [official docs](https://www.tensorflow.org/guide/estimators) but they were a bit vauge for my taste, e.g. I couldn't read them and know what to do. 

For context, I have a toy problem, that has numbers as words in German and the actual words. I want to do two toy problems, one is classify whether a given word is even or odd. The other is convert from word to number using [NALU](https://arxiv.org/abs/1808.00508). It's 20:30 now and I have a baby that's switching between crying, feeding and sleeping. Let's see how far he lets us get. 

# Step one - "Literature Review"
The official docs point to a collection of [official models](https://github.com/tensorflow/models/tree/master/official) that are well maintained and serve as references for the high level APIs like Estimator. This is great. Even better, they have an implementation of the [Transformer model](https://github.com/tensorflow/models/tree/master/official/transformer) which is in the realm of NLP. That;s articulrly important because dataloading in NLP is a black art and getting it to play nice with a new API will be blacker than black so it's nice to have a reference to copy paste from
![funny](https://i.gr-assets.com/images/S/compressed.photo.goodreads.com/books/1457364208i/29437996._UY630_SR1200,630_.jpg)

## A Caveat - The Transformer implementation is too good for me
The implementation in the transormer model is a bit too good. It covers a lot of things I don't really need now like Bleu scores, running on TPUs and distributed training. Yes, these are highlights of the Estimator API but they are highlights I don't need now. So really the first thing I'll do is mark what I need to keep and delete

## What we know

So basically the Estimator API says "Give me a function that returns Data, and another function that returns a model, and I'll combine them, run them, calculate the metrics, save checkpoints, distribute it across nodes and make you coffee. 


So what we need to figure out is 
* How to write a model function
* What are the specifications for the data function
    * Can we use feed dicts or only TFRecords ? 
    * It looks like we need to seperate examples and labels for the API, where should we do that ? 
    * Can we preproccess the text in Python. 
    
  

## How to write a model function 
So the transformer model function is [here](https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L69)
Notably, it's signature is 
```python
def model_fn(features, labels, mode, params):

```
I guess features is inputs, labels is labels, I know that mode is one of [Train,Test,Predict] or something semantically equaivalent. 
If you look [here]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L81-L89) you'll see that the model returns something in the case that mode==PREDICT and then [here]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L125-L138) something else if mode==TRAIN.

Cool. so first thing we know is that it should do different things based on mode. This is actually super duper awesome because before this their wasn't a canonical way to make that seperation and every project reinvented the wheel. So while a bit complex, it's great. 

### What is the model_fn returning
So when we look at what the model function is returing we see it returns an [EstimatorSpec](https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec)  
```python
      return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

```
Just looking at how it's called here this thing actually makes sense. 
The docs say

>For mode == ModeKeys.TRAIN: required fields are loss and train_op.

>For mode == ModeKeys.EVAL: required field is loss.

>For mode == ModeKeys.PREDICT: required fields are predictions.


So that tells us what the minumum we need to pass in. I wonder why we need to pass the loss into the estimator during training, since I assume it's implied in the training op that is minimizing the loss. But whatever.

### Where did loss and train_op come from ?

So we saw in that little python snippet that we pass loss and train_op but where did they come from ?
So in the begining of the model_fn they do 
```python
    model = transformer.Transformer(params, mode == tf.estimator.ModeKeys.TRAIN)

    logits = model(inputs, targets)
```
Where transformer is an import from some other directory. 
Then, once they've checked they are not in prediction mode, they calculate the loss
```python
    xentropy, weights = metrics.padded_cross_entropy_loss(
        logits, targets, params["label_smoothing"], params["vocab_size"])
    loss = tf.reduce_sum(xentropy) / tf.reduce_sum(weights)
```
Then they check if they are in eval or train mode, and if they are in train mode they also set up the train op 
```python
      train_op, metric_dict = get_train_op_and_metrics(loss, params)
```
Where get_train_op_and_metrics is defined in the file. 

### Two patterns emerge
The first pattern to emerge is that they calculate only what is needed. Instead of saying calculate we can say they only set up the graph-ops that are needed. E.g. they don't make the train_op if they don't need it. I wonder if this is just good engineering or serves a more "practical purpose", e.g. to leave space on GPU or someething. 

Second pattern to emerge, which is **more important** is their use of imports and helper functions. Obviously this is a good pratice, and I mention it because i feel that when I copy and paste from X I'm not always confident about what "style" of programming I can use. This holds especially true in ML and frontend code where the software engineering chops vary wildly within the respective communities. But I digress


# Who calls the model_fn ?
So we've seen that the model_fn they define returns an esitmator spec when it gets data and a mode. Who calls it the model_fn and where does it get the data from ? 
Well [here]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L499-L502) is an example. Let's copy paste it!

```python
    return tf.estimator.Estimator(
        model_fn=model_fn, model_dir=flags_obj.model_dir, params=params,
        config=tf.estimator.RunConfig(train_distribute=distribution_strategy))
```

Cool. So it makes sense that the Estimator class will call the model function, which will in turn return an estimator spec based on the mode. The thing is, model_fn is called with features and labels, where did they come from ? It's not obvious just from looking at this function. I can only assume that it's somehow specified in params. I vaugely remembered something about an input_fn so I searched the code for it and found 
[this morsel]((https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L334-L337) 
```python
    estimator.train(
        dataset.train_input_fn,
        steps=schedule_manager.single_iteration_train_steps,
        hooks=train_hooks)
```
And so, I realize that the estimator instance returned from instantiaing the Estimator class with our model_fn has a method on it called train that accepts an input function. Shoutout to timeless [Execution in the Kingdom of nouns](https://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html).
It's actually very sensible, an instance of an estimator has a bunch of methods, like train, evaluate and predict, which happen to correspond to the things we'd like to do with a model. In any of these cases, we need to provide data to our model, which is done through an input_fn. We can do a bunch of extra fancy things which we'll get to later. 

## A question is answered! We now know the constraints on the data input
We can now go look at the docs for the estimators [train](https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator#train) method and see what we can pass around in the input_fn. To quote the docs

        input_fn: A function that provides input data for training as minibatches. See Premade Estimators for more information. The function should construct and return one of the following: 
        * A tf.data.Dataset object: Outputs of Dataset object must be a tuple (features, labels) with same constraints as below. 
        * A tuple (features, labels): Where features is a tf.Tensor or a dictionary of string feature name to Tensor and labels is a Tensor or a dictionary of string label name to Tensor. 
        Both features and labels are consumed by model_fn. They should satisfy the expectation of model_fn from inputs.

When we set out on this adventure I asked 
* Can we use feed dicts or only TFRecords ? 
* It looks like we need to seperate examples and labels for the API, where should we do that ? 
* Can we preproccess the text in Python. 

Let's answer them

### Can we use feed dicts or only TFRecords ? 
Apperently no way to use feeddicts. But, we don't need to use TFRecords, we just need a function that reads data on the fly and returns tensors.
### It looks like we need to seperate examples and labels for the API, where should we do that ? 
Well, input function says to return a Dataset object that returns a tuple. It isn't very specific about what that tuple should be. 

I'm not being nitpicky, I like to pass a lot of tensors in dictionaries, for example, I want to pass my examples in a tensor and another tensor with their lengths and put both of those in a dict. Apperently, I can, so long as I return a tuple of two dicts.

I'll guestimate that the tensorflow folks were trying to avoid stuff like this when desiging the API. Probably the logic is that if you follow it to the letter you'll have super portable models that you can share and swap out parts. In my way, with the dicts, your model_fn now needs to parse dicts. Works for me. 

### Can we preproccess the text in Python ?
I think we can. We'll try to do that in a bit


# Setting up for Tensorboard
Let's face it, the best part of doing deep learning is watching the loss go down on Tensorboard. While the estimator API promises to let us do that for free, we haven't seen how. 

So actually, in the Transformer example they have this cool function that gets a loss and some params and returns both the train ops and the metrics we want in tensorflow. It's [here](https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L166-L194). Basically they have a dict called metric_dict and it has names of scalars and scalrs. Then they run a function, [record_scalars](https://github.com/tensorflow/models/blob/master/official/transformer/transformer_main.py#L141) and that sets up the scalars for tensorboard. 

If you dig around, they do the same thing in slightly different ways, but really their is no magic here. You call tf.summary.X (or tf.contrib.summary.X)  and Estimator will take care of the rest. Amen

# Pre-Summary - Setting up some context
As I mentioned, I have a toy task! It consists of taking a word in German that represents a number and predicting if it is even or odd. Conveniently, I have a program that gives me dicts whose keys are numbers and values are  their German word equivalent. Check it out (Disclaimer, I wrote this program and my German is shameful so maybe its wrong) 

In [1]:
from utils.numtoWord import createNum2WordDict
createNum2WordDict(size=10,high=100)

{15: 'fünfzehn',
 25: 'fünfundzwanzig',
 30: 'unddreiβig',
 38: 'achtunddreiβig',
 43: 'dreiundvierzig',
 48: 'achtundvierzig',
 56: 'sechsundfünfzig',
 57: 'siebenundfünfzig',
 60: 'undsechzig',
 62: 'zweiundsechzig'}

Cool! 
Now, let's see how we make that into something that maps Words to even numbers. (0 means odd, 1 means even)

In [2]:
d = createNum2WordDict(size=10,high=100)
d = {key: (val,(key+1)%2) for key,val in d.items()}
d

{0: ('', 1),
 12: ('zwölf', 1),
 20: ('undzwanzig', 1),
 23: ('dreiundzwanzig', 0),
 36: ('sechsunddreiβig', 1),
 51: ('einundfünfzig', 0),
 56: ('sechsundfünfzig', 1),
 71: ('einundsiebzig', 0),
 84: ('vierundachtzig', 1),
 97: ('siebenundneunzig', 0)}

Cool, so now that I have my data here is what I want
1. A model that 
    * Is an LSTM
    * Reads the words charechter by charechter
    * Predicts if they are even or odd
2. An input function that 
    * calls my fancy function above and returns tensors in the proper format
3. An Estimarot that
    * Uses my model via a model_fn and my input_fn to train and evaluate 
    * To see progress and accuracy in Tensorboard
    * As a bonus, to do a one line deploy of this to Google-ml

# Summary
So now I can summarize what I've learnt in light of what I want to do, e.g. derive a recipe. 
Basically
1. Find your data
2. Write a function that returns a Dataset object which in itself returns a tuple
3. Define your model somewhere, as a function that returns logits / predictions
4. Write a model_fn, 
    * Takes as input
        * features and labels are the tuple from your input function
        * mode is one of the values of [ModeKeys](https://www.tensorflow.org/api_docs/python/tf/estimator/ModeKeys)
        * params are paramaters we haven't disucssed 
    * Returns an instance of an EstimatorSpec
        * That does what needs to be done based on the mode (e.g. trains, or just predicts) 
5. Instantiate an Estimator with the model_fn
6. Call estimator.train/eval/predict with the relevant input_fn

It is now 22:07, so it took me an hour and forty to figure this out. My child did not interfere much so this was more or less continuous. 

Armed with this new knoweledge, I'm going to walk the dogs and then actually do this



In [3]:
import tensorflow as tf


  return f(*args, **kwds)
  return f(*args, **kwds)


In [4]:
def nalu(input_layer, num_outputs):
    """ Neural Arithmetic Logic Unit tesnorflow layer
    Arguments:
    input_layer - A Tensor representing previous layer
    num_outputs - number of ouput units 
    Returns:
    A tensor representing the output of NALU
    """

    shape = (int(input_layer.shape[-1]), num_outputs)

    # define variables
    W_hat = tf.Variable(tf.truncated_normal(shape, stddev=0.02,dtype=tf.float64))
    M_hat = tf.Variable(tf.truncated_normal(shape, stddev=0.02,dtype=tf.float64))
    G = tf.Variable(tf.truncated_normal(shape, stddev=0.02,dtype=tf.float64))

    # operations according to paper
    W = tf.tanh(W_hat) * tf.sigmoid(M_hat)
    m = tf.exp(tf.matmul(tf.log(tf.abs(input_layer) + 1e-7), W))
    g = tf.sigmoid(tf.matmul(input_layer, G))
    a = tf.matmul(input_layer, W)
    out = g * a + (1 - g) * m

    return out


In [5]:

def model(inputs,labels,params):
    '''
        Inputs a dict of tensors {"sequences":[?,?],"lengths":[?]}
        labels a tesnor of shape [?] batch size
        returns logits [?] batch_size
    '''
    lengths = inputs["lengths"]
    sequences = inputs["sequences"]
    char_embeddings = tf.get_variable("char_embeddings",[params['vocab_size'], params['hidden_size']],dtype=tf.float64)
    embedded = tf.nn.embedding_lookup(char_embeddings, sequences)
    
    cell = tf.nn.rnn_cell.LSTMCell(num_units=params['hidden_size'],dtype=tf.float64)
    outputs, states = tf.nn.bidirectional_dynamic_rnn(cell,cell, embedded,sequence_length=lengths,
                                   dtype=tf.float64)
    state = tf.concat([states[0][0],states[1][0]],axis=1)
    logits = nalu(state,1)
    return tf.squeeze(logits)
    
    

    
    
    


In [6]:
from utils.numtoWord import createNum2WordDict, vocab
def generator_function(params):
    while True:
        d = createNum2WordDict(size=100,high=params['max_num'])
        for value,word in d.items():
            if value==0:
                continue
            ids = [vocab[char] for char in word]
            length = len(word)
            yield (ids,length,value )
        

In [7]:
def input_fn(params):
    generator = lambda : generator_function(params)
    dataset = tf.data.Dataset.from_generator(
        generator=generator,
        output_types=(tf.int64,tf.int64,tf.double),
        output_shapes=(tf.TensorShape([None]),tf.TensorShape([]),tf.TensorShape([]))
    )
    dataset =dataset.padded_batch(
    params['batch_size'],
    padded_shapes=(tf.TensorShape([None]),tf.TensorShape([]),tf.TensorShape([]))
    )
    
    dataset = dataset.map(lambda x,y,z: ({"sequences":x,"lengths":y},z))
    return dataset
        

# SIde hack, try the model out

In [8]:
# sess = tf.InteractiveSession()
# with tf.variable_scope("new18"):
#     ds =input_fn(params)
#     it = ds.make_one_shot_iterator()
#     next_el = it.get_next()
#     sess.run([tf.global_variables_initializer()])
#     logits = model(*next_el,params=params)
#     opt = tf.train.AdamOptimizer(0.001)
#     loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=next_el[1])
#     loss = tf.reduce_mean(loss)
#     train = opt.minimize(loss)
#     sess.run(tf.initialize_all_variables())




In [9]:
# for i in range(100):
#     _,l = sess.run([train,loss])
#     if i%50 ==0:
#         print(l)
    

# Back to making an estimator

In [10]:
def model_fn(features,labels,mode,params):
    with tf.variable_scope("model"):
        logits = model(features,labels,params)
        loss = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits,labels=labels)
        loss = tf.nn.l2_loss(logits - labels) # NALU uses mse

        loss = tf.reduce_mean(loss)
        tf.summary.scalar("loss",loss)
        opt = tf.train.AdamOptimizer(0.0001)
        train = opt.minimize(loss,global_step=tf.train.get_global_step())
        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train)

params = {
    "max_num":5000,
    "batch_size":32,
    "hidden_size":128,
    "vocab_size":len(vocab)
}


        
estimator = tf.estimator.Estimator(model_fn=model_fn,params=params)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_num_worker_replicas': 1, '_evaluation_master': '', '_task_id': 0, '_task_type': 'worker', '_global_id_in_cluster': 0, '_keep_checkpoint_max': 5, '_save_summary_steps': 100, '_service': None, '_num_ps_replicas': 0, '_is_chief': True, '_log_step_count_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f97153b3ac8>, '_train_distribute': None, '_master': '', '_model_dir': '/tmp/tmpqhajrxkz', '_tf_random_seed': None, '_session_config': None, '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_save_checkpoints_secs': 600, '_device_fn': None}


In [11]:
estimator.train(input_fn=input_fn)

INFO:tensorflow:Calling model_fn.
Instructions for updating:
seq_dim is deprecated, use seq_axis instead
Instructions for updating:
batch_dim is deprecated, use batch_axis instead
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpqhajrxkz/model.ckpt.
INFO:tensorflow:loss = 172930990.04924768, step = 1
INFO:tensorflow:global_step/sec: 3.79334
INFO:tensorflow:loss = 100603741.11380596, step = 101 (26.364 sec)
INFO:tensorflow:global_step/sec: 4.17133
INFO:tensorflow:loss = 44853105.048112184, step = 201 (23.973 sec)
INFO:tensorflow:global_step/sec: 4.0366
INFO:tensorflow:loss = 34108714.80213587, step = 301 (24.772 sec)
INFO:tensorflow:global_step/sec: 3.74638
INFO:tensorflow:loss = 24332296.444560308, step = 401 (26.695 sec)
INFO:tensorflow:global_step/sec: 4.15361
INFO:tensorflow:los

INFO:tensorflow:loss = 18035358.66029378, step = 6601 (22.290 sec)
INFO:tensorflow:global_step/sec: 4.53726
INFO:tensorflow:loss = 19765387.89220039, step = 6701 (22.040 sec)
INFO:tensorflow:global_step/sec: 4.46528
INFO:tensorflow:loss = 22984159.44177398, step = 6801 (22.395 sec)
INFO:tensorflow:global_step/sec: 4.54287
INFO:tensorflow:loss = 20130895.669376504, step = 6901 (22.012 sec)
INFO:tensorflow:global_step/sec: 4.48387
INFO:tensorflow:loss = 10070395.73918238, step = 7001 (22.302 sec)
INFO:tensorflow:global_step/sec: 4.46667
INFO:tensorflow:loss = 14000603.525941122, step = 7101 (22.388 sec)
INFO:tensorflow:global_step/sec: 4.5181
INFO:tensorflow:loss = 10057452.830848247, step = 7201 (22.133 sec)
INFO:tensorflow:global_step/sec: 4.55668
INFO:tensorflow:loss = 11473754.58756508, step = 7301 (21.946 sec)
INFO:tensorflow:global_step/sec: 4.48553
INFO:tensorflow:loss = 16430049.930695642, step = 7401 (22.294 sec)
INFO:tensorflow:global_step/sec: 4.37824
INFO:tensorflow:loss = 61

INFO:tensorflow:global_step/sec: 4.9764
INFO:tensorflow:loss = 13707426.255418804, step = 14001 (20.095 sec)
INFO:tensorflow:global_step/sec: 5.05927
INFO:tensorflow:loss = 16972811.071351968, step = 14101 (19.766 sec)
INFO:tensorflow:global_step/sec: 5.0693
INFO:tensorflow:loss = 10454342.111710288, step = 14201 (19.727 sec)
INFO:tensorflow:global_step/sec: 5.00603
INFO:tensorflow:loss = 6456774.617390536, step = 14301 (19.976 sec)
INFO:tensorflow:global_step/sec: 5.02339
INFO:tensorflow:loss = 14397656.943593943, step = 14401 (19.908 sec)
INFO:tensorflow:global_step/sec: 5.01361
INFO:tensorflow:loss = 35017580.18803251, step = 14501 (19.944 sec)
INFO:tensorflow:global_step/sec: 5.01672
INFO:tensorflow:loss = 23448378.989832655, step = 14601 (19.934 sec)
INFO:tensorflow:global_step/sec: 5.04376
INFO:tensorflow:loss = 18182696.682309024, step = 14701 (19.827 sec)
INFO:tensorflow:global_step/sec: 5.00765
INFO:tensorflow:loss = 13432733.762523215, step = 14801 (19.969 sec)
INFO:tensorflo

INFO:tensorflow:global_step/sec: 5.02971
INFO:tensorflow:loss = 3121797.2799081393, step = 21401 (19.882 sec)
INFO:tensorflow:global_step/sec: 5.03607
INFO:tensorflow:loss = 1212325.0360055522, step = 21501 (19.857 sec)
INFO:tensorflow:global_step/sec: 5.01061
INFO:tensorflow:loss = 4730989.052578319, step = 21601 (19.958 sec)
INFO:tensorflow:global_step/sec: 5.05824
INFO:tensorflow:loss = 4877788.169050291, step = 21701 (19.770 sec)
INFO:tensorflow:global_step/sec: 5.05989
INFO:tensorflow:loss = 4086675.24030818, step = 21801 (19.763 sec)
INFO:tensorflow:global_step/sec: 5.04925
INFO:tensorflow:loss = 4738457.444300904, step = 21901 (19.805 sec)
INFO:tensorflow:global_step/sec: 5.00312
INFO:tensorflow:loss = 2122916.587268344, step = 22001 (19.988 sec)
INFO:tensorflow:global_step/sec: 5.05435
INFO:tensorflow:loss = 2929984.915270215, step = 22101 (19.785 sec)
INFO:tensorflow:global_step/sec: 5.02353
INFO:tensorflow:loss = 1229636.0015456167, step = 22201 (19.906 sec)
INFO:tensorflow:g

INFO:tensorflow:global_step/sec: 3.70044
INFO:tensorflow:loss = 1476854.8872257378, step = 28701 (27.022 sec)
INFO:tensorflow:global_step/sec: 4.42175
INFO:tensorflow:loss = 981097.1426545212, step = 28801 (22.616 sec)
INFO:tensorflow:global_step/sec: 4.09699
INFO:tensorflow:loss = 814676.4519285181, step = 28901 (24.408 sec)
INFO:tensorflow:global_step/sec: 4.22845
INFO:tensorflow:loss = 2608955.857022923, step = 29001 (23.650 sec)
INFO:tensorflow:global_step/sec: 4.37772
INFO:tensorflow:loss = 2663378.3498398354, step = 29101 (22.843 sec)
INFO:tensorflow:global_step/sec: 4.66072
INFO:tensorflow:loss = 682997.8065809604, step = 29201 (21.456 sec)
INFO:tensorflow:global_step/sec: 4.57856
INFO:tensorflow:loss = 2420400.367247807, step = 29301 (21.841 sec)
INFO:tensorflow:global_step/sec: 4.54712
INFO:tensorflow:loss = 1371179.179428639, step = 29401 (21.992 sec)
INFO:tensorflow:global_step/sec: 4.45095
INFO:tensorflow:loss = 759161.6108873808, step = 29501 (22.467 sec)
INFO:tensorflow:g

INFO:tensorflow:global_step/sec: 4.66059
INFO:tensorflow:loss = 825294.6583547059, step = 36001 (21.457 sec)
INFO:tensorflow:global_step/sec: 4.59952
INFO:tensorflow:loss = 1192631.7944655104, step = 36101 (21.741 sec)
INFO:tensorflow:global_step/sec: 4.53713
INFO:tensorflow:loss = 526655.112480223, step = 36201 (22.041 sec)
INFO:tensorflow:global_step/sec: 4.52607
INFO:tensorflow:loss = 1429311.5499446164, step = 36301 (22.094 sec)
INFO:tensorflow:global_step/sec: 4.61135
INFO:tensorflow:loss = 665120.5906262724, step = 36401 (21.686 sec)
INFO:tensorflow:global_step/sec: 4.57875
INFO:tensorflow:loss = 432820.7835214502, step = 36501 (21.840 sec)
INFO:tensorflow:global_step/sec: 4.69818
INFO:tensorflow:loss = 997091.6554707349, step = 36601 (21.285 sec)
INFO:tensorflow:global_step/sec: 4.54811
INFO:tensorflow:loss = 693162.866280966, step = 36701 (21.989 sec)
INFO:tensorflow:global_step/sec: 4.55875
INFO:tensorflow:loss = 548039.7463813815, step = 36801 (21.934 sec)
INFO:tensorflow:glo

KeyboardInterrupt: 

In [None]:
d

In [None]:
createNum2WordDict(size=10,high=100000)