# Intro 

This module will get us familair with the basics of deep learning for NLP, in Tensorflow. 
We'll cover using embeddings, defining models and some of the tools that tensorflow offers us

## Boilerplate
Nerual networks come with a lot of boilerplate. Throughout these modules, we'll provde you with most of it so that you can focus on key concepts. As we progress, we'll learn to roll more of it ourselves. 

For the duration of this notebook, we'll be focused on a single problem, given a number repeesented as a german word, decide if it is even or odd using a nueral network. 

During each excercise in this notebook, you'll only implement one function, called model . It looks like this
```python
def model(inputs,labels,params):
    '''
        Inputs a dict of tensors {"sequences":[?,?],"lengths":[?]}
        labels a tesnor of shape [?] batch size
        returns logits [?] batch_size
    '''
    lengths = inputs["lengths"]
    sequences = inputs["sequences"]
    #DEEP LEARNING MAGIC
```

Once you've defined it, you'll run it with the following code
```python
    estimator = estimator_factory(model,params)
    estimator.train(input_fn=input_fn,steps=100000)
```

### What is that ? 

Those two lines leverage Tensorflows Estimator API and Dataset API to make your life easy. You define a model that outputs logits and they do things like
* Get data and feed it to the model
* Calculate the loss and run the backpropogation
* Save checkpoints of your model to disk
* Visiualize metrics for you in Tensorboard

As we progress we'll go into some of those functions to see how we can expand. 

## What data will go into your model 

Your model will receive three tensors, 
The input/sequences consists of sequences of charchter ids that represent a word. 
The next tensor, lengths, specifies how long each sequence is
The final tensor, labels, specifies if each example is odd (1) or even (0)

In [1]:
import tensorflow as tf
from utils.numtoWord import vocab # Our data's vocabulary
from utils.dnn.estimatorTools import estimator_factory, input_fn #Magic
params = { #Paramaters for our model. You can change these to see what happens
    "max_num":5000,
    "batch_size":32,
    "hidden_size":128,
    "vocab_size":len(vocab)
}



  return f(*args, **kwds)
  return f(*args, **kwds)


In [4]:
def model(inputs,labels,params):
    '''
        Use this template to solve all of the excercises ahead. 
        Inputs a dict of tensors {"sequences":[?,?],"lengths":[?]}
        labels a tesnor of shape [?] batch size
        returns logits [?] batch_size
    '''
    lengths = inputs["lengths"]
    sequences = inputs["sequences"]
    '''
        YOU DO WORK HERE
    '''
    return logits
estimator = estimator_factory(model,params)
estimator.train(input_fn=input_fn,steps=100000)

# Excercise 1 - Embeddings
The highlight of deep learning for NLP is that we can use vector representations of words or letters. Our inputs are sequences of ids, which we want to convert to sequences of vectors. 
Your job is to write a model that
1. Takes the ids provided in sequence and converts them to vectors with the [embedding_lookup]. (You'll need google foo for this task) (https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) function
2. Reduces each sequence by summing it with [reduce_sum](https://www.tensorflow.org/api_docs/python/tf/nn/embedding_lookup) 
3. Converts the resulting sum into a logit using [tf.layers.dense](https://www.tensorflow.org/api_docs/python/tf/layers/dense)

In [None]:
# DO WORK HERE

# Excercise 1.1 - Bonus questions
If that was easy answer the following ? 
1. Why won't your models loss converge ? 
2. If you take the average instead of the sum will it help ? 
3. Try it 
4. We never specified the vectors values. Where are they coming from ?


# Excercise 2 - RNNs
This excercise introduces us to the workhourse of NLP, recurrent neural networks and their variants. In particular we'll look at some useful tools tensorflow provides that make working with sequences easier.  
Your job is to write a model that
1. As before, embeds the input
2. Uses tensorflow's [dynamic rnn](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn) to process the sequence
3. Convert the final state of the rnn to a logit using [tf.layers.dense](https://www.tensorflow.org/api_docs/python/tf/layers/dense)

If you did it well, your model should converge
## Things to consider
1. Use a GRUCell instead of RNN or LSTM. It's interface is slightly simpler
2. How does your model behave when you specify sequence_lengths vs when you don't specify them. What is happening ? 
3. What happens if you add another hidden layer between the rnn and logits ?
5. Change the paramater max_len to 10000000, this will make words longer. What changes in your models behaviour ? 



# Excercise 3 - BiDirectional RNNs
A standard RNN reads the input from left to right. It's helpful if it can read the input from left to right and right to left. That's where BiDirectional RNNs come in. 
It used to be that implementing them was horrible. Now it's easy, just use [bidirectional_dynamic_rnn](https://www.tensorflow.org/api_docs/python/tf/nn/bidirectional_dynamic_rnn). 

So for excercise 3, repeat excercise two with BiDirectional RNNS. Make sure you read the docs thoroughly. 

## Bonus
1. Try using an LSTM cell now. What changes with the last state? Why is that ? 



In [1]:
import tensorflow as tf

  return f(*args, **kwds)
  return f(*args, **kwds)


In [None]:
tf.nn.sparse_softmax_cross_entropy_with_logits