# Character level language model - Dinosaurus land

** This problem comes from Andrew Ng's coursera course. Instead of using their Keras-based solution, we will try to solve this problem based on Tensorflow.**

Imagine that leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely! 

<table>
<td>
<img src="images/dino.jpg" style="width:250;height:300px;">

</td>

</table>

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this [dataset](dinos.txt). (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. 


By completing this assignment you will learn:

- How to store text data for processing using an RNN 
- How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
- How to build a character-level text generation recurrent neural network




In [1]:
import numpy as np
#from utils import *
import random
import math
import tensorflow as tf
import os
import time

  from ._conv import register_converters as _register_converters


## 1 - Problem Statement

### 1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size. 

In [2]:
data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))

del data

There are 19912 total characters and 27 unique characters in your data.


The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the `<EOS>` (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. 
In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, `char_to_ix` and `ix_to_char` are the python dictionaries. 

In [3]:
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)

{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}


## 2 - Building the language model 

It is time to build the character-level language model for text generation. 





### 2.1 - Build training data

Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. The following function build_training_data() generates X_train, Y_train, and seqlen.

In [4]:
def build_training_data( filename, name_length=30 ):

    with open(filename) as f:
        examples = f.readlines()
    examples = [x.lower().strip() for x in examples]    
    
    data_ix = []   
    for name in examples:
        data_ix.append([char_to_ix[ch] for ch in name])

    seqlen = []
    X_train = []
    Y_train = []
    for name_ix in data_ix:  
        seqlen.append( len(name_ix) )
        x = name_ix.copy()
        x.extend([np.nan]*(name_length-len(name_ix)))
        X_train.append( x )
        y = name_ix[1:].copy()+[char_to_ix["\n"]]
        y.extend([np.nan]*(name_length-len(name_ix)))
        Y_train.append( y )
    
    X_train, Y_train, seqlen = np.array(X_train), np.array(Y_train), np.array(seqlen)
    
    return X_train, Y_train, seqlen

### 2.2 - Build the model 

Create placeholders X_tr, Y_tr, Inits_tr, and Seq_len_tr for training data. In addition, create placeholders X_te, Y_te, Inits_te, and Seq_len_te. 


In [5]:
def create_placeholders(n_a, n_layers):
    
    # create placeholder for switch
    is_training = tf.placeholder_with_default( tf.constant(True), [], name='is_training' )

    ### START CODE HERE ###
    # Use tf.placeholder().
    # create placeholder for X_tr. dtype:tf.int32, shape:[None,None] (actually [M, n_steps]), name='X_tr'
    X_tr = tf.placeholder( tf.int32, [None, None], name='X_tr' ) 
    # create placeholder for Y_tr. dtype:tf.int32, shape:[None,None] (actually [M, n_steps]), name='Y_tr'
    Y_tr = tf.placeholder( tf.int32, [None, None], name='Y_tr') 
    # create placeholder for Inits_tr. dtype:tf.float32, shape:[None,n_a*n_layers] (actually [M,n_a*n_layers]), name='Inits_tr'
    Inits_tr = tf.placeholder( tf.float32, [None, n_a*n_layers], name='Inits_tr')  
    # create placeholder for Seq_len_tr. dtype:tf.int32, shape:[None] (actually for [M]), name='Seq_len_tr'
    Seq_len_tr = tf.placeholder( tf.int32, [None], name='Seq_len_tr' ) 
    ### ENE CODE HERE ###
    
    # create placeholders for testing
    # X_te:(1, 1), Inits_te:(1, 200), Seq_len_te:(1,)
    X_te = tf.placeholder_with_default( tf.constant([[1]], dtype=tf.int32), [1, 1], name='X_te' ) 
    Y_te = tf.placeholder_with_default( tf.constant([[1]], dtype=tf.int32), [None, None], name='Y_te')
    Inits_te = tf.placeholder_with_default( tf.zeros([1, n_a*n_layers], dtype=tf.float32), [1, n_a*n_layers], name='Inits_te')  
    Seq_len_te = tf.placeholder_with_default( tf.constant([1], dtype=tf.int32), [1], name='Seq_len_te' ) 
    
    return (is_training, X_tr, Y_tr, Inits_tr, Seq_len_tr, X_te, Y_te, Inits_te, Seq_len_te)

In [6]:
(is_training, X_tr, Y_tr, Inits_tr, Seq_len_tr, X_te, Y_te, Inits_te, Seq_len_te) = create_placeholders(10, 2)
print ("X_tr = " + str(X_tr))
print ("Y_tr = " + str(X_tr))
print ("Inits_tr = " + str(Inits_tr))
print ("Seq_len_tr = " + str(Seq_len_tr))

X_tr = Tensor("X_tr:0", shape=(?, ?), dtype=int32)
Y_tr = Tensor("X_tr:0", shape=(?, ?), dtype=int32)
Inits_tr = Tensor("Inits_tr:0", shape=(?, 20), dtype=float32)
Seq_len_tr = Tensor("Seq_len_tr:0", shape=(?,), dtype=int32)


**Expected Output**

<table> 
<tr>
<td>
    X_tr = Tensor("X_tr_1:0", shape=(?, ?), dtype=int32)
</td>
</tr>
<tr>
<td>
    Y_tr = Tensor("X_tr_1:0", shape=(?, ?), dtype=int32)
</td>
</tr>
<tr>
<td>
    Inits_tr = Tensor("Inits_tr_1:0", shape=(?, 20), dtype=float32)
</td>
</tr>
<tr>
<td>
    Seq_len_tr = Tensor("Seq_len_tr_1:0", shape=(?,), dtype=int32)
</td>
</tr>
</table>

Build the forward_propagation graph. 

In [7]:
def forward_propagation(Xo, n_layers, n_a, inits, seq_len, n_y):

    ### START CODE HERE ###
    # Create the list of GRUCells using tf.nn.rnn_cell.GRUCell. 
    # The number of hidden nodes is n_a and the number of layers is n_layers.
    basic_cell = [tf.nn.rnn_cell.GRUCell(n_a) for _ in range(n_layers)]
    # Create multi-RNN cell using tf.nn.rnn_cell.MultiRNNCell. Set state_is_tuple to False.(should be updated)
    basic_cell = tf.nn.rnn_cell.MultiRNNCell( basic_cell, state_is_tuple=False ) 
    # Build dynamic RNNs using tf.nn.dynamic_rnn(). Be careful with the arguments, initial_state, dtype(tf.float32), seq_len.
    # It returns outpus and states where outputs.shape = [batch_size, n_steps, num_units], states.shape = [batch_size, n_a]
    outputs, states = tf.nn.dynamic_rnn( basic_cell, Xo, initial_state=inits, dtype = tf.float32, sequence_length=seq_len )
    ### END CODE HERE ###

    # Yfloat shape : [ batch_size x n_steps, n_a ]
    Yflat = tf.reshape(outputs, [-1, n_a])    
    # Ylogits shape : [ batch_size x n_steps, vocab_size ] 
    Ylogits = tf.contrib.layers.fully_connected( Yflat, n_y, activation_fn = None )
    
    return Ylogits, states

In [8]:
tf.reset_default_graph()
np.random.seed(1)
Xo = tf.one_hot([[1, 2, 0, 0]], 3)
Z3 = forward_propagation( Xo, 1, 5, tf.constant([[0.0,0.0,0.0,0.0,0.0]],dtype=tf.float32), tf.constant([2],dtype=tf.int32), 3 )
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    logit_out, states_out = sess.run(Z3)
print("logit_out.shape: ", logit_out.shape)
print("states_out.shape:", states_out.shape)

logit_out.shape:  (4, 3)
states_out.shape: (1, 5)


**Expected Output**

<table> 
<tr>
<td>
    logit_out.shape:  (4, 3)
</td>
</tr>
<tr>
<td>
    states_out.shape: (1, 5)
</td>
</tr>
</table>

Compute the loss. Ylogits has the shape [batch_size*n_steps, n_y] and thus Yo also should be reshaped to have the same shape.
Use tf.reshape(). 

In [9]:
def compute_loss(Ylogits, Yo, n_y):
    
    ### START CODE HERE ###
    # Yflat_ shape : [ batch_size x n_steps, n_y ]
    # Use tf.reshape() with [-1, n_y]
    Yflat_ = tf.reshape(Yo, [-1, n_y])     
    ### END CODE HERE ###
    
    # Compute loss. At the following first line, loss has the shape [ batch_size x n_steps ]
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Yflat_)  
    loss = tf.reduce_mean(loss)    
    
    return loss

In [10]:
tf.reset_default_graph()
np.random.seed(1)
logits = tf.constant([0.8,-0.1], dtype=tf.float32)
Y = tf.constant([1,0], dtype=tf.float32)
loss = compute_loss( logits, Y, 2 )
init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    loss_out = sess.run(loss)
print("loss_out: ", loss_out)

Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See @{tf.nn.softmax_cross_entropy_with_logits_v2}.

loss_out:  0.34115392


**Expected Output**

<table> 
<tr>
<td>
    loss_out:  0.34115392
</td>
</tr>
</table>

Build model by adding the functions you built above.

In [11]:
# GRADED FUNCTION: model

def model( ix_to_char, char_to_ix, num_epochs = 35000, n_a = 100, 
          dino_names = 30, vocab_size = 27, batch_size=100, learning_rate=0.001, max_len_name=30, n_layers=2):
    """
    Trains the model and generates dinosaur names. 
    
    Arguments:
    ix_to_char -- dictionary that maps the index to a character
    char_to_ix -- dictionary that maps a character to an index
    num_epochs -- number of iterations to train the model for
    n_a -- number of units of the RNN cell
    dino_names -- number of dinosaur names you want to sample at each iteration. 
    vocab_size -- number of unique characters found in the text, size of the vocabulary
    batch_size
    learning_rate
    max_len_name
    n_layers -- number of layers of RNNs
    
    Returns:
    None
    """
    tf.reset_default_graph()
    
    # Retrieve n_x and n_y from vocab_size (=27)
    n_x, n_y = vocab_size, vocab_size
    
    # Build training data with padding
    # X_train' shape : [m, n_steps]
    # Y_train' shape : [m, n_steps]
    X_train, Y_train, seqlen_train = build_training_data( "dinos.txt" )
    m = X_train.shape[0]
    
    # Create placeholder
    (is_training, X_tr, Y_tr, Inits_tr, Seq_len_tr, X_te, Y_te, Inits_te, Seq_len_te) = create_placeholders( n_a, n_layers )

    # Create training dataset and its iterator
    train_dataset = tf.data.Dataset.from_tensor_slices((X_tr, Y_tr, Inits_tr, Seq_len_tr))
    train_dataset = train_dataset.repeat().batch(batch_size)
    train_iterator = train_dataset.make_initializable_iterator()
    train_iter_init_op = train_iterator.make_initializer(train_dataset, name="train_iter_init_op")
    
    # Create test dataset and its iterator
    test_dataset = tf.data.Dataset.from_tensor_slices((X_te, Y_te, Inits_te, Seq_len_te))
    test_dataset = test_dataset.repeat().batch(1)
    test_iterator = test_dataset.make_initializable_iterator()
    test_iter_init_op = test_iterator.make_initializer(test_dataset, name="test_iter_init_op")
    
    # Execute conditionally
    (x, y, inits, seq_len) = tf.cond(tf.equal(is_training, tf.constant(True, dtype=tf.bool)), 
                                     lambda: train_iterator.get_next(), 
                                     lambda: test_iterator.get_next())
    
    # Change x and y into one-hot codes
    Xo = tf.one_hot( x, n_x ) #[m, n_steps, n_x]
    Yo = tf.one_hot( y, n_y ) #[m, n_steps, n_y]        
    
    # Build the model for forward propagation
    # Ylogits's shape : [ batch_size x n_steps, n_y=vocab_size ] 
    Ylogits, states = forward_propagation(Xo, n_layers, n_a, inits, seq_len, n_y)
    
    # compute lost
    loss = compute_loss(Ylogits, Yo, n_y)
    
    # optimize the model
    train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
    
    with tf.Session() as sess:
        
        # compute the number of minibatches
        num_minibatches = np.ceil(m / batch_size).astype(int)         

        # Run the initialization
        states_zero = np.zeros((X_train.shape[0], n_a*n_layers))
        print("X_train:{}, Y_train:{}, Inits:{}, Seq_len:{}"
              .format(X_train.shape, Y_train.shape, states_zero.shape, seqlen_train.shape))

        sess.run([tf.global_variables_initializer(), train_iter_init_op, test_iter_init_op], 
                 feed_dict={X_tr:X_train, Y_tr:Y_train, Inits_tr:states_zero, Seq_len_tr:seqlen_train })

        # Do the training loop
        for epoch in range(num_epochs):
            train_dataset.shuffle(100000)
            minibatch_loss = 0.            
            for i in range(num_minibatches):
                # run session. use sess.run()
                ### START CODE HERE ### (1 line)
                _ , temp_loss = sess.run( [train_op, loss] )

                minibatch_loss += temp_loss
            minibatch_loss /= num_minibatches
            
            # Print the cost every epoch
            if epoch % 100 == 0:
                print ("Cost after epoch %i: %f" % (epoch, minibatch_loss))

            if epoch % 100 == 0:
                ry = np.array([[np.random.randint(0, 27)]])
                rh = np.zeros([1, n_a*n_layers])
                for name in range(dino_names):
                    sess.run(test_iter_init_op, feed_dict={X_te: ry, Inits_te: rh, Seq_len_te:[1], is_training:False} )
                    ry_logits, rh, x_out, is_training_out, X_te_out = sess.run([Ylogits, states, x, is_training, X_te],
                                                feed_dict={X_te: ry, Inits_te: rh, Seq_len_te:[1], is_training:False})
                    ry = np.argmax(ry_logits, axis=1)
                    if ix_to_char[ry[0]]=='\n':
                        break
                    else:
                        print(ix_to_char[ry[0]], end = ' ')
                        ry = np.reshape(ry, (1,1))
                print('\n')

    return None


Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names. 100 or 200 epoch may be sufficient.

In [None]:
model( ix_to_char, char_to_ix)

X_train:(1539, 30), Y_train:(1539, 30), Inits:(1539, 200), Seq_len:(1539,)
Cost after epoch 0: 1.254533
s u u 



## Conclusion

You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like `maconucon`, `marloralus` and `macingsersaurus`. Your model hopefully also learned that dinosaur names tend to end in `saurus`, `don`, `aura`, `tor`, etc.


**References**:
- This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy's [blog post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
- For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py 