# Character level language model - Dinosaurus land

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely! 

<table>
<td>
<img src="images/dino.jpg" style="width:250;height:300px;">

</td>

</table>

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this [dataset](dinos.txt). (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath! 

By completing this assignment you will learn:

- How to store text data for processing using an RNN 
- How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
- How to build a character-level text generation recurrent neural network
- Why clipping the gradients is important

We will begin by loading in some functions that we have provided for you in `rnn_utils`. Specifically, you have access to functions such as `rnn_forward` and `rnn_backward` which are equivalent to those you've implemented in the previous assignment. 

In [None]:
import numpy as np
from utils import *
import random

## 1 - Problem Statement

### 1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size. 

In [None]:
data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))

The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the `<EOS>` (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character character. This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, `char_to_ix` and `ix_to_char` are the python dictionaries. 

In [None]:
char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)

### 1.2 - Overview of the model

Your model will have the following structure: 

- Initialize parameters 
- Run the optimization loop
    - Forward propagation to compute the loss function
    - Backward propagation to compute the gradients with respect to the loss function
    - Clip the gradients to avoid exploding gradients
    - Using the gradients, update your parameter with the gradient descent update rule.
- Return the learned parameters 
    
<img src="images/rnn.png" style="width:450;height:300px;">
<caption><center> **Figure 1**: Recurrent Neural Network, similar to what you had built in the previous notebook "Building a RNN - Step by Step".  </center></caption>

At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset $X = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$ is a list of characters in the training set, while $Y = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$ is such that at every time-step $t$, we have $y^{\langle t \rangle} = x^{\langle t+1 \rangle}$. 

## 2 - Building blocks of the model

In this part, you will build two important blocks of the overall model:
- Gradient clipping: to avoid exploding gradients
- Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

### 2.1 - Clipping the gradients in the optimization loop

In this section you will implement the `clip` function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not "exploding," meaning taking on overly large values. 

In the exercise below, you will implement a function `clip` that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a `maxValue` (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone. 

<img src="images/clip.png" style="width:400;height:150px;">
<caption><center> **Figure 2**: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into slight "exploding gradient" problems. </center></caption>

**Exercise**: Implement the function below to return the clipped gradients of your dictionary `gradients`. Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this [hint](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.clip.html) for examples of how to clip in numpy. You will need to use the argument `out = ...`.

In [None]:
### GRADED FUNCTION: clip

def clip(gradients, maxValue):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
    Returns: 
    gradients -- a dictionary with the clipped gradients.
    '''
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    ### START CODE HERE ###
    # clip to mitigate exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
    for gradient in gradients:
        gradients[gradient] = np.clip(gradients[gradient], a_min=-maxValue, a_max=maxValue)
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
        ### END CODE HERE ###
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

In [None]:
np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])

** Expected output:**

<table>
<tr>
    <td> 
    **gradients["dWaa"][1][2] **
    </td>
    <td> 
    10.0
    </td>
</tr>

<tr>
    <td> 
    **gradients["dWax"][3][1]**
    </td>
    <td> 
    -10.0
    </td>
    </td>
</tr>
<tr>
    <td> 
    **gradients["dWya"][1][2]**
    </td>
    <td> 
0.29713815361
    </td>
</tr>
<tr>
    <td> 
    **gradients["db"][4]**
    </td>
    <td> 
[ 10.]
    </td>
</tr>
<tr>
    <td> 
    **gradients["dby"][1]**
    </td>
    <td> 
[ 8.45833407]
    </td>
</tr>

</table>

### 2.2 - Sampling

Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:

<img src="images/dinos3.png" style="width:500;height:300px;">
<caption><center> **Figure 3**: In this picture, we assume the model is already trained. We pass in $x^{\langle 1\rangle} = \vec{0}$ at the first time step, and have the network then sample one character at a time. </center></caption>

**Exercise**: Implement the `sample` function below to sample characters. You need to carry out 4 steps:

- **Step 1**: Pass the network the first "dummy" input $x^{\langle 1 \rangle} = \vec{0}$ (the vector of zeros). This is the default input before we've generated any characters. We also set $a^{\langle 0 \rangle} = \vec{0}$

- **Step 2**: Run one step of forward propagation to get $a^{\langle 1 \rangle}$ and $\hat{y}^{\langle 1 \rangle}$. Here are the equations:

$$ a^{\langle t+1 \rangle} = \tanh(W_{ax}  x^{\langle t \rangle } + W_{aa} a^{\langle t \rangle } + b)\tag{1}$$

$$ z^{\langle t + 1 \rangle } = W_{ya}  a^{\langle t + 1 \rangle } + b_y \tag{2}$$

$$ \hat{y}^{\langle t+1 \rangle } = softmax(z^{\langle t + 1 \rangle })\tag{3}$$

Note that $\hat{y}^{\langle t+1 \rangle }$ is a (softmax) probability vector (its entries are between 0 and 1 and sum to 1). $\hat{y}^{\langle t+1 \rangle}_i$ represents the probability that the character indexed by "i" is the next character.  We have provided a `softmax()` function that you can use.

- **Step 3**: Carry out sampling: Pick the next character's index according to the probability distribution specified by $\hat{y}^{\langle t+1 \rangle }$. This means that if $\hat{y}^{\langle t+1 \rangle }_i = 0.16$, you will pick the index "i" with 16% probability. To implement it, you can use [`np.random.choice`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.choice.html).

Here is an example of how to use `np.random.choice()`:
```python
np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())
```
This means that you will pick the `index` according to the distribution: 
$P(index = 0) = 0.1, P(index = 1) = 0.0, P(index = 2) = 0.7, P(index = 3) = 0.2$.

- **Step 4**: The last step to implement in `sample()` is to overwrite the variable `x`, which currently stores $x^{\langle t \rangle }$, with the value of $x^{\langle t + 1 \rangle }$. You will represent $x^{\langle t + 1 \rangle }$ by creating a one-hot vector corresponding to the character you've chosen as your prediction. You will then forward propagate $x^{\langle t + 1 \rangle }$ in Step 1 and keep repeating the process until you get a "\n" character, indicating you've reached the end of the dinosaur name. 

In [None]:
# GRADED FUNCTION: sample

def sample(parameters, char_to_ix, seed):
    """
    Sample a sequence of characters according to a sequence of probability distributions output of the RNN

    Arguments:
    parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
    char_to_ix -- python dictionary mapping each character to an index.
    seed -- used for grading purposes. Do not worry about it.

    Returns:
    indices -- a list of length n containing the indices of the sampled characters.
    """
    
    # Retrieve parameters and relevant shapes from "parameters" dictionary
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]
    n_a = Waa.shape[1]
    n_x = Wax.shape[1]
    
    ### START CODE HERE ###
    # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
    x = np.zeros((n_x, 1))
    # Step 1': Initialize a_prev as zeros (≈1 line)
    a_prev = np.zeros((n_a, 1))
    
    # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
    indices = []
    
    # Idx is a flag to detect a newline character, we initialize it to -1
    idx = -1 
    
    # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append 
    # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well 
    # trained model), which helps debugging and prevents entering an infinite loop. 
    counter = 0
    newline_character = char_to_ix['\n']
    
    while (idx != newline_character and counter != 50):
        
        # Step 2: Forward propagate x using the equations (1), (2) and (3)
        a = a_prev
        z = np.tanh(np.dot(Waa, a) + np.dot(Wax, x) + b)
        y = softmax(np.dot(Wya, z) + by)
        
        # for grading purposes
        np.random.seed(counter+seed) 
        
        # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
        idx = np.random.choice(range(0, vocab_size), p = y.ravel())
        # Append the index to "indices"
        indices.append(idx)
        
        # Step 4: Overwrite the input character as the one corresponding to the sampled index.
        x = np.zeros((n_x, 1))
        x[idx] = 1
        
        # Update "a_prev" to be "a"
        a_prev = z
        
        # for grading purposes
        seed += 1
        counter +=1
        
    ### END CODE HERE ###

    if (counter == 50):
        indices.append(char_to_ix['\n'])
    
    return indices

In [None]:
np.random.seed(2)
_, n_a = 20, 100
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}


indices = sample(parameters, char_to_ix, 0)
print("Sampling:")
print("list of sampled indices:", indices)
print("list of sampled characters:", [ix_to_char[i] for i in indices])

** Expected output:**
<table>
<tr>
    <td> 
    **list of sampled indices:**
    </td>
    <td> 
    [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, <br>
    7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]
    </td>
    </tr><tr>
    <td> 
    **list of sampled characters:**
    </td>
    <td> 
    ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', <br>
    'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', <br>
    'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']
    </td>
    
        
    
</tr>
</table>

## 3 - Building the language model 

It is time to build the character-level language model for text generation. 


### 3.1 - Gradient descent 

In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. As a reminder, here are the steps of a common optimization loop for an RNN:

- Forward propagate through the RNN to compute the loss
- Backward propagate through time to compute the gradients of the loss with respect to the parameters
- Clip the gradients if necessary 
- Update your parameters using gradient descent 

**Exercise**: Implement this optimization process (one step of stochastic gradient descent). 

We provide you with the following functions: 

```python
def rnn_forward(X, Y, a_prev, parameters):
    """ Performs the forward propagation through the RNN and computes the cross-entropy loss.
    It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""
    ....
    return loss, cache
    
def rnn_backward(X, Y, parameters, cache):
    """ Performs the backward propagation through time to compute the gradients of the loss with respect
    to the parameters. It returns also all the hidden states."""
    ...
    return gradients, a

def update_parameters(parameters, gradients, learning_rate):
    """ Updates parameters using the Gradient Descent Update Rule."""
    ...
    return parameters
```

In [None]:
# GRADED FUNCTION: optimize

def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
    """
    Execute one step of the optimization to train the model.
    
    Arguments:
    X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
    Y -- list of integers, exactly the same as X but shifted one index to the left.
    a_prev -- previous hidden state.
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        b --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    learning_rate -- learning rate for the model.
    
    Returns:
    loss -- value of the loss function (cross-entropy)
    gradients -- python dictionary containing:
                        dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
                        dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
                        dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
                        db -- Gradients of bias vector, of shape (n_a, 1)
                        dby -- Gradients of output bias vector, of shape (n_y, 1)
    a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
    """
    
    ### START CODE HERE ###
    
    # Forward propagate through time (≈1 line)
    loss, cache = rnn_forward(X, Y, a_prev, parameters)
    
    # Backpropagate through time (≈1 line)
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Clip your gradients between -5 (min) and 5 (max) (≈1 line)
    gradients = clip(gradients, 5.0)
    
    # Update parameters (≈1 line)
    parameters = update_parameters(parameters, gradients, learning_rate)
    
    ### END CODE HERE ###
    
    return loss, gradients, a[len(X)-1]

In [None]:
np.random.seed(1)
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26]

loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])

** Expected output:**

<table>


<tr>
    <td> 
    **Loss **
    </td>
    <td> 
    126.503975722
    </td>
</tr>
<tr>
    <td> 
    **gradients["dWaa"][1][2]**
    </td>
    <td> 
    0.194709315347
    </td>
<tr>
    <td> 
    **np.argmax(gradients["dWax"])**
    </td>
    <td> 93
    </td>
</tr>
<tr>
    <td> 
    **gradients["dWya"][1][2]**
    </td>
    <td> -0.007773876032
    </td>
</tr>
<tr>
    <td> 
    **gradients["db"][4]**
    </td>
    <td> [-0.06809825]
    </td>
</tr>
<tr>
    <td> 
    **gradients["dby"][1]**
    </td>
    <td>[ 0.01538192]
    </td>
</tr>
<tr>
    <td> 
    **a_last[4]**
    </td>
    <td> [-1.]
    </td>
</tr>

</table>

### 3.2 - Training the model 

Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic gradient descent visits the examples in random order. 

**Exercise**: Follow the instructions and implement `model()`. When `examples[index]` contains one dinosaur name (string), to create an example (X, Y), you can use this:
```python
        index = j % len(examples)
        X = [None] + [char_to_ix[ch] for ch in examples[index]] 
        Y = X[1:] + [char_to_ix["\n"]]
```
Note that we use: `index= j % len(examples)`, where `j = 1....num_iterations`, to make sure that `examples[index]` is always a valid statement (`index` is smaller than `len(examples)`).
The first entry of `X` being `None` will be interpreted by `rnn_forward()` as setting $x^{\langle 0 \rangle} = \vec{0}$. Further, this ensures that `Y` is equal to `X` but shifted one step to the left, and with an additional "\n" appended to signify the end of the dinosaur name. 

In [None]:
# GRADED FUNCTION: model

def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):
    """
    Trains the model and generates dinosaur names. 
    
    Arguments:
    data -- text corpus
    ix_to_char -- dictionary that maps the index to a character
    char_to_ix -- dictionary that maps a character to an index
    num_iterations -- number of iterations to train the model for
    n_a -- number of units of the RNN cell
    dino_names -- number of dinosaur names you want to sample at each iteration. 
    vocab_size -- number of unique characters found in the text, size of the vocabulary
    
    Returns:
    parameters -- learned parameters
    """
    
    # Retrieve n_x and n_y from vocab_size
    n_x, n_y = vocab_size, vocab_size
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, n_x, n_y)
    
    # Initialize loss (this is required because we want to smooth our loss, don't worry about it)
    loss = get_initial_loss(vocab_size, dino_names)
    
    # Build list of all dinosaur names (training examples).
    with open("dinos.txt") as f:
        examples = f.readlines()
    examples = [x.lower().strip() for x in examples]
    
    # Shuffle list of all dinosaur names
    np.random.seed(0)
    np.random.shuffle(examples)
    
    # Initialize the hidden state of your LSTM
    a_prev = np.zeros((n_a, 1))
    
    # Optimization loop
    for j in range(num_iterations):
        
        ### START CODE HERE ###
        
        # Use the hint above to define one training example (X,Y) (≈ 2 lines)
        index = j % len(examples)
        X = [None] + [char_to_ix[ch] for ch in examples[index]] 
        Y = X[1:] + [char_to_ix["\n"]]
        
        # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
        # Choose a learning rate of 0.01
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
        
        ### END CODE HERE ###
        
        # Use a latency trick to keep the loss smooth. It happens here to accelerate the training.
        loss = smooth(loss, curr_loss)

        # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
        if j % 2000 == 0:
            
            print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
            
            # The number of dinosaur names to print
            seed = 0
            for name in range(dino_names):
                
                # Sample indices and print them
                sampled_indices = sample(parameters, char_to_ix, seed)
                print_sample(sampled_indices, ix_to_char)
                
                seed += 1  # To get the same result for grading purposed, increment the seed by one. 
      
            print('\n')
        
    return parameters

Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names. 

In [None]:
parameters = model(data, ix_to_char, char_to_ix)

## Conclusion

You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like `maconucon`, `marloralus` and `macingsersaurus`. Your model hopefully also learned that dinosaur names tend to end in `saurus`, `don`, `aura`, `tor`, etc.

If your model generates some non-cool names, don't blame the model entirely--not all actual dinosaur names sound cool. (For example, `dromaeosauroides` is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest! 

This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!

<img src="images/mangosaurus.jpeg" style="width:250;height:300px;">

## 4 - Writing like Shakespeare

The rest of this notebook is optional and is not graded, but we hope you'll do it anyway since it's quite fun and informative. 

A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term dependencies were less important with dinosaur names, since the names were quite short. 


<img src="images/shakespeare.jpg" style="width:500;height:400px;">
<caption><center> Let's become poets! </center></caption>

We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes. 

In [None]:
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io

To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called [*"The Sonnets"*](shakespeare.txt). 

Let's train the model for one more epoch. When it finishes training for an epoch---this will also take a few minutes---you can run `generate_output`, which will prompt asking you for an input (`<`40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try "Forsooth this maketh no sense " (don't enter the quotation marks). Depending on whether you include the space at the end, your results might also differ--try it both ways, and try other inputs as well. 


In [None]:
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])

In [None]:
x.shape, y.shape

In [None]:
# Run this cell to try with different inputs without having to re-train the model 
generate_output()

In [None]:
model.predict(x)

The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:
- LSTMs instead of the basic RNN to capture longer-range dependencies
- The model is a deeper, stacked LSTM model (2 layer)
- Using Keras instead of python to simplify the code 

If you want to learn more, you can also check out the Keras Team's text generation implementation on GitHub: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py.

Congratulations on finishing this notebook! 

**References**:
- This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy's [blog post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
- For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py 

In [1]:
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from textgen import *
import sys
import io

Using TensorFlow backend.


Loading text data...
corpus length: 67869
total chars: 58
nb sequences: 22610
Vectorization...
Build model...


W0303 12:42:41.809789 46832 deprecation_wrapper.py:119] From C:\Users\Rajath\Anaconda3\envs\nlp_course\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.



Epoch 1/60

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: " on your lips, my bum is on your lips"
a"
 on your lips, my bum is on your lips"
and i' the bit the the the the that in the butt the the the but the thet in the the the the the the the the that me the the the thith thathin't in the the thathe shat in the the bith the the the the the the whot in the the the shot the whe the the the thath the the but the the the thith me the bith the that i the with the thith whe the the the the the the the the the butt i' the the the the the the
----- diversity: 0.5
----- Generating with seed: " on your lips, my bum is on your lips"
a"
 on your lips, my bum is on your lips"
an't or thit a the thet in a watt i want rot't the shat't an thathe mo it ine the wack butcaup the the yours i't mothe the but in thithin'th the that the buad you thith thet ao the bit't whe the shot and hith with your an't in the whot i mack watt and hith thatithe buthe the the mithe t

i hi'tl i'm rrad of a be thy and slenc at they carly if you wantis boblutnepe to bee the comcknd dight don't stark it they corlin', knew i's know act to hmale i prool as ear eving? my paphen the halls geve beally
(hit ille-stam shativ
mer i'm elit mi, stup in’t an't hat, you do everynoal of your
and to selfa me?
go bay, your lould in anco’d me, fuck wher put yea
----- diversity: 1.2
----- Generating with seed: "g
x to the z
nate dogg
come on, yeah

aw"
g
x to the z
nate dogg
come on, yeah

aw your to lilli'l stats tifahes is realis like frout yeald mo sibae i'm fun the, wwe? of the narear 
i'gle rendy, shott nive in ever your oom youll got thay slitt backesan your fick lot, you led,ase
some han, cered (i'm s'nach iom plep world the hatnot your
agjwand oltyy fucks yours, annavesy disti eversed?(backest-i'd the flucks am and ig of knew i'll fon a wasotted 'cause ore sictyloot han the in
Epoch 5/60

----- Generating text after Epoch: 4
----- diversity: 0.2
----- Generating with seed: " kn

that when it's ancert the this with a got a lot his will me slim shady slim shady slart all me stand up?
'cause i take standin' mather all the back these drugs really got a got a 
----- diversity: 0.5
----- Generating with seed: "rshall!
hey, remember the time we went t"
rshall!
hey, remember the time we went to little little to be be a re
oy the real shaty, way just let me latel what chat villin' mothere songed and it all
it the bit
dun the really wanna fand waited for i way a toot
wath on my all you!
for fay, i don't really got a lough me crachin
look, where you here the be't the way the fuck of your to that i bont you
good him to mule
'cause they bullers
all i wanna got you don't got the a way drince
----- diversity: 1.0
----- Generating with seed: "rshall!
hey, remember the time we went t"
rshall!
hey, remember the time we went to being woth it, sheeps, with a man get? whor's just this pikes let it? incerse. dodn't, cancerse
you can't all ot lookey ald so thet harked of world, men'

just somenobed wath that's just sebowest yeak!
guessidem?"sminille
no, fon every hisille get on you
fuckin' this is was di bit?
and if hey iim-hig -where recrew the pprobe slum 
hy minds down' snoppbroo!
me)
betim's keck, ever or whete itgay wanged, i jusk thit (th-skn’ce, i just asbwaing is hese abrot-as wingare you ase!
i'm jrome me to hills
and i'm tillin' it yor hanbl moore, hick m
Epoch 12/60

----- Generating text after Epoch: 11
----- diversity: 0.2
----- Generating with seed: "fuck, spillin' guts to you
we just met, "
fuck, spillin' guts to you
we just met, beck, then but no, i don't got a lough, wath a got to all
and it it the beck, what i want just did i been you can't been at all
i want hit rucking the corence
you don't give a fuck, with me gudn me it all
and i the never the fight stink
i'm some shim than they call me so wrenk die worldand and the corunkens and the corurders and strot me the real slim shady wand up, please stand up, please stand u
----- diversity: 0.5
----- 

i way that's keen me that's now what i'm dender toress and a night? that bord be the and want and just this wonth
the weal, i with a coudd down to say this with a couse and knew i'd atcord to t
----- diversity: 1.0
----- Generating with seed: "r me? slam! slam!
remember me? nigga, ba"
r me? slam! slam!
remember me? nigga, bag
i cas a lotie my saim i the callin' imm and get that thes abes
me
that you a bit me tras
i am
i'm a cripinn thious hist in the sedis out as liken low go, loted aid would, i got oxt sont?
'cause i way a got voood 'em ge-us outtake, think anstes is you
don't llave my sack
(i here so sic man wo'd i'm leady natin', chart geap ats are just his
what up pleasens, had and off (i didn't so it aggot your 
----- diversity: 1.2
----- Generating with seed: "r me? slam! slam!
remember me? nigga, ba"
r me? slam! slam!
remember me? nigga, bask my dick!
'cause these ghin, you're reatied hirl on you
think, re. lam abpbow no bes
he, i gasi't shit violyes of the rosticed this with a 

i know you got my last two letters
in your hall of the nees, and get was doube somess with the real slim shady, yes i'm the real shady
all you do slim she's culd up to stand
i can hall on they say wher i was out me
'cause i'm shadyn you can his his wink i can hese my gens ain't was 'cause and get a little somess so wrown i can is so ligga no gon't got the ralle this worsed gets age
i wrough like the mother and get of you
go i'm l
----- diversity: 0.5
----- Generating with seed: "ve it?
i know you got my last two letter"
ve it?
i know you got my last two letters
in as of off thin hy, cal, they say what i just said some some nomos if your ham, i don't make in shist about me the callim that i was in he like if i need to a had
from in the singor this is at me
and i never knew i'd get him to suck my dick!
i'm some to said i need to me
the poppin' the to standin' don't wanna go on
but whot, no go go bad
wan, you'll on you
helf your singod fendess of your hou
----- diversity: 1.0
----- Genera

night agaff, i deviing like her probs worked and just die sees, we dear hill her will here yeah?
----- diversity: 1.2
----- Generating with seed: "
with a fifth in me, when i guzzle remy "

with a fifth in me, when i guzzle remy mess?press win’t peect us, piteorc this kids xor wengea notteen
duzwan, you be’s a llfe, i have an the lost to of, here, ect lated
but i'm a crew ronit no, i cam! it so log! i'm fuck off got to lly nill on vobutively persbout it overe
the buggh , bont of the lixtingse vorungior and furs?
no, thero)
if it the back in the fuck amike whot, pifty so ben blay who wantad flim shisy throwbes me so'r hig 
Epoch 23/60

----- Generating text after Epoch: 22
----- diversity: 0.2
----- Generating with seed: "y i am
i'm so sick and tired of being ad"
y i am
i'm so sick and tired of being ad me
on the side to think i'm gotere of up and and stappen you wanna reat shit, i drapse a bund back would i can suck me to did
to been it this pit the back do befrome me
by up and shootin

bitch on, it hee still out the str wing dens
i'm not sompond of the name an and you
weved to be me
eck, i'm shady, yesueseille so son't with the real slim shady dese slim shady desle sees a ligt of me
and you can't really wanna fuck with the me in the sint, yeah, i just lly i just like me slim shady up and vare in litine
they bitch! (way the fuckin' her funstrs and sifever sheepse shouted night back
do 
----- diversity: 1.0
----- Generating with seed: " and your guns and get laughed at
bitch "
 and your guns and get laughed at
bitch in hist of litele framem, my neld of bed
and my mib
i'm thiske stroupin stoppees, didnetither motherfus
lioted who edin' no tok
to be me my dadn't wanna need attab
i just lot this (i'm? whot i be ton't all
and i minds a ducking
don't don't want fun cher it's give a songory kimen me!
what, whilist this remaw, here just tike up for from she forf
i'm  tamin' the strets his cickind when they seeps, si
----- diversity: 1.2
----- Generating with seed: " and your 

i'm shady, i don't suckin' him some just this chit, this finds
hawi
Epoch 30/60

----- Generating text after Epoch: 29
----- diversity: 0.2
----- Generating with seed: "no more
they ain't say i can't rap about"
no more
they ain't say i can't rap about me
of the shit
and it
and i amidan the sinted the sight with a sent a dad
i mes in the that from amikn!
remember me? (it the real stank a content this must like ed bitch is start slim shady, i'm just a resid or aroop ithorr and trook c!
i'm shady, with a grat a loot to sell
ant they cutcestcosty, and want and just die write me
and i was go nagga? who lated from amikn-yourselfife!)ess agool and th
----- diversity: 0.5
----- Generating with seed: "no more
they ain't say i can't rap about"
no more
they ain't say i can't rap about ma

all they sint the hos
i just die snigs roopint to soutt be out of mice
i'm night just die wile with the real slim shady, wrate my saidus
i’m shady, you can't sluepin' that where with the real shady, with a hrank

i ne, of your "i take faged whot you
don't way that i voop of , i'm dod to you
and tvrown stunef
gove somexsesityous
i'm not sigged i ten tees sound ppecause this gets lit, it
gen a cam, you can't realitalive motherfuckensed
fuck you want calf
site bleath bitch stand you to car soppin' to you sten you ten st, feen light and trost to borother
i wroth me give a fuckin' to sell
a mat, years
i'm g
----- diversity: 1.2
----- Generating with seed: "ned to interscope?
my little sister's bi"
ned to interscope?
my little sister's bitch
won't just to out off think
afty doy this to beet you hatt with some
in these shit three trape, soune.pend stater think thes act mer the babing’s act the singin' dress, so cars
as looum in ait doublle
they say this couse stit haroug
dag
no i go i'll a pedin' him to slut his
insmem, chach, wat here just to tike shit, i rest a trost this to?
i nevin the "is betce no say i am
i'm and baby, weeg, 
Epoch 34/60

----- Generating text after Epoch: 33
----- diversity: 0.

'em where i got stop come the bick it, "id it out me to show sta man to did me to slunt get stall
me mith a fight you don't like me slim shady sleess me like i goght to but 'elle man, i aron't back, some on the 
----- diversity: 0.5
----- Generating with seed: "rs hit the ceiling
(i never knew i)
(i n"
rs hit the ceiling
(i never knew i)
(i never knew i) me in the back or you wanna right wantad man, i that the back down you don't like me mite this is figiot some song
i see at all it hes
i just saim i shatthic me how you don't, bitch some hownes dousin' to you cal, me chapss come beffremed? bor me
oh shoo the firsblawe it
you don't like me shittly dall memake it out a bonged wrot you
don't like me shit's you say i the bitch pusty's got
----- diversity: 1.0
----- Generating with seed: "rs hit the ceiling
(i never knew i)
(i n"
rs hit the ceiling
(i never knew i)
(i never knew i) me somether aight, i'm lifed on she quitenis
if you don't like hiim some some i wont
you'll get at a dound to 

dn't, then phil saw it all, then at a a brothermole to tonge stop like ee mise all fife it avel oftensered will or hew
cut 'twing so bedy?
shey do th-t’s saymell, i he aid with here!
so a sad
and punker, ol with a hrang-whe hered who babce! wven you thing when you ain't paing, bitch, i don't don't do it! her have and from amityville (him?
and up
dong how keen callam!
so son' this say w-o at
a hogrungs smapien 'cause i'm snifixta havy ou
Epoch 41/60

----- Generating text after Epoch: 40
----- diversity: 0.2
----- Generating with seed: "ers
brigade barricade to bring the noise"
ers
brigade barricade to bring the noisever i was high when i wrote this, so say what i'm gead whe recold stresp my dick if you don't wanna dannibut in lyou, but i say is right now, but i say this is when about me
(i'm like a fucking looge
these shit of you
seen you a trown baby, wher i get no man the fuck dead! had i am
i'm traping as me
jush had dousl
and i can till oo mance
these didn't way don't was be go sma

and i a night on these loving me herse hope
they callath me slim shady is me the trunkin' her get stap so my terne, and and stirt
this some shotta get, me this shit od me (ass?
'cause i a night don't make beat me to senty's
----- diversity: 1.0
----- Generating with seed: "eard of a mind as perverted as mine
you "
eard of a mind as perverted as mine
you cann't the stuck with me
give a sound up me now is me comesin' the told slim shadys up?
i'm thating if i just, years
i'm a suck ag, kid?
and i don't wanna to dong
bitch who boy somerem and get to but i just dies, ised too but i that i bonit do is rapped hoor th)
beforeste
exphe, in 'emoush with this pritinin's) here hapets
play me knea romed? i'm not mr and get bitch is befitall from me!
'cause i 
----- diversity: 1.2
----- Generating with seed: "eard of a mind as perverted as mine
you "
eard of a mind as perverted as mine
you canff, shit will yemors)
me, shit's come the caim and pribla smapin' or these motherfucker give a fuck, but no,

but not for him to take me that y'all
and i can't ee to sell, i make this beish it
you can stankin'
remambsboxredrect offoret they suepin' the back down ther and back off
yeat i was a deat this hit
you're holl out a homan' a of the mind
they but you shouth ohe but noy suck we doesso?
so a to the reas slim shady
i just die hell up in the say i tas of off offp"
than they knew i just do hat will be macher get him to slees ki
----- diversity: 0.5
----- Generating with seed: " that's fine!"
but not for him to take m"
 that's fine!"
but not for him to take me that y'all
i son't your bitch in the man, the firss at you
still your manins
and if you don't like me gits outrick of you
so wrowns living me (fuckin' him to slands me to say what i was swa that your no, so your cring, i don't doend to buy me to sentin' up on your on
olf, ressels knew i, knew i'm a tousi and really got a hold of me
the furu! if i let me guss, bitch
watchis? gon't back, they slaw
----- diversity: 1.0
----- Generating wit

i hatn the say you a winnat fougs afen twookin'
you held you thit the rics reake and make me nimga resome on troing t
----- diversity: 1.2
----- Generating with seed: "r?) no, bitch, i'm retired
fuckin' your "
r?) no, bitch, i'm retired
fuckin' your some
the strugh, out a how drave ass
dink rlam 
frull of me
that's this
stink you min't ner at a coad an the prostin' up (you are hight, fin clim in as whole will out, bitch it ame
don't get bitch trask
they babs, like you feed stuch it! gun a fucker and everyore? didn't not wooled a feggetive
so winds pir you dodn't been criccin't a fun the trake exthers aid at a noom
i every reteng in every pime
Epoch 52/60

----- Generating text after Epoch: 51
----- diversity: 0.2
----- Generating with seed: "st want you all to notice me, and people"
st want you all to notice me, and people on a to to shot me
backer
rom whot, stirm to me befough these donce
i dustathers act off the fuckin' he won't? goddammit he will (he's)
mentally ill from amityville


i hope you've helffffft the mowhat i was herencall ther hug
'til where and keep assech, i want bedor wantses hiss
please said i don't got that be gots acte
and if you bette fuck aigh i was swa chit's shotta neroused
no sen this wo that beenible
you get so trugh of yours
one)
me to see me to tell leal
i'm a criminal!
'cause that's fuck with me godda gun shoulda cal, they saule come trod on, me
and and off
i'm so
----- diversity: 1.0
----- Generating with seed: "decided to show his face!
i hope you've "
decided to show his face!
i hope you've holloy i men a hom?
and i wanna go on
wook is and brigged arew who shit that be goter than befocostes
(t'el manin', fay i the to that i just shot, you lot, bitchese stanin' i'm a crimin'lousing to be me
and uf one to bef
if he's on somerick to sen my pippen it get you argody
and every do fol my shies
(i'm gon’t song and guy fuck we do's why?
and sleave run you don't really wanni’s and fell figh th
----- diversity: 1.2
----- Generating with seed: "de

i go tome shot shopin' downs down these didy sleepsback
(i'm gon, this wonts tent to mibe fun leaver, yeah, you 
Epoch 59/60

----- Generating text after Epoch: 58
----- diversity: 0.2
----- Generating with seed: "
let me outta this place, i'm outta plac"

let me outta this place, i'm outta plack in the nater shit? just like me was deen
and i wo pusbout with hear stapin' the ball that who beck)
you head i am whan i been climin' he won't just like me who mackin' the pait this wented a den't got finds
and i a swing with hear saim shady, ye, deg't matelin' the ball that's man i seen you do this bitch
i never knew i, knew i'd actumn
and they do who back doy
i don't wanna go on
a little fuck 
----- diversity: 0.5
----- Generating with seed: "
let me outta this place, i'm outta plac"

let me outta this place, i'm outta plack in the lape, that shere
and i as if i cacks, dinne the mot the back in yess, let it out!
and i was i say in amindem wanna gon a got mad, get on the mot yeare
i'm a seapi