# TV Script Generation
In this project, you'll generate your own [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) TV scripts using RNNs.  You'll be using part of the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) of scripts from 27 seasons.  The Neural Network you'll build will generate a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).
## Get the Data
The data is already provided for you.  You'll be using a subset of the original dataset.  It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [1]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper

data_dir = './data/simpsons/moes_tavern_lines.txt'
text = helper.load_data(data_dir)
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

## Explore the Data
Play around with `view_sentence_range` to view different parts of the data.

In [2]:
view_sentence_range = (0, 10)

"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import numpy as np

print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))
sentence_count_scene = [scene.count('\n') for scene in scenes]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))
word_count_sentence = [len(sentence.split()) for sentence in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

print()
print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

Dataset Stats
Roughly the number of unique words: 11492
Number of scenes: 262
Average number of sentences in each scene: 15.248091603053435
Number of lines: 4257
Average number of words in each line: 11.50434578341555

The sentences 0 to 10:
Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.




## Implement Preprocessing Functions
The first thing to do to any dataset is preprocessing.  Implement the following preprocessing functions below:
- Lookup Table
- Tokenize Punctuation

### Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following tuple `(vocab_to_int, int_to_vocab)`

In [3]:
import numpy as np
import problem_unittests as tests

def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    # TODO: Implement Function
    vocab = set(text)
    vocab_to_int = {c: i for i, c in enumerate(vocab)}
    int_to_vocab = dict(enumerate(vocab))
    
    return vocab_to_int, int_to_vocab


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [4]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    tokenize_dictionary = {
        '.': '|period|',
        ',': '|comma|',
        '"': '|quotation_mark|',
        ';': '|semicolon_mark|',
        '!': '|exclamation|',
        '?': '|question_mark|',
        '(': '|left_parenthesis|',
        ')': '|right_parenthesis|',
        '--': '|dash|',
        '\n': '|return|'
    }
    
    return tokenize_dictionary

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_tokenize(token_lookup)

Tests Passed


## Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [5]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

# Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [6]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import helper
import numpy as np
import problem_unittests as tests

int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## Build the Neural Network
You'll build the components necessary to build a RNN by implementing the following functions below:
- get_inputs
- get_init_cell
- get_embed
- build_rnn
- build_nn
- get_batches

### Check the Version of TensorFlow and Access to GPU

In [7]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from distutils.version import LooseVersion
import warnings
import tensorflow as tf

# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.0'), 'Please use TensorFlow version 1.0 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))

# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

TensorFlow Version: 1.1.0




### Input
Implement the `get_inputs()` function to create TF Placeholders for the Neural Network.  It should create the following placeholders:
- Input text placeholder named "input" using the [TF Placeholder](https://www.tensorflow.org/api_docs/python/tf/placeholder) `name` parameter.
- Targets placeholder
- Learning Rate placeholder

Return the placeholders in the following tuple `(Input, Targets, LearningRate)`

In [8]:
def get_inputs():
    """
    Create TF Placeholders for input, targets, and learning rate.
    :return: Tuple (input, targets, learning rate)
    """
    # TODO: Implement Function
    input = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    learningRate = tf.placeholder(tf.float32, name='learningRate')
    
    return input, targets, learningRate


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_inputs(get_inputs)

Tests Passed


### Build RNN Cell and Initialize
Stack one or more [`BasicLSTMCells`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/BasicLSTMCell) in a [`MultiRNNCell`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell).
- The Rnn size should be set using `rnn_size`
- Initalize Cell State using the MultiRNNCell's [`zero_state()`](https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/MultiRNNCell#zero_state) function
    - Apply the name "initial_state" to the initial state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the cell and initial state in the following tuple `(Cell, InitialState)`

In [9]:
def get_init_cell(batch_size, rnn_size):
    """
    Create an RNN Cell and initialize it.
    :param batch_size: Size of batches
    :param rnn_size: Size of RNNs
    :return: Tuple (cell, initialize state)
    """
    # TODO: Implement Function
    def build_cell(rnn_size):
        lstm = tf.contrib.rnn.BasicLSTMCell(rnn_size)
        return lstm
    
    num_layers = 2
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(rnn_size) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    initial_state = tf.identity(initial_state, name='initial_state')
    return (cell, initial_state)

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_init_cell(get_init_cell)

Tests Passed


### Word Embedding
Apply embedding to `input_data` using TensorFlow.  Return the embedded sequence.

In [10]:
def get_embed(input_data, vocab_size, embed_dim):
    """
    Create embedding for <input_data>.
    :param input_data: TF placeholder for text input.
    :param vocab_size: Number of words in vocabulary.
    :param embed_dim: Number of embedding dimensions
    :return: Embedded input.
    """
    # TODO: Implement Function
    embedding = tf.Variable(tf.random_uniform((vocab_size, embed_dim), -1, 1))
    embed = tf.nn.embedding_lookup(embedding, input_data)
    
    return embed


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_embed(get_embed)

Tests Passed


### Build RNN
You created a RNN Cell in the `get_init_cell()` function.  Time to use the cell to create a RNN.
- Build the RNN using the [`tf.nn.dynamic_rnn()`](https://www.tensorflow.org/api_docs/python/tf/nn/dynamic_rnn)
 - Apply the name "final_state" to the final state using [`tf.identity()`](https://www.tensorflow.org/api_docs/python/tf/identity)

Return the outputs and final_state state in the following tuple `(Outputs, FinalState)` 

In [11]:
def build_rnn(cell, inputs):
    """
    Create a RNN using a RNN Cell
    :param cell: RNN Cell
    :param inputs: Input text data
    :return: Tuple (Outputs, Final State)
    """
    # TODO: Implement Function
    outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, dtype=tf.float32)
    final_state = tf.identity(final_state, name='final_state')
    return outputs, final_state


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_rnn(build_rnn)

Tests Passed


### Build the Neural Network
Apply the functions you implemented above to:
- Apply embedding to `input_data` using your `get_embed(input_data, vocab_size, embed_dim)` function.
- Build RNN using `cell` and your `build_rnn(cell, inputs)` function.
- Apply a fully connected layer with a linear activation and `vocab_size` as the number of outputs.

Return the logits and final state in the following tuple (Logits, FinalState) 

In [12]:
def build_nn(cell, rnn_size, input_data, vocab_size, embed_dim):
    """
    Build part of the neural network
    :param cell: RNN cell
    :param rnn_size: Size of rnns
    :param input_data: Input data
    :param vocab_size: Vocabulary size
    :param embed_dim: Number of embedding dimensions
    :return: Tuple (Logits, FinalState)
    """
    # TODO: Implement Function
    embed = get_embed(input_data, vocab_size, embed_dim)
    outputs, final_state = build_rnn(cell, embed)
    # TODO Initialize weights with random normal.
    logits = tf.contrib.layers.fully_connected(outputs, \
                                               vocab_size, \
                                               activation_fn=None, \
                                               weights_initializer=tf.contrib.layers.xavier_initializer(uniform=False))
    return (logits, final_state)


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_build_nn(build_nn)

Tests Passed


### Batches
Implement `get_batches` to create batches of input and targets using `int_text`.  The batches should be a Numpy array with the shape `(number of batches, 2, batch size, sequence length)`. Each batch contains two elements:
- The first element is a single batch of **input** with the shape `[batch size, sequence length]`
- The second element is a single batch of **targets** with the shape `[batch size, sequence length]`

If you can't fill the last batch with enough data, drop the last batch.

For exmple, `get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)` would return a Numpy array of the following:
```
[
  # First Batch
  [
    # Batch of Input
    [[ 1  2], [ 7  8], [13 14]]
    # Batch of targets
    [[ 2  3], [ 8  9], [14 15]]
  ]

  # Second Batch
  [
    # Batch of Input
    [[ 3  4], [ 9 10], [15 16]]
    # Batch of targets
    [[ 4  5], [10 11], [16 17]]
  ]

  # Third Batch
  [
    # Batch of Input
    [[ 5  6], [11 12], [17 18]]
    # Batch of targets
    [[ 6  7], [12 13], [18  1]]
  ]
]
```

Notice that the last target value in the last batch is the first input value of the first batch. In this case, `1`. This is a common technique used when creating sequence batches, although it is rather unintuitive.

In [54]:

def get_batches(int_text, batch_size, seq_length):
    """
    Return batches of input and target
    :param int_text: Text with the words replaced by their ids
    :param batch_size: The size of batch
    :param seq_length: The length of sequence
    :return: Batches as a Numpy array
    """
    
    int_text = np.array(int_text)
    num_batches = len(int_text) // (batch_size * seq_length)
    input = int_text[:num_batches * batch_size * seq_length] #select only exact batch sizes
    target = int_text[1:(num_batches * batch_size * seq_length)+1] # why target index 1 more?
    target[(num_batches * batch_size * seq_length)-1] = int_text[0]
    batches = np.zeros([num_batches, 2, batch_size, seq_length], np.int32) 
    print(target)
    
    a = np.array(np.split(input, indices_or_sections=batch_size))    
    b = np.array(np.split(target, indices_or_sections=batch_size))
    
    for i in range(num_batches):
        batches[i][0] = a[:, i * seq_length : (i+1) * seq_length]
        
        if i-1 == num_batches:
            batches[i][1] = b[:, i * seq_length : (i+1) * seq_length]  
        else: 
            batches[i][1] = b[:, i * seq_length : (i+1) * seq_length]        
    
    return batches

#x = get_batches([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 3, 2)
#print(x)

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_batches(get_batches)

[   1    2    3 ..., 4478 4479    0]
Tests Passed


## Neural Network Training
### Hyperparameters
Tune the following parameters:

- Set `num_epochs` to the number of epochs.
- Set `batch_size` to the batch size.
- Set `rnn_size` to the size of the RNNs.
- Set `embed_dim` to the size of the embedding.
- Set `seq_length` to the length of sequence.
- Set `learning_rate` to the learning rate.
- Set `show_every_n_batches` to the number of batches the neural network should print progress.

In [89]:
# Number of Epochs
num_epochs = 512
# Batch Size
batch_size = 256
# RNN Size
rnn_size = 256 #128
# Embedding Dimension Size
embed_dim = 100
# Sequence Length
seq_length = 11 #100
# Learning Rate
learning_rate = 0.01
# Show stats for every n number of batches
show_every_n_batches = 5

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
save_dir = './save'

### Build the Graph
Build the graph using the neural network you implemented.

In [90]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
from tensorflow.contrib import seq2seq

train_graph = tf.Graph()
with train_graph.as_default():
    vocab_size = len(int_to_vocab)
    input_text, targets, lr = get_inputs()
    input_data_shape = tf.shape(input_text)
    cell, initial_state = get_init_cell(input_data_shape[0], rnn_size)
    logits, final_state = build_nn(cell, rnn_size, input_text, vocab_size, embed_dim)

    # Probabilities for generating words
    probs = tf.nn.softmax(logits, name='probs')

    # Loss function
    cost = seq2seq.sequence_loss(
        logits,
        targets,
        tf.ones([input_data_shape[0], input_data_shape[1]]))

    # Optimizer
    optimizer = tf.train.AdamOptimizer(lr)

    # Gradient Clipping
    gradients = optimizer.compute_gradients(cost)
    capped_gradients = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gradients if grad is not None]
    train_op = optimizer.apply_gradients(capped_gradients)

## Train
Train the neural network on the preprocessed data.  If you have a hard time getting a good loss, check the [forms](https://discussions.udacity.com/) to see if anyone is having the same problem.

In [91]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
batches = get_batches(int_text, batch_size, seq_length)

with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())

    for epoch_i in range(num_epochs):
        state = sess.run(initial_state, {input_text: batches[0][0]})

        for batch_i, (x, y) in enumerate(batches):
            feed = {
                input_text: x,
                targets: y,
                initial_state: state,
                lr: learning_rate}
            train_loss, state, _ = sess.run([cost, final_state, train_op], feed)

            # Show every <show_every_n_batches> batches
            if (epoch_i * len(batches) + batch_i) % show_every_n_batches == 0:
                print('Epoch {:>3} Batch {:>4}/{}   train_loss = {:.3f}'.format(
                    epoch_i,
                    batch_i,
                    len(batches),
                    train_loss))

    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, save_dir)
    print('Model Trained and Saved')

[2076 4892  790 ..., 4194  677 5727]
Epoch   0 Batch    0/24   train_loss = 8.822
Epoch   0 Batch    5/24   train_loss = 6.782
Epoch   0 Batch   10/24   train_loss = 6.775
Epoch   0 Batch   15/24   train_loss = 6.876
Epoch   0 Batch   20/24   train_loss = 6.722
Epoch   1 Batch    1/24   train_loss = 6.571
Epoch   1 Batch    6/24   train_loss = 6.139
Epoch   1 Batch   11/24   train_loss = 6.035
Epoch   1 Batch   16/24   train_loss = 6.067
Epoch   1 Batch   21/24   train_loss = 6.112
Epoch   2 Batch    2/24   train_loss = 5.969
Epoch   2 Batch    7/24   train_loss = 5.999
Epoch   2 Batch   12/24   train_loss = 5.928
Epoch   2 Batch   17/24   train_loss = 5.721
Epoch   2 Batch   22/24   train_loss = 5.760
Epoch   3 Batch    3/24   train_loss = 5.578
Epoch   3 Batch    8/24   train_loss = 5.674
Epoch   3 Batch   13/24   train_loss = 5.447
Epoch   3 Batch   18/24   train_loss = 5.391
Epoch   3 Batch   23/24   train_loss = 5.252
Epoch   4 Batch    4/24   train_loss = 5.059
Epoch   4 Batch   

Epoch  37 Batch   22/24   train_loss = 1.006
Epoch  38 Batch    3/24   train_loss = 1.004
Epoch  38 Batch    8/24   train_loss = 0.952
Epoch  38 Batch   13/24   train_loss = 1.006
Epoch  38 Batch   18/24   train_loss = 0.948
Epoch  38 Batch   23/24   train_loss = 0.926
Epoch  39 Batch    4/24   train_loss = 0.977
Epoch  39 Batch    9/24   train_loss = 0.937
Epoch  39 Batch   14/24   train_loss = 0.905
Epoch  39 Batch   19/24   train_loss = 0.927
Epoch  40 Batch    0/24   train_loss = 0.992
Epoch  40 Batch    5/24   train_loss = 0.904
Epoch  40 Batch   10/24   train_loss = 0.887
Epoch  40 Batch   15/24   train_loss = 0.886
Epoch  40 Batch   20/24   train_loss = 0.918
Epoch  41 Batch    1/24   train_loss = 0.891
Epoch  41 Batch    6/24   train_loss = 0.826
Epoch  41 Batch   11/24   train_loss = 0.795
Epoch  41 Batch   16/24   train_loss = 0.856
Epoch  41 Batch   21/24   train_loss = 0.805
Epoch  42 Batch    2/24   train_loss = 0.891
Epoch  42 Batch    7/24   train_loss = 0.813
Epoch  42 

Epoch  76 Batch    1/24   train_loss = 0.319
Epoch  76 Batch    6/24   train_loss = 0.318
Epoch  76 Batch   11/24   train_loss = 0.287
Epoch  76 Batch   16/24   train_loss = 0.313
Epoch  76 Batch   21/24   train_loss = 0.313
Epoch  77 Batch    2/24   train_loss = 0.338
Epoch  77 Batch    7/24   train_loss = 0.316
Epoch  77 Batch   12/24   train_loss = 0.327
Epoch  77 Batch   17/24   train_loss = 0.326
Epoch  77 Batch   22/24   train_loss = 0.336
Epoch  78 Batch    3/24   train_loss = 0.332
Epoch  78 Batch    8/24   train_loss = 0.316
Epoch  78 Batch   13/24   train_loss = 0.331
Epoch  78 Batch   18/24   train_loss = 0.320
Epoch  78 Batch   23/24   train_loss = 0.298
Epoch  79 Batch    4/24   train_loss = 0.329
Epoch  79 Batch    9/24   train_loss = 0.316
Epoch  79 Batch   14/24   train_loss = 0.305
Epoch  79 Batch   19/24   train_loss = 0.334
Epoch  80 Batch    0/24   train_loss = 0.329
Epoch  80 Batch    5/24   train_loss = 0.330
Epoch  80 Batch   10/24   train_loss = 0.304
Epoch  80 

Epoch 114 Batch    4/24   train_loss = 0.321
Epoch 114 Batch    9/24   train_loss = 0.310
Epoch 114 Batch   14/24   train_loss = 0.297
Epoch 114 Batch   19/24   train_loss = 0.327
Epoch 115 Batch    0/24   train_loss = 0.322
Epoch 115 Batch    5/24   train_loss = 0.325
Epoch 115 Batch   10/24   train_loss = 0.297
Epoch 115 Batch   15/24   train_loss = 0.314
Epoch 115 Batch   20/24   train_loss = 0.314
Epoch 116 Batch    1/24   train_loss = 0.310
Epoch 116 Batch    6/24   train_loss = 0.310
Epoch 116 Batch   11/24   train_loss = 0.278
Epoch 116 Batch   16/24   train_loss = 0.303
Epoch 116 Batch   21/24   train_loss = 0.303
Epoch 117 Batch    2/24   train_loss = 0.327
Epoch 117 Batch    7/24   train_loss = 0.306
Epoch 117 Batch   12/24   train_loss = 0.318
Epoch 117 Batch   17/24   train_loss = 0.318
Epoch 117 Batch   22/24   train_loss = 0.327
Epoch 118 Batch    3/24   train_loss = 0.323
Epoch 118 Batch    8/24   train_loss = 0.308
Epoch 118 Batch   13/24   train_loss = 0.324
Epoch 118 

Epoch 152 Batch    7/24   train_loss = 0.305
Epoch 152 Batch   12/24   train_loss = 0.317
Epoch 152 Batch   17/24   train_loss = 0.319
Epoch 152 Batch   22/24   train_loss = 0.331
Epoch 153 Batch    3/24   train_loss = 0.328
Epoch 153 Batch    8/24   train_loss = 0.320
Epoch 153 Batch   13/24   train_loss = 0.357
Epoch 153 Batch   18/24   train_loss = 0.450
Epoch 153 Batch   23/24   train_loss = 0.735
Epoch 154 Batch    4/24   train_loss = 1.355
Epoch 154 Batch    9/24   train_loss = 2.074
Epoch 154 Batch   14/24   train_loss = 2.580
Epoch 154 Batch   19/24   train_loss = 3.051
Epoch 155 Batch    0/24   train_loss = 3.098
Epoch 155 Batch    5/24   train_loss = 3.127
Epoch 155 Batch   10/24   train_loss = 3.264
Epoch 155 Batch   15/24   train_loss = 3.096
Epoch 155 Batch   20/24   train_loss = 2.981
Epoch 156 Batch    1/24   train_loss = 2.731
Epoch 156 Batch    6/24   train_loss = 2.687
Epoch 156 Batch   11/24   train_loss = 2.536
Epoch 156 Batch   16/24   train_loss = 2.431
Epoch 156 

Epoch 190 Batch   10/24   train_loss = 0.305
Epoch 190 Batch   15/24   train_loss = 0.321
Epoch 190 Batch   20/24   train_loss = 0.323
Epoch 191 Batch    1/24   train_loss = 0.317
Epoch 191 Batch    6/24   train_loss = 0.316
Epoch 191 Batch   11/24   train_loss = 0.285
Epoch 191 Batch   16/24   train_loss = 0.310
Epoch 191 Batch   21/24   train_loss = 0.311
Epoch 192 Batch    2/24   train_loss = 0.334
Epoch 192 Batch    7/24   train_loss = 0.311
Epoch 192 Batch   12/24   train_loss = 0.325
Epoch 192 Batch   17/24   train_loss = 0.325
Epoch 192 Batch   22/24   train_loss = 0.336
Epoch 193 Batch    3/24   train_loss = 0.330
Epoch 193 Batch    8/24   train_loss = 0.314
Epoch 193 Batch   13/24   train_loss = 0.330
Epoch 193 Batch   18/24   train_loss = 0.318
Epoch 193 Batch   23/24   train_loss = 0.299
Epoch 194 Batch    4/24   train_loss = 0.327
Epoch 194 Batch    9/24   train_loss = 0.315
Epoch 194 Batch   14/24   train_loss = 0.303
Epoch 194 Batch   19/24   train_loss = 0.332
Epoch 195 

Epoch 228 Batch   13/24   train_loss = 0.323
Epoch 228 Batch   18/24   train_loss = 0.311
Epoch 228 Batch   23/24   train_loss = 0.291
Epoch 229 Batch    4/24   train_loss = 0.320
Epoch 229 Batch    9/24   train_loss = 0.308
Epoch 229 Batch   14/24   train_loss = 0.296
Epoch 229 Batch   19/24   train_loss = 0.327
Epoch 230 Batch    0/24   train_loss = 0.321
Epoch 230 Batch    5/24   train_loss = 0.323
Epoch 230 Batch   10/24   train_loss = 0.296
Epoch 230 Batch   15/24   train_loss = 0.313
Epoch 230 Batch   20/24   train_loss = 0.314
Epoch 231 Batch    1/24   train_loss = 0.309
Epoch 231 Batch    6/24   train_loss = 0.308
Epoch 231 Batch   11/24   train_loss = 0.277
Epoch 231 Batch   16/24   train_loss = 0.301
Epoch 231 Batch   21/24   train_loss = 0.303
Epoch 232 Batch    2/24   train_loss = 0.325
Epoch 232 Batch    7/24   train_loss = 0.304
Epoch 232 Batch   12/24   train_loss = 0.317
Epoch 232 Batch   17/24   train_loss = 0.317
Epoch 232 Batch   22/24   train_loss = 0.327
Epoch 233 

Epoch 266 Batch   16/24   train_loss = 0.299
Epoch 266 Batch   21/24   train_loss = 0.301
Epoch 267 Batch    2/24   train_loss = 0.323
Epoch 267 Batch    7/24   train_loss = 0.303
Epoch 267 Batch   12/24   train_loss = 0.315
Epoch 267 Batch   17/24   train_loss = 0.316
Epoch 267 Batch   22/24   train_loss = 0.325
Epoch 268 Batch    3/24   train_loss = 0.319
Epoch 268 Batch    8/24   train_loss = 0.305
Epoch 268 Batch   13/24   train_loss = 0.321
Epoch 268 Batch   18/24   train_loss = 0.309
Epoch 268 Batch   23/24   train_loss = 0.289
Epoch 269 Batch    4/24   train_loss = 0.318
Epoch 269 Batch    9/24   train_loss = 0.305
Epoch 269 Batch   14/24   train_loss = 0.294
Epoch 269 Batch   19/24   train_loss = 0.324
Epoch 270 Batch    0/24   train_loss = 0.319
Epoch 270 Batch    5/24   train_loss = 0.321
Epoch 270 Batch   10/24   train_loss = 0.294
Epoch 270 Batch   15/24   train_loss = 0.312
Epoch 270 Batch   20/24   train_loss = 0.312
Epoch 271 Batch    1/24   train_loss = 0.307
Epoch 271 

Epoch 304 Batch   19/24   train_loss = 0.336
Epoch 305 Batch    0/24   train_loss = 0.351
Epoch 305 Batch    5/24   train_loss = 0.403
Epoch 305 Batch   10/24   train_loss = 0.599
Epoch 305 Batch   15/24   train_loss = 1.123
Epoch 305 Batch   20/24   train_loss = 1.806
Epoch 306 Batch    1/24   train_loss = 2.346
Epoch 306 Batch    6/24   train_loss = 2.865
Epoch 306 Batch   11/24   train_loss = 3.158
Epoch 306 Batch   16/24   train_loss = 3.311
Epoch 306 Batch   21/24   train_loss = 3.209
Epoch 307 Batch    2/24   train_loss = 3.200
Epoch 307 Batch    7/24   train_loss = 3.139
Epoch 307 Batch   12/24   train_loss = 3.047
Epoch 307 Batch   17/24   train_loss = 2.851
Epoch 307 Batch   22/24   train_loss = 2.662
Epoch 308 Batch    3/24   train_loss = 2.724
Epoch 308 Batch    8/24   train_loss = 2.517
Epoch 308 Batch   13/24   train_loss = 2.387
Epoch 308 Batch   18/24   train_loss = 2.375
Epoch 308 Batch   23/24   train_loss = 2.259
Epoch 309 Batch    4/24   train_loss = 2.168
Epoch 309 

Epoch 342 Batch   22/24   train_loss = 0.336
Epoch 343 Batch    3/24   train_loss = 0.331
Epoch 343 Batch    8/24   train_loss = 0.316
Epoch 343 Batch   13/24   train_loss = 0.330
Epoch 343 Batch   18/24   train_loss = 0.320
Epoch 343 Batch   23/24   train_loss = 0.300
Epoch 344 Batch    4/24   train_loss = 0.327
Epoch 344 Batch    9/24   train_loss = 0.315
Epoch 344 Batch   14/24   train_loss = 0.306
Epoch 344 Batch   19/24   train_loss = 0.335
Epoch 345 Batch    0/24   train_loss = 0.329
Epoch 345 Batch    5/24   train_loss = 0.331
Epoch 345 Batch   10/24   train_loss = 0.305
Epoch 345 Batch   15/24   train_loss = 0.322
Epoch 345 Batch   20/24   train_loss = 0.321
Epoch 346 Batch    1/24   train_loss = 0.316
Epoch 346 Batch    6/24   train_loss = 0.317
Epoch 346 Batch   11/24   train_loss = 0.284
Epoch 346 Batch   16/24   train_loss = 0.310
Epoch 346 Batch   21/24   train_loss = 0.309
Epoch 347 Batch    2/24   train_loss = 0.331
Epoch 347 Batch    7/24   train_loss = 0.311
Epoch 347 

Epoch 381 Batch    1/24   train_loss = 0.309
Epoch 381 Batch    6/24   train_loss = 0.309
Epoch 381 Batch   11/24   train_loss = 0.276
Epoch 381 Batch   16/24   train_loss = 0.301
Epoch 381 Batch   21/24   train_loss = 0.302
Epoch 382 Batch    2/24   train_loss = 0.324
Epoch 382 Batch    7/24   train_loss = 0.304
Epoch 382 Batch   12/24   train_loss = 0.316
Epoch 382 Batch   17/24   train_loss = 0.318
Epoch 382 Batch   22/24   train_loss = 0.326
Epoch 383 Batch    3/24   train_loss = 0.320
Epoch 383 Batch    8/24   train_loss = 0.307
Epoch 383 Batch   13/24   train_loss = 0.321
Epoch 383 Batch   18/24   train_loss = 0.311
Epoch 383 Batch   23/24   train_loss = 0.291
Epoch 384 Batch    4/24   train_loss = 0.319
Epoch 384 Batch    9/24   train_loss = 0.306
Epoch 384 Batch   14/24   train_loss = 0.296
Epoch 384 Batch   19/24   train_loss = 0.326
Epoch 385 Batch    0/24   train_loss = 0.320
Epoch 385 Batch    5/24   train_loss = 0.323
Epoch 385 Batch   10/24   train_loss = 0.296
Epoch 385 

Epoch 419 Batch    4/24   train_loss = 2.047
Epoch 419 Batch    9/24   train_loss = 2.327
Epoch 419 Batch   14/24   train_loss = 2.530
Epoch 419 Batch   19/24   train_loss = 2.466
Epoch 420 Batch    0/24   train_loss = 2.555
Epoch 420 Batch    5/24   train_loss = 2.509
Epoch 420 Batch   10/24   train_loss = 2.562
Epoch 420 Batch   15/24   train_loss = 2.491
Epoch 420 Batch   20/24   train_loss = 2.521
Epoch 421 Batch    1/24   train_loss = 2.322
Epoch 421 Batch    6/24   train_loss = 2.295
Epoch 421 Batch   11/24   train_loss = 2.208
Epoch 421 Batch   16/24   train_loss = 2.182
Epoch 421 Batch   21/24   train_loss = 2.135
Epoch 422 Batch    2/24   train_loss = 2.047
Epoch 422 Batch    7/24   train_loss = 1.987
Epoch 422 Batch   12/24   train_loss = 1.932
Epoch 422 Batch   17/24   train_loss = 1.851
Epoch 422 Batch   22/24   train_loss = 1.679
Epoch 423 Batch    3/24   train_loss = 1.772
Epoch 423 Batch    8/24   train_loss = 1.582
Epoch 423 Batch   13/24   train_loss = 1.582
Epoch 423 

Epoch 457 Batch    7/24   train_loss = 0.311
Epoch 457 Batch   12/24   train_loss = 0.324
Epoch 457 Batch   17/24   train_loss = 0.324
Epoch 457 Batch   22/24   train_loss = 0.334
Epoch 458 Batch    3/24   train_loss = 0.328
Epoch 458 Batch    8/24   train_loss = 0.314
Epoch 458 Batch   13/24   train_loss = 0.329
Epoch 458 Batch   18/24   train_loss = 0.318
Epoch 458 Batch   23/24   train_loss = 0.298
Epoch 459 Batch    4/24   train_loss = 0.325
Epoch 459 Batch    9/24   train_loss = 0.313
Epoch 459 Batch   14/24   train_loss = 0.302
Epoch 459 Batch   19/24   train_loss = 0.332
Epoch 460 Batch    0/24   train_loss = 0.327
Epoch 460 Batch    5/24   train_loss = 0.329
Epoch 460 Batch   10/24   train_loss = 0.302
Epoch 460 Batch   15/24   train_loss = 0.319
Epoch 460 Batch   20/24   train_loss = 0.319
Epoch 461 Batch    1/24   train_loss = 0.315
Epoch 461 Batch    6/24   train_loss = 0.315
Epoch 461 Batch   11/24   train_loss = 0.282
Epoch 461 Batch   16/24   train_loss = 0.307
Epoch 461 

Epoch 495 Batch   10/24   train_loss = 0.296
Epoch 495 Batch   15/24   train_loss = 0.313
Epoch 495 Batch   20/24   train_loss = 0.312
Epoch 496 Batch    1/24   train_loss = 0.308
Epoch 496 Batch    6/24   train_loss = 0.307
Epoch 496 Batch   11/24   train_loss = 0.276
Epoch 496 Batch   16/24   train_loss = 0.300
Epoch 496 Batch   21/24   train_loss = 0.302
Epoch 497 Batch    2/24   train_loss = 0.323
Epoch 497 Batch    7/24   train_loss = 0.303
Epoch 497 Batch   12/24   train_loss = 0.316
Epoch 497 Batch   17/24   train_loss = 0.316
Epoch 497 Batch   22/24   train_loss = 0.326
Epoch 498 Batch    3/24   train_loss = 0.320
Epoch 498 Batch    8/24   train_loss = 0.306
Epoch 498 Batch   13/24   train_loss = 0.321
Epoch 498 Batch   18/24   train_loss = 0.310
Epoch 498 Batch   23/24   train_loss = 0.290
Epoch 499 Batch    4/24   train_loss = 0.318
Epoch 499 Batch    9/24   train_loss = 0.306
Epoch 499 Batch   14/24   train_loss = 0.294
Epoch 499 Batch   19/24   train_loss = 0.325
Epoch 500 

## Save Parameters
Save `seq_length` and `save_dir` for generating a new TV script.

In [92]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# Save parameters for checkpoint
helper.save_params((seq_length, save_dir))

# Checkpoint

In [93]:
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import tensorflow as tf
import numpy as np
import helper
import problem_unittests as tests

_, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()
seq_length, load_dir = helper.load_params()

## Implement Generate Functions
### Get Tensors
Get tensors from `loaded_graph` using the function [`get_tensor_by_name()`](https://www.tensorflow.org/api_docs/python/tf/Graph#get_tensor_by_name).  Get the tensors using the following names:
- "input:0"
- "initial_state:0"
- "final_state:0"
- "probs:0"

Return the tensors in the following tuple `(InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)` 

In [94]:
def get_tensors(loaded_graph):
    """
    Get input, initial state, final state, and probabilities tensor from <loaded_graph>
    :param loaded_graph: TensorFlow graph loaded from file
    :return: Tuple (InputTensor, InitialStateTensor, FinalStateTensor, ProbsTensor)
    """
    # TODO: Implement Function
    return loaded_graph.get_tensor_by_name('input:0'),\
        loaded_graph.get_tensor_by_name('initial_state:0'), \
        loaded_graph.get_tensor_by_name('final_state:0'), \
        loaded_graph.get_tensor_by_name('probs:0')


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_get_tensors(get_tensors)

Tests Passed


### Choose Word
Implement the `pick_word()` function to select the next word using `probabilities`.

In [95]:
def pick_word(probabilities, int_to_vocab):
    """
    Pick the next word in the generated text
    :param probabilities: Probabilites of the next word
    :param int_to_vocab: Dictionary of word ids as the keys and words as the values
    :return: String of the predicted word
    """
    # TODO: Implement Function
    return int_to_vocab[np.random.choice(len(int_to_vocab.keys()), p=probabilities)]


"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_pick_word(pick_word)

Tests Passed


## Generate TV Script
This will generate the TV script for you.  Set `gen_length` to the length of TV script you want to generate.

In [96]:
gen_length = 200
# homer_simpson, moe_szyslak, or Barney_Gumble
prime_word = 'moe_szyslak'

"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(load_dir + '.meta')
    loader.restore(sess, load_dir)

    # Get Tensors from loaded model
    input_text, initial_state, final_state, probs = get_tensors(loaded_graph)

    # Sentences generation setup
    gen_sentences = [prime_word + ':']
    prev_state = sess.run(initial_state, {input_text: np.array([[1]])})

    # Generate sentences
    for n in range(gen_length):
        # Dynamic Input
        dyn_input = [[vocab_to_int[word] for word in gen_sentences[-seq_length:]]]
        dyn_seq_length = len(dyn_input[0])

        # Get Prediction
        probabilities, prev_state = sess.run(
            [probs, final_state],
            {input_text: dyn_input, initial_state: prev_state})
        
        pred_word = pick_word(probabilities[dyn_seq_length-1], int_to_vocab)

        gen_sentences.append(pred_word)
    
    # Remove tokens
    tv_script = ' '.join(gen_sentences)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')
        
    print(tv_script)

INFO:tensorflow:Restoring parameters from ./save
moe_szyslak: hey, homer... what if you got back to me.
moe_szyslak: i'm just the big thing. if i need to give you in? i didn't sell to" because--

" then" take that, you won't be livin' in our bar.
homer_simpson: moe, have to make him some day. seems like make me an big guy who was gonna have to come...
homer_simpson: so you can say?
homer_simpson: well, if i come... uh, boy, you and on my fault!
lenny_leonard:(laughs) i'd never do it?
moe_szyslak: he, guys! to an brave seat.
homer_simpson: oh man, i'm so proud! where you, help that, hideous time i tried a ticket.
moe_szyslak:(muttering) homer, well, you lost us to find the corpses?
carl_carlson:(shrugs) well, i, what's the money of moe's all all the best friend, homer.
homer_simpson: you came to the time, homer. on


# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  We trained on less than a megabyte of text.  In order to get good results, you'll have to use a smaller vocabulary or get more data.  Luckly there's more data!  As we mentioned in the begging of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data).  We didn't have you train on all the data, because that would take too long.  However, you are free to train your neural network on all the data.  After you complete the project, of course.
# Submitting This Project
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_tv_script_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "helper.py" and "problem_unittests.py" files in your submission.