# TV Script Generation
In this project, you'll generate your own [Simpsons](https://en.wikipedia.org/wiki/The_Simpsons) TV scripts using RNNs. 

You'll be using part of the [Simpsons dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data) of scripts from 27 seasons.

The Neural Network you'll build will generate a new TV script for a scene at [Moe's Tavern](https://simpsonswiki.com/wiki/Moe's_Tavern).

**Objectif : Generate a new episode of the Simpsons.**
- 1 Get the Data
- 2 Explore the Data
- 3 Implement Preprocessing functions
    - 3-1 Lookup Table
    - 3-2 Tokenize Punctuation
- 4 Build the neural network
    - 4-1 Create Training examples and targets
    - 4-2 Create Training batches
- 5 Build the model
- 6 Try the model
- 7 Train the model
- 8 Generate TV Scripts
    

In [None]:
import numpy as np
import pandas as pd
import os
import sys
import warnings
import tensorflow as tf
from tensorflow import keras as k
from tensorflow.keras import layers

In [None]:
import problem_unittests as tests
import helper

## 1 - Get the Data
The data is already provided for you.  You'll be using a subset of the original dataset.  It consists of only the scenes in Moe's Tavern.  This doesn't include other versions of the tavern, like "Moe's Cavern", "Flaming Moe's", "Uncle Moe's Family Feed-Bag", etc..

In [None]:
data_dir = 'moes_tavern_lines.txt'
text = helper.load_data(data_dir)
    
# Ignore notice, since we don't use it for analysing the data
text = text[81:]

In [None]:
print(text[:1000])

Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.


Moe_Szyslak: Ah, isn't that nice. Now, there is a politician who cares.
Barney_Gumble: If I ever vote, it'll be for him. (BELCH)


Barney_Gumble: Hey Homer, how's your neighbor's store doing?
Homer_Simpson: Lousy. He just sits there all day. He'd have a great job if he didn't own the place. (CHUCKLES)


## 2 - Explore the Data
Play around with `view_sentence_range` to view different parts of the data.

In [None]:
print('Dataset Stats')
print('Roughly the number of unique words: {}'.format(len({word: None for word in text.split()})))

Dataset Stats
Roughly the number of unique words: 11492


In [None]:
# TODO Compute the number of scenes
scenes = text.split('\n\n')
print('Number of scenes: {}'.format(len(scenes)))

Number of scenes: 262


In [None]:
# TODO Compute the average number of sentence per scene
sentence_count_scene = [len(i) for i in [s.split('\n') for s in scenes]]
print('Average number of sentences in each scene: {}'.format(np.average(sentence_count_scene)))

Average number of sentences in each scene: 16.248091603053435


In [None]:
sentences = [sentence for scene in scenes for sentence in scene.split('\n')]
print('Number of lines: {}'.format(len(sentences)))

Number of lines: 4257


In [None]:
# TODO Compute the average number of word per sentence
word_count_sentence = [len(words.split()) for words in sentences]
print('Average number of words in each line: {}'.format(np.average(word_count_sentence)))

Average number of words in each line: 11.50434578341555


In [None]:
view_sentence_range = (0, 10)


print('The sentences {} to {}:'.format(*view_sentence_range))
print('\n'.join(text.split('\n')[view_sentence_range[0]:view_sentence_range[1]]))

The sentences 0 to 10:
Moe_Szyslak: (INTO PHONE) Moe's Tavern. Where the elite meet to drink.
Bart_Simpson: Eh, yeah, hello, is Mike there? Last name, Rotch.
Moe_Szyslak: (INTO PHONE) Hold on, I'll check. (TO BARFLIES) Mike Rotch. Mike Rotch. Hey, has anybody seen Mike Rotch, lately?
Moe_Szyslak: (INTO PHONE) Listen you little puke. One of these days I'm gonna catch you, and I'm gonna carve my name on your back with an ice pick.
Moe_Szyslak: What's the matter Homer? You're not your normal effervescent self.
Homer_Simpson: I got my problems, Moe. Give me another one.
Moe_Szyslak: Homer, hey, you should not drink to forget your problems.
Barney_Gumble: Yeah, you should only drink to enhance your social skills.




## 3 - Implement Preprocessing Functions
The first thing to do to any dataset is preprocessing.  Implement the following preprocessing functions below:
- Lookup Table
- Tokenize Punctuation

### 3-1 Lookup Table
To create a word embedding, you first need to transform the words to ids.  In this function, create two dictionaries:
- Dictionary to go from the words to an id, we'll call `vocab_to_int`
- Dictionary to go from the id to word, we'll call `int_to_vocab`

Return these dictionaries in the following tuple `(vocab_to_int, int_to_vocab)

In [None]:
def create_lookup_tables(text):
    """
    Create lookup tables for vocabulary
    :param text: The text of tv scripts split into words
    :return: A tuple of dicts (vocab_to_int, int_to_vocab)
    """
    
    # TODO: Implement Function
    
    text_set = set(text)
    vocab_to_int = { v:i for i,v in enumerate(text_set)}
    
    int_to_vocab = { i:v for i,v in enumerate(text_set)}
    
    return vocab_to_int, int_to_vocab


tests.test_create_lookup_tables(create_lookup_tables)

Tests Passed


### 3-2 Tokenize Punctuation
We'll be splitting the script into a word array using spaces as delimiters.  However, punctuations like periods and exclamation marks make it hard for the neural network to distinguish between the word "bye" and "bye!".

Implement the function `token_lookup` to return a dict that will be used to tokenize symbols like "!" into "||Exclamation_Mark||".  Create a dictionary for the following symbols where the symbol is the key and value is the token:
- Period ( . )
- Comma ( , )
- Quotation Mark ( " )
- Semicolon ( ; )
- Exclamation mark ( ! )
- Question mark ( ? )
- Left Parentheses ( ( )
- Right Parentheses ( ) )
- Dash ( -- )
- Return ( \n )

This dictionary will be used to token the symbols and add the delimiter (space) around it.  This separates the symbols as it's own word, making it easier for the neural network to predict on the next word. Make sure you don't use a token that could be confused as a word. Instead of using the token "dash", try using something like "||dash||".

In [None]:
def token_lookup():
    """
    Generate a dict to turn punctuation into a token.
    :return: Tokenize dictionary where the key is the punctuation and the value is the token
    """
    # TODO: Implement Function
    return {".":"||Period||",
            "," : "||Comma||",
            "?" : "||QuestionMark||",
            '!' : "||ExclamationMark||",
            ";" : "||Semicolon||",
            '"' : "||QuotationMark",
            ')' : "||RightParentheses||",
            '(' : "||LeftParentheses||",
            '--' : "||Dash||",
            '\n' : "||Return||"
           }

tests.test_tokenize(token_lookup)

Tests Passed


### 3-3 Preprocess all the data and save it
Running the code cell below will preprocess all the data and save it to file.

In [None]:
# Preprocess Training, Validation, and Testing Data
helper.preprocess_and_save_data(data_dir, token_lookup, create_lookup_tables)

### 3-4 Check Point
This is your first checkpoint. If you ever decide to come back to this notebook or have to restart the notebook, you can start from here. The preprocessed data has been saved to disk.

In [None]:
int_text, vocab_to_int, int_to_vocab, token_dict = helper.load_preprocess()

## 4 - Build the Neural Network

In this section, you'll build the components necessary to build a Recurrent Neural Network

### Check Access to GPU

In [None]:
# Check for a GPU
if not tf.test.gpu_device_name():
    warnings.warn('No GPU found. Please use a GPU to train your neural network.')
else:
    print('Default GPU Device: {}'.format(tf.test.gpu_device_name()))

Default GPU Device: /device:GPU:0


### 4-1 Create Training examples and targets
Next divide the text into example sequences. Each input sequence will contain seq_length words from the text.

For each input sequence, the corresponding targets contain the same length of word, except shifted one word to the right.

So break the text into chunks of seq_length+1. For example, say seq_length is 9 and our text is :
- "Michael Jordan is the greatest basketball player of all time". 

The input sequence would be :
- "Michael Jordan is the greatest basketball player of all"

And the target sequence would be :
- "Jordan is the greatest basketball player of all time".

To do this first use the tf.data.Dataset.from_tensor_slices function to convert the word int vector into a stream of word indices.

The [tf.data.Dataset API](https://www.tensorflow.org/api_docs/python/tf/data/Dataset) supports writing descriptive and efficient input pipelines. Dataset usage follows a common pattern:

- Create a source dataset from your input data.
- Apply dataset transformations to preprocess the data.
- Iterate over the dataset and process the elements.

Iteration happens in a streaming fashion, so the full dataset does not need to fit into memory.

In [None]:
# The maximum length sentence we want for a single input in words
seq_length = 16
examples_per_epoch = len(int_text)//(seq_length)
print(examples_per_epoch)

# Create training examples / targets
word_dataset = tf.data.Dataset.from_tensor_slices(int_text)

for i in word_dataset.take(5):
    print(int_to_vocab[i.numpy()])

4318
moe_szyslak:
||leftparentheses||
into
phone
||rightparentheses||


The `batch` method lets us easily convert these individual words to sequences of the desired size.

In [None]:
sequences = word_dataset.batch(seq_length, drop_remainder=True)

for item in sequences.take(1):
    print(np.array(item))
    print(repr(''.join([int_to_vocab[elt] for elt in np.array(item)])))

[6205 6551 1331 1944  388 5785 2056 1458 5380 5078  817 3666    5  625
 1458 2157]
"moe_szyslak:||leftparentheses||intophone||rightparentheses||moe'stavern||period||wheretheelitemeettodrink||period||||return||"


For each sequence, duplicate and shift it to form the input and target text by using the `map` method to apply a simple function to each batch:

In [None]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

Print the first examples input and target values:

In [None]:
for input_example, target_example in  dataset.take(1):
    print ('Input data: ', repr(''.join([int_to_vocab[elt] for elt in np.array(input_example)])))
    #TODO TO IMPLEMENT
    print ('Target data:', repr(''.join([int_to_vocab[elt] for elt in np.array(target_example)])))

Input data:  "moe_szyslak:||leftparentheses||intophone||rightparentheses||moe'stavern||period||wheretheelitemeettodrink||period||"
Target data: "||leftparentheses||intophone||rightparentheses||moe'stavern||period||wheretheelitemeettodrink||period||||return||"


Each index of these vectors are processed as one time step. For the input at time step 0, the model receives the index for 'moe_szyslak:' and trys to predict the index for '||left_parentheses||' as the next word. At the next timestep, it does the same thing but the `RNN` considers the previous step context in addition to the current input word.

In [None]:
for i, (input_idx, target_idx) in enumerate(zip(input_example[:5], target_example[:5])):
    print("Step {:4d}".format(i))
    print("  input: {} ({:s})".format(input_idx, repr(int_to_vocab[input_idx.numpy()])))
    print("  expected output: {} ({:s})".format(target_idx, repr(int_to_vocab[target_idx.numpy()])))

Step    0
  input: 6205 ('moe_szyslak:')
  expected output: 6551 ('||leftparentheses||')
Step    1
  input: 6551 ('||leftparentheses||')
  expected output: 1331 ('into')
Step    2
  input: 1331 ('into')
  expected output: 1944 ('phone')
Step    3
  input: 1944 ('phone')
  expected output: 388 ('||rightparentheses||')
Step    4
  input: 388 ('||rightparentheses||')
  expected output: 5785 ("moe's")


### 4-2 Create training batches

We used `tf.data` to split the text into manageable sequences. But before feeding this data into the model, we need to shuffle the data and pack it into batches.

In [None]:
# Batch size
BATCH_SIZE = 128

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

dataset

<BatchDataset element_spec=(TensorSpec(shape=(128, 15), dtype=tf.int32, name=None), TensorSpec(shape=(128, 15), dtype=tf.int32, name=None))>

## 5 Build the model
Use `tf.keras.Sequential` to define the model. For this simple example three layers are used to define our model:

* `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map the numbers of each word to a vector with `embedding_dim` dimensions;
* `tf.keras.layers.GRU`: A type of RNN with size `units=rnn_units` `return_sequences=True`, `stateful=True` and `recurrent_initializer='glorot_uniform'` (You can also use a LSTM layer here.)
* `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs.

In [None]:
# Length of the vocabulary in words
vocab_size = len(vocab_to_int)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 512

In [None]:
# TODO TO IMPLEMENT
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim=vocab_size,output_dim=embedding_dim,batch_size=batch_size),
    tf.keras.layers.GRU(units=rnn_units,return_sequences=True, stateful=True,recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dense(vocab_size)
    ])
    return model

In [None]:
# TODO TO IMPLEMENT
model = build_model(vocab_size, embedding_dim, rnn_units, BATCH_SIZE)


For each word the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next word:

![A drawing of the data passing through the model](images/text_generation_training.png)

Please note that we choose to Keras sequential model here since all the layers in the model only have single input and produce single output. In case you want to retrieve and reuse the states from stateful RNN layer, you might want to build your model with Keras functional API or model subclassing. Please check [Keras RNN guide](https://www.tensorflow.org/guide/keras/rnn#rnn_state_reuse) for more details.

## 6 Try the model

Now run the model to see that it behaves as expected.

First check the shape of the output:

In [None]:
for input_example_batch, target_example_batch in dataset.take(1):
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape)
    print(( BATCH_SIZE, seq_length-1, vocab_size ))

(128, 15, 6779)
(128, 15, 6779)


In the above example the sequence length of the input is 16 (value of seq_length variable) but the model can be run on inputs of any length:

In [None]:
model.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_6 (Embedding)     (128, None, 256)          1735424   
                                                                 
 gru_6 (GRU)                 (128, None, 512)          1182720   
                                                                 
 dense_6 (Dense)             (128, None, 6779)         3477627   
                                                                 
Total params: 6,395,771
Trainable params: 6,395,771
Non-trainable params: 0
_________________________________________________________________


To get actual predictions from the model we need to sample from the output distribution, to get actual word indices. This distribution is defined by the logits over the word vocabulary.

Note: It is important to sample from this distribution as taking the argmax of the distribution can easily get the model stuck in a loop.

Try it for the first example in the batch:

In [None]:
# Generate an array of size (seq_length,1). Each value of the array is a word index
# logits: 2-D Tensor with shape [batch_size, num_classes]. Each slice [i, :] represents the unnormalized log-probabilities for all classes.
# num_samples = Number of independent samples to draw for each row slice.
# Return the drawn samples (random) of shape [batch_size, num_samples].
sampled_indices = tf.random.categorical(logits = example_batch_predictions[0], num_samples=1)

# Squeeze removes dimension of size 1 from the shape of a tensor
sampled_indices = tf.squeeze(sampled_indices,axis=-1).numpy()

This gives us, at each timestep, a prediction of the next word index:

In [None]:
sampled_indices

array([6563, 2074, 3524, 2148,  459,  904,   28,  909, 2206, 5761, 2950,
       4906, 3030, 5632, 3104])

Decode these to see the text predicted by this untrained model:

In [None]:
input_example_batch[0].numpy()

array([   5, 3639, 5040,  848, 6690, 3639, 6216,    5, 1923, 6756, 5078,
       2553, 1458, 2157, 6205], dtype=int32)

In [None]:
print("Input: \n", repr(" ".join([int_to_vocab[elt] for elt in input_example_batch[0].numpy()])))
print()
print("Next Word Predictions: \n", repr(" ".join([int_to_vocab[idx] for idx in sampled_indices])))

Input: 
 'to you ||comma|| moe ||questionmark|| you used to be about the booze ||period|| ||return|| moe_szyslak:'

Next Word Predictions: 
 "pian-ee taking temp value understand television painting chips blob incognito ya' up-bup-bup healthier savagely seven"


## 7 Train the model
At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next word.

### 7-1 Attach an optimizer, and a loss function

The standard `tf.keras.losses.sparse_categorical_crossentropy` loss function works in this case because it is applied across the last dimension of the predictions.

Because our model returns logits, we need to set the `from_logits` flag.

In [None]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

example_batch_loss  = loss(target_example_batch, example_batch_predictions)
print("Prediction shape: ", example_batch_predictions.shape, " # (batch_size, sequence_length, vocab_size)")
print("scalar_loss:      ", example_batch_loss.numpy().mean())

Prediction shape:  (128, 15, 6779)  # (batch_size, sequence_length, vocab_size)
scalar_loss:       8.821674


Configure the training procedure using the `tf.keras.Model.compile` method. We'll use `tf.keras.optimizers.Adam` with default arguments and the loss function.

In [None]:
# TODO TO IMPLEMENT
model.compile(optimizer=tf.keras.optimizers.Adam(),loss=loss)

### 7-2 Configure Check Point

Use a `tf.keras.callbacks.ModelCheckpoint` to ensure that checkpoints are saved during training:

In [None]:
# Directory where the checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

### 7-3 Execute the training

To keep training time reasonable, use 10 epochs to train the model. In Colab, set the runtime to GPU for faster training.

In [None]:
EPOCHS=100

In [None]:
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback])

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

## 8 - Generate TV Script
This will generate the TV script for you.

### 8-1 Restore the latest checkpoint

To keep this prediction step simple, use a batch size of 1.

Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.

To run the model with a different `batch_size`, we need to rebuild the model and restore the weights from the checkpoint.

In [None]:
tf.train.latest_checkpoint(checkpoint_dir)

'./training_checkpoints/ckpt_100'

In [None]:
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(tf.train.latest_checkpoint(checkpoint_dir))

model.build(tf.TensorShape([1, None]))

In [None]:
model.summary()

Model: "sequential_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_7 (Embedding)     (1, None, 256)            1735424   
                                                                 
 gru_7 (GRU)                 (1, None, 512)            1182720   
                                                                 
 dense_7 (Dense)             (1, None, 6779)           3477627   
                                                                 
Total params: 6,395,771
Trainable params: 6,395,771
Non-trainable params: 0
_________________________________________________________________


### 8-2 The prediction loop

The following code block generates the text:

* It Starts by choosing a start string, initializing the RNN state and setting the number of words to generate.

* Get the prediction distribution of the next word using the start string and the RNN state.

* Then, use a categorical distribution to calculate the index of the predicted word. Use this predicted word as our next input to the model.

* The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one word. After predicting the next word, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.


![To generate text the model's output is fed back to the input](images/text_generation_sampling.png)


Looking at the generated text, you'll see the model knows when to capitalize, make paragraphs and imitates a Shakespeare-like writing vocabulary. With the small number of training epochs, it has not yet learned to form coherent sentences.

In [None]:
def generate_text(model, start_string, gen_length):
    # Evaluation step (generating text using the learned model)

    # Converting our start string to numbers (vectorizing)
    input_eval = vocab_to_int[start_string]
    input_eval = tf.expand_dims(input_eval, 0)
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty string to store our results
    text_generated = []

    # Here batch size == 1
    model.reset_states()
    for i in range(gen_length):
        # TODO TO IMPLEMENT : call predict
        predictions = model.predict(input_eval)
        # remove the batch dimension
        predictions = tf.squeeze(predictions, 0)

        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

        # We pass the predicted word as the next input to the model
        # along with the previous hidden state
        input_eval = tf.expand_dims([predicted_id], 0)

        # TODO TO IMPLEMENT Add the generated word to the generated text
        text_generated.append(int_to_vocab[predicted_id])

    # Remove tokens
    tv_script = start_string + ' '.join(text_generated)
    for key, token in token_dict.items():
        ending = ' ' if key in ['\n', '(', '"'] else ''
        tv_script = tv_script.replace(' ' + token.lower(), key)
    tv_script = tv_script.replace('\n ', '\n')
    tv_script = tv_script.replace('( ', '(')

    return (tv_script)

Let's go to prediction:
- Set `gen_length` to 1000 (the length of TV script you want to generate.)
- We will use the following start string: "moe_szyslak:"

In [None]:
print(generate_text(model, start_string=u"moe_szyslak:", gen_length=1000))

moe_szyslak:faith from grace is complete.
lenny_leonard: nineteen ninety-seven!
carl_carlson: homer, that was amazing-- he actually felt better comin' through...
moe_szyslak: four.


barflies: we're proof that you... without the field goal? you gotta be sober to fly. boxing involving multiple senators.
kent_brockman: no, no. not a man who doesn't care to me.
moe_szyslak:(puzzled) you want to" buy a round on, no sharing!
moe_szyslak:(hopeful) you gotta do for yourself.
homer_simpson:(surprised sound) what the-- where's my money?


homer_simpson:(excited) ooo, no. oh, no. no pal is a real tenuous hold on my girlfriend here.
moe_szyslak: yeah, and that's not all. why don't you tell 'em to cat) i won't tell if you don't tell life.
homer_simpson:(puzzled) ooh, my liver hurts.


moe_szyslak: what's wrong, homer, what's the point? i've been in the bartender business for a long time alright, i've heard of it.
moe_szyslak:(shocked) are those ears?! all the" forget-me-shot?"(reading)" strokkur g

The easiest thing you can do to improve the results it to train it for longer (try `EPOCHS=30`).

You can also experiment with a different start string, or try adding another RNN layer to improve the model's accuracy, or adjusting the temperature parameter to generate more or less random predictions.

# The TV Script is Nonsensical
It's ok if the TV script doesn't make any sense.  We trained on less than a megabyte of text.  In order to get good results, you'll have to use a smaller vocabulary or get more data.  Luckly there's more data!  As we mentioned in the begging of this project, this is a subset of [another dataset](https://www.kaggle.com/wcukierski/the-simpsons-by-the-data).  We didn't have you train on all the data, because that would take too long.  However, you are free to train your neural network on all the data.  After you complete the project, of course.
