# Text Generation With Alice in Wonderland

Recurrent neural networks can also be used as generative models. This means that in addition to being used for predictive models (making predictions), they can learn the sequences of a problem and then generate entirely new plausible sequences for the problem domain. Generative models like this are helpful to study how well a model has learned a problem and learn more about the problem domain itself. In this project, you will discover how to create a generative model for text, character-by-character using LSTM recurrent neural networks in Python with Keras. After completing this project, you will know:

* Where to download a free corpus of text that you can use to train generative text models.
* How to frame the problem of text sequences to a recurrent neural network generative model.
* How to develop an LSTM to generate plausible text sequences for a given problem.

Let's get started.

**Note:** You may want to speed up the computation for this tutorial by using GPU rather than CPU hardware. This is a suggestion, not a requirement. The tutorial will work just fine on the CPU.

## Problem Description: Text Generation

Many of the classical texts are no longer protected under copyright. This means that you can download all of the text for these books for free and use them in experiments, like creating generative models. Perhaps the best place to access free books that are no longer protected by copyright is Project Gutenberg <sup>[1](https://www.gutenberg.org/)</sup>. This tutorial will use a favorite book from childhood as the dataset: Alice's Adventures in Wonderland by Lewis Carroll<sup>[2](https://www.gutenberg.org/ebooks/11)</sup>.

We will learn the dependencies between characters and the conditional probabilities of characters in sequences so that we can generate wholly new and original sequences of characters. This tutorial is a lot of fun, and I recommend repeating these experiments with other books from Project Gutenberg. These experiments are not limited to text; you can also experiment with other ASCII data, such as computer source code, marked-up documents in LaTeX, HTML or Markdown, and more.

You can download the complete text in ASCII format (Plain Text UTF-8)<sup>[3](http://www.gutenberg.org/cache/epub/11/pg11.txt)</sup> for this book for free and place it in the `datasets` directory with the filename `wonderland.txt` (***already done for you***). Now we need to prepare the dataset ready for modeling. Project Gutenberg adds a standard header and footer to each book, which is not part of the original text. Open the file in a text editor and delete the header and footer. The header is obvious and ends with the text:

`*** START OF THIS PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND ***`

The footer is all of the text after the line of text that says:

`THE END`

You should be left with a text file that has about 3,330 lines of text.

## Develop a Small LSTM Recurrent Neural Network

In this section, we will develop a simple LSTM network to learn sequences of characters from Alice in Wonderland. In the next section, we will use this model to generate new sequences of characters. Let's start by importing the classes and functions we intend to use to train our model.

In [1]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras import utils

Next, we need to load the ASCII text for the book into memory and convert all of the characters to lowercase to reduce the network's vocabulary.

In [2]:
# load ascii text and covert to lowercase
filename = "../../datasets/wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

Now that the book is loaded, we must prepare the data for modeling by the neural network. We cannot model the characters directly; instead, we must convert the characters to integers. We can do this easily by first creating a set of all of the distinct characters in the book, then creating a map of each character to a unique integer.

In [3]:
# create mapping of unique chars to integers, and a reverse mapping
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

For example, the list of unique sorted lowercase characters in the book is as follows:

```
['\n', '\r', ' ', '!', '"', "'", '(', ')', '*', ',', '-', '.', ':', ';', '?', '[', ']',
'_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\xbb', '\xbf', '\xef']
```

You can see that there may be some characters that we could remove to further clean up the dataset that will reduce the vocabulary and may improve the modeling process. Now that the book has been loaded and the mapping prepared, we can summarize the dataset.

In [4]:
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  144431
Total Vocab:  46


We can see that the book has just under 150,000 characters and that when converted to lowercase, there are only 47 distinct characters in the vocabulary for the network to learn—much more than the 26 in the alphabet. We now need to define the training data for the network. There is a lot of flexibility in choosing to break up the text and expose it to the network during training. This tutorial will split the book text up into subsequences with a fixed length of 100 characters, an arbitrary length. We could just as easily split the data up by sentences, pad the shorter sequences, and truncate the longer ones.

Each training pattern of the network comprises 100-time steps of one character (X) followed by one character output (y). When creating these sequences, we slide this window along the whole book one character at a time, allowing each character a chance to be learned from the 100 characters that preceded it (except the first 100 characters, of course). For example, if the sequence length is 5 (for simplicity), then the first two training patterns would be as follows:

```
CHAPT -> E
HAPTE -> R
```

As we split up the book into these sequences, we convert the characters to integers using our lookup table we prepared earlier.

In [5]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  144331


Running the code to this point shows us that when we split up the dataset into training data for the network to learn that we have just under 150,000 training pattens. This makes sense as excluding the first 100 characters; we have one training pattern to predict each of the remaining characters.

Now that we have prepared our training data, we need to transform it to be suitable for use with Keras. First, we must transform the list of input sequences into the form [samples, time steps, features] expected by an LSTM network. Next, we need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network that uses the sigmoid
activation function by default.

Finally, we need to convert the output patterns (single characters converted to integers) into a one-hot encoding. This is to configure the network to predict the probability of each of the 47 different characters in the vocabulary (an easier representation) rather than trying to force it to predict precisely the next character. Each y value is converted into a sparse vector with a length of 47, full of zeros, except with a 1 in the column for the letter (integer) that the pattern represents. For example, when n (integer value 31) is one hot encoded, it looks as follows:

```
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0.]
```

We can implement these steps as below.

In [6]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))

# normalize
X = X / float(n_vocab)

# one hot encode the output variable
y = utils.to_categorical(dataY)

We can now define our LSTM model. Here we define a single hidden LSTM layer with 256 memory units. The network uses dropout with a probability of 20%. The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the 47 characters between 0 and 1. The problem is a single character classification problem with 47 classes and, as such, is defined as optimizing the log loss (cross-entropy), here using the Adam optimization algorithm for speed.

In [7]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

There is no test dataset. We are modeling the entire training dataset to learn the probability of each character in a sequence. We are not interested in the most accurate (classification accuracy) model of the training dataset. This would be a model that predicts each character in the training dataset perfectly. Instead, we are interested in a generalization of the dataset
that minimizes the chosen loss function. We are seeking a balance between generalization and overfitting but short of memorization.

The network is slow to train (about 300 seconds per epoch on an Nvidia K520 GPU). Because of the slowness and our optimization requirements, we will use model checkpointing to record all of the network weights to file each time an improvement in loss is observed at the end of the epoch. We will use the best set of weights (lowest loss) to instantiate our generative model in the next section.

In [8]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, 
                             monitor='loss', 
                             verbose=1, 
                             save_best_only=True,
                             mode='min')
callbacks_list = [checkpoint]

We can now fit our model to the data. Here we use a modest number of 20 epochs and a large batch size of 128 patterns.

In [9]:
# fit the model
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20

Epoch 00001: loss improved from inf to 2.96642, saving model to weights-improvement-01-2.9664.hdf5
Epoch 2/20

Epoch 00002: loss improved from 2.96642 to 2.76319, saving model to weights-improvement-02-2.7632.hdf5
Epoch 3/20

Epoch 00003: loss improved from 2.76319 to 2.66228, saving model to weights-improvement-03-2.6623.hdf5
Epoch 4/20

Epoch 00004: loss improved from 2.66228 to 2.58135, saving model to weights-improvement-04-2.5814.hdf5
Epoch 5/20

Epoch 00005: loss improved from 2.58135 to 2.51255, saving model to weights-improvement-05-2.5126.hdf5
Epoch 6/20

Epoch 00006: loss improved from 2.51255 to 2.45334, saving model to weights-improvement-06-2.4533.hdf5
Epoch 7/20

Epoch 00007: loss improved from 2.45334 to 2.40188, saving model to weights-improvement-07-2.4019.hdf5
Epoch 8/20

Epoch 00008: loss improved from 2.40188 to 2.35042, saving model to weights-improvement-08-2.3504.hdf5
Epoch 9/20

Epoch 00009: loss improved from 2.35042 to 2.30523, saving model to weig

<tensorflow.python.keras.callbacks.History at 0x7f8d50199668>

You will see different results because of the stochastic nature of the model and because it is hard to fix the random seed for LSTM models to get 100% reproducible results. This is not a concern for this generative model. After running the example, you should have a number of weight checkpoint files in the local directory. You can delete them all except the one with the smallest loss value. For example, below was the checkpoint with the smallest loss we achieved when we ran this example.

`weights-improvement-20-1.9277.hdf5`

The network loss decreased almost every epoch, and I expect the network could benefit from training for many more epochs. In the next section, we will look at using this model to generate new text sequences.

## Generating Text with an LSTM Network

Generating text using the trained LSTM network is relatively straightforward. Firstly, we load the data and define the network in the same way, except the network weights are loaded from a checkpoint file, and the network does not need to be trained.

**Note:** It seems that you might need to use the same machine/environment to generate text as was used to create the network weights (e.g., GPUs or CPUs); otherwise, the network might generate garbage.

In [10]:
# load the network weights
filename = "weights-improvement-20-1.9277.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

When preparing the mapping of unique characters to integers, we must also create a reverse mapping that we can use to convert the integers back to characters to understand the predictions.

In [11]:
int_to_char = dict((i, c) for i, c in enumerate(chars))

Finally, we need to make predictions. The simplest way to use the Keras LSTM model to make predictions is to start with a seed sequence as input, generate the next character, and then update the seed sequence to add the generated character to the end and trim off the first character. This process is repeated for as long as we want to predict new characters (e.g., a sequence of 1,000 characters in length). We can pick a random input pattern as our seed
sequence, then print generated characters as we generate them.

In [12]:
import sys

# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

print("\nGenerated:")
# generate characters
for i in range(1000):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print("\nDone.")

Seed:
"  baby violently up and down, and the poor little thing howled so,
that alice could hardly hear the w "

Generated:
hite tablid thet sae oo the thitg, and she west on wirh a sorp of thicg so the thate, and she west on wirh a sore of thicg so the thate, and she west on wirh a sore of thicg so the thate, and she west on whrh a lott oi the seatin of the sabli, and she sest hn whlh a little toieted  and saed to the gorphon. 
'ie  you soonts ' said the manch hare.

'ie ion't hate io wou taan ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said the manch hare.

'ie ion't sate thing ' said

Running this example first outputs the selected random seed, then each character as it is generated. For example, below are the results from one run of this text generator. The random seed was:

```
baby violently up and down, and the poor little thing howled so,
that alice could hardly hear the w
```

The generated text with the random seed was:

```
hite tablid thet sae oo the thitg, and she west on wirh a sorp of thicg so the thate, and she west on wirh a sore of thicg so the thate, and she west on wirh a sore of thicg so the thate, and she west on whrh a lott oi the seatin of the sabli, and she sest hn whlh a little toieted  and saed to the gorphon. 
'ie  you soonts ' said the manch hare.

'ie ion't hate io wou taan ' said the manch hare
```

We can note some observations about the generated text.

* It generally conforms to the line format observed in the original text of fewer than 80 characters before a new line.
* The characters are separated into word-like groups, and most groups are English words (e.g., the little and was), but many do not (e.g., lott, tiie, and taede).
* Some of the words in sequence make sense(e.g., and the white rabbit), but many do not (e.g., wese tilel).

The fact that this character-based model of the book produces output like this is very impressive. It gives you a sense of the learning capabilities of LSTM networks. The results are not perfect. In the next section, we look at improving the quality of results by developing a much larger LSTM network.

## Larger LSTM Recurrent Neural Network

We got results, but not excellent results in the previous section. Now, we can improve the quality of the generated text by creating a much larger network. We will keep the number of memory units the same at 256 but add a second layer.

```
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
```

We will also change the filename of the checkpointed weights to tell the difference between weights for this network and the previous (by appending the word bigger in the filename).

`filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"`

Finally, we will increase the training epochs from 20 to 50 and decrease the batch size from 128 to 64 to give the network more of an opportunity to be updated and learn. The complete code listing is presented below for completeness.

In [13]:
# Larger LSTM Network to Generate Text for Alice in Wonderland
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras import utils

# load ascii text and covert to lowercase
filename = "../../datasets/wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

# summarize the loaded data
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))

# normalize
X = X / float(n_vocab)

# one hot encode the output variable
y = utils.to_categorical(dataY)

# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True,
mode='min')
callbacks_list = [checkpoint]

# fit the model
model.fit(X, y, epochs=50, batch_size=64, callbacks=callbacks_list)

Total Characters:  144431
Total Vocab:  46
Total Patterns:  144331
Epoch 1/50

Epoch 00001: loss improved from inf to 2.78978, saving model to weights-improvement-01-2.7898-bigger.hdf5
Epoch 2/50

Epoch 00002: loss improved from 2.78978 to 2.43368, saving model to weights-improvement-02-2.4337-bigger.hdf5
Epoch 3/50

Epoch 00003: loss improved from 2.43368 to 2.23046, saving model to weights-improvement-03-2.2305-bigger.hdf5
Epoch 4/50

Epoch 00004: loss improved from 2.23046 to 2.09418, saving model to weights-improvement-04-2.0942-bigger.hdf5
Epoch 5/50

Epoch 00005: loss improved from 2.09418 to 1.99630, saving model to weights-improvement-05-1.9963-bigger.hdf5
Epoch 6/50

Epoch 00006: loss improved from 1.99630 to 1.91818, saving model to weights-improvement-06-1.9182-bigger.hdf5
Epoch 7/50

Epoch 00007: loss improved from 1.91818 to 1.85730, saving model to weights-improvement-07-1.8573-bigger.hdf5
Epoch 8/50

Epoch 00008: loss improved from 1.85730 to 1.80349, saving model to wei


Epoch 00042: loss improved from 1.25390 to 1.25174, saving model to weights-improvement-42-1.2517-bigger.hdf5
Epoch 43/50

Epoch 00043: loss improved from 1.25174 to 1.24582, saving model to weights-improvement-43-1.2458-bigger.hdf5
Epoch 44/50

Epoch 00044: loss improved from 1.24582 to 1.24230, saving model to weights-improvement-44-1.2423-bigger.hdf5
Epoch 45/50

Epoch 00045: loss did not improve from 1.24230
Epoch 46/50

Epoch 00046: loss improved from 1.24230 to 1.23469, saving model to weights-improvement-46-1.2347-bigger.hdf5
Epoch 47/50

Epoch 00047: loss improved from 1.23469 to 1.22756, saving model to weights-improvement-47-1.2276-bigger.hdf5
Epoch 48/50

Epoch 00048: loss did not improve from 1.22756
Epoch 49/50

Epoch 00049: loss did not improve from 1.22756
Epoch 50/50

Epoch 00050: loss improved from 1.22756 to 1.22594, saving model to weights-improvement-50-1.2259-bigger.hdf5


<tensorflow.python.keras.callbacks.History at 0x7f8ce00b0208>

Running this example takes some time, at least 700 seconds per epoch. After running this example, you may achieve a loss of about 1.2. For example, the best result I achieved from running this model was stored in a checkpoint file with the name:

`weights-improvement-50-1.2259-bigger.hdf5`

Achieving a loss of 1.2219 at epoch 47. As in the previous section, we can use this best model from the run to generate text. The only change we need to make to the text generation script from the previous section is in the specification of the network topology and from which file to seed the network weights. The complete code listing is provided below for completeness.

In [20]:
# Load Larger LSTM network and generate text
import sys
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras import utils

# load ascii text and covert to lowercase
filename = "../../datasets/wonderland.txt"
raw_text = open(filename).read()
raw_text = raw_text.lower()

# create mapping of unique chars to integers, and a reverse mapping
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

# summarize the loaded data
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))

# normalize
X = X / float(n_vocab)

# one hot encode the output variable
y = utils.to_categorical(dataY)

# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(256))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))

# load the network weights
filename = "weights-improvement-50-1.2259-bigger.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

# generate characters
print("\nGenerated:")
for i in range(1000):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print("\nDone.")

Total Characters:  144431
Total Vocab:  46
Total Patterns:  144331
Seed:
" or?"'

'she boxed the queen's ears--' the rabbit began. alice gave a little
scream of laughter. 'oh, "

Generated:
 downd you tell me,' she said to herself, as she spoke as the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the p

One example of running this text generation script produces the output below. The randomly chosen seed text was:

```
" or?"'

'she boxed the queen's ears--' the rabbit began. alice gave a little
scream of laughter. 'oh, "
```

The generated text with the seed was:

```
 downd you tell me,' she said to herself, as she spoke as the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind and looked at the mock turtle, and the pueen say on tit off, and the pueen say on lind
```

We can see that generally, there are fewer spelling mistakes, and the text looks more realistic but is still quite nonsensical. For example, the same phrases get repeated again and again, like said to herself and little. Quotes are opened but not closed. These are better results but there is still much room for improvement.

## Extension Ideas to Improve the Model

Below is a sample of ideas that you may want to investigate to improve the model further:
* Predict fewer than 1,000 characters as output for a given seed.
* Remove all punctuation from the source text, and therefore from the models' vocabulary.
* Try a one-hot encoding for the input sequences.
* Train the model on padded sentences rather than random sequences of characters.
* Increase the number of training epochs to 100 or many hundreds.
* Add dropout to the visible input layer and consider tuning the dropout percentage.
* Tune the batch size, try a batch size of 1 as a (very slow) baseline, and larger sizes from there.
* Add more memory units to the layers and/or more layers.
* Experiment with scale factors (temperature) when interpreting the prediction probabilities.
* Change the LSTM layers to be stateful to maintain state across batches.

## Summary

In this project, you discovered how to develop an LSTM recurrent neural network for text generation in Python with the Keras deep learning library. After completing this project, you know:

* Where to download the ASCII text for classical books for free that you can use for training.
* How to train an LSTM network on text sequences and use the trained network to generate new sequences.
* How to develop stacked LSTM networks and lift the performance of the model.