# Text Generation with LSTM

Recurrent neural networks are also known for their ability to generate text.  As a result, the output of the neural network can be free-form text.  In this section, we will see how to train an LSTM can  on a textual document, such as classic literature, and learn to output new text that appears to be of the same form as the training material.  If you train your LSTM on [Shakespeare](https://en.wikipedia.org/wiki/William_Shakespeare), it will learn to crank out new prose similar to what Shakespeare had written. 

Don't get your hopes up.  You are not going to teach your deep neural network to write the next [Pulitzer Prize for Fiction](https://en.wikipedia.org/wiki/Pulitzer_Prize_for_Fiction).  The prose generated by your neural network will be nonsensical.  However, it will usually be nearly grammatically and of a similar style as the source training documents. 

A neural network generating nonsensical text based on literature may not seem useful at first glance.  However, this technology gets so much interest because it forms the foundation for many more advanced technologies.  The fact that the LSTM will typically learn human grammar from the source document opens a wide range of possibilities. You can use similar technology to complete sentences when a user is entering text.  Simply the ability to output free-form text becomes the foundation of many other technologies.  In the next part, we will use this technique to create a neural network that can write captions for images to describe what is going on in the picture. 

### Additional Information

The following are some of the articles that I found useful in putting this section together.

* [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
* [Keras LSTM Generation Example](https://keras.io/examples/lstm_text_generation/)

### Character-Level Text Generation

There are several different approaches to teaching a neural network to output free-form text.  The most basic question is if you wish the neural network to learn at the word or character level.  In many ways, learning at the character level is the more interesting of the two.  The LSTM is learning to construct its own words without even being shown what a word is.  We will begin with character-level text generation.  In the next module, we will see how we can use nearly the same technique to operate at the word level.  We will implement word-level automatic captioning in the next module.

We begin by importing the needed Python packages and defining the sequence length, named **maxlen**.  Time-series neural networks always accept their input as a fixed-length array.  Because you might not use all of the sequence elements, it is common to fill extra elements with zeros.  You will divide the text into sequences of this length, and the neural network will train to predict what comes after this sequence.

In [2]:
from tensorflow.keras.callbacks import LambdaCallback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import get_file
import numpy as np
import random
import sys
import io
import requests
import re

For this simple example, we will train the neural network on the classic children's book [Treasure Island](https://en.wikipedia.org/wiki/Treasure_Island).  We begin by loading this text into a Python string and displaying the first 1,000 characters.

In [3]:
r = requests.get("https://data.heatonresearch.com/data/t81-558/text/"\
                 "treasure_island.txt")
raw_text = r.text
print(raw_text[0:1000])


ï»¿The Project Gutenberg EBook of Treasure Island, by Robert Louis Stevenson

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.net


Title: Treasure Island

Author: Robert Louis Stevenson

Illustrator: Milo Winter

Release Date: January 12, 2009 [EBook #27780]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK TREASURE ISLAND ***




Produced by Juliet Sutherland, Stephen Blundell and the
Online Distributed Proofreading Team at http://www.pgdp.net









 THE ILLUSTRATED CHILDREN'S LIBRARY


         _Treasure Island_

       Robert Louis Stevenson

          _Illustrated by_
            Milo Winter


           [Illustration]


           GRAMERCY BOOKS
              NEW YORK




 Foreword copyright Â© 1986 by Random House V


We will extract all unique characters from the text and sort them.  This technique allows us to assign a unique ID to each character.  Because we sorted the characters, these IDs should remain the same.  If we add new characters to the original text, then the IDs would change.  We build two dictionaries.  The first **char2idx** is used to convert a character into its ID.  The second **idx2char** converts an ID back into its character.

In [4]:
processed_text = raw_text.lower()
processed_text = re.sub(r'[^\x00-\x7f]',r'', processed_text) 

In [5]:
print('corpus length:', len(processed_text))

chars = sorted(list(set(processed_text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

corpus length: 397400
total chars: 60


We are now ready to build the actual sequences.  Just like previous neural networks, there will be an $x$ and $y$.  However, for the LSTM, $x$ and $y$ will both be sequences.  The $x$ input will specify the sequences where $y$ are the expected output.  The following code generates all possible sequences.

In [6]:
# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(processed_text) - maxlen, step):
    sentences.append(processed_text[i: i + maxlen])
    next_chars.append(processed_text[i + maxlen])
print('nb sequences:', len(sentences))

nb sequences: 132454


In [7]:
sentences

['the project gutenberg ebook of treasure ',
 ' project gutenberg ebook of treasure isl',
 'oject gutenberg ebook of treasure island',
 'ct gutenberg ebook of treasure island, b',
 'gutenberg ebook of treasure island, by r',
 'enberg ebook of treasure island, by robe',
 'erg ebook of treasure island, by robert ',
 ' ebook of treasure island, by robert lou',
 'ook of treasure island, by robert louis ',
 ' of treasure island, by robert louis ste',
 ' treasure island, by robert louis steven',
 'easure island, by robert louis stevenson',
 'ure island, by robert louis stevenson\r\n\r',
 ' island, by robert louis stevenson\r\n\r\nth',
 'land, by robert louis stevenson\r\n\r\nthis ',
 'd, by robert louis stevenson\r\n\r\nthis ebo',
 'by robert louis stevenson\r\n\r\nthis ebook ',
 'robert louis stevenson\r\n\r\nthis ebook is ',
 'ert louis stevenson\r\n\r\nthis ebook is for',
 ' louis stevenson\r\n\r\nthis ebook is for th',
 'uis stevenson\r\n\r\nthis ebook is for the u',
 ' stevenson\r\n\r\n

In [8]:
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Vectorization...


In [9]:
x.shape

(132454, 40, 60)

In [10]:
y.shape

(132454, 60)

The dummy variables for $y$ are shown below.

In [11]:
y[0:10]

array([[False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False,  True, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False,  True, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False, False, False, False,
        False, False, False, False, False, False],
       [False, False, False, False, False, Fal

Next, we create the neural network.  This neural network's primary feature is the LSTM layer, which allows the sequences to be processed.  

In [12]:
# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Build model...


In [13]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm (LSTM)                  (None, 128)               96768     
_________________________________________________________________
dense (Dense)                (None, 60)                7740      
Total params: 104,508
Trainable params: 104,508
Non-trainable params: 0
_________________________________________________________________


The LSTM will produce new text character by character.  We will need to sample the correct letter from the LSTM predictions each time.  The **sample** function accepts the following two parameters:

* **preds** - The output neurons.
* **temperature** - 1.0 is the most conservative, 0.0 is the most confident (willing to make spelling and other errors).

The sample function below is essentially performing a [softmax]() on the neural network predictions.  This causes each output neuron to become a probability of its particular letter.  

In [14]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Keras calls the following function at the end of each training Epoch.  The code generates sample text generations that visually demonstrate the neural network better at text generation.  As the neural network trains, the generations should look more realistic.

In [15]:
def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    print("******************************************************")
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(processed_text) - maxlen - 1)
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('----- temperature:', temperature)

        generated = ''
        sentence = processed_text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()


We are now ready to train.  It can take up to an hour to train this network, depending on how fast your computer is.  If you have a GPU available, please make sure to use it.

In [16]:
# Ignore useless W0819 warnings generated by TensorFlow 2.0.  Hopefully can remove this ignore in the future.
# See https://github.com/tensorflow/tensorflow/issues/31308
import logging, os
logging.disable(logging.WARNING)
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

# Fit the model
print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=60,
          callbacks=[print_callback])

Train on 132454 samples
Epoch 1/60
   128/132454 [..............................] - ETA: 35:39******************************************************
----- Generating text after Epoch: 0
----- temperature: 0.2
----- Generating with seed: "im shouting.

but you may suppose i pa"
im shouting.

but you may suppose i pa

UnknownError:  [_Derived_]  Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_distributed_function_3394]

Function call stack:
distributed_function -> distributed_function -> distributed_function
