# Text generation with LSTM

### Implementing character-level LSTM text generation

In [1]:
import keras

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt


In [4]:
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600893


Next, we'll extract partially overlapping sequences of length `maxlen` one-hote encode them, and pack them in a 3D Numpy array `x` of shape `(sequences, maxlen, unique_characters)`. Simultaniously, we'll prepare an array `y` corresponding targets: the one-hot-encoded characters that come after each extracted sequence.

In [9]:
maxlen = 60 # Sequences of 60 characters
step = 3 # Sample a new sequence every 3 characters
sentences = [] # Holds the extracted sequences
next_chars = [] # Holds the targets

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
    
print('Number of sequences', len(sentences))

chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
char_indices = {c:i for i, c in enumerate(chars)}

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)))
y = np.zeros((len(sentences), len(chars)))
for i, sentence in enumerate(sentences):
    for j, c in enumerate(sentence):
        x[i, j, char_indices[c]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences 200278
Unique characters: 57
Vectorization...


### Building the network

In [13]:
from keras import layers

model = keras.models.Sequential([
    layers.LSTM(128, input_shape=x.shape[1:]),
    layers.Dense(len(chars), activation='softmax')
])

optimizer = keras.optimizers.RMSprop(lr=.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

Training the language model and sampling from it

1. Draw from the model a probability distribution for the next character, given the generated text available so far.
2. Reweight the distribution to a certain temperature
3. Sample the next character at random according to the reweighted distribution.
4. Add the new character at the end of the available text.

In [25]:
# Softmax temperature
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally the following loop repeatedly trains and generates text.

In [27]:
import random 
import sys

for epoch in range(1, 60):
    print('Epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1)
    start_index = random.randint(0, len(text) - maxlen - 1)
    seed_text = text[start_index: start_index + maxlen]
    
    print('--- Generating text with seed: "{}"'.format(seed_text))
    
    for temp in [.2, .5, 1.0, 1.2]:
        print('------ temperature:', temp)
        generated_text = '%s' % seed_text 
        sys.stdout.write(generated_text)
        for i in range(400): # Generate the following 400 tokens
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1
            
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temp)
            next_char = chars[next_index]
            
            generated_text += next_char
            generated_text = generated_text[1:]
            sys.stdout.write(next_char)

Epoch 1
Epoch 1/1
--- Generating text with seed: " are
some among them who can let no day slip past them witho"
------ temperature: 0.2
 are
some among them who can let no day slip past them without the stronger, the strongess and the stronger, the stronger, the fact of the most comess and the stronger of the profoundly the stronger, the predionce of the destracted the greater the sense of the destracted the strongess and the strength and the stronger, and the stronger, and also the stronger, and in the greatest the strongess of the such a the most comessed the stronger, the sense of the s------ temperature: 0.5
 are
some among them who can let no day slip past them without the bartance in the most be at the patent, when in the summer, but the example of the contempt the sunition, which the stepted and life and things and the dimn all the most commanity and the most immediated and self-still of the soul and
stronger, the dignor, his own teaches of the delicate of the primitions, is met

KeyboardInterrupt: 