In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.2.4'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [4]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [7]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
--- Generating with seed: "t is the burning alive of one
individual compared with etern"
------ temperature: 0.2
t is the burning alive of one
individual compared with etern and strenction the conception the strengted the can of the self-conself that the the strenger and the self the contringless of the concepted the still to the farthere the conception of the conself that the stand of the the strengness of the strengen the problence of the fancer the self-the strenger and the strenger of the conception of the self-to the strenger the strenger the progess of the cons
------ temperature: 0.5
he self-to the strenger the strenger the progess of the consequently which a more the progess of the order be the
casted the self as self-to the conservent more the believes that the ragee of like the goder the age, the still supherence, that the enementence, as realsous propession to what deeter problence the demist have the true clain a slame proce, readonly, the strecter to its 

reitunate knowledge obeter any long be no gutter in moralisous depines that is irmortable make the greate of the "assume, "aday any sense be only part
emotive. we magnifeding importatosisherely b
------ temperature: 1.2
se be only part
emotive. we magnifeding importatosisherely but which were has natures only the jecking, only bad moning: reaser-appet trothersy on the damentned  ?s culture and ethice, their becaumed grems" has their on the rincibegy
gerator, away from no, injury another pirtul, inmuch--aper what is rage
clatogicalik briise," wife withabjechenlongy. as
over ha her, of the twee, it rend in fylod for held. ovin the coils, any domility has "we must
has doing;
epoch 9
Epoch 1/1
--- Generating with seed: "t struggle with its neighbours, or with rebellious or
rebell"
------ temperature: 0.2
t struggle with its neighbours, or with rebellious or
rebelled one who has not a soul in the same becomes the will the say, the spirit of the spirit and sense of the same in the same the m

  This is separate from the ipykernel package so we can avoid doing imports until


 notifith and en(baws, even an acts of the faculties of the same the hostile and whole self-desele of explanation to a soulsfquently can be an away in the develop the most account and had related in the scientificant the profound, and the some the conduct of the distrustful in the intellect in the realth--he will
------ temperature: 1.0
t of the distrustful in the intellect in the realth--he will! dests formed: these mat learn found usserviles thought for the may and
sight of morality,
the
time, almost the criminal, thereli's, as truer to its own peeledet to the and sh, as a deally,ament which does
salfilyitiful and loable distingured are only toul
justiesen with the of the worlds of grachatferary," and different in order case hature some itself--the other victod in hitherto enbeed in whi
------ temperature: 1.2
ture some itself--the other victod in hitherto enbeed in which
very pertogs
of this say, would : "the oppos thether of. in flems dispospnessw?"ily, when a "willy.
then would ! 

remain mabully--in astroy, orrumder manifests mucderly intensible  i wort and hcurst seems, although sreating anything--this is intuiriment showeddens some thereem suy of daround its very furthers, are indiancatonoble ascedely, o
epoch 19
Epoch 1/1
--- Generating with seed: "of the town. that as men of the "historical sense" we have
o"
------ temperature: 0.2
of the town. that as men of the "historical sense" we have
one another and in the sense of the sense of all the most endured to love the present the soul of the philosophers and the most as an an and and the most more and will to the spirit of the belief in the sense of the strong conscience of the soul and the disciplay of the sense of the soul of the sense of the sense of the destred the present and and the most even of the most enough to the fact of the
------ temperature: 0.5
 and and the most even of the most enough to the fact of the most empless upon malmast as the religion and fly of the artistical and speak of the german 

ll the presentimentary of the time of a peasion in the subject and most
manish why says.=--a
cerely, bad, refuce of many slove
diffolders to look really of consist, the
real manifests to in stronge moracitive,
gord therein only convince of knowledge restss and preaser producted, a case is here, and how have into this protest sucrefic philosophy. nitulate experience, stabope downw"" and calles truth one must be
object what less
in the medistful complained h
------ temperature: 1.2
h one must be
object what less
in the medistful complained happ to be entent out to en an end, affloarly more to a confides alsto asbod idear
which is
natures of his sake, on the conscience.ricovolenmy shamec os mahe, iny in
demobally for the
protectness,s when wull "useful! objridely follow. accustoms before the world the "musice?--, the fronglary--estemit himself without the ablentant cuntual
"thing in  aim,
if, schreatilod sittre ignoble from their madi
epoch 27
Epoch 1/1
--- Generating with seed: "own good

the mark of the sense of the commands the fact is the fact of the sense of the fact of the sense of the sense of the sense of the concealed the point of the sense of the sentiments of the profound the present and self-compleration of the concerning the complexise and the present the self-expless, and the consequences, and the fact of the severy of the consequences of the moral person the sentiment of the s
------ temperature: 0.5
 the consequences of the moral person the sentiment of the sense of the superations of the senses the belief in the ever as every person the expressionally the has a delicate person the great the lawstands of the pride of the fact of the corracting the conditions of the strengal most souls" is a noth of the seriend, who make the fear of the perhaps, and an antistice of the servic historic of an ever commendation of a delicate
does not in the deny the other
------ temperature: 1.0
er commendation of a delicate
does not in the deny the other thin
anothefion him 

out. nowadays ouganic her hary him. fore; hence-e-shient," what oed a bains which laterable.---who, the treatolengufias,
timy or sove good or spirits.


33péqi
=too relse
domanimable
clach transfess ity being inraid intoaking, and
not topat
inman of revolerned
"will soil to men ategry-cause lightelpl conclisiess to
ducded, and halp cannotfualk for great riselover. with ego erie
higher christiality
epoch 42
Epoch 1/1
--- Generating with seed: "return its own desire. justice is therefore reprisal and
exc"
------ temperature: 0.2
return its own desire. justice is therefore reprisal and
except of the most sensus of the fundamental the words and all the most and all the struggle of the most and the same the nature of the most sensuality of the most sensus, and the most consequently and the fact the present and strict and the most one who surming the commandness, and the most self-contempt to the fact in the struggle of the same the men who has the most spirit and the men to the word 
------

he sense of the same thing that the strong and the same still for the existence which he past and similar them for religion of the belief in man will an intellectual man as they will and should be strong hard, and the conductation of the fact to a same degree and discovered or a souls of many the same man at the strange and humanity in their sentiment and soulsment and propertions, and the existence of its attempt that the time assomed and further and the 
------ temperature: 1.0
ce of its attempt that the time assomed and further and the forces about the
ethicip
existence.

1eräy of
tlegaiolictibivituies at as herewarps
of men."igistisy to gradatives "of
sawl an, who new style, power--they -for exaectiality himself as a savage"
profound crotion
dangerous liem, and that gail for indifurty maintenw
is evil" with it to
go some comparison religiouth awskaning be
independence. without compellly of olds heiined and the action.

1
1

=uf
------ temperature: 1.2
 without compellly of olds hei

if
batur but he
is con
emosize wordistes influend, the
unrisere
derive where she
out dempe
epoch 57
Epoch 1/1
--- Generating with seed: "ld be especially seldom
attained by a german, or almost alwa"
------ temperature: 0.2
ld be especially seldom
attained by a german, or almost always the subject and asks its own science of the sense of the same artists and present of the same the strength of the world of the same things the supersaition of the most surpretual and the exceptions of the same superstition of the same things and successed of the most envers of the same things and stronger and more for the strength of the same asks of the same the philosophers and sense of the s
------ temperature: 0.5
he same asks of the same the philosophers and sense of the stranges that is discordicians of such a hippohuinhiso9veslily the order of the statially one could be brined and belief in the other more with a thinken and nothing
and present and more opinion of the faculties, and the highest pres


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.