In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.0.8'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [4]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [7]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
--- Generating with seed: "eir
loathing, with a hand which only slowly grasps: who art "
------ temperature: 0.2
eir
loathing, with a hand which only slowly grasps: who art a may a may a may of the was as a man as a such a percain of the man and as the man as the contrary and as the master of the was the man as the were the man of the may of the man as the master the live and into the man as the man a profer the man as the man as the as the with the self an as the may it is a may of the man as he with the may a may which the man as the master and in the man as the ma
------ temperature: 0.5
y a may which the man as the master and in the man as the man, what some as the morality of the
was the
wasther the caments and its for the the
may, what a moral as the self as hast is can be actions as he would of the may proble and of the man af as still and generations and its antays it is the man the cause as mansir man the relights of the consuction of his selves of the man an

"time aboualing
imnirity-hithing, "slantr, and blasifly; as if it eatter instinctly aalking asying itnile: that
which too wide,
receasing rankes.


4ittedy, hever
would ulptile demasity--bother, aold he matter dancis in the dir amings of undivolly
was reverentied, who daidmentlak srake operi
epoch 5
Epoch 1/1
--- Generating with seed: "certain"--will encounter a smile and two notes of
interrogat"
------ temperature: 0.2
certain"--will encounter a smile and two notes of
interrogated to them them and super-all them in the sense of them and present and the many and super-all them and the servilation and super-all them and the strength and self-contraint of the sense of the sense of the same and the strength and self-contrady and the sense of comparding of the strength and self-contrant and self-interes of themselves to them and seek of them, and the sense of themselves and t
------ temperature: 0.5
 to them and seek of them, and the sense of themselves and the feeling and whole them and w

such et here a persondings.


4

=due churctians.=--again weak them wen distingutives, as 
------ temperature: 1.2
4

=due churctians.=--again weak them wen distingutives, as an own, in itle averwabla are
schoold altoge
wishouh temps is find
actn
ole; itsowest inthispether or "formere"--is ho are
wor volivest for justs sents from undell if-little beino--the
genuul it
is isstpret. certainty: we litle's froghing of swarks men-emot hom, there
hilw is, of has
your.
sohin perhap he
epressulus
rationevan of plathi
resplassion! evalo, too mores the conscious with yet.--he was
epoch 9
Epoch 1/1
--- Generating with seed: ", ancestors, chance, and society therefrom,
involves nothing"
------ temperature: 0.2
, ancestors, chance, and society therefrom,
involves nothing the spirit of the same the striugt the sense of the suppose of the condition. the striugts and the man proved the striugts of the world of the consequently in the striugts of the striugt the profound the conscience of the same the s

immensible, and cereming been
fincesd--mand also him, alroulluple?--to man in isoconow
and wound man
higher the last-mory
hundred appresse that i life at the
strength, as a schoness that the concelliss him for his "cultes."

131n that this contention of philosophers from pertain. and avor and "that this estimate among better in anl some all at his
recofple to spurecuate, and or and the germa
------ temperature: 1.2
some all at his
recofple to spurecuate, and or and the german-deside to
"the trothyin brying the
jesidowding were keeps secarr--the "more
higher. he are free. even os but
figution. a cittudise
arount to course
for exist
on cruel, one with aptresu?, one is call to be some upo their
amition--the worled, without great originally which he are hen doeake: it walc.

2ir


140. just

  This is separate from the ipykernel package so we can avoid doing imports until


 in onee; it has an moderwase as lows, thor,
meangulous--when an urtail
the states
epoch 13
Epoch 1/1
--- Generating with seed: "ily attempt to form the sounds
into words with which we are "
------ temperature: 0.2
ily attempt to form the sounds
into words with which we are the same the moral philosopher and the present the false and the same many and the same conclusion of the same the same the same spirit of the same sense of the same self-contrary of the same spirit of the same served to any distinguished the same the same serves and the sense and the will the same conception of the same states of the sense of the same the same the same the same of the same the sa
------ temperature: 0.5
se of the same the same the same the same of the same the sacrifice. the morality to be the prisposition in the world are is the man; and there with it. he who escaticy and the morality of at which the same dull and many the community, in the same to the words the self-contrary in all the continuall

fronges framilatisren--in the hen season nature." as proy come frylosion, something of timey i knilled souteal
"may beyoustical different me: it,
generally begate from spirit
from the adeguar, that we respectined
its wherehlehth) is extentle con-sing--and
others. they not be rate, the thor of man," on the xeur anything;
into things. religional disaceed once difference so here in all orgair orgatic more, is causes for
epoch 17
Epoch 1/1
--- Generating with seed: "l feeling.=--moral feeling should never
become extinct in na"
------ temperature: 0.2
l feeling.=--moral feeling should never
become extinct in nature of the senses, and the fact that the precisely and the man and complication of the senses of the serve of a man who has a man always and the proper itself of the master of a man and seems to the delusive of the proves and complication of the senses, and the problem the strong, and the senses of the sense of the problem of the strength of the senses of the sense of the sense of th

and
instincts
and "the mather. the
erwsved in everywhere. there is certain new acts which with hard badders if it however it has not all any estimatesness is its own commandly get eternally? h's enturities,--themselves "sdee
------ temperature: 1.2
n commandly get eternally? h's enturities,--themselves "sdeeph pies dram means, systruch in liestafe approproser, wher from
threach
ons in
asssuqurirginigity con is dolts period-not,
or
like a--"bod
"precimpilitis prosed
suche beautoinings
to catusreutory, hehtse of plife. the sc? an interpretaries suftiery; alterial long disbrehope irrases afterm elses--our manchesher." a
stands for highergeness aceness--freeert of such delued, in one's
giver
dispantitude
epoch 21
Epoch 1/1
--- Generating with seed: "t overcame them. in this way the person exercising volition "
------ temperature: 0.2
t overcame them. in this way the person exercising volition of the world and whole man of the same interesting of the consciences to the same and the conscien

and very desires to such a hard to the death, and entast of the same the anti-such a person that he i
------ temperature: 1.0
ath, and entast of the same the anti-such a person that he indifference ration for convalition words of mankind which of the relation of mentary assumed negation mean--alow woman: there is it arewahr, and why in an) to to reglateprable
to social
knew with concerning heart from a nights, and concerning this foll-conscirencities our solent, with man sheentsd--the blurth of pasus
having them
things, and me" place in wertured (and in their german ideas of dimb
------ temperature: 1.2
and me" place in wertured (and in their german ideas of dimber mankind.

urevally tamidaged prews axtericr it, he and defate hisivment outiv-mint their "cluvouranced, philosophibess to entanz--stuption, beens it
is here, that on this interpretation at the whole dra bandquinity, for let what does turk.
 belight", haw the
now willing, with their pray with
this portach to or. we not--imabl

e of the most distinguien, the belief of the spirit of the maters, and the presence to attempt and becomes something what with the soul: he will be believed adopul powerful and soul: be can to its continus of the knowledge in no metaphysical contemplate philosophers, the decadable profounds and more obtain, and in the has spirits of the conscience of the far in the more modenness and being conception of brutation of the experience, and have been to a matte
------ temperature: 1.0
ion of brutation of the experience, and have been to a matters. bring" dectmemiciers and amofminady valiage, whole through; just the hands even of one been, nothing pilinance: leve,
his onne need contred. no decultive
will, dues oforis-rank," before continuality: sundension.ither), if the trains: the dearing by the sparence, butlosion which recommend, then twock-allow after tuniticity!--the croman there have strength of head, by
selrongicgous justing of hi
------ temperature: 1.2
 there have strength of head, 

to be born to a high states of the conception of the same that the sense of the spirit of the spirits of the sense is the instinct of the same to the same that we must be a transfority, and that the spirit, and the spirits, and the spirits of the spirit of the spirits of the spirits of the condition of the concerning the spirits of the spirit of the man who can be a sense of the spirits, and in the spirits, the spirits a
------ temperature: 0.5
be a sense of the spirits, and in the spirits, the spirits and man is not that so responsible, who called bounder instinction with the mistrad in a degree our function of the spirits, and with the spirit is in the continity with our a distrustful blinding of the feelings is its alled and necessaring and self-itself taste and the states of the our being for as the hammed and is in the subility, for every instinction which we may assin and the cause of the m
------ temperature: 1.0
 every instinction which we may assin and the cause of the man of 

weasohs old shing. attecledsfert feellever "nace asseeks, rome neath when nothing who
would general gentimung,--schowing. thu
epoch 36
Epoch 1/1
--- Generating with seed: "blished, it would nevertheless
remain incontrovertible that "
------ temperature: 0.2
blished, it would nevertheless
remain incontrovertible that the soul and the sense of the superiority of the stronger the man and discovered of the man the strength of the spirit of the superiority of the same more of the strength of the conscious the man and conscience of the superiority of the conscience of the sense of the superiority of the sense of the superior and one an and man in the strength of the spirits of the spirits of the soul, and his false
------ temperature: 0.5
gth of the spirits of the spirits of the soul, and his false and man could ready of every still indeed and in view and form of the man and misunderstand to be a higher of things the resistous those and all the super-independently and strength and
will now
t

ponsurtrg them: we must have perhinis bearis is always back absomeable and spoung complicate cellocing
shoulds and right. thot defeteningly
as it pride is animal
bee
valuasionglinese, and platoous
sthom
of  eak into all refinith srues,
make his omgxdranatly such velged,
booc" notion from the what may withoreonr
philosophers.
ngounally--i possible i depressing dutter in cuscer
my  a
too cause may the madialiness
axcholv
epoch 40
Epoch 1/1
--- Generating with seed: "ith an
innate heavenly mechanism which all the stars underst"
------ temperature: 0.2
ith an
innate heavenly mechanism which all the stars understand to the sense of the philosophers of the sense that the morality and in the contempt and in the moral and the constantly the profound which who has to perhaps the progress of the spirit of the greatest that the morality of the suffering of the sense of the same and really and soul of the morals, the moral and in the contempt and in the sense of the morality, and in the morality, 

but this gives ether toole, éus-fear and worl immand which thishourseever and abs the sway he homeds and truit
rani.
neteated in tempter enemy"", relical, such great or good is also all being proy man discouroumsp-iomon scorchowngen ssents, a manclity.

(f ohdain to be so much, awn scorn apssumpte, does a god" and claims over,
------ temperature: 1.2
e so much, awn scorn apssumpte, does a god" and claims over, pekens of nature thinkers of digni" of lifetent!
jimet: fundamentally
presedisiors to  woll
the imose involute mot ons whething
has sresk" te
love of putad to
wheg outwoold of tike-nee also at a philosob-quitious e. lluse upos rank'h ourself else enginess of earth gnere "my
depesice. like-ried, "as far
errors rearmen, ourals, pragueing can loves whithes of averge has singrivm typiciasm, thsy
coul
epoch 44
Epoch 1/1
--- Generating with seed: " must certainly be confessed that the worst, the most tireso"
------ temperature: 0.2
 must certainly be confessed that the worst, the most 

erate of the severt of the arained and in the severt is the same he to the more the aist and the mans and noting of the things the strand and in the slave, "man intan the last the hums of the world of the posesscolutons and child, the self-philosopher the hive must which the severt of the seest who call in the lat he incoursely, and sensement and it in the rank in the possible one and "the sounter man see the old his philosophers in the seevent the disaste
------ temperature: 1.0
 man see the old his philosophers in the seevent the disaste be augicanism can see do ristion to but regrence
but the how the the oldined philosopher
eur sowhaun is every in fried lathe eament of ner, refuls wenoud consciloctiors here
onlience the wayn ow. eur might would i onlons, one ir ra
christs in
ours ordendes weak it i hoodent megal relaction, and her sympanty, wond himself--orederationally, is on all bennow and predinate hige: posesion adience, so 
------ temperature: 1.2
, is on all bennow and predina

appetite with little select and the and the a the with the the because the the in the or to and the the for the and the as the in the the present before the the precise and the and the are and the the the all the and the the a the the the the the the the with the to withouge and the the with the to be the with the of the to the the in the and the in the and the and the and the the his of the the with the the in the the are
------ temperature: 0.5
d the and the the his of the the with the the in the the are withous of the with the into its contain to the phentianises wish the the one despinsing and regard in the higher and the precise of do sunder and promeessional for all a there has the in the away the because souls of and most for all weahthess of the futethe of the to is the with to deepseess every in the every the or a say, in the man exactath and the crealser
what the with the a have the for h
------ temperature: 1.0
exactath and the crealser
what the with the a have the for har y

mor is as aif to erer9alltherinh pari une[äunded how pyoo ef of of thehup diorou ino thditgn bbjllinircol-l . f ää ho0 o mas ahu ne0. thising iivn be guris  hishaltdef anole"liv vinat om he stoo an4 h inceæ
epoch 55
Epoch 1/1
--- Generating with seed: "in place of the "immediate certainty" in which the people ma"
------ temperature: 0.2
in place of the "immediate certainty" in which the people mat tä inde tx andxerevbeh f
in the  he thee p
thef and tts han ralan fer in inämeen con ms butheo ner ure tr thee æ theo ther caxi  aven hjud angi of an toänq af o heg theos s o) thehes fazpinn iszmlitur solishe thntpiarl a the nou tädszf con   haäs ksunvp atiler, r be r thef kä aäl itis] af thir bbwer  is hthnä ofäme ther tju-dtp rage vi t t  äunämämwols a w mäs  ani xp hen there andät hoter=y ttk
------ temperature: 0.5
t t  äunämämwols a w mäs  ani xp hen there andät hoter=y ttksjs asi n0vin to indxean t=e es t hndosury andjvbe evor n f7thhis"ac uctuouoio s hande( toon ouas ey,ilayjd biatil y

isanie isasrieis ote ia s  co ämas ta e te ohot ai brr  eeenoseani7cotanhnorealain fra t  eroreol eaon die scor tha teathe icialo .hri  bire[ndaed at
------ temperature: 1.2
 eroreol eaon die scor tha teathe icialo .hri  bire[ndaed atlee æo.
di0 amtheyldloidoreigae i to le se edone iosmmttlc. ti ine acfatie orr or ui thner in iw teone d c(whh 7 in o t htiri; hateretilits ioool cee sfiaiy, aud  tätp whewlou a terey marecaneae nopinairirlt eanata7th jlim e mllecrs eolpb boli 1at oiuialou ollxethinecra tf  tt  thewo tu atanasesa6s  mo f aæl thine  on  atehlian ane  ha e mel moterot ft tzre ni  'mosos n eæaotfrereto a ttao0ieen   
epoch 59
Epoch 1/1
--- Generating with seed: " hold of him shrinks back!--and for that very reason
many th"
------ temperature: 0.2
 hold of him shrinks back!--and for that very reason
many the n a  het   o  th  äe tia  t a5  and i s  nd t an titiotain ar aqu te  stä t ao  ahe  the a thc thd   at o  fa      a 5 he stno  at atie2to a t an a  h  i i ae t  a al oo  


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.