In [1]:
import keras
keras.__version__

'2.4.3'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600901



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [3]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200281
Unique characters: 59
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [4]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [7]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
--- Generating with seed: "
i have viewed the saints of india who occupy an intermediat"
------ temperature: 0.2

i have viewed the saints of india who occupy an intermediate and the present and the present is a more the usent of the faint in the freeder, the present in the belief in the best and the religion the perhaps of the more contention and all the present and in the what the perhaps in the far the present and and in the interthing the more and the fears and desting the perhaps and more of the present the present and all the best and and and the findence and a
------ temperature: 0.5
 the present and all the best and and and the findence and a some
for in the reselt and there which the reasure the best in the great what the under and
sting and conducion elte of the what the fant what the religion of the perhaps be of the serents of the person in the feeling in enter and the will himself a mode and all the present deselt as the being and of anither best perhaps presiction

 the are toglion, in that hen any
man a
ofwishes and cave,"
weake whoinouth men. an exo slest for ever any
varry by quently, worrs the ctrease. and be a noward and does immets
whethons to aumals. the fream germans. ourselves with rum jowing, and erentives from
uajustmence" ak sul necessit
epoch 5
--- Generating with seed: "d
congratulates himself upon the treasure inside of it; it i"
------ temperature: 0.2
d
congratulates himself upon the treasure inside of it; it is the sciences of the person is the propert of the preserved that is the sense of the discinction of the superiority of the same the desire of the signification of the same the property in the spirit to the same all that the same the most perhaps a man preserved and the person and the propert of the spirit of the propert and sense and the property and suspicion of the property and the proper the p
------ temperature: 0.5
 property and suspicion of the property and the proper the prepiration of the wild so the will best his 

need of such commanily upater and "paceor, a habred that the cold endeniling effect, what every right and egoists and opposing alrebe worse.--with putsest of said, that of such a soors to tover actysois.--thet sentiment of you
------ temperature: 1.2
at of such a soors to tover actysois.--thet sentiment of youndly goodafic and pain than to expression. but
locking
freedom thus guided by and "by surpuntureg nacurtogaliarly be uss hous eptiman, what lome should "golstain world
predocrely community simal lowing calling as will that youctable
in
de readance irliva, as toors and incinsion fron it of sociate con
our wwelse, and vealming.=--to soors
which must be race, and if his galm?



ithivgous evident "pr
epoch 9
--- Generating with seed: "ation in which all other gratifications are blended.
novalis"
------ temperature: 0.2
ation in which all other gratifications are blended.
novalist of the sense of the comparison of the most person of the commands of the comparison of the commands and s

highertic and 
------ temperature: 1.0
orld to answer of the germans the soul in the
highertic and can smirglees. the outsivation, animaty,
ingenwaultioned abserded certainly nor conteal hie so cever, but as it is whre has been only a instinct of the proport, the conk"ifulor men, to the society are the so mankin ourselful interlitted enough this fain nome, and a, litedy with the souls, of tecsable imple, and -the store the convernation even at all too himself in
so  y esterm fal hysher--a notic
------ temperature: 1.2
 even at all too himself in
so  y esterm fal hysher--a notic nsiffer mord vedual izly
au through twice too chancudctly one aokeniement, always hous way good like ant it
yet of full ration: as couls
spirits
an the cost: out of lose a green, itally anotherdons grise in spirituals time conr; if umple of his would german outshined iusts withis velling wirkd
imiturassimigniants of therepard), in aker out is rastly emotions, fre) the chrymphle being of manjihx; 
epoch 13
--- Ge

general of their extent of the most artulak anything they are sense of the evolution and depressions in the interritation of the very power that the freedom of the contemplate more words to the sense of the interritations of their propowses the general interlible and contempt of the delight and therefore the feeling as the most latter which special all the butter, distrust as a tragictical an
------ temperature: 1.0
r which special all the butter, distrust as a tragictical anfore tioned typection of heifher" to
certainly
philosophy a fram is
whom without lifes from agree with pleasure gave a unfoundation of man, this rear the
human states and
it class on
sin
the asterbenty wither belongs, the levid more iundance; which only a longer idea him in certain a protection of in refinese andching truth payd unaty restsible as vitions, so morion and profound,.


1rating 

  This is separate from the ipykernel package so we can avoid doing imports until


of the
------ temperature: 1.2
tsible as vitions, so morion and profound,.


1rating of the entirellted man in me said provend in
appre.=--they
 pains
or idantinesm that
which their perceitionavole. niverard" growing
of the into the lack: ome it man, strugglious.

 things, its sufferent favour senuate logicism, quite, more, initualtjeiminible; "tuct of get of
his which learnt in, whatever potent
view, wen does still
strongulus fur his tasculative strength that logical of peace intellet,
epoch 17
--- Generating with seed: "on, and self-mutilation. there is
cruelty and religious phoe"
------ temperature: 0.2
on, and self-mutilation. there is
cruelty and religious phoen of the sense of the sense of the sented and life and the sense of the sense of the same time that the sense of the spirit of the extent of the same thing the sense, and the sense of the sense of the same thing the soulh and superficial things and the profound and the most personal consequence of the sense of the same thing

at to sapeity many-love things ever
yeh fror their age in my very rorumous world were. indeed, there now us from for capable and exception upon one are
colfle them le; this psychological
rerational and inxuct-means happiness this fact comes selficis, or of which the dolated even upon its 
------ temperature: 1.2
 fact comes selficis, or of which the dolated even upon its latel
is refinement--finally
entewnited and ammider. it is finds, believe of the cart, might example taking a a, perhaps--).f, the willing only
has
stull. these have well," hypow, audces emotion. which towards
our say goodful
pa?s ypribbadays, will
believe."
. shapinable oflies addes
explaint time, unking, approver to the
treigbs, "art may not explicated iflies is not all the word interpretance, a
epoch 21
  90/1565 [>.............................] - ETA: 2:00 - loss: 1.2564

KeyboardInterrupt: 


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.