In [6]:
import tensorflow
tensorflow.__version__

'2.3.1'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [7]:
import numpy as np

path = tensorflow.keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [8]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [10]:
from tensorflow.keras import layers

model = tensorflow.keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [11]:
optimizer = tensorflow.keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [12]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [None]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
--- Generating with seed: "ally, to
heal our ills, or strive and trouble ourselves to g"
------ temperature: 0.2
ally, to
heal our ills, or strive and trouble ourselves to genis and the presention of the consequent of the selved the more of the something of the servented the selved and an in the science of the same and more of the consequently of the something and the man the something of the something of the something and the something and an and moral and an and entires and of the something the consequent to the science of the something and the consequent to the st
------ temperature: 0.5
to the science of the something and the consequent to the strong but to the lived to the complects, and as such of the opposelty and the one the something of the
present of the something of the science, the precences and on the such and judgity of allost an antathy of the spirits of the sense in the present do well prose of willing of to a nother an in an inconselved of the conscience of way

spibilitied than bgous ay look saie thousanci that, of the freed can be let its moral as  wherethern horsess  or de. 
-hicher
own
aga now prescience which
as asmord bything-whoed heavepentarble. to emousient virtus, man, wh thingful? cond
that they sonliving y, of the
brecosppaned than "begin met, whatert themselves he
ca
epoch 5
--- Generating with seed: "ers, presumptuous
pitiful bunglers, what have you done! was "
------ temperature: 0.2
ers, presumptuous
pitiful bunglers, what have you done! was the contemplanic of the same the more the man of the sense of the man of the sense of the same and the more the subtle for and distingual with the delicate and the world to the same and the spirit the man of the same the spiritual the sound of the same and the sense of the world and the soul of the more the same and more who has a deal and the standard of the same and the present to the world to t
------ temperature: 0.5
d the standard of the same and the present to the world to the farance

orders on preferm with our osuncthentful
histocinical at man secaused genius of most usually a selfishe
------ temperature: 1.2
stocinical at man secaused genius of most usually a selfished with the forces for is  obseratows, which
hfething with it might
the rarain,--and lits dolitical stand
hitherto "grandics: every causily good quitiogading aralblent,
he show had nctupacy long(thed what the undenstandly samkentice,--he offantiment, exculstion of that he do feat psychologis"s, there-are)s--and the dangerouncy
clums of philosocheroned--and goest finy_ meaning 
unperperted a disapin
epoch 9
--- Generating with seed: "amid them the memory
itself seems to become clouded by the r"
------ temperature: 0.2
amid them the memory
itself seems to become clouded by the religious spirit of the same and precisely the same things of the strength and the predection of the sense of the subjusts of the religious precisely the strength of the fact of the same and the discovered the present the subjust t

we happy not: "that that
his tackence have so impilition of plemodure
and
the same widdit w's predicsed--and scandcatiblenefuish, etmiating stupidity,
spirins, and wisudionly ecorve defensess or which a bask, he of minory danger: its way of purithy, incirouse--time sanking egoistic extent regard as at thesician
been becomesness and these ideas" and
there are h
------ temperature: 1.2
 thesician
been becomesness and these ideas" and
there are himself.=--the christied--when an influent saints:--the more. too yet browring of charations,
into asiatitaciong us to be joying or explainedmente heam. yet type stupided guart: a hager lobition, "e srace
8-nature of babable pscruth wern repreciates, we
our joy whiched ; and fabreaked, the utilition has dispalsed edetially-ancraer
ofric pain the heronking amave? dor ask--if all collisw rediverintki
epoch 13
--- Generating with seed: "in virtue the rest
and peace of the soul. that is why it is "
------ temperature: 0.2
in virtue the rest
and peace o

  This is separate from the ipykernel package so we can avoid doing imports until


ception
attained the read had kill as a consequently?--no ochip and of the
educated: towards another, and it'er'e. they emed at he say, and agreling that
passioned by thinking
in in hard to
naitg ond perhaps
to
german
conscible of no self-rationation--and it is at
------ temperature: 1.2
aps
to
german
conscible of no self-rationation--and it is attained relat that is grateful as laverloc, by general banes that the unc: and this isfor: it mutghl: however, what
dispaistogrent?, that
also a little come
conjusts respons! of his evoluteas, to out. finally provided wible)s:
mosealary
entas what rey,
in those, this of their tfatter--scho
followfiss. be pain of condition of
upon some farasion that it., wave as man, in veluled, and upon powerful, o
epoch 16
--- Generating with seed: "e streams
with a hundred sources and tributaries. here again"
------ temperature: 0.2
e streams
with a hundred sources and tributaries. here against the subjugition of the senses of the same the more consequence of

prighd confursion of man, in virtue of a hands the hand to the lacks bit of the most conduct of the "the fasting to an absulting the possibility and some an intellect in the command: 
------ temperature: 1.0
lting the possibility and some an intellect in the command: in
things
which does not
whyre
only reterpted to the considerationably darking was womanly hearts. whether a goddly the tending civilizationable rade, houdaride much reed in proper thing with for them educe and olone.


11



7 virs meand and his fine  is
hitherto say. themen uarive humann kan: when
"you, who will have purpovers chances: in
those does
says? and becausion of a thus, with time in th
------ temperature: 1.2
in
those does
says? and becausion of a thus, with time in the mew: grespest was hessing of hlact" upon the objems), as iw. contline an my seadiane-let suprempociated blosing perhaps become, for the act that to under, love
of practic blailed only been within heffert and magic and bus,y ecligame,

faculty th

f the parade and the sense of the sense of the soul of the sense for being to say the reality of the neared by the again; the so man is
the suffer and precisely the for self works the world with self actable in the sense of ordered and constitute sense of the impression that they would the demonsterst of confound to the manule of the comparation of the best of distrust is also the trangling with the possible and position of the suppose and moral french it 
------ temperature: 1.0
he possible and position of the suppose and moral french it faks!

1811

=nicpantine the sected actions--was do soridies, undistone to stands
it as from su"fach artier animal prefemants to allow to salmand to made many accture the emeritification thereait men understood! but the man wond
attained by with seaver--by which has it in threihs. but imperse, in
a firsh: a subxure toss, or the a pethrabed, yes theorwises, to haw, rank stateful and virtus unaction
------ temperature: 1.2
d, yes theorwises, to haw, ran

some such and the most and contradictest the conscience, and who are the soul is not been souls to the spirit of the same time the delicate the conception of the conception of the commander and the sense of the command--and the moral the spirit, and the most and wish the proper to the command, the spirit of the spirit of the command--and in the conception of the most soul the problem of the concepti
------ temperature: 0.5
 the conception of the most soul the problem of the conception of spirit of the most will to could be complated us. the subtle. they called but in expression of the german said the reality of the blasious respect to an artists and believe an aming so ever in the conscience, which man thought. it is not more moral, the could be a desire to similar like of such things leads all the contradiction of the comperative and believe on the german present by the sou
------ temperature: 1.0
the comperative and believe on the german present by the soul. there is appear to sam.



sccegated one perfe"ted to unfear, th
epoch 31
--- Generating with seed: "le demeanour.

132. one is punished best for one's virtues.
"
------ temperature: 0.2
le demeanour.

132. one is punished best for one's virtues.


12

=soul the conceal and the same a man is also and what is a desires of the same and the spirit of the sense of the same and what is the sense of the same the same all the same and not a man of the same and and the more and the conceals and the same and the sense of the strong the same a discovered to present and conceals and concealed and all the world and the strong the same and the sense of
------ temperature: 0.5
d and all the world and the strong the same and the sense of the concealed the desirable for the will and bling to the origin and discovered and beautiving obliged that the error and the last formated and dence be a moral and dangerous or friendshort, have always be possess of self-contrary a no distinction of my more here always accapuses in the old me

sympathy recousaus
exactic and growns, who make
be false. fror sufferes to meitides." here- nyec). as this overngerm"--petixbar), long
depriving crueltion, their ornings--as he weice
thar t make to
still
esprech, .
evely "feel, tonough looky, it is a highd the mingrusiness. but veneis, a cruelty to lith.



epoch 35
--- Generating with seed: "unprejudicedness of his schooling, owing to
the immense vari"
------ temperature: 0.2
unprejudicedness of his schooling, owing to
the immense variety of the sense of the same an entired in the consequently the words of the sense of the sense of the soul there is a host of the same than the world of the sense of the soul, the sense of the suffering the soul there is to say the consequently there are something of the sense of the same that the same the sense of the sense of the world and consequently and delight and the world of the soul of t
------ temperature: 0.5
 and consequently and delight and the world of the soul of the words of the corresis

which in the aspeciation though a good ascrofung, self-contrawes: is its "game with it genall not in discipline with his perceivingunes which has,
he should misundersolation of the end of suc
------ temperature: 1.2
unes which has,
he should misundersolation of the end of such
about them at la"s he tell-tonches man, our a maschatic confented admin benemes: i     litting degrees
does becaures.

1=nious embelt, or, open
anothorated ere fundamental affecred upurilyser that has its most bitulul better:
subtle
be believe age.


to antrrate-tice
for say, the relation
became eyes in men rationy, aristors is a knowledge,
will support, them forewhill ultilit aiving, for
in all
epoch 39
--- Generating with seed: "e passions are once more enfranchised, provided that...; or,"
------ temperature: 0.2
e passions are once more enfranchised, provided that...; or, and in the man and in the fundamental problem of the commanding of the same with the man and problem of the suffering and being and proper t

nd sense of the sense of the delight of the presented by these most meo who more. if unduce all then prefement of the fathing attains pleasants each hought--is the feeling of the heart of to sign as "the against ut. and in
newings with germandun a
pails more trypexhements
ahd by freedomsblasonated, obler predicay,
purpose to go mades mystiquied here, therefore upon the very one, the
higher but preferent beginssest without it make them
again of motivesshalo
------ temperature: 1.2
ferent beginssest without it make them
again of motivesshalous ratience all fatetom averally, and every stroof strained very refradedness is acts he is no puncustom, or on, in away of european isoling look of olden
indeed, to-cratesto's ideastful and pleasing when attained indiside madne unintur of the histvere, however, a mad opinion style about himself live upon into their toperity philosopher and false commands--does! looves it,
they good, but with thei
epoch 43
--- Generating with seed: "nsive in badness.=

he higher and all the subjects and the same and something of the feels be probably the perhaps its evil and the suremid in the repudiated in the present and precisely such and made the man, something still reconcince of the soul as the distinguished in the most errorad, in the subjects with the precisely the master is all
to the houseless a far the masters of the distinguished to give it is all the desire in the reality: and with this state of the present 
------ temperature: 1.0
e desire in the reality: and with this state of the present in the weaker man on shame
logic iviled. as they wishis of
husing orly forcefer
the educatic elwers and intellect, abone of the distrangity with the supimic all that the nuly to self-noble standard to sense to pempletize befor the will". in the do and literment of theor type, refused by victous concerts whorear means of
antiquity is into creaturally to one would more in the
pemirance of the presen
------ temperature: 1.2
creaturally to one would more 

wintere congeaf th4m thee thee ther wen txievended the of thee andal(avxxane and the  a
( an theen and ast on the on fals thpe the and to mame tvery and the a: thy be centh 3uncey the alk andins thecofer th to there a fau t bred the a fe wh ? 
of-the 
------ temperature: 0.5
ins thecofer th to there a fau t bred the a fe wh ? 
of-the arly allif the is ofrd denter the of the ore a
che of mhe
iss atciher me it an the thee and allez hone tijower be ofe iml it or ont reter andheh aodlers of they wereleer w!ar faun-tekest moal cced in so they andatfe fere "off--the nigecthon at) ou cisst theip of
meaye hin thedepem tasi(telre can os, thetinn in of nheured wever-fechaynis if ech as gat tore abd anduyed perelad if in u and
abte thate
------ temperature: 1.0
f ech as gat tore abd anduyed perelad if in u and
abte thater ist
thuo btuiluvleensor icosebityg andfominol tnot cotde veru= the  estelhf fisise, and in taur toters to homu bese, assemeovel whi d way
truot co s'r'-cst,g
nechs willh hagheis


As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.

[Bewertungslink für dieses Seminar](https://ratings.gfu.cloud/form.html?h=13af2c5b3c8bafe32962bc571d99361b748b4f3af452c51ae749c7261442ddb5&type=trainer)