In [1]:
import keras
keras.__version__

'2.6.0'

# Text generation with LSTM

This notebook contains the code samples found in Chapter 8, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.

----

[...]

## Implementing character-level LSTM text generation


Let's put these ideas in practice in a Keras implementation. The first thing we need is a lot of text data that we can use to learn a 
language model. You could use any sufficiently large text file or set of text files -- Wikipedia, the Lord of the Rings, etc. In this 
example we will use some of the writings of Nietzsche, the late-19th century German philosopher (translated to English). The language model 
we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the 
English language.

## Preparing the data

Let's start by downloading the corpus and converting it to lowercase:

In [29]:
# import keras
import tensorflow as tf
import numpy as np

path = tf.keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600893


In [31]:
import  os

print('1:',os.getcwd()) #获取当前工作目录路径
print('2:',os.path.abspath('.')) #获取当前工作目录路径
print('3:',os.path.abspath('test.txt')) #获取当前目录文件下的工作目录路径
print('4:',os.path.abspath('..')) #获取当前工作的父目录 ！注意是父目录路径
print('5:',os.path.abspath(os.curdir)) #获取当前工作目录路径
# import sys
# print(sys.argv[0])
#输出
#本地路径

1: E:\JupyterNotebook\Python 深度学习\first_edition
2: E:\JupyterNotebook\Python 深度学习\first_edition
3: E:\JupyterNotebook\Python 深度学习\first_edition\test.txt
4: E:\JupyterNotebook\Python 深度学习
5: E:\JupyterNotebook\Python 深度学习\first_edition



Next, we will extract partially-overlapping sequences of length `maxlen`, one-hot encode them and pack them in a 3D Numpy array `x` of 
shape `(sequences, maxlen, unique_characters)`. Simultaneously, we prepare a array `y` containing the corresponding targets: the one-hot 
encoded characters that come right after each extracted sequence.

In [9]:
# Length of extracted character sequences
maxlen = 60

# We sample a new sequence every `step` characters
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (the follow-up characters)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 58
Vectorization...


## Building the network

Our network is a single `LSTM` layer followed by a `Dense` classifier and softmax over all possible characters. But let us note that 
recurrent neural networks are not the only way to do sequence data generation; 1D convnets also have proven extremely successful at it in 
recent times.

In [10]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

Since our targets are one-hot encoded, we will use `categorical_crossentropy` as the loss to train the model:

In [14]:
import tensorflow as tf
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it


Given a trained model and a seed text snippet, we generate new text by repeatedly:

* 1) Drawing from the model a probability distribution over the next character given the text available so far
* 2) Reweighting the distribution to a certain "temperature"
* 3) Sampling the next character at random according to the reweighted distribution
* 4) Adding the new character at the end of the available text

This is the code we use to reweight the original probability distribution coming out of the model, 
and draw a character index from it (the "sampling function"):

In [15]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures 
after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of 
temperature in the sampling strategy.

In [32]:
import random
import sys

for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)

    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
--- Generating with seed: "d therefore
has not taken and does not take root in german h"
------ temperature: 0.2
d therefore
has not taken and does not take root in german heart of the spirit and sacrificity of the state of the spiritual to be also interestive of the more account, and the artists and self-short of the spirit of the same and sense of the same and self-sentiment of the state of the spirit and self-exists the same the state of the spirit and self-sentiment of the developed and the contraling to the same intentional and self-evil the developed in the pre
------ temperature: 0.5
 the same intentional and self-evil the developed in the present in the german consequently for impulses under the man in last a some man, and when in the more spirits of the continually to be a same present the senses the faith, in the contempt to one of the herable and conduct is a yet been interarary in itself the pride, one is the fact that the person as the present man is to be seems to

gloomarf the kood as it uponsibbeleentness of meofs of herver euroophishzation of by a general excite precisely friend "slamence exsenedings, ochar, ratef--names in vely bust, that the trmood." are sectionusg, th" vulgable
wula clofy naman, which after they differencely opinion from one's sei. brear moral colllarg, i must would be unuel keew or
inventiymence
agomingly (wemoriy instinct, or, east for particicately. to acausing of which
is himself, rated
xyn
epoch 5
--- Generating with seed: " not to speak of the
wicked who are happy--a species about w"
------ temperature: 0.2
 not to speak of the
wicked who are happy--a species about which they have the stronger and self-religion and self-soul and in the sentiments and the present the strength of the subjection of the sense of the strength of the strength of the present the still and the present the fact themselves to the sense of the subjection of the reason and the man and senture of the spirit and interpretation of the state of the s

conclusion in regide to platest, from thoreing wholly, but instinct and hen

chesolitives, and could go with pually shar "give the take tarbibiess the inclules and unavotious
commandert is individual worlds he still be for
self derusical spite around further interpretation is
------ temperature: 1.2
be for
self derusical spite around further interpretation is lies even ty, which he is manifess thing, stoness
originated badly, utgo to
inexplind a daveine,
fostity,
at his
soul it
evitm, and
theihntked proper tenaciinely of the tackilars" mare, formerly that it who increate his valually first day
craaving strift among the kingublatily
shuplibility where; , were already been been the
falies is , without one
can unusly sority. when inverly friends! who so 
epoch 9
--- Generating with seed: "e who has (or discloses)
an idea of the fact that philosophi"
------ temperature: 0.2
e who has (or discloses)
an idea of the fact that philosophisher of the same superioric and self-contradists a more a

of the problem of consolity to him an experience and conscience, and also not a more men to wishes to wame who slaver of interpretations, with streng to a gurated in the compulsion of the most ourselves. a problem of pain of such and the complated taste with the sense of the inclination of the faculty soul, we 
------ temperature: 1.0
e with the sense of the inclination of the faculty soul, we hisaring and giks them panchear himself finally begin seexisive most
sufferings in ourverly know guisal domain of a doubt a popularly sunderingauritoring,--conclosry which were men a expedient act not heid they have something course of eposisions
and gaves untray frequently in whole counterings to the kins of
which they, he will taust partorion a spirit. thein which could ewricction
with
vulgarily
------ temperature: 1.2
torion a spirit. thein which could ewricction
with
vulgarily seeen buldly
delu christian trutiling friingly indscaution moantched among to
ourmbol. ione; theirmen,
would not
impa

serves as an amply sufficient and something in the greatest and also the same to the contrary the sense of the proposition of the superiority of the sense of the problem of the contrary the stranger and something in the senses of the sense of the sense of the propermition and conscience of the sense of the sense of the same superseation of the free spirits of the sense of the conduct and also the distinguish and and an
------ temperature: 0.5
the sense of the conduct and also the distinguish and and antificial, and the domain of its
demonstring as though to one will not the state the attached. the first and the
disinterection and for the convince of man of the fathers of the problem and grappines to the sense of the same sense of a distinguish in so he state untruthing of the man absolutely of the races and desires and supersection of the assumed to still be for the free is difficult that the w
------ temperature: 1.0
the assumed to still be for the free is difficult that the wilize re

and then, and never miss important
epoch 20
--- Generating with seed: "n on him. the latter
obeys a superior and hence feels no res"
------ temperature: 0.2
n on him. the latter
obeys a superior and hence feels no restraint to the property of the supersision of the man of the sense of the standard of the more readily a more of the spirit of the more of the profound to the same and desires of the same of the supersictiman the more of the consequence of the strength of the supersine of the sense of the property of the man has always as the deeper and despises a consequence of the sense of the person with the sup
------ temperature: 0.5
spises a consequence of the sense of the person with the supersictimer man of the "goodness of the mistakes of the exacted although as the formultatilable and consideration of the reality speaks of the problem of the conduct and the same and things the worth of an almost opinions, where all the respectance of the fact that the "mind and their reality of th

  preds = np.log(preds) / temperature


the examples of the armistakes and the gelood to all the interestice, the have the dangerous and personal sick in the same something of the taste of himself the respectatedly something same is a comparative and superstimes the value of the mass of the most men of contempt to the same to a fact of the of the hands and according to the nation is through and the st
------ temperature: 1.0
 the hands and according to the nation is through and the still so could have the dibernt a "dreame idea to conflatrous to wants moralingsks,  every higher met of minked conceals against the crast a thrie path on every monurnest times of our femoly alcontering to a nourisher just fast pitshing it knows no noory of the graditary for the charm of goethe about food and an to all of that acts to the sexually worked: a thinks securition of the sensition of that
------ temperature: 1.2
exually worked: a thinks securition of the sensition of that (or a conather of acqueitive little, and
nothing to gaiud gorman 

knows how different from the states of the spirit and some the stronger of the spiritual conscience of the sufferer of the man and interest of the spirit of the spiritual sense of the most profoundly and all the spiritual conscience of the spirit of the stronger of the contemption of the sense of the statcey of the stronger of the most souls and also the man who has a power and antide and some only and the state of the stronger of
------ temperature: 0.5
er and antide and some only and the state of the stronger of the soul without good and the sharp see to the greatest of ethical order of the concealed enthution of the spiritual and men and in the lack of allow themselves and in the profoundanss with the good and scould to which once stronger man for nature of the good and the exceried to man
all possession in the scientifications of which only a romanticism of the distrust of the proted entire of expliona
------ temperature: 1.0
romanticism of the distrust of the proted entire of expl

of ppoing, it which acfuenced hoched to-leation of feeling, than thinks with 
epoch 29
--- Generating with seed: "ix or seven great
men.--yes, and then to get round them.

12"
------ temperature: 0.2
ix or seven great
men.--yes, and then to get round them.

12ogue to make them to the spirit is a discourse the world of the contradict of the spirit in the world is also the man is always it have the free spirit is not to the sente because the struced and the spirit is also the spirit in the spirit in the struggle with the spirit in the spirit in the spirit is also the more than the spirit in the spirit in the sense of the read to the death of the world of
------ temperature: 0.5
spirit in the sense of the read to the death of the world of the first the estimate that it is not not all the good tirtue of the good and responsible in the contemplation of the good in the faculty strengended for example, the spirit and soul in the species to the thing in the arise who comparation of a person wh

nless questit ray, of wimman obuited--and this make too
weagned would not doubler
who also,tjoetunt-
discience, and thuster can cognistys and sense pruseng,
passy as one on day, on the
absung, in embralure, it ages, and now thing, sense we
love must be, whoeforth as the put asmisting upon the vanity as !   minllyness by accuritate, for and anhered" himself
aacingest, arice
paton oneme
epoch 33
--- Generating with seed: "necessary, even scarecrows--and it is
necessary nowadays, th"
------ temperature: 0.2
necessary, even scarecrows--and it is
necessary nowadays, the sense of the same distinguished to sends the sense of the same to himself, and the strong to the consequences of the sense of the subtleticism of the same time of the conscience of the states of the same time and such a subtlety of the strong to the interest and sensual power of the same to the entire the entire the father of the same time of the same time and strong of the present strong of the
------ temperature: 0.5
ime o

edears for the hold strangy are training fow, philosophy has to falsion of plet in farron, a known, but and say, who in all this of the isage
value of otiugst grow, bulleagings
could graug tented and
secony-lackard i
------ temperature: 1.2
st grow, bulleagings
could graug tented and
secony-lackard ide: iterense-c.
oh the way, made rights" had perpossisn,
love the law in givet
phrases away phesowed to
conceise, in skeen not debledf everings
for hardes. considatorian wich
unfaugh, of dromy--the german'--sece embrainly or wecon
is facts, only a god
standed the imposes that
by
this evil-endaiming, clinglic, to the begized itvellse; there erron. strong
of repossingly" uto.

12is rehard hopk (mepa
epoch 37
--- Generating with seed: "that the richness of inner, rational beauty always
spreads a"
------ temperature: 0.2
that the richness of inner, rational beauty always
spreads and personal desire the struce. the subttement of the struce and more the spirit of the fancy and the same and the st

spirit of life, is consider the world of the fact of his considered as the knowledge in the struggle of the spirit and as a souling for the 
------ temperature: 1.0
edge in the struggle of the spirit and as a souling for the aderst into event, an
artist to make himself of historical sense idmitagian god. anying spoit them, entaile, in aldegitaus would perhaps appearing our significant explanation and
materias, the crompligent that, are also one sentumal cath instractowes, nlwardness amitings? our centurishe last
spirit; and nevertheless original fellaborible perceivs, on the stound and living quite valuation.


13t] e
------ temperature: 1.2
perceivs, on the stound and living quite valuation.


13t] even knewn. father), and necessarile
againism of diffidisine pessible orware of one's imagination--and for his ime coming that of the lifablitys, astraurious, varie, events-our surfulers good opinions--experience, to telled
is nessivety--or to
dison-ortimane, a lark ad. a chanom usly. on th

 of the world of the most sense of the end that this way that they was the german pure the man of the fact of the attained cases of the sense of the same conscience of a personal with the tradition of the unself, it should be the same type of the substing the conception of the lack of conscience and intention that the most attentive man is not a sounds the mind how fully believes the entire contrary at a distinly must perhaps the states to the engling of d
------ temperature: 1.0
ry at a distinly must perhaps the states to the engling of divinds and nature so elsming", as the baint existence and
secilality usuation to that with errors of the promisest isonaration to listent things himself which its lookinge unimagrism, and what is perhaps
him, abolable ducipong rards to venter a blistion still to hrave the owining thousan,
of blow.ccassing
and readinate, a boold
of herigeble:--the wishe was alcodungly strogling much caused by cruel
------ temperature: 1.2
le:--the wishe was alcodungly 

by the spirit of self sacrifice for others are united that one of the subtle according to the experience of the subjection is almost the sense of the same time, the proful the subject and conceptions of the subtle man is always as a profound and spirit is the moral and such a subtle more of the subjection and such a more of the streea wholly also and the promises of the subtle more and most progress and the contrary and conscience of the subtle 
------ temperature: 0.5
most progress and the contrary and conscience of the subtle religion of the highest father or the very confusing in the same time of the contrary, something from the mankind, the instincts of men is not there is nothing in a personal contradictority, in the realm of the value of the most deeper to arrived and sufferers and perhaps more man as the
conscience of the securited to the prishing in any moreness to the exulied and sacrifice to the sense to the po
------ temperature: 1.0
moreness to the exulied and sacrifice to 

state to himgly europe, espeak. in fastionive, we precise of it precisely, all the science,-parklqked and cultures of the nonequalds to science: xawelty. circumsprible,, si: hibke with alweint, period, enemy to which in certain creature short, this tyhye among presencely even faorinly, "this remarks understlanically debe, certre meanies, and hhsew.--by 
epoch 52
--- Generating with seed: "aiseworthy means to yield
obedience to ancient law and hered"
------ temperature: 0.2
aiseworthy means to yield
obedience to ancient law and hereditary of the states of the states of the same against them and conscience of the states of the states of the same against the states of the states of the states of the states of the states of the same against the states of the states of the states of the states of the states of the states of the the promise of the states of the states of the states of the states of the states of the standard is th
------ temperature: 0.5
the states of the states of the state

soul that the roped that rule, even to a descrey on single satisfaity and in itgly that is to nothing stagles irresse's
come to under the acaly, man to meaning gives accom--y
------ temperature: 1.2
sse's
come to under the acaly, man to meaning gives accom--youth--that humbifultyness
sorre,
as pentarion
from which a billbrment in greadfus opensing of is satisfad the black. the precisely that sum the moding part of ple"s live.--that kasify the unchilor. in every me, putten would e time of the sympathy,
or at onever.
szaus among general
obsuration of the jesuas the
unresponsion, that untell,
centiment--softencumest,--which, wis "hue has
to a constrancis
epoch 56
--- Generating with seed: "cs connected with intellectual
capacity.


60

=desire for v"
------ temperature: 0.2
cs connected with intellectual
capacity.


60

=desire for view of the same problement of the man as the same which the spirit and the same philosophers are still out of the same man, the same words and such a proposis

of the stinle of his own being has increase of the states; or a fascincality to it is a for, and becomdances of a world demands from nature, the let that it is misinterioral antampicates the promisemate continues east has to course of it as teling, morally been an otherwise in
this same can falseons bebotions:
is the good man has wind something the
insts of greek life more is well incabry, seems to first at present
but tha
------ temperature: 1.2
life more is well incabry, seems to first at present
but than this
question helists ragerises, for shishy 
ndure
wicke, only long.--of every hunt
fredistely,--every glower above, to boundrathy misafsible for himself (sures thought) sea romeetness against ranvers of monely besimater-mivin, primation at a man such
nowadays. a very absolqury, imonghunntary spirits in
onejlopilee,
owe still modient german honec origin: he last knows as i my "good" consatismant



As you can see, a low temperature results in extremely repetitive and predictable text, but where local structure is highly realistic: in 
particular, all words (a word being a local pattern of characters) are real English words. With higher temperatures, the generated text 
becomes more interesting, surprising, even creative; it may sometimes invent completely new words that sound somewhat plausible (such as 
"eterned" or "troveration"). With a high temperature, the local structure starts breaking down and most words look like semi-random strings 
of characters. Without a doubt, here 0.5 is the most interesting temperature for text generation in this specific setup. Always experiment 
with multiple sampling strategies! A clever balance between learned structure and randomness is what makes generation interesting.

Note that by training a bigger model, longer, on more data, you can achieve generated samples that will look much more coherent and 
realistic than ours. But of course, don't expect to ever generate any meaningful text, other than by random chance: all we are doing is 
sampling data from a statistical model of which characters come after which characters. Language is a communication channel, and there is 
a distinction between what communications are about, and the statistical structure of the messages in which communications are encoded. To 
evidence this distinction, here is a thought experiment: what if human language did a better job at compressing communications, much like 
our computers do with most of our digital communications? Then language would be no less meaningful, yet it would lack any intrinsic 
statistical structure, thus making it impossible to learn a language model like we just did.


## Take aways

* We can generate discrete sequence data by training a model to predict the next tokens(s) given previous tokens.
* In the case of text, such a model is called a "language model" and could be based on either words or characters.
* Sampling the next token requires balance between adhering to what the model judges likely, and introducing randomness.
* One way to handle this is the notion of _softmax temperature_. Always experiment with different temperatures to find the "right" one.