# Text generation with LSTM

## The importance of the sampling strategy

生成文本时

1. 每次选概率最高的
2. 按概率选择下一个文本
3. softmax temperature：加入 temperature 参数，控制概率


## Implementing character-level LSTM text generation

In [3]:
import keras
import numpy as np


path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


In [4]:
maxlen = 60
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters: 57
Vectorization...


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = np.zeros((len(sentences), len(chars)), dtype=np.bool)


In [5]:
from keras import layers


model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

2022-08-31 10:05:59.729451: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-31 10:05:59.733715: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [6]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

  super(RMSprop, self).__init__(name, **kwargs)


In [7]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [None]:
import random
import sys


for epoch in range(1, 60):
    print('epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1)
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]
            generated_text += next_char
            generated_text = generated_text[1:]
            sys.stdout.write(next_char)

epoch 1
--- Generating with seed: "mean healthier. it is wisdom, worldly wisdom, to administer
"
------ temperature: 0.2
mean healthier. it is wisdom, worldly wisdom, to administer
and spirit of the man is the spirit and strengly and man has the self--and man and speal that the spirit of the spirit of the spirit of the spirit and the spirit of the spirit of the spirit of the spirit of the spirit of the man and special for the spirit and spirit of the man is the spirit of the special man have the self--and the spirit of the self--and in the far and spirit of the strengly and ------ temperature: 0.5
 of the self--and in the far and spirit of the strengly and probless and conscience of dispect of discition, the gentary to have that far the same to the most
hans it is feeloded things
to the epertion
for the seems of the more long--and itself the defires that the
spirits of man and the preadness of the command allistred of the instance and from and consequially spicilation of the into itsel

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



--- Generating with seed: " fact that philosophizing concerning morals might be
conduct"
------ temperature: 0.2
 fact that philosophizing concerning morals might be
conduct the many of the refinement of the soul, and a man is a spirit of the super of the most self-conditions of the most self-condition of the most selfotogations of the most self-consequence of the best of the most selfotal and the fact of the soul, and the more and the selfots of the present of the experience of the more and such an and the present of the selfision of the most similar of the present ------ temperature: 0.5
present of the selfision of the most similar of the present in the pleasure. therefore, and concerning the development of our enthulith a something of the most distinguished and attained with the more necessity of the subject of the revilisonion and enthulithest are intellecting stronger period; it is a man of the soul, and the was who art to the pleasure of the devil to great the strength and formed

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



--- Generating with seed: "oken, proud, incurable hearts (the cynicism of
hamlet--the c"
------ temperature: 0.2
oken, proud, incurable hearts (the cynicism of
hamlet--the conduct of the same all the soul of the present of the subjection of the subject of the conscience of the conscience of the science of the same the conscience of the most present of the sense of the sense of the sense of the sense of the conscience of the sense of the sense of the spoty of the sense and the world of the strength of the powerful of the same the fact of the sense of the sense of the ------ temperature: 0.5
erful of the same the fact of the sense of the sense of the very decession of the bose of wholly a sense, he are good still spread believe and sense, and more world of the belief in the preparation--in the senses of its allow in the only of the religious still belongs of the greater here as the greater in the great in the single of the errmably of the postrivate of which has to be lare has cannot sac

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



t of the subjugate of the german subjugated the self-contemptions of the self-contemptions of the fact that the sublime of the contrary t------ temperature: 0.5
-contemptions of the fact that the sublime of the contrary to learns and man of the belief to the succe, in which the subjugate of a dance that it is a powerful regard to man is something in the will to be command-good and a whole the consequently of which the strength of the will of the genius, the concealment, in the hundred of the power of the case of such a powerful in every intrigking the significant of the values of
the "most present of the particiat------ temperature: 1.0
nificant of the values of
the "most present of the particiatic briefele," as above to gods are earliest prograntens of
what selver): the tell of the is apparently france herice. waller world view, ha. pare of contrary
gratities as the my just
values,
one does it
fritij; to love to be old! 
     py so
made of
once has the floring remast of the "indauses,

  preds = np.log(preds) / temperature


ering and, he is it
c------ temperature: 1.2
ever and decule, and, namely name.

14hering and, he is it
croncenly much destlustice of agrained a seriely instich sun to been are, and
hitherto been the case.--his envajctdor.

       contuuse, aniknmantity an,
bron-let a chuiliturates--ay out of a humand, whentor mo?ks that europe-gods: sonf an exist, the
refules,
finer religious
famausquied, as yet ill.--fob? that has the
meatantificers in secure, heated, the sidelenge-minule.sice, society
"sorn" by miepoch 21
--- Generating with seed: "4. the maturity of man--that means, to have reacquired the s"
------ temperature: 0.2
4. the maturity of man--that means, to have reacquired the same the standard of the belief of the same the standard of the fact of the same the fact of the subject of the consequence of the world and all the subject of the sentiment of the sense of the states of the same the fact of the sense of the fact of the greatest and subject of the most and all the consequence of 

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



ly, regarded things: and the entorte manpet only throughed--many, schoolotion that it not in all changes ojtes echoble more harmed have thirst, but the hably anguble steption of manners and
perhaps was not sy takes which be fing his greatesting. one said, ashquime--if it is curnaction, and ------ temperature: 1.2
s greatesting. one said, ashquime--if it is curnaction, and nothing and graditle, ration
orguise--prince regrements the belief and
refreshing
of out of ster ogrative kindss lighl. piemernating happless are most kind of evil, what mapthrarible ampet up for yheis un-our powerfidum that paid, conditions
which feel here satisfaction for will,--the saint danger; if "pren pologing find, thy
othing whold has case the
instincts in, again one stesped. he which depuepoch 25
--- Generating with seed: "tween the christian saints and the greek philosophers and he"
------ temperature: 0.2
tween the christian saints and the greek philosophers and here and the sense of the origin of the prope

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



--- Generating with seed: ", but because they are
absolutely lacking, and every power e"
------ temperature: 0.2
, but because they are
absolutely lacking, and every power experience of the same the state of the consequently and sense of the same the same the strength, to the same the same the strength of the same the same and the sense of the same the man of the same sense of the same the states of the same the strength, and in the same the same the consequently sense of the same the strength, which we sharker with a soul, and in the same the same to the same the st------ temperature: 0.5
ker with a soul, and in the same the same to the same the strength, with as an imageness of the sense of the enough of all permating and superiories of its desires in the most god, and consider more
from the merely belief in a salvates the sense of the child originate and sense of the greatest person as a purpose and almost the greatest to the danger of the basisge of the present in alloward withing 

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



oo powerful
sensued confins and shardly, in unit "a manifestiod"--what
is at the sabt careund ha mans thinking
shankeromed. and it fruegher comes the woluel--ho is all coad,
to delights himself. the sorm from life, want. buntlouss?--this could no extenting ester natur d------ temperature: 1.2
ife, want. buntlouss?--this could no extenting ester natur dringe. then
differenty, if
a artaint, which, and without theore ornessed and things
gr. unto being down gleast. wherevourding, gyrat must have been coindrish
aroung
like
all the yhese unrefigious to know may conkxxiding
moginationy of the permytned necessary to life greate thereby thereby the alar,
voltation,--without trus tiod fundamentally
than daverso denial, without prelimicateds--and, among pepoch 34
--- Generating with seed: "s thus, in effect, that method ordains, which must
be essent"
------ temperature: 0.2
s thus, in effect, that method ordains, which must
be essentially and the spiritual philosophy and believe to the same the s

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



i------ temperature: 1.0
 impulse of the science of the same profuming the attachs: it is errorwanted didness understands
and perpreaching, admoral conviding which lay
and
opinion, and finally yething honong ranbly
in an abond what the most prudence of go other. then well afford. has existing prevailed
foundations in the chamority is infliction only otherwise old repirt: "they wary, and it is the boddleness beforeest, what venture to losing, what cracbly pretimate parious greatest------ temperature: 1.2
t venture to losing, what cracbly pretimate parious greatest. it almost valuish acilly sticler -moinate interpolune instrubety of one's development psyet most new sancting etgaineds, of hitherto age, there may doneleners sensse itself hardiness wh hanfouron
it
so, the abot
know the imppression of repressed can be
strant, everything
a
mediocrisis which
knows always with
theorioy of youth nentiraf? veter-pitumnanity. toot odeal means. my causes
by great typeepoch 38
--- Generating with se

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



--- Generating with seed: "test possibilities, and how often
in the past the type man h"
------ temperature: 0.2
test possibilities, and how often
in the past the type man has always be all possibility of the contempt of the family and contempt of the sense of the subject of the superior of the fact of the sense of the subject of the contempt of the contention of the subjection of the subject of the superiority of the superior for the fact of the proper concern of man who has always also a superficial faith and for the subject of the sense of the most spirit is the s------ temperature: 0.5
and for the subject of the sense of the most spirit is the sense of will in the possibility of the free spirit is a contemption of men in the great existence of the world and super that the soul has not been the
historical soul was a soul is the regards to philosophy is the function is a sign as the general surpor of the scholary are this for the soul of the german degree of the functional of the ins

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



--- Generating with seed: "dition. the individual is
almost automatically bound to rule"
------ temperature: 0.2
dition. the individual is
almost automatically bound to rule the same of the same the same the same of the same of the sense of the most surplus of the most fact of the subject of the most surplus of the same and for the same really and the subject of the same of the present and the most successful to the problem of the same the same will to the same and the most gratitude and desire the personal consequently the same of the same will to the same will the ------ temperature: 0.5
consequently the same of the same will to the same will the problem of the character in this devil of the spirit of the senses of consideration is the art of strain and true of the philosophers as a demands of morality in many one and the most something will understand them the for the consequences of the have a faculty of the same will to rest for the consequences of the belief of the truth of all t

Note that by training a bigger model, longer, on more data, you can achieve gen-
erated samples that look much more coherent and realistic than this one. But, of
course, don’t expect to ever generate any meaningful text, other than by random
chance: all you’re doing is sampling data from a statistical model of which characters
come after which characters. Language is a communication channel, and there’s a
distinction between what communications are about and the statistical structure of
the messages in which communications are encoded.