# Character-by-Character text generation using a language model
![title](pics/text_generation.png)

In [39]:
#import 

#keras imports
import keras
from keras import layers

#general imports
from IPython.display import display, Markdown #just to display markdown
import random
import numpy as np
import sys

## Downloading and parsing our initial text file

In [23]:
path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()

print ("Corpus length:"+str(len(text)))

display(Markdown("### Initial text:"))
print(text[:500]+"\n")

Corpus length:600893


### Initial text:

preface


supposing that truth is a woman--what then? is there not ground
for suspecting that all philosophers, in so far as they have been
dogmatists, have failed to understand women--that the terrible
seriousness and clumsy importunity with which they have usually paid
their addresses to truth, have been unskilled and unseemly methods for
winning a woman? certainly she has never allowed herself to be won; and
at present every kind of dogma stands with sad and discouraged mien--if,
indeed, it s



## Vectorizing partially-overlapping sequences of characters

In [36]:
# Length of extracted character sequences
maxlen = 60
# We sample a new sequence every `step` characters
step = 3
# This holds our extracted sequences
sentences = []
# This holds the targets (the follow-up characters)
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))
# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)
# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

    
print ("\nafter data encoding:")
print('encoded text shape:', x.shape)
print('encoded target shape:', y.shape, "\n")
 
print("\nFirst 5 data samples & targets\n")


print ("text sentence")
print (sentences[:5])

print ("\ncorresponding characters to predict")
print(next_chars[:5])

Number of sequences: 200278
Unique characters: 57
Vectorization...

after data encoding:
encoded text shape: (200278, 60, 57)
encoded target shape: (200278, 57) 


First 5 data samples & targets

text sentence
['preface\n\n\nsupposing that truth is a woman--what then? is the', 'face\n\n\nsupposing that truth is a woman--what then? is there ', 'e\n\n\nsupposing that truth is a woman--what then? is there not', '\nsupposing that truth is a woman--what then? is there not gr', 'pposing that truth is a woman--what then? is there not groun']

corresponding characters to predict
['r', 'n', ' ', 'o', 'd']


## A single-layer LSTM model for next-character prediction

In [41]:
print ("Starting model archiitecture development")
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax')) 
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
model.summary()

Starting model archiitecture development
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_1 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT
Given a trained model and a seed text snippet, we generate new text by repeatedly:
- Drawing from the model a probability distribution over the next character given the text available so far
- Reweighting the distribution to a certain "temperature"
- Sampling the next character at random according to the reweighted distribution 4) Adding the new character at the end of the available text

In [38]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

### The text generation loop

In [None]:
for epoch in range(1, 60):
    print('epoch', epoch)
    # Fit the model for 1 epoch on the available training data
    model.fit(x, y,
              batch_size=128,
              epochs=1)
    # Select a text seed at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)
        # We generate 400 characters
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]
            generated_text += next_char
            generated_text = generated_text[1:]
            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
--- Generating with seed: " all kinds of injury and loss the lower and coarser soul is
"
------ temperature: 0.2
 all kinds of injury and loss the lower and coarser soul is
the period of the sense of the sense of the period of the sense of the sense of the sense of the most present and pertested the sense of the power to the sense of the everything the sense of the fact of the sense of the belief the period of the sense of the sense of the sense of the sense of the master of the sense of the german of the destruction of the sense of the sense of the sense of the conc
------ temperature: 0.5
struction of the sense of the sense of the sense of the conceited, the our person be sint that should be regarded the fact of the advantage with the entired, and the future no belongs and everything to period of the short, and moder the may be srine that the sensition of belief as become the interpretationation of love of the absolutely different of the subliges, one of the master 

  This is separate from the ipykernel package so we can avoid doing imports until


in percoses allness,
adimes. belon posstaniati: life
in he bentsbleke. gensal. as good, ssspress
epoch 3
Epoch 1/1
--- Generating with seed: "nd whatever objects he may encounter must
suffer from the pe"
------ temperature: 0.2
nd whatever objects he may encounter must
suffer from the person with its act the men and the struct, the struct of the destructive and being and profound the structs of the strong propering the world of the spirit the words of the strength of the struggle of the struch the strength and the struct of the strength of the man is a still sublines and man is a thing in the intellectual strength of the man is a man is a man is a man and the profoundly and the s
------ temperature: 0.5
 man is a man is a man is a man and the profoundly and the sense of new and religion and with the "more spirit which an accounts, the precisely and human and a more experience, and the weld be must have the subject, the as the sense of the except and stronger and the man is
an acconscie

y that ancient word knows soous are things itself as the bignige ruspond out of the regarded, the acimis of reflected.
by the kind,
wi man
work, as but withinglic itself, and (we "maddoe recoufinit consequences of which one
case of as prutibly
rezing
of enception of good, to the philosophy wherchly
of the staf,   just opposed
blood soulse of slends
oneranty, et he sut upon has metpzed for example and
purely. in orden 
gindl; always
leasting, cause of striv
epoch 7
Epoch 1/1
--- Generating with seed: "doubtful mediocrity;--supposing a statesman were to condemn "
------ temperature: 0.2
doubtful mediocrity;--supposing a statesman were to condemn and more believe of the states of the strengthers of the strengthers of the personal and particule of the more of the stronger of the strength of the stronger of the stronger and something and the world and more personal strong and something the same more believe of the strengthers of the states and his own something the strength of the world and

interpretation of the germans of the states of have the thist, attack its consideranced, lay vicgoroul, the understand that it is syventure-diverbofving, through even comprehined
classgly, on the intelligence is long religious than generality," how futce, is pinesvariated old sacrifice and not rangdism of all perhaps really that in
no at ideas tible of
sundurated with
a
still our contrum infinated when he seanigs and them
world that like anything is posses
------ temperature: 1.2
 when he seanigs and them
world that like anything is possessigable most lightence would be
tear
with error" egor unsuch as preate, hutosils" and trangrouse--"and good is, affective,"
bstering the world spirit awart no revererical onlriod centurie to laws that harders of spiritably to howen been things of
acty intemreged than its beenly dofs ruling shrwe-njou shi! is feels and lead such what is
more from it" principre, for cause for scand.

1mine, deep, be
epoch 11
Epoch 1/1
--- Generating with seed: " which
s

soul of the spirit and so that the soul, and the spirit is an appeal philosopher it and skeptic of the world and sacrifice and same men to self-constitutes the one of the spirit which even well of the world souls the most power--the religion of the individual means of the account, and sense of the spirit and compared not present, of the sensitied as a means of the philosophy of the sense and contrary endourable to a past neighbour the power in the least in
------ temperature: 1.0
ary endourable to a past neighbour the power in the least in the for upon it. for evilting may be otherwise, of precisely sything with which is soathen that when ethen oughts os a train into one
must not
have to the contrcriftices an intenspospen, a so't wite been until a alone or vident in a mife of life and feelshe to over made of the
pepity itself it is believe to thing other human in fact, which fearing from all to lade, au race, the belief under himse
------ temperature: 1.2
ch fearing from all to lade, a

above which he himself has developed--he will also and the superioring the still the condition of the art of the still and the and the moral and constraine and the stronger soul as the will to be a man and the condition of the noble in the sense of the interestion of the superioring and the fact that which is there are the problem of the superior of the spirit and the and sould and the superior the more all the still and sense of the s
------ temperature: 0.5
d and the superior the more all the still and sense of the state of the condition to metaphysical more under the principle of suttrack and first and and deepers and soul, in
the belief as explice, and with which one of superior, and it is not so dered to place of the there sun and common the former the its of the most problem has there is also suffering and its approved with its absolute as the deferted as a cause of the sensitide and history of the sin to
------ temperature: 1.0
ferted as a cause of the sensitide and history of t