This exercise is based on https://keras.io/examples/lstm_text_generation/

In [2]:
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM, GRU
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
import numpy as np
import random
import sys
import io

Using TensorFlow backend.


In [3]:
path = get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

corpus length: 600893


Create a training data set of sentences of length 40

In [4]:
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
j=0
for i in range(0, len(text) - maxlen, step):
    j+=1
    #print(j)
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

print(x.shape)
print(y.shape)


total chars: 57
nb sequences: 200285
Vectorization...
(200285, 40, 57)
(200285, 57)


Implement a LSTM-based RNN (hidden layer-size: 128)

In [11]:
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

model.summary()

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 128)               95232     
_________________________________________________________________
dense_5 (Dense)              (None, 57)                7353      
Total params: 102,585
Trainable params: 102,585
Non-trainable params: 0
_________________________________________________________________


In [8]:
print('Build model...')
model = Sequential()
model.add(GRU(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars), activation='softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

model.summary()

Build model...
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
gru_2 (GRU)                  (None, 128)               71424     
_________________________________________________________________
dense_4 (Dense)              (None, 57)                7353      
Total params: 78,777
Trainable params: 78,777
Non-trainable params: 0
_________________________________________________________________


Implement a method for temperature-based sampling

In [9]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Implement a callback function for text-generation (temperatures: 0.2, 0.5, 1.0, 1.2 and 1.5) and train your model for 5 epochs.

In [12]:
def on_epoch_end(epoch, _):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=20, verbose=2,
          callbacks=[print_callback])

Epoch 1/20
 - 80s - loss: 1.9727

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: "g and becoming strong, equilibrium, ball"
g and becoming strong, equilibrium, bally the sensed there in the have and and and and and and the have is the more that it is the the been and and the feeling the more the seeling of the mears, and the more of the self-still that is the disited the higher the have and more the more the more and and the sore and the soul the say and the more and the sensery and the more the seeming the does that is the the more the well the messical the
----- diversity: 0.5
----- Generating with seed: "g and becoming strong, equilibrium, ball"
g and becoming strong, equilibrium, bally stater and the seeming mending the mader
to the mas considing to the amperion even the
seems as for the facted of the messioned that to the forms of the reriest have and of there is not becomes inded the belies there indivilual to be recress, as still that itsel

that is even the exall orture so the inner, of understanding of
goodness, it
wowance, even to shatter prieralth. less lives still with
----- diversity: 1.2
----- Generating with seed: "omething of the echo
of the wilderness, "
omething of the echo
of the wilderness, food,
ryolten "bottver, let shame attempt and other of vias:" od of should be thlrey
who tramitolon attrours: forly. as a
fereang?--therselve searmenc, the reyermint. "and bytinas "amongraticcs from you yehe,"..: eothering exeanly
we is singlous science with cnever
letip. on
itself. there"
this to realsing cyance of natmerery; those case aftined.."y sthong ahevyl let spied nowaw pay of males of th
Epoch 5/20
 - 79s - loss: 1.4673

----- Generating text after Epoch: 4
----- diversity: 0.2
----- Generating with seed: "ne points the young,
and in view of whom"
ne points the young,
and in view of whom he still the state of the sense of the senses the state of the world the senses and sensed the state and instinct of the senses 

 spectator, inspired, not by the scientific portaling to the speaks and into baver of its spredematic of the tasted as the prost and the most indeed, but he have that the other also of the enduristic of the properious the self-soul. the individuals that is a more
and present the conscious explained and who calls because of the
heart, original cruse of consequence of the consideration to life, the scientific "stronger in the experiences,
----- diversity: 1.0
----- Generating with seed: " spectator, inspired, not by the scienti"
 spectator, inspired, not by the scientifics is all mensure calls ratherous liberalizesre his case of my believe, whether hely and its dudgly inteal soul). if instant sade liminizon
in
instantly; dreed to others because to undering of sensation, frologable is taking that he are of especitants, act it wer: one certains,
if seeks to that the wholence ey is resultrances to culture's seperce itself the
spredemated, the willed to europe.

 n
----- diversity: 1.2
-----

enlightenment--a banner bearing the sense and the world of the superstitions of the superstition of the self-dount the state of the superstitions of the subject of the superstitions of the thing and sense of the supernation of the supernation of the conscience with the same tent in the self-decerve the supernation of any contemptions of the extrame and always and the superstition of the can be seems and superfaction of the supernati
----- diversity: 0.5
----- Generating with seed: " of
enlightenment--a banner bearing the "
 of
enlightenment--a banner bearing the fundamentation of the explanation of the best the would not error the result and and in the more still distrust, he has when the notion of the soul and deciving bad to all appear life. the
woman and deception of the doubt, more conscientific originates he not a different could be reading that all and sense and distress that the the compessions and soul of the worst not and manifests he fear of the
----- diversity: 1.0
----- Gen

  after removing the cwd from sys.path.


eterioratic intention in a virtue who germany, i. no com
----- diversity: 1.2
----- Generating with seed: "ort, which one would like
nowadays to la"
ort, which one would like
nowadays to lapde,
catter,
matter, bhoughter. it is edi
sub"gvness, but
restlow procepted
ast or powerfours"?"--one esse, hadet and surious
instracts you
determinatier, an explos
pdudine, certain
live thus is most
away repeaded enough away begros, slaven
cruter." ever revelse of the case would "fasscent phise ,ol
the naturas, and stood--the nature,
prevalue moral.




mar fricounces; serious
taucl"--uncondin be
Epoch 16/20
 - 82s - loss: 1.3536

----- Generating text after Epoch: 15
----- diversity: 0.2
----- Generating with seed: " despised, is deemed bad. in the communi"
 despised, is deemed bad. in the community of the properation of the propery of the such a so the such a sure and in the propery of the conception of the substituted and individual man is a religion of the subtlered to the propery of the propery

in the most last the mind and music oldes--the philosophy of the stands wishes the sense and aspicity, the consequences to the feeling the whole wishes ourselves of the skeptical the fact that he have the world to ourselves of an an experient of the profound in the value of the very thing of the world the excha
----- diversity: 1.0
----- Generating with seed: "ws that in his century all the impulses "
ws that in his century all the impulses habitable occable, would preaching. in the goal and subtlesseful racence, and it is a moral
philosophy
self intorminations of mindled of which them even them, contemption. is unpportion of his twreat all chass good talkness of muil and
ske=c
to the human upon "furede-nuse if a notionalize what is the world-mind, the speiler even depored by every a live
nature and ethrorsy of the cdood of them woul
----- diversity: 1.2
----- Generating with seed: "ws that in his century all the impulses "
ws that in his century all the impulses of nobless conmotit co

<keras.callbacks.History at 0x7f9038795668>

How many trainable parameters does the neural network from exercise 2 have? Explain your solution!

Answer: The LSTM has three gates (input gate, output gate, forget gate), and the memory cell, each of which has independent weights connecting the input from the previous layer and the hidden layer from the previous time step to the current state, as well as biases. The number of trainable parameters for an LSTM with a hidden layer size $N_\mathrm{h}$ and an input size $N_\mathrm{i}$ can therefore be calculated as:
$$N_\mathrm{LSTM} = 4(N_\mathrm{h}^2 + N_\mathrm{h}N_\mathrm{i}+ N_\mathrm{h})$$
In our implementation the hidden layer of the LSTM at the last time step is fed into a fully connected layer with a size of $N_\mathrm{o}$, adding 
$$N_\mathrm{Dense} = N_\mathrm{o}N_\mathrm{h}+N_\mathrm{o}$$
trainable parameters.

The total number of trainable parameters of our network can therefore be calculated as
$$N_\mathrm{Trainable} = N_\mathrm{LSTM} + N_\mathrm{Dense} = 95232 + 7353 = 102585$$

Replace the LSTM unit from exercise 2 with a GRU with an equally big number of the hidden state neurons and run your experiment again. How many trainable parameters does the new network have? Explain your solution.

Answer: The architecture of a GRU is quite similar to the one of an LSTM, but the GRU gets rid of the output gate leading to only 
$$N_\mathrm{GRU} = 3(N_\mathrm{h}^2 + N_\mathrm{h}N_\mathrm{i}+ N_\mathrm{h})$$
parameters.

The total number of trainable parameters of this architecure is therefore slightly lower than for the LSTM one:
$$N_\mathrm{Trainable} = N_\mathrm{GRU} + N_\mathrm{Dense} = 71424 + 7353 = 78777$$