<a href="https://colab.research.google.com/github/Richish/deep_learning_with_python/blob/master/ch8_generative_deep_learinng.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Text generation with LSTM

### Reweighting a probability distribution to a different temperature

In [None]:
import numpy as np
def weight_redistribution(original_distribution, temperature=0.8):
    distribution=np.log(original_distribution)/temperature
    distribution=np.exp(distribution)
    distribution=distribution/np.sum(distribution)
    return distribution

a=np.array([0.1, 0.2, 0.3, 0.2, 0.1, 0.1])
weight_redistribution(a)


array([0.08543321, 0.20319555, 0.33730927, 0.20319555, 0.08543321,
       0.08543321])

### Implementing character-level LSTM text generation

In
this example, you’ll use some of the writings of Nietzsche, the late-nineteenth century
German philosopher (translated into English). The language model you’ll learn will
thus be specifically a model of Nietzsche’s writing style and topics of choice, rather
than a more generic model of the English language.

#### Downloading and parsing the initial text file

In [None]:
import keras
import numpy as np
path = keras.utils.get_file(
'nietzsche.txt',
origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600893


#### Vectorizing text

You’ll extract partially overlapping sequences of length maxlen, one-hot encode
them, and pack them in a 3D Numpy array x of shape (sequences, maxlen,
unique_characters). Simultaneously, you’ll prepare an array y containing the corresponding
targets: the one-hot-encoded characters that come after each extracted
sequence.

In [None]:
maxlen = 60 # sentences of length 60 chars each
step = 3 # sentences sampled at an interval of 3 characters
sentences = [] # list of sentences extracted.
next_chars = [] # list of next character for those sentences

for i in range(0, len(text)-maxlen, step):
    sentences.append(text[i: i+maxlen])
    next_chars.append(text[i+maxlen])
print(len(sentences), len(next_chars))
sentences[0], next_chars[0]

200278 200278


('preface\n\n\nsupposing that truth is a woman--what then? is the', 'r')

In [None]:
chars=sorted(list(set(text)))
len(chars)
chars[0:5]

['\n', ' ', '!', '"', "'"]

In [None]:
char_indices = dict((char, chars.index(char)) for char in chars)
char_indices['a']

27

In [None]:
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)

for i, sentence in enumerate(sentences):
    for j, char in enumerate(sentence):
        x[i, j, char_indices[char]] = 1
        y[i, char_indices[next_chars[i]]] = 1


Vectorization...


#### Single-layer LSTM model for next-character prediction

In [None]:
from keras import layers
model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)


#### TRAINING THE LANGUAGE MODEL AND SAMPLING FROM IT
Given a trained model and a seed text snippet, you can generate new text by doing the
following repeatedly:
1 Draw from the model a probability distribution for the next character, given the
generated text available so far.
2 Reweight the distribution to a certain temperature.
3 Sample the next character at random according to the reweighted distribution.
4 Add the new character at the end of the available text.
This is the code you use to reweight the original probability distribution coming out
of the model and draw a character index from it (the sampling function).

In [None]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

##### Text generation loop

In [None]:
import random
import sys
for epoch in range(1, 60):
    print('epoch', epoch)
    model.fit(x, y, batch_size=128, epochs=1)
    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1
            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]
            generated_text += next_char
            generated_text = generated_text[1:]
            sys.stdout.write(next_char)

epoch 1
Epoch 1/1
--- Generating with seed: "rits--and some day perhaps such will
actually be our--posthu"
------ temperature: 0.2
rits--and some day perhaps such will
actually be our--posthulish of the most there all there is all sure there is and suppining there is in the most there is something there is is and and and in the courses and more there is and there and and and suppinity and farthely of the religious and and all there is all suppinity of the more there is is is all surpession of the more there is a more and and and there is something there is is is a more there is in the------ temperature: 0.5
and there is something there is is is a more there is in there is intentent of the man and simpless his and interpless, and nother the most propess of the most them a vasulishts and man in the instrunce this such every there is man be any residence of the manter, and iminst there and frights spirition of anchull the somations. there is and according--in there is its likerty and for 