<a id='Title-Language-Model'></a>
<h1 style="color:SlateGray;">Language Model</h1>

**Input dataset**

Beyond Good and Evil: the plain text form of Friedrich Nietzsche's book.

**Output classification**

Set of the 400 most likely characters to proceed a random sample from the text.

<h2 style="color:SlateGray;">Background</h2>

**Layers**

[*Dense*](01_Image_Classification.ipynb#Layers-Dense)

[*LSTM*](12_Basic_NLP_RNN.ipynb#Layers-LSTM)

**Activations**

[*softmax*](01_Image_Classification.ipynb#Activations-softmax)
		
**Optimizers**

[*rmsprop*](01_Image_Classification.ipynb#Optimizers-rmsprop)
		
**Loss functions**

[*categorical_crossentropy*](01_Image_Classification.ipynb#Loss-functions-categorical_crossentropy)

<h2 style="color:SlateGray;">Overview</h2>

*What is it for, conceptually?*

Demonstrates the capabilities of text-generation with RNN-based models.

*How does it work, mechanically?*

Uses a random snippet of text from the dataset as a seed to produce the initial probability distribution of possible next characters. This distribution is rescaled by a temperature, which indicates the amount of randomness-padding desired from the distribution, a next character is then selected and appended to the seed which will be used as the new seed for generating a .


In [1]:
import keras
keras.__version__

Using TensorFlow backend.


'2.2.2'

In [2]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
Corpus length: 600901


In [3]:
maxlen = 60
step = 3

sentences = []
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))

chars = sorted(list(set(text)))
print('Unique characters:', len(chars))
char_indices = dict((char, chars.index(char)) for char in chars)

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200281
Unique characters: 59
Vectorization...


In [4]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

In [5]:
optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

In [6]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

In [8]:
import random
import sys

for epoch in range(1, 5):
    print('epoch', epoch)

    model.fit(x, y,
              batch_size=128,
              epochs=1)

    start_index = random.randint(0, len(text) - maxlen - 1)
    generated_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: "' + generated_text + '"')

    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(generated_text)

        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

epoch 1
Epoch 1/1
--- Generating with seed: "ther they were
accustomed to commanding from morning till ni"
------ temperature: 0.2
ther they were
accustomed to commanding from morning till night to make the last of the sense of the sound of the demander the conscious and the conscious the self-consequence, and desire of the instance of the sense of the general and delight of the proper and the self-called the present the state of the conscious the spiritual the conscious and the desire of the fact the soul as the same the state of the present that it is the conscious the same the high
------ temperature: 0.5
te of the present that it is the conscious the same the higher, in the same to his bad into the prisench itself as if the individually called that the self-ciscial thought of all the trandary as his motioned a spiritual souls; and for the good" on a care or science. it is though and the principal suffering no fact a such and armine for the done that is the instinction which as a ma

hominss and viing, "swear develop attracity and perseebed only so--it is, permes. we diffast type perspent
who is mir: thused of homands of tend that all," with him not fastigable. 
peises and buinw bekenthwass, by obscurecy that beying the primons.--the delightful what this, an-achies soserveem
thereze the sour the di, bsy over origin,
artations and part to the balment refetine, oar grue
