In [1]:
import tensorflow.keras as keras
keras.__version__

"""
Allocate only as much GPU memory as needed for the runtime allocations.
"""
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

# Text generation with LSTM

Recurrent neural networks can be used to generate sequence data, i.e., text, musical notes, paintings.

In this file, we will look at `character-level neural language model`
- take a LSTM layer
- feed it strings of *N* characters extracted from a text corpus
- train it to predict character *N+1*

The output of the model will be a softmax over all possible characters: a probability distribution for the next character.


## The importance of the sampling strategy

Sampling probability from the softmax output of the model is neat: it allows even unlikely characters to be sample some of th time, generating more interesting-looking sentences and sometimes showing creativity by coming up with new, realistic-sounding words that didn't occur in the training data.

In order to control the amount of stochasticity in the sampling process, we'll introduce a paramter called the softmax temperature that characterizes the entropy of the probability distribution used for sampling. Given a `temperature` values, a new probability distribution is computed from the softmax output of the model (the original distribution) by reweighting it in the following way.

In [2]:
"""
Reweighting a probability distribution to a different temperature, using logrithmatic operators
"""

import numpy as np 

def reweight_distribution(orig_distribution, temperature=0.5):
    distribution = np.log(orig_distribution) / temperature
    distribution = np.exp(distribution)
    return distribution / np.sum(distribution)   # re-normalize by the sum.

## Implementing character-level LSTM text generation

The first thing you need is a lot of text data that you can use to learn a language model. Here we ill use some of the writing of Neitzsche. THe lanuage model we will learn will thus be specifically a model of Nietzsche's writing style and topics of choice, rather than a more generic model of the English language.


### Preparing the data

In [3]:
import keras
import numpy as np

path = keras.utils.get_file(
    'nietzsche.txt',
    origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
text = open(path).read().lower()
print('Corpus length:', len(text))

Corpus length: 600893


Next, we will extract partially-overlapping sequences of length maxlen, one-hot encode them and pack them in a 3D Numpy array *x* of shape `(sequences, maxlen, unique_characters)` where 
- sequences == number of sentences, each has a length of maxlen.
- maxlen == Length of extracted character sequences
- unique_characters == number of unique characters in the sequence

Simultaneously, we prepare a array *y* containing the corresponding targets: the one-hot encoded characters that come right after each extracted sequence.

In [4]:
# Extract sequence of 60 characters.
maxlen = 60

# Sample a new sequence every 3 characters.
step = 3

# This holds our extracted sequences
sentences = []

# This holds the targets (ie, the characters right after sentences)
next_chars = []

for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('Number of sequences:', len(sentences))


# List of unique characters in the corpus
chars = sorted(list(set(text)))
print('Unique characters in the entire dataset:', len(chars))

# Dictionary mapping unique characters to their index in `chars`
char_indices = dict((char, chars.index(char)) for char in chars)

# Next, one-hot encode the characters into binary arrays.
print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Number of sequences: 200278
Unique characters in the entire dataset: 57
Vectorization...


In [5]:
sentences[:3]

['preface\n\n\nsupposing that truth is a woman--what then? is the',
 'face\n\n\nsupposing that truth is a woman--what then? is there ',
 'e\n\n\nsupposing that truth is a woman--what then? is there not']

In [6]:
next_chars[:3]

['r', 'n', ' ']

## Building the network

This network is a single LSTM layer followed by a `Dense classfier` and `softmax` over all possible characters. BUt note that recurrent nerual networks aren't the only way to do sequence data generations; 1D convnets also have proven extremely successful at this task in recent times.

In [7]:
from keras import layers

model = keras.models.Sequential()
model.add(layers.LSTM(128, input_shape=(maxlen, len(chars))))
model.add(layers.Dense(len(chars), activation='softmax'))

"""
Since our targets are one-hot encoded, 
we will use categorical_crossentropy as the loss to train the model:
"""

optimizer = keras.optimizers.RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)

## Training the language model and sampling from it

Given a trained model and a seed text snippet, we generate new text by repeatedly:

1. Drawing from the model a probability distribution over the next character given the text available so far
2. Reweighting the distribution to a certain `temperature`
3. Sampling the next character at random according to the `reweighted distribution`
4. Adding the new character at the end of the available text

In [8]:
"""
Reweight the original probability distribution coming out of the model, 
and draw a character index from it (the sampling function):
"""

def sample(preds, temperature=1.0):
    preds       = np.asarray(preds).astype('float64')
    preds       = np.log(preds) / temperature
    exp_preds   = np.exp(preds)
    preds       = exp_preds / np.sum(exp_preds)
    probas      = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

Finally, this is the loop where we repeatedly train and generated text. We start generating text using a range of different temperatures after every epoch. This allows us to see how the generated text evolves as the model starts converging, as well as the impact of temperature in the sampling strategy.

In [9]:
import random
import sys

for epoch in range(1, 60):
    print('+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=++=+=+=+=+=+=')
    print('epoch', epoch)


    ### Training Phase +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
    # Fints the model for one iteration on the data
    # The more iterations the model trains on, the better it gets.
    model.fit(x, y,
              batch_size=128,
              epochs=1)


    ### Generation Phase +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
    # Larger the temperature, more gibberish the generated text become.
    # Smaller the temperature, less creative the generated text is.

    if (epoch+1)%5:  # Printout every 5 epoch
        continue

    # Select a starting sentence at random
    start_index = random.randint(0, len(text) - maxlen - 1)
    root_text = text[start_index: start_index + maxlen]
    print('--- Generating with seed: \n"' + root_text + '"')

    # Experiment with different sampling strategies.
    for temperature in [0.2, 0.5, 1.0, 1.2]:
        print('------ temperature:', temperature)
        sys.stdout.write(root_text)
        generated_text = root_text

        # We generate new character using prior X characters.
        for i in range(400):
            sampled = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(generated_text):
                sampled[0, t, char_indices[char]] = 1.

            preds = model.predict(sampled, verbose=0)[0]
            next_index = sample(preds, temperature)
            next_char = chars[next_index]

            generated_text += next_char
            generated_text = generated_text[1:]

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
        

 fact a world as a man as any historical transfortholy--it may something the course of men to our values. he sough to be reference a man is patience--of the promise of the translices of the each the such a long stimulentless, which there are something it withing--can b
------ temperature: 1.0
e operate most potently upon vanity, these same purposes of sad ome more sofontic to have a dut imiliced yes to him to liberty--and something senseably culture, upind even to spring imaged cas chiracule,
such above a sacrifice.
say nowadaya time a; even all strangesant of mankind and cheincers in the imital, sons
proterring and religion, our framment--this circamicy. the
philosopher." in all actually comnective id, world and begloomy? indignate covs lived 
------ temperature: 1.2
e operate most potently upon vanity, these same purposes of the shume will
a still
rehung,
who suspicion ana
now precisely that
his voco, distrry giving that lives high, we, ye evidacces
of by it not cause sympauntce; we,