Script to generate text from an already trained network (with lstm_train.py)
--By word--
It is necessary to at least provide the trained model and the vocabulary file
(generated also by lstm_train.py).

In [1]:
import numpy as np
import re
from keras.models import load_model

In [2]:
def validate_seed(vocabulary, seed):
    """Validate that all the words in the seed are part of the vocabulary"""
    print("\nValidating that all the words in the seed are part of the vocabulary: ")
    seed_words = seed.split(" ")
    valid = True
    for w in seed_words:
        print(w, end="")
        if w in vocabulary:
            print(" ✓ in vocabulary")
        else:
            print(" ✗ NOT in vocabulary")
            valid = False
    return valid


# Functions from keras-team/keras/blob/master/examples/lstm_text_generation.py
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def generate_text(model, indices_word, word_indices, seed,
                  sequence_length, diversity, quantity):
    """
    Similar to lstm_train::on_epoch_end
    Used to generate text using a trained model
    :param model: the trained Keras model (with model.load)
    :param indices_word: a dictionary pointing to the words
    :param seed: a string to be used as seed (already validated and padded)
    :param sequence_length: how many words are given to the model to generate
    :param diversity: is the "temperature" of the sample function (usually between 0.1 and 2)
    :param quantity: quantity of words to generate
    :return: Nothing, for now only writes the text to console
    """
    sentence = seed.split(" ")
    print("----- Generating text")
    print('----- Diversity:' + str(diversity))
    print('----- Generating with seed:\n"' + seed)

    print(seed)
    for i in range(quantity):
        x_pred = np.zeros((1, sequence_length, len(vocabulary)))
        for t, word in enumerate(sentence):
            x_pred[0, t, word_indices[word]] = 1.

        preds = model.predict(x_pred, verbose=0)[0]
        next_index = sample(preds, diversity)
        next_word = indices_word[next_index]

        sentence = sentence[1:]
        sentence.append(next_word)

        print(" "+next_word, end="")
    print("\n")

In [4]:
model = load_model('../save/LSTM_simple_model_v3.h5')
print("\nSummary of the Network: ")
model.summary()


Summary of the Network: 
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bidirectional (Bidirectiona  (None, 256)              29765632  
 l)                                                              
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense (Dense)               (None, 28939)             7437323   
                                                                 
 activation (Activation)     (None, 28939)             0         
                                                                 
Total params: 37,202,955
Trainable params: 37,202,955
Non-trainable params: 0
_________________________________________________________________


In [6]:
vocabulary = open('../save/vocab_v3.txt', "r", encoding="utf8").readlines()
# remove the \n at the end of the word, except for the \n word itself
vocabulary = [re.sub(r'(\S+)\s+', r'\1', w) for w in vocabulary]
vocabulary = sorted(set(vocabulary))

word_indices = dict((c, i) for i, c in enumerate(vocabulary))
indices_word = dict((i, c) for i, c in enumerate(vocabulary))

In [None]:
sequence_length = 20
diversity = 1
quantity = 200

In [None]:
# seed = "my name is"
# seed = 'be mine tonight tomorrow will be too late'
# seed = 'have i ever know what you do not what i'
# eed = 'this is the beginning of my life of me and my face'

In [None]:
seed = 'you are the sunshine of my life that s why i ll always be around you are the apple of'


In [None]:
if validate_seed(vocabulary, seed):
    print("\nSeed is correct.\n")
    # repeat the seed in case is not long enough, and take only the last elements
    seed = " ".join((((seed+" ")*sequence_length)+seed).split(" ")[-sequence_length:])
    generate_text(
        model, indices_word, word_indices, seed, sequence_length, diversity, quantity
    )
else:
    print('\033[91mERROR: Please fix the seed string\033[0m')
    exit(0)


Validating that all the words in the seed are part of the vocabulary: 
you ✓ in vocabulary
are ✓ in vocabulary
the ✓ in vocabulary
sunshine ✓ in vocabulary
of ✓ in vocabulary
my ✓ in vocabulary
life ✓ in vocabulary
that ✓ in vocabulary
s ✓ in vocabulary
why ✓ in vocabulary
i ✓ in vocabulary
ll ✓ in vocabulary
always ✓ in vocabulary
be ✓ in vocabulary
around ✓ in vocabulary
you ✓ in vocabulary
are ✓ in vocabulary
the ✓ in vocabulary
apple ✓ in vocabulary
of ✓ in vocabulary

Seed is correct.

----- Generating text
----- Diversity:1
----- Generating with seed:
"you are the sunshine of my life that s why i ll always be around you are the apple of
you are the sunshine of my life that s why i ll always be around you are the apple of
 you i'm something 
 but you sitting in the sleep of the bridge 
 ring them stand and could send me down beyond your mind, 
 i get bell lazy this head is a chance 
 i ain't this for given to show 
 
 they just get to the promised 
 'cause i got to know, that i g