# Generating Texts with Recurrent Neural Networks in Python


Recurrent neural networks are very useful when it comes to the processing of sequential data like text. In this i am going to use LSTM neural networks (Long-Short-Term Memory) in order to teach the model to write texts like Shakespeare. we can train our model to guess the next letter based on the letters that came before.

## Data Loading

In [9]:
import pandas as pd
import numpy as np
import tensorflow as tf

In [12]:
filepath= tf.keras.utils.get_file('shakespeare.txt',
        'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
text = open(filepath, 'rb').read().decode(encoding='utf-8')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


## Data Preparation

The problem that we have right now with our data is that we are dealing with text. We cannot just train a neural network on letters or sentences. We need to convert all of these values into numerical data. So we have to come up with a system that allows us to convert the text into numbers, then predict specific numbers based on that data and then again convert the resulting numbers back into text.

In [16]:
text = open(filepath, 'rb').read().decode(encoding='utf-8').lower()

In this case I immediately convert all of the text into lower-case so that we have fewer possible choices. Also I am not going to use the whole text file as training data.

In [19]:
characters = sorted(set(text))

char_to_index = dict((c,i) for i, c in enumerate(characters))
index_to_char = dict((i,c) for i, c in enumerate(characters))

Now we create a sorted set of all the unique characters that occur in the text. In a set no value appears more than once, so this is a good way to filter out the characters. After that we define two structures for converting the values. Both are dictionaries that enumerate the characters. In the first one, the characters are the keys and the indices are the values. In the second one it is the other way around. Now we can easily convert a character into a unique numerical representation and vice versa.



In [23]:
SEQ_LENGTH = 40
STEP_SIZE = 3

sentences = []
next_char = []

In this next step, we define how long a sequence shall be and also how many characters we will step further to start the next sentence. What we try to do here is to take sentences and then save the next letter as the training data.

In [26]:
for i in range(0, len(text) - SEQ_LENGTH, STEP_SIZE):
    sentences.append(text[i: i + SEQ_LENGTH])
    next_char.append(text[i + SEQ_LENGTH])

We iterate through the whole text and gather all sentences and their next character. This is the training data for our neural network. Now we just need to convert it into a numerical format.



In [31]:
x = np.zeros((len(sentences), SEQ_LENGTH,
              len(characters)), dtype=np.bool_) 
y = np.zeros((len(sentences),
              len(characters)), dtype=np.bool_)  

for i, satz in enumerate(sentences):
    for t, char in enumerate(satz):
        x[i, t, char_to_index[char]] = 1
    y[i, char_to_index[next_char[i]]] = 1

We are creating two NumPy arrays full of zeros. The data type of those is bool, which stands for boolean. Wherever a character appears in a certain sentence at a certain position we will set it to a one or a True. We have one dimension for the sentences, one dimension for the positions of the characters within the sentences and one dimension to specify which character is at this position.



## Building RNN model

Now that the training data is prepared, we are going to build the neural network.The following specific tools that we are going to use is imported.

In [35]:
import random
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.layers import Activation, Dense, LSTM


We will use Sequential for our model, Activation, Dense and LSTM for our layers and RMSprop for optimization during the compilation of our model. LSTM stands for long-short-term memory and is a type of recurrent neural network layer. It might be called the memory of our model. This is crucial, since we are dealing with sequential data.

In [41]:
from keras.layers import Input

model = Sequential([
    Input(shape=(SEQ_LENGTH, len(characters))),
    LSTM(128),
    Dense(len(characters)),
    Activation('softmax')
])

The inputs immediately flow into our LSTM layer with 128 neurons. Our input shape is the length of a sentence times the amount of characters. The character which shall follow will be set to True or one. This layer is followed by a Dense hidden layer, which just increases complexity. In the end we use the Softmax activation function in order to make our results add up to one. This gives us the probability for each character.

Now we compile the model and train it with our training data that we prepared above. We choose a batch size of 256 (which you can change if you want) and four epochs. This means that our model is going to see the same data four times.

In [45]:
model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(learning_rate=0.01))

model.fit(x, y, batch_size=256, epochs=4)

Epoch 1/4
[1m1453/1453[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m55s[0m 38ms/step - loss: 2.2943
Epoch 2/4
[1m1453/1453[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 44ms/step - loss: 1.6273
Epoch 3/4
[1m1453/1453[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m65s[0m 45ms/step - loss: 1.5155
Epoch 4/4
[1m1453/1453[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 46ms/step - loss: 1.4643


<keras.src.callbacks.history.History at 0x308554bc0>

## Helper Function

Now that the model is trained but it will only output the probabilities for the next character so i have added additional functions like helper function "sample" to make our script generate some reasonable text. The helper function i took as a reference from : https://keras.io/examples/lstm_text_generation/

In [50]:
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

It basically just picks one of the characters from the output. As parameters it takes the result of the prediction and a temperature. This temperature indicates how risky the pick shall be. If we have a high temperature, we will pick one of the less likely characters. A low temperature will cause a conservative choice.

## Text Generation

In [52]:
def generate_text(length, temperature):
    start_index = random.randint(0, len(text) - SEQ_LENGTH - 1)
    generated = ''
    sentence = text[start_index: start_index + SEQ_LENGTH]
    generated += sentence
    for i in range(length):
        x_predictions = np.zeros((1, SEQ_LENGTH, len(characters)))
        for t, char in enumerate(sentence):
            x_predictions[0, t, char_to_index[char]] = 1

        predictions = model.predict(x_predictions, verbose=0)[0]
        next_index = sample(predictions,
                                 temperature)
        next_character = index_to_char[next_index]

        generated += next_character
        sentence = sentence[1:] + next_character
    return generated

We basically choose a random starting position within the text because we need some starting text in order to predict the “next” character. So basically the first SEQ_LENGTH amount of characters will be copied from the original text. But we could just cut them off afterwards and we would end up with text that is completely generated by our neural network.

So we choose some random starting text and then we run a for loop in the range of the length that we want. We can generate a text with 100 characters or one with 20,000. We then convert our sentence into the desired input format that we already talked about. 

The sentence is now an array with ones or Trues, wherever a character occurs. Then we use the predict method of our model, to predict the likelihoods of the next characters. Then we make use of our sample helper function. In this function we also have a temperature parameter, which we can pass to that helper function. Of course the result we get needs to be converted from the numerical format into a readable character. Once this is done, we add the character to our generated text and repeat the process, until we reach the desired length.



## RESULTS

In [54]:
print(generate_text(300, 0.2))
print(generate_text(300, 0.4))
print(generate_text(300, 0.5))
print(generate_text(300, 0.6))
print(generate_text(300, 0.7))
print(generate_text(300, 0.8))

cutio, peace!
thou talk'st of nothing.

brutus:
there is thee the man the wing to thee shall
be thee than they make thee than they shall
be the duke of thee than they are thee shall
be the prince to the head thee than they love
thee the son that thou not the brother with his cruell,
that we will not do so him the war the hands than they a
thou a kingdom; all of you allegiance:
there is the rest to this morn to thee camillo
england of the end there show i thank thee shall
be so hath i look and think thee show i do not thee:
what you do be the end men to thee to thee be
the traitor to thee down do thee there is streef.

menenius:
i will not deeds thy son to have so head
thee
s before.

first citizen:
come, come, were i do but thee between hear
thee seems as they she hath and show thee be
comes thee there is not nothing have so look to thee.

provost:
what in the lords so thunk the prince and think there
shall not duke of true to thee shall be a come
and there there is thee be come thee t