## Recurrent Neural Networks: Shakespeare task

Bronwyn Bowles-King

### Introduction

This project will produce a model called fake_shake, which attempts to mimic the dramatic writing style of Shakespeare from a sample of his original plays. It relies on a Recurrent Neural Network (RNN) created in Python with TensorFlow.

### 0. Preparation steps

In [2]:
# Import required libraries
import numpy as np
import requests
import random
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.layers import Input

# Import Shakespeare dataset
url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"

response = requests.get(url)
text = response.text

print("Length of text: ", len(text))

Length of text:  1115394


### 1. Define a function to generate text from the trained RNN

The most critical part of the function defined below to note is the for loop (for i in range (length)) where the model predicts the next character and appends it to the seed sequence it started with (seed_text). The model needs a prompt to begin with from Shakespeare's original works used to train the model.

The function works in iterations with a current sequence of characters (the 'window') that it prepares to feed to the model in the required format (char_to_int). The trained model is a type of Long Short-Term Memory (LSTM) model that predicts the probability of the next likely character.

The model does not generate whole words at a time, just one character. The line np.argmax(prediction) chooses the character with the highest predicted probability. This predicted character is translated back from its index to its actual character (int_to_char) and the window is updated with this new character. This process repeats until the required number of characters is generated.

What is special about RNNs such as this compared to feedforward networks is that it always bases its predictions on previously generated sequences, thus having a type of 'memory' or context-awareness that makes it seem convincingly intelligent. However, like AI we can currently use, it is based on mathematical and computational functions.

In [3]:
def fake_shake(model, seed_text, char_to_int, int_to_char, length=2000):

    # Prepare initial pattern from seed_text
    seq_length = len(seed_text)
    pattern = [char_to_int[char] for char in seed_text]
    generated = []

    for i in range(length):
        # Prepare input of shape (1, seq_length, 1)
        x = np.reshape(pattern, (1, seq_length, 1))
        x = x / float(len(char_to_int))

        # Model to predict the next character
        prediction = model.predict(x, verbose=0)
        index = np.argmax(prediction)
        result = int_to_char[index]

        # Append result and update the pattern
        generated.append(result)
        pattern.append(index)
        # Move to the next window
        pattern = pattern[1:]

    return seed_text + ''.join(generated)

### 2. Pre-process text for encoding

The characters in the Shakespeare sample are mapped with a dictionary to integers that will represent them. The program will then create overlapping sequences of the set length (seq_length) that work across the encoded text to train the model to predict the next character (seq_out) when given an input sequence (seq_in).

In [4]:
chars = sorted(list(set(text)))
print(f'Unique characters: {len(chars)}')

# Map characters to unique integers and back again
char_to_int = {ch: i for i, ch in enumerate(chars)}
int_to_char = {i: ch for i, ch in enumerate(chars)}

# Encode text as integers
encoded_text = [char_to_int[ch] for ch in text]

# Number of input sequences
seq_length = 100

X = []
y = []

# Iterate through encoded text to create sequences and target data
for i in range(0, len(encoded_text) - seq_length):
    seq_in = encoded_text[i:i + seq_length]
    seq_out = encoded_text[i + seq_length]
    X.append(seq_in)
    y.append(seq_out)

print(f'Number of sequences: {len(X)}')

Unique characters: 65
Number of sequences: 1115294


Next, the input data (X) is reshaped and normalised to fit the requirements of the training and model. The data needs to be a 3D tensor with sample, time step, and features dimensions. The samples are the number of independent sequences in the dataset. 100 characters is one sample in this model. Time steps are the length of a sequence. There are 100 time steps per sample because there are sequences of 100 characters at a time. The features dimension is the number of features at each time step. One-hot encoding is applied for the target variable (y).

In [5]:
# Reshape X to 3D tensor [samples, time steps, features]
X_reshaped = np.reshape(X, (len(X), seq_length, 1))

# Normalise X data
X_normalized = X_reshaped / float(len(chars))

# One-hot encode y (output) variable
y_categorical = to_categorical(y, num_classes=len(chars))

print("X_reshaped shape:", X_reshaped.shape)
print("y_categorical shape:", y_categorical.shape)

X_reshaped shape: (1115294, 100, 1)
y_categorical shape: (1115294, 65)


### 3. Define and compile the model

The code below now defines the simple RNN model for character-level text generation using an LSTM layer with 256 hidden units or 'neurons'. The LSTM is suited to learning sequences in steps which is best for text data.

The input_shape argument tells the model the expected shape of the input data. A fully connected (Dense) layer is added with the same number of unique characters (65) in the text dataset from Shakespeare's works, including upper and lowercase characters, numbers, punctuation, etc.

The activation function applied is the softmax one to turn the output into a probability distribution over all these possible characters. The model will thus predict across the probability of each of these 65 characters which one is most likely to go next.

The last line of code below prepares the model for training by specifying how it should learn (Adam optimiser), how mistakes are quantified (categorical cross‑entropy loss), and how to monitor progress (accuracy score).

The loss function is categorical cross‑entropy for multi‑class classification problem such as predicting the next character out of 65 possiblities. The model uses this function to measure the difference between its predictions and the actual character in training examples. Over training, the difference will be minimised.

Adaptive Moment Estimation (Adam) is an optimisation algorithm applied here that updates the model's weights and then minimises the loss function over time. It changes the learning rate during training to help the model improve more quickly.

In [8]:
model = Sequential()
model.add(Input(shape=(X_normalized.shape[1], X_normalized.shape[2])))
model.add(LSTM(256))
model.add(Dense(len(chars), activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### 4. Training routine

The model is now trained for 10 rounds (epochs) - as per the instructions for this task, although this number of rounds is too few to train the model properly. It was found that changing to the GPU runtime type in Google Colab sped up the process. This reduced training time from over 10 hours, which did not complete, to a few minutes.

In [9]:
history = model.fit(X_normalized, y_categorical, epochs=10, batch_size=128)

Epoch 1/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m127s[0m 14ms/step - accuracy: 0.1973 - loss: 2.9543
Epoch 2/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 15ms/step - accuracy: 0.2756 - loss: 2.5696
Epoch 3/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 15ms/step - accuracy: 0.3061 - loss: 2.4324
Epoch 4/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m140s[0m 14ms/step - accuracy: 0.3232 - loss: 2.3504
Epoch 5/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 14ms/step - accuracy: 0.3378 - loss: 2.2902
Epoch 6/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 14ms/step - accuracy: 0.3518 - loss: 2.2391
Epoch 7/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 14ms/step - accuracy: 0.3664 - loss: 2.1914
Epoch 8/10
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 14ms/step - accuracy: 0.3772 - loss: 2.1548


### 5. Generate synthetic Shakespearean dialogue

Using the fake_shake function previously defined (section 1), we can now request the trained RNN model to generate text. The model needs seed text as a starting point or prompt and then it generates words until the target character count is reached.

In this case, I requested 1 000 characters to ensure it reached the minimum word count of 100. The model does not perform very well and this was expected. We can see why in the output from the previous cell as the model only achieved an accuracy of 39% over 10 epochs.

In [10]:
# Select seed_text from encoded_text as starting point for the model
start = random.randint(0, len(encoded_text) - seq_length - 1)
seed_text = ''.join([int_to_char[i] for i in encoded_text[start:start+seq_length]])

# Run the function
monologue = fake_shake(model, seed_text, char_to_int, int_to_char, length=1000)
print(monologue)

'aged custom,
But by your voices, will not so permit me;
Your voices therefore.' When we granted thae,
Io the pooe seat thet whth the paaee of thee,
And then the world the world the whrl the world,
And then the world the world shel then the world,
And then the world the world shell toeak the soateh
That thet wh lave to the mor the poieters shat
That the whsl sooe thet whth the paaee of thee,
And then the world the world shel then the world,
And then the world the world shel then the world,
And then the world the world shell toeak the soateh
That thet wh lave to the mor the poieters shat
That the whsl sooe thet whth the paaee of thee,
And then the world the world shel then the world,
And then the world the world shel then the world,
And then the world the world shell toeak the soateh
That thet wh lave to the mor the poieters shat
That the whsl sooe thet whth the paaee of thee,
And then the world the world shel then the world,
And then the world the world shel then the world,
And then th

### 6. Retrain the model for 30 epochs and generate text again

The model is now retrained for 30 epochs to see if it will perform better. The model has improved as there is less repetition, but can be improved further (Trekhleb, 2020). The model is not complex enough to create more convincing text in the style of Shakespeare. However, this exercise has demonstrated what is possible with an RNN.

In [11]:
history = model.fit(X_normalized, y_categorical, epochs=30, batch_size=128)

Epoch 1/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 14ms/step - accuracy: 0.4006 - loss: 2.0668
Epoch 2/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 14ms/step - accuracy: 0.4063 - loss: 2.0468
Epoch 3/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 14ms/step - accuracy: 0.4133 - loss: 2.0232
Epoch 4/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 14ms/step - accuracy: 0.3947 - loss: 2.1053
Epoch 5/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 14ms/step - accuracy: 0.4195 - loss: 1.9962
Epoch 6/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 14ms/step - accuracy: 0.4253 - loss: 1.9769
Epoch 7/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 14ms/step - accuracy: 0.4313 - loss: 1.9586
Epoch 8/30
[1m8714/8714[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m123s[0m 14ms/step - accuracy: 0.4319 - loss: 1.9498


**References**

Geeks4Geeks. (2025). ML | ADAM (Adaptive Moment Estimation) Optimization. https://www.geeksforgeeks.org/machine-learning/adam-adaptive-moment-estimation-optimization-ml

HyperionDev. (2025). Build a Neural Network. Course materials. Private repository, GitHub.

HyperionDev. (2025). Neural Networks. Course materials. Private repository, GitHub.

HyperionDev. (2025). Recurrent Neural Networks. Course materials. Private repository, GitHub.

Karpathy, A. (2016). Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy. GitHub. https://gist.github.com/karpathy/d4dee566867f8291f086

Kithmanthie, R. (2025). Predicting the Next Character with RNN: A Simple Introduction Using Shakespeare's Text. Medium. https://medium.com/@ritharaedirisinghe/predicting-the-next-character-with-rnn-a-simple-introduction-using-shakespeares-text-88e62550ac17

PyTorch. (2024). Running Tutorials in Google Colab. https://docs.pytorch.org/tutorials/beginner/colab.html

TensorFlow. (n.d.). Text generation with an RNN.
https://www.tensorflow.org/text/tutorials/text_generation

TensorFlow. (2023). Keras: The high-level API for TensorFlow. https://www.tensorflow.org/guide/keras

Trekhleb, O. (2020). Shakespeare Text Generation (using RNN LSTM). Google Colab. https://colab.research.google.com/github/trekhleb/machine-learning-experiments/blob/master/experiments/text_generation_shakespeare_rnn/text_generation_shakespeare_rnn.ipynb