# RNN Text Genrator

> An experimental neural text generator that uses Recurrent Neural Networks (RNNs) to craft original poems and play-style scripts. Trained on existing literary works, the model learns rhythmic and structural patterns of language, enabling it to generate creative, coherent text one character at a time. It explores how simple sequence models can emulate the cadence of human writing — blending art and machine learning in a single experiment.

This is based on: [TensorFlow Text Genration](https://www.tensorflow.org/text/tutorials/text_generation)

## Import

In [1]:
from keras.preprocessing import sequence
import keras
import tensorflow as tf
import os
import numpy as np
import glob

## Dataset

> This project uses the **Shakespeare dataset**, a publicly available text corpus containing all of William Shakespeare's plays and poems. It's commonly used for character-level text generation tasks because it contains diverse sentence structures, emotions, and dialogue formats — making it ideal for learning long-term language dependencies with RNNs.

In [2]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt
[1m1115394/1115394[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


### Loading Your Own Data

> To load your own data you'll need to upload a file from the dialog below. Then you'll need to follow the steps from above but load in this new file instead.

In [3]:
from google.colab import files

uploaded = files.upload()
if uploaded:
    path_to_file = list(uploaded.keys())[0]
    print(f"✅ Uploaded: {path_to_file}")
else:
    print("⚠️ No file uploaded — using default dataset instead.")
    path_to_file = tf.keras.utils.get_file(
        'shakespeare.txt',
        'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt'
    )

⚠️ No file uploaded — using default dataset instead.


### Read Content of Data

> Let's look at the content of file

In [4]:
# Read, then decode for py2 compat.
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# length of text is the number of characters in it
print(f'Length of text: {len(text)} characters')

Length of text: 1115394 characters


> Take a look at the first 250 characters in text

In [5]:
print(text[:250])

First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you know Caius Marcius is chief enemy to the people.



> The unique characters in the file

In [6]:
vocab = sorted(set(text))
print(f'{len(vocab)} unique characters')

65 unique characters


### Processing The Data

> Before training the RNN model, we need to process the raw text data into a numerical form that the model can understand. This step involves **encoding** the text into numbers and later **decoding** the model’s predictions back into text.

#### Encoding

> Encoding converts text (characters or words) into numerical representations. Each unique token is mapped to an integer index, forming sequences that can be fed into the neural network.

In [7]:
vocab = sorted(set(text))
# Creating a mapping from unique characters to indices
char2idx = {u: i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

def text_to_int(text):
  return np.array([char2idx[c] for c in text])

text_as_int = text_to_int(text)

> Let's look at how part of our text is encoded

In [8]:
print("Text:", text[:13])
print("Encoded:", text_to_int(text[:13]))

Text: First Citizen
Encoded: [18 47 56 57 58  1 15 47 58 47 64 43 52]


#### Decoding

> Decoding reverses the process — it converts numeric predictions from the model back into human-readable text.

In [9]:
def int_to_text(ints):
  try:
    ints = ints.numpy()
  except:
    pass
  return ''.join(idx2char[ints])

print(int_to_text(text_as_int[:13]))

First Citizen


### Creating Training Examples

> To train the model effectively, the text data must be divided into shorter sequences that serve as individual training samples.
Each training example consists of an input sequence and a corresponding target sequence of equal length *(seq_length)*. The target sequence is simply the input shifted one character to the right. For example:  
`input: Hell | output: ello`

The process begins by generating a continuous stream of characters from the text data, which can then be segmented into these input-target pairs.

In [10]:
seq_length = 100 # length of sequence for a training example
examples_per_epoch = len(text)//(seq_length)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

> Using Batch method to turn the strean of characters into batches of desired length.

In [11]:
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

> Spliting the sequences of length of 101 and split them into input and output.

In [12]:
def split_input_target(chunk): # for the example hello
  input_text = chunk[:-1] # hell
  target_text = chunk[1:] # ello
  return input_text, target_text # hell, ello

dataset = sequences.map(split_input_target) # we use map to apply the above function to every entry

> Example

In [13]:
for x, y in dataset.take(2):
  print("\n\nEXAMPLE\n")
  print("INPUT")
  print(int_to_text(x))
  print("\nOUTPUT")
  print(int_to_text(y))



EXAMPLE

INPUT
First Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You

OUTPUT
irst Citizen:
Before we proceed any further, hear me speak.

All:
Speak, speak.

First Citizen:
You 


EXAMPLE

INPUT
are all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you 

OUTPUT
re all resolved rather to die than to famish?

All:
Resolved. resolved.

First Citizen:
First, you k


> Training Batches

In [14]:
BATCH_SIZE = 64
VOCAB_SIZE = len(vocab) # vocab is number of unique characters
EMBEDDING_DIM = 256
RNN_UNITS = 1024

# Buffer size to shuffle the dataset
#  (TF data is designed to work with possibly infinite sequences,
#   so it doesn't attempt to shuffle the entire sequence in memory.
#   Instead, it maintains a buffer in which it shuffles elements.)
BUFFER_SIZE = 10000

# data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True).prefetch(tf.data.experimental.AUTOTUNE)
data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
dataset

<_MapDataset element_spec=(TensorSpec(shape=(100,), dtype=tf.int64, name=None), TensorSpec(shape=(100,), dtype=tf.int64, name=None))>

## Building Model

> This section defines the model as a keras.Model subclass (For details see Making new Layers and Models via subclassing).  
This model has three layers:
>
> - `tf.keras.layers.Embedding`: The input layer. A trainable lookup table that will map each character-ID to a vector with `embedding_dim` dimensions;
>
> - `tf.keras.layers.LSTM` (or `GRU`): Processes sequences of embeddings. Here, we use `rnn_units` to define the hidden state size. Setting `return_sequences=True` ensures the output maintains the sequence length, and `stateful=True` allows the model to carry hidden states across batches.
>
> - `tf.keras.layers.Dense`: The output layer, with `vocab_size` outputs. It outputs one logit for each character in the vocabulary. These are the log-likelihood of each character according to the model.

In [15]:
# def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
#   model = tf.keras.Sequential([
#     tf.keras.layers.Embedding(vocab_size, embedding_dim,
#                               batch_input_size=[batch_size, None]),
#     tf.keras.layers.LSTM(rnn_units,
#                          return_sequences=True,
#                          stateful=True,
#                          recurrent_initializer='glorot_uniform'),
#     tf.keras.layers.Dense(vocab_size)
#   ])
#   return model

# model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
# model.summary()

> Stateless Model

In [16]:
# def build_model_stateless(vocab_size, embedding_dim, rnn_units):
#     return tf.keras.Sequential([
#         # Let the model infer batch size; variable-length sequences allowed via (None,)
#         tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_shape=(None,)),
#         tf.keras.layers.LSTM(units=rnn_units, return_sequences=True, stateful=False,
#                              recurrent_initializer='glorot_uniform'),
#         tf.keras.layers.Dense(vocab_size)
#     ])

# model = build_model_stateless(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS)
# model.summary()

> Statefull model with 64 `batch_size`

In [17]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    # Input layer with fixed batch size
    inputs = tf.keras.Input(batch_shape=(batch_size, None), dtype=tf.int32)

    # Embedding layer (does NOT fix batch size; it inherits from Input)
    x = tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim)(inputs)

    # Stateful LSTM layer
    x = tf.keras.layers.LSTM(
        units=rnn_units,
        return_sequences=True,
        stateful=True,
        recurrent_initializer='glorot_uniform'
    )(x)

    # Dense output layer
    outputs = tf.keras.layers.Dense(vocab_size)(x)

    return tf.keras.Model(inputs, outputs)

# Build the model
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, BATCH_SIZE)
model.summary()

### Creating a Loss Function
> The loss function evaluates how closely the model's predictions match the target data. For a character-level RNN, each timestep predicts a character ID, so `sparse_categorical_crossentropy` is typically used. This function computes the difference between the predicted probability distribution and the true character indices, guiding the model to improve during training.

In [18]:
data = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

for input_example_batch, target_example_batch in data.take(1):
    # shape: (BATCH_SIZE, sequence_length)
    example_batch_predictions = model(input_example_batch)
    print(example_batch_predictions.shape, "# (batch_size, sequence_length, vocab_size)")

(64, 100, 65) # (batch_size, sequence_length, vocab_size)


In [19]:
# we can see that the prediction is an array of 64 arrays, one for each entry in the batch
print(len(example_batch_predictions))
print(example_batch_predictions)

64
tf.Tensor(
[[[-1.8967954e-04 -4.8379302e-03  4.4216500e-05 ... -2.6844705e-03
   -3.8732965e-03 -1.5947804e-03]
  [ 3.5951680e-03 -4.5101284e-03 -1.0560903e-03 ... -2.5638705e-03
   -6.3740410e-04 -2.1776652e-03]
  [ 6.3097887e-03 -3.9878706e-03 -6.6788640e-04 ... -8.4625802e-04
    1.4776374e-03  5.7361205e-03]
  ...
  [-1.0655216e-03 -1.1174589e-02  1.1930048e-02 ...  7.6719783e-03
   -1.3978775e-02  3.3099328e-03]
  [ 2.5860486e-03 -7.2288392e-03  8.2859602e-03 ...  8.6672632e-03
   -1.0147412e-02  7.6457328e-04]
  [ 2.0457371e-03 -8.6569868e-04  5.1542907e-03 ...  4.9958681e-03
   -1.7121035e-03  7.8855893e-03]]

 [[-5.0309522e-04  4.4534202e-03 -1.9659549e-03 ... -2.6231743e-03
    5.7358244e-03  7.7906107e-03]
  [ 7.3923417e-03  2.4798014e-03  8.5042202e-04 ... -1.1502475e-02
    1.3583360e-03 -1.6984475e-03]
  [ 1.0056873e-02  1.3161669e-03  9.1775222e-04 ... -9.2546679e-03
    2.0896192e-03 -2.1223186e-03]
  ...
  [ 2.2501808e-04 -4.5149308e-03  3.1642434e-03 ...  7.2238212e

In [20]:
# lets examine one prediction
pred = example_batch_predictions[0]
print(len(pred))
print(pred)
# this is a 2d array of length 100, where each interior array is the prediction for the next character at each time step

100
tf.Tensor(
[[-1.8967954e-04 -4.8379302e-03  4.4216500e-05 ... -2.6844705e-03
  -3.8732965e-03 -1.5947804e-03]
 [ 3.5951680e-03 -4.5101284e-03 -1.0560903e-03 ... -2.5638705e-03
  -6.3740410e-04 -2.1776652e-03]
 [ 6.3097887e-03 -3.9878706e-03 -6.6788640e-04 ... -8.4625802e-04
   1.4776374e-03  5.7361205e-03]
 ...
 [-1.0655216e-03 -1.1174589e-02  1.1930048e-02 ...  7.6719783e-03
  -1.3978775e-02  3.3099328e-03]
 [ 2.5860486e-03 -7.2288392e-03  8.2859602e-03 ...  8.6672632e-03
  -1.0147412e-02  7.6457328e-04]
 [ 2.0457371e-03 -8.6569868e-04  5.1542907e-03 ...  4.9958681e-03
  -1.7121035e-03  7.8855893e-03]], shape=(100, 65), dtype=float32)


In [21]:
# and finally we'll look at a prediction at the first timestep
time_pred = pred[0]
print(len(time_pred))
print(time_pred)
# and of course its 65 values representing the probability of each character occuring next

65
tf.Tensor(
[-1.8967954e-04 -4.8379302e-03  4.4216500e-05 -1.7776679e-03
  4.8225066e-03 -2.3276508e-03 -5.0586776e-04  6.5414347e-03
 -6.1750491e-03  4.0324097e-03 -7.0799775e-03  8.9374674e-04
 -1.5559488e-03 -4.5747254e-03 -4.5672851e-03  4.5846548e-04
 -6.9041173e-03  9.2828792e-04  9.6362166e-04  4.4138907e-03
 -3.6721840e-03  2.6349446e-03 -3.5190273e-03 -5.1060515e-03
  6.6353788e-04  7.2255940e-04 -1.6945989e-03 -6.6888463e-03
 -8.0346977e-03  2.7390618e-03 -1.2528415e-04 -8.7365072e-04
  9.0455054e-04  2.0543049e-04  2.6783173e-03  6.9034990e-04
 -4.1095726e-03  1.8450888e-03  2.5799291e-03 -9.4007549e-04
  3.3457570e-03 -6.8186629e-03 -5.3894514e-04 -2.5526199e-03
 -2.9710587e-03 -6.2200014e-04  7.3780078e-03  2.7948746e-04
 -3.0212307e-03  3.7137431e-04  2.3847180e-03  8.4244367e-04
 -1.3813739e-04  6.7474874e-04  5.3491215e-03  2.5765444e-03
  3.0667270e-03 -4.2434805e-03 -3.2067641e-03 -3.2252558e-03
 -1.9770078e-03 -1.0076161e-03 -2.6844705e-03 -3.8732965e-03
 -1.594780

In [22]:
# If we want to determine the predicted character we need to sample the output distribution (pick a value based on probabillity)
sampled_indices = tf.random.categorical(pred, num_samples=1)

# now we can reshape that array and convert all the integers to numbers to see the actual characters
sampled_indices = np.reshape(sampled_indices, (1, -1))[0]
predicted_chars = int_to_text(sampled_indices)

predicted_chars  # and this is what the model predicted for training sequence 1

'TsvBHf juRF!nERTGns VZCAwD-fYzZ-$fp-L-pbg$nCj&jnIewgJ!s?-Y sWFtwC&ICIwN,erMa&;mNHcgkeLkCwrVumjLNIb!a'

> So now we need to create a loss function that can compare that output to the expected output and give us some numeric value representing how close the two were.

In [23]:
def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

### Compiling the Model
> Before training, the model must be compiled with an optimizer, loss function, and optional metrics. This configures how the model updates weights and evaluates performance during training.

In [24]:
  model.compile(optimizer='adam', loss=loss)

### Creating Checkpoints
> To prevent losing training progress and to resume training later, we use model checkpoints.  
> A checkpoint saves the model's weights, optimizer state, and training progress after each epoch.

In [25]:
# Directory where checkpoints will be saved
checkpoint_dir = './training_checkpoints'
# Name of the checkpoint files
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.weights.h5")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)

## Training
>With the model compiled and checkpoints set up, we can begin training.
During each epoch, the model learns to predict the next character based on previous ones, updating weights to minimize loss.

In [26]:
history = model.fit(data, epochs=50, callbacks=[checkpoint_callback])

Epoch 1/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 66ms/step - loss: 2.8950
Epoch 2/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 67ms/step - loss: 1.8906
Epoch 3/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 68ms/step - loss: 1.6236
Epoch 4/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 69ms/step - loss: 1.4909
Epoch 5/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 70ms/step - loss: 1.4167
Epoch 6/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 71ms/step - loss: 1.3657
Epoch 7/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 72ms/step - loss: 1.3274
Epoch 8/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 73ms/step - loss: 1.2911
Epoch 9/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 72ms/step - loss: 1.2564
Epoch 10/50
[1m172/172[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15

## Loading the Model

> Before training or generating text, we need to load the pre-defined RNN model and move it to the GPU for faster computation. Ensure that the model architecture matches the saved weights if loading from a checkpoint.

In [27]:
model = build_model(VOCAB_SIZE, EMBEDDING_DIM, RNN_UNITS, batch_size=1)

> Load the latest trained weights from checkpoints and build the model with a flexible input shape for inference or further training.

In [28]:
# Find the latest .weights.h5 file in the checkpoint directory
weight_files = glob.glob(os.path.join(checkpoint_dir, "*.weights.h5"))
if weight_files:
    latest_weights = max(weight_files, key=os.path.getmtime)
    model.load_weights(latest_weights)
    print(f"Loaded weights from {latest_weights}")
else:
    print("No weight files found, starting from scratch.")

# Build the model with a flexible input shape
model.build(tf.TensorShape([1, None]))

Loaded weights from ./training_checkpoints/ckpt_50.weights.h5


> We can load **any checkpoint** we want by specifying the exact file to load.

In [29]:
# checkpoint_num = 10
# checkpoint_path = f"./training_checkpoints/ckpt_{checkpoint_num}.weights.h5"

# # Load weights directly (no tf.train.load_checkpoint)
# model.load_weights(checkpoint_path)
# print(f"Loaded weights from {checkpoint_path}")

# # Build model with flexible input shape
# model.build(tf.TensorShape([1, None]))

In [30]:
# # Recreate the model architecture (same as training)
# model = build_model(
#     vocab_size=len(vocab),
#     embedding_dim=EMBEDDING_DIM,
#     rnn_units=RNN_UNITS,
#     batch_size=1
# )

# # Load the saved weights
# checkpoint_num = 10
# checkpoint_path = f"./training_checkpoints/ckpt_{checkpoint_num}.weights.h5"
# model.load_weights(checkpoint_path)
# print(f"Loaded weights from {checkpoint_path}")

# # Build with flexible input shape
# model.build(tf.TensorShape([1, None]))

## Generating Text

> This function uses the trained RNN model to generate text character by character.
>   
Starting from a seed string, it predicts the next character based on learned patterns and feeds it back into the model to continue generation.
>   
Adjusting the **temperature** value controls creativity—lower values make predictions more deterministic, while higher ones make the output more diverse.

In [31]:
def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Numbers of characters to generate
  num_generate = 800

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting
  temperature = 1.0

  # Here batch_size == 1
  model.reset_states()
  for i in range(num_generate):
    predictions = model(input_eval)
    # remove the batch dimension
    predictions = tf.squeeze(predictions, 0)

    # using a categorical distribution to predict the character returned by the model
    predictions = predictions / temperature
    predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

    # We pass the predicted characters as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)

    text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

In [32]:
def generate_text(model, start_string):
    # Evaluation step (generating text using the learned model)

    # Number of characters to generate
    num_generate = 800

    # Converting the start string to numbers (vectorizing)
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty list to store the results
    text_generated = []

    # Low temperature → more predictable text
    # High temperature → more creative / random text
    temperature = 1.0

    for i in range(num_generate):
        # Get predictions for the current input
        predictions = model(input_eval)

        # Remove batch dimension
        predictions = tf.squeeze(predictions, 0)

        # Adjust predictions by temperature
        predictions = predictions / temperature

        # Sample from the probability distribution
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

        # Use the predicted character as the next input
        input_eval = tf.expand_dims([predicted_id], 0)

        # Append the predicted character to the result
        text_generated.append(idx2char[predicted_id])

    return start_string + ''.join(text_generated)

In [33]:
inp = input("Type a starting string: ")
print(generate_text(model, inp))

Type a starting string: romeo
romeowing all my heir not give
multife:
'Tis virtuous slave,
A glorious angel on; fain be thou rages!

ALONS:
If my shephend hell unswept'd the coats will seem
Of my purpose.'

Nurse:
A man, a goodly lady, and general,
I have forgot your daughter, here 'tistance, which once untainted as I am, it
therein the head to them. Come all to pieces.
We villain, with old Ving Edward's guard!
And kneel not what; I can.

GREMIO:
What's that?

JULIET:
'Tis to bed, with one profound, than might happy in
him: his chair,
And and then, to break an oath with some hour;
Bid her daughter now to be avoided,
Or else a holy man, and not my fare
With brief wench'd and no sees grey so fast?

Second Murderer:
I pray thee, madam:
To be it from thy beauty when they
say, eyebrace of one direct I fear;
But now the Duke of M


## Sources

1. Chollet, François. *Deep Learning with Python*. Manning Publications, 2018.  
2. [**Text Classification with an RNN** - TensorFlow Core](https://www.tensorflow.org/tutorials/text/text_classification_rnn)  
3. [**Text Generation with an RNN** - TensorFlow Core](https://www.tensorflow.org/tutorials/text/text_generation)  
4. [**Understanding LSTM Networks** - Colah's Blog](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)  
5. [**Text Generation with RNNs (TensorFlow & Keras)** - freeCodeCamp.org](https://www.youtube.com/watch?v=tPYj3fFJGjk)