# Text Generation Model: Process We Can Follow
Text Generation Models have various applications, such as content creation, chatbots, automated story writing, and more. They often utilize advanced Machine Learning techniques, particularly Deep Learning models like Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), and Transformer models like GPT (Generative Pre-trained Transformer).

Below is the process we can follow for the task of building a Text Generation Model:

Understand what you want to achieve with the text generation model (e.g., chatbot responses, creative writing, code generation).
Consider the style, complexity, and length of the text to be generated.
Collect a large dataset of text that’s representative of the style and content you want to generate.
Clean the text data (remove unwanted characters, correct spellings), and preprocess it (tokenization, lowercasing, removing stop words if necessary).
Choose a deep neural network architecture to handle sequences for text generation.
Frame the problem as a sequence modelling task where the model learns to predict the next words in a sequence.
Use your text data to train the model.
For this task, we can use the Tiny Shakespeare dataset because of two reasons:


It’s available in the format of dialogues, so you will learn how to generate text in the form of dialogues.
Usually, we need huge textual datasets for building text generation models. The Tiny Shakespeare dataset is already available in the tensorflow datasets, so we don’t need to download any dataset externally.

In [20]:
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
import os

In [4]:
#Load the tiny shakespeare dataset
dataset, info = tfds.load('tiny_shakespeare', with_info = True, as_supervised = False)

Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/tiny_shakespeare/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/tiny_shakespeare/incomplete.619TJD_1.0.0/tiny_shakespeare-train.tfrecord*.…

Generating validation examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/tiny_shakespeare/incomplete.619TJD_1.0.0/tiny_shakespeare-validation.tfrec…

Generating test examples...: 0 examples [00:00, ? examples/s]

Shuffling /root/tensorflow_datasets/tiny_shakespeare/incomplete.619TJD_1.0.0/tiny_shakespeare-test.tfrecord*..…

Dataset tiny_shakespeare downloaded and prepared to /root/tensorflow_datasets/tiny_shakespeare/1.0.0. Subsequent calls will reuse this data.


2025-06-04 07:42:29.414789: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


Our dataset contains data in a textual format. Language models need numerical data, so we’ll convert the text to sequences of integers. We’ll also create sequences for training:

In [5]:
#get the text from the dataset
text = next(iter(dataset['train']))['text'].numpy().decode('utf-8')

In [6]:
#create a mapping from unique characters to indices

vocab = sorted(set(text))
char2idx ={char : idx for idx, char in enumerate(vocab)}
idx2char = np.array(vocab)

In [7]:
#numerically represent the characters
text_as_int = np.array([char2idx[c] for c in text])

In [24]:
#create training examples and targets
seq_length = 100
example_per_epoch = len(text)// (seq_length + 1)

In [25]:
#create training sequences
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length + 1,drop_remainder = True)

For each sequence, we will now duplicate and shift it to form the input and target text by using the map method to apply a simple function to each batch:

In [10]:
def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

In [11]:
dataset = sequences.map(split_input_target)

Now, we’ll shuffle the dataset and pack it into training batches:

In [12]:
#batch size and buffer size
BATCH_SIZE = 64
BUFFER_SIZE = 10000

In [13]:
dataset = (dataset
           .shuffle(BUFFER_SIZE)
           .batch(BATCH_SIZE, drop_remainder = True)
          .prefetch(tf.data.experimental.AUTOTUNE)) 

Now, we’ll use a simple Recurrent Neural Network model with a few layers to build the model:

In [14]:
#length of the vocabulary
vocab_size = len(vocab)

In [15]:
#the embedding dimension
embedding_dim = 256

In [16]:
#number of RNN UNITS
rnn_units = 1024

In [17]:
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(None,), batch_size=batch_size),  
        tf.keras.layers.Embedding(vocab_size, embedding_dim),
        tf.keras.layers.LSTM(rnn_units, return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform'),
        tf.keras.layers.Dense(vocab_size)
    ])
    return model

model = build_model(vocab_size, embedding_dim,rnn_units,BATCH_SIZE)

We’ll now choose an optimizer and a loss function to compile the model:

In [18]:
def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits,from_logits= True)
model.compile(optimizer = 'adam',loss= loss)

We’ll now train the model:

In [21]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.weights.h5")

checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True
)
#train the model 
EPOCHS = 10
history = model.fit(dataset, epochs = EPOCHS, callbacks = [checkpoint_callback] )

Epoch 1/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m588s[0m 4s/step - loss: 3.1798
Epoch 2/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m623s[0m 4s/step - loss: 2.1044
Epoch 3/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m613s[0m 4s/step - loss: 1.8118
Epoch 4/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m570s[0m 4s/step - loss: 1.6523
Epoch 5/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m579s[0m 4s/step - loss: 1.5526
Epoch 6/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m584s[0m 4s/step - loss: 1.4849
Epoch 7/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m583s[0m 4s/step - loss: 1.4387
Epoch 8/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m588s[0m 4s/step - loss: 1.3990
Epoch 9/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m598s[0m 4s/step - loss: 1.3641
Epoch 10/10
[1m155/155[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m583s[0m 4s

In [28]:
import os
print("Files in checkpoint_dir:", os.listdir(checkpoint_dir))

Files in checkpoint_dir: ['ckpt_10.weights.h5', 'ckpt_1.weights.h5', 'ckpt_6.weights.h5', 'ckpt_3.weights.h5', 'ckpt_8.weights.h5', 'ckpt_7.weights.h5', 'ckpt_4.weights.h5', 'ckpt_5.weights.h5', 'ckpt_9.weights.h5', 'ckpt_2.weights.h5']


After training, we can now use the model to generate text. First, we will restore the latest checkpoint and rebuild the model with a batch size of 1:

In [29]:
# Manually find the latest `.weights.h5` file
ckpt_files = [f for f in os.listdir(checkpoint_dir) if f.endswith('.weights.h5')]
ckpt_files.sort(key=lambda x: int(x.split('_')[1].split('.')[0]))  # Sort by checkpoint number
latest_ckpt = ckpt_files[-1]
latest_ckpt_path = os.path.join(checkpoint_dir, latest_ckpt)

# Load model and weights
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(latest_ckpt_path)
model.build(tf.TensorShape([1, None]))

print(f"✅ Loaded weights from: {latest_ckpt_path}")


✅ Loaded weights from: ./training_checkpoints/ckpt_10.weights.h5


In [36]:
for i, layer in enumerate(model.layers):
    print(i, layer.name, type(layer))

0 embedding_2 <class 'keras.src.layers.core.embedding.Embedding'>
1 lstm_2 <class 'keras.src.layers.rnn.lstm.LSTM'>
2 dense_2 <class 'keras.src.layers.core.dense.Dense'>


Now, to generate text, we’ll input a seed string, predict the next character, and then add it back to the input, continuing this process to generate longer text:

In [38]:
def generate_text(model, start_string):
    num_generate = 1000
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval,0)
    text_generated = []
    model.layers[1].reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        predictions= tf.squeeze(predictions, 0)
        predicted_id = tf.random.categorical(predictions, num_samples = 1)[-1,0].numpy()
        input_eval = tf.expand_dims([predicted_id],0)
        text_generated.append(idx2char[predicted_id])
    return (start_string + ''.join(text_generated))
print(generate_text(model, start_string = u"Queen: So, let's end things"))

Queen: So, let's end things
are this well-wars Crie than his good words,
Haviness in so name eccuses,
Which on his father's a blood far ears,
Out a' the presure of your other light
This doth scipt me on, strong from this far
Triumphanent exercide beats and brother to our court!
And Norfortune! Romeo! he is a devil
To make seems be striptany:
I think she dy crabll himself broke
Hath cherking's prince; what, trushor a whil answer
Gold perfil in the time in a longer venta.

DUKE VINCENTIO:
Kind suffer me; honour none devounds,
Who field
About him, and all the scovern
thee winter:
Wherein time the bettee of the senators,
A gentleman since the mountendest, I
knock thy life thou thou would for her flims four man,
Who were thee on abure, my batcent heart me;
The searing Tybalt, Petrictio, being hole
That but shook their own oback for you anot?
Whief there is the pripuly of the cause.
I must be sign means to my desire swole,
Ay, that's nothing live, one, it would Hulced Angelo;
Wh cheak and lo

The generate_text function in the above code uses a trained Recurrent Neural Network model to generate a sequence of text, starting with a given seed phrase (start_string). It converts the seed phrase into a sequence of numeric indices, feeds these indices into the model, and then iteratively generates new characters, each time using the model’s most recent output as the input for the next step. This process continues for a specified number of iterations (num_generate), resulting in a stream of text that extends from the initial seed.

The function employs randomness in character selection to ensure variability in the generated text, and the final output is a concatenation of the seed phrase with the newly generated characters, typically reflecting the style and content of the training data used for the model.

Summary


So, this is how you can build a Text Generation Model with Deep Learning using Python. Text Generation Models have various applications, such as content creation, chatbots, automated story writing, and more. They often utilize advanced Machine Learning techniques, particularly Deep Learning models like Recurrent Neural Networks (RNNs), Long Short-Term Memory Networks (LSTMs), and Transformer models like GPT (Generative Pre-trained Transformer)