<h1 style=\"text-align: center; font-size: 50px;\"> 📜 Text Generation with Neural Networks and Torch</h1>

In this notebook our objective is to demonstrate how to generate text using a character-based RNN and Torch working with a dataset of Shakespeare's writing

# Notebook Overview
- Start Execution
- Install and Import Libraries
- Configure Settings
- Verify Assets
- Get Text Data
- Preparing textual data
- One Hot Encoding
- Creating Training Batches
- Creating the LSTM Model
- Training the Network
- Generating Predictions

## Start Execution

In [None]:
import logging
import time

# Configure logger
logger: logging.Logger = logging.getLogger("run_workflow_logger")
logger.setLevel(logging.INFO)
logger.propagate = False  # Prevent duplicate logs from parent loggers

# Set formatter
formatter: logging.Formatter = logging.Formatter(
    fmt="%(asctime)s - %(levelname)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S"
)

# Configure and attach stream handler
stream_handler: logging.StreamHandler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)

In [None]:
start_time = time.time()  

logger.info("Notebook execution started.")

## Install and Import Libraries

In [None]:
# Standard Library Imports
import warnings
from pathlib import Path
from datetime import datetime

# Third-Party Libraries
import torch
from torch import nn
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt

torch.manual_seed(0)

<torch._C.Generator at 0x7fc3f805bc70>

## Configure Settings

In [2]:
warnings.filterwarnings("ignore")

In [4]:
# ------------------------ Define global experiment and run names to be used throughout the notebook ------------------------
EXPERIMENT_SET = "RNN text generation"
RUN_NAME = "RNN Text Generation"
MODEL_NAME = "dict_torch_rnn_model"
TORCH_MODEL = "dict_torch_rnn_model.pt"
REGISTER_NAME = "Shakespeare_Model"
EXPERIMENT_NAME = "Shakespeare Text Generation"

# ------------------------ Paths ------------------------
DATA_PATH = "../data/shakespeare.txt"
MODEL_DECODER_PATH = "models/decoder.pt"
MODEL_ENCODER_PATH = "models/encoder.pt"
MODEL_PATH = 'models/dict_torch_rnn_model.pt'

## Verify Assets

In [7]:
def log_asset_status(asset_path: str, asset_name: str, success_message: str, failure_message: str) -> None:
    """
    Logs the status of a given asset based on its existence.

    Parameters:
        asset_path (str): File or directory path to check.
        asset_name (str): Name of the asset for logging context.
        success_message (str): Message to log if asset exists.
        failure_message (str): Message to log if asset does not exist.
    """
    if Path(asset_path).exists():
        logger.info(f"{asset_name} is properly configured. {success_message}")
    else:
        logger.info(f"{asset_name} is not properly configured. {failure_message}")
        
log_asset_status(
    asset_path=DATA_PATH,
    asset_name="Shakespeare text",
    success_message="",
    failure_message="Please create and download the required assets in your project on AI Studio."
)

log_asset_status(
    asset_path=MODEL_DECODER_PATH ,
    asset_name="Decoder model",
    success_message="",
    failure_message="Please check if model folder was properly downloaded in your project on AI Studio."
)

log_asset_status(
    asset_path=MODEL_ENCODER_PATH,
    asset_name="Encoder model",
    success_message="",
    failure_message="Please check if model folder was properly downloaded in your project on AI Studio."
)

log_asset_status(
    asset_path=MODEL_PATH,
    asset_name="Rnn model",
    success_message="",
    failure_message="Please check if model folder was properly downloaded in your project on AI Studio."
)

2025-06-24 22:04:50 - INFO - Shakespeare text is properly configured. 
2025-06-24 22:04:50 - INFO - Decoder model is properly configured. 
2025-06-24 22:04:50 - INFO - Encoder model is properly configured. 
2025-06-24 22:04:50 - INFO - Rnn model is properly configured. 


In [8]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


## Get Text Data

This is the text we'll use as a basis for our generations: let's try to generate 'Shakespearean' texts.

This text is from Shakespeare's Sonnet 1. It's one of the 154 sonnets written by William Shakespeare that were first published in 1609. This particular sonnet, like many others, discusses themes of beauty, procreation, and the transient nature of life, urging the beautiful to reproduce so their beauty can live on through their offspring.

In [9]:
with open(DATA_PATH,'r',encoding='utf8') as f:
    text = f.read()

In [10]:
logger.info('First 600 chars: \n')
print(text[:600])

2025-06-24 22:04:50 - INFO - First 600 chars: 




                     1
  From fairest creatures we desire increase,
  That thereby beauty's rose might never die,
  But as the riper should by time decease,
  His tender heir might bear his memory:
  But thou contracted to thine own bright eyes,
  Feed'st thy light's flame with self-substantial fuel,
  Making a famine where abundance lies,
  Thy self thy foe, to thy sweet self too cruel:
  Thou that art now the world's fresh ornament,
  And only herald to the gaudy spring,
  Within thine own bud buriest thy content,
  And tender churl mak'st waste in niggarding:
    Pity the world, or else th


## Preparing textual data

We need to encode our data to give the model a proper numerical representation of our text.

In [11]:
all_characters = set(text) # creates a set of unique characters found in the text

In [12]:
len(all_characters)

84

In [13]:
decoder = dict(enumerate(all_characters))
# assigns a unique integer to each character in a dictionary format, 
# creating a mapping that can later be used to transform encoded predictions back into characters

In [14]:
encoder = {char: ind for ind, char in decoder.items()} 
# reverses the decoder dictionary, providing a mapping from characters to their respective assigned integers, which is used to encode the text.

In [15]:
torch.save(decoder, MODEL_DECODER_PATH)
torch.save(encoder, MODEL_ENCODER_PATH)

In [16]:
encoded_text = np.array([encoder[char] for char in text])
# encodes the entire text as an array of integers, with each integer representing the character at that position
#in the text according to the encoder dictionary

## One Hot Encoding

One-hot encoding is a way to convert categorical data into a fixed-size vector of numerical values.

This encoding allows the model to treat input data uniformly and is particularly important for models that need to determine the presence or absence of a feature, such as a particular character.

In [17]:
def one_hot_encoder(encoded_text, num_uni_chars):
    """
        Convert categorical data into a fixed-size vector of numerical values.

        Args:
            encoded_text: Batch of encoded text.
            num_uni_chars: Number of unique characters

    """
    try:
        # Create a placeholder for zeros
        one_hot = np.zeros((encoded_text.size, num_uni_chars))
        
        # Convert data type for later use with pytorch
        one_hot = one_hot.astype(np.float32)

        # Using indexing fill in the 1s at the correct index locations
        one_hot[np.arange(one_hot.shape[0]), encoded_text.flatten()] = 1.0
        
        # Reshape it so it matches the batch shape
        one_hot = one_hot.reshape((*encoded_text.shape, num_uni_chars))
        
        return one_hot
    except Exception as e:
            logger.error(f"Error converting categorical data: {str(e)}")

# Creating Training Batches

Training batches are a way of dividing the dataset into smaller, manageable groups of data points that are fed into a machine learning model during the training process.

In [18]:
def generate_batches(encoded_text, samp_per_batch=10, seq_len=50):
    
    '''
    Generate (using yield) batches for training.
    
    X: Encoded Text of length seq_len
    Y: Encoded Text shifted by one
    
    Example:
    
    X:
    
    [[1 2 3]]
    
    Y:
    
    [[ 2 3 4]]
    
    encoded_text : Complete Encoded Text to make batches from
    batch_size : Number of samples per batch
    seq_len : Length of character sequence
       
    '''
    try:
        # Total number of characters per batch
        # Example: If samp_per_batch is 2 and seq_len is 50, then 100
        # characters come out per batch.
        char_per_batch = samp_per_batch * seq_len
        
        # Number of batches available to make
        # Use int() to roun to nearest integer
        num_batches_avail = int(len(encoded_text)/char_per_batch)
        
        # Cut off end of encoded_text that
        # won't fit evenly into a batch
        encoded_text = encoded_text[:num_batches_avail * char_per_batch]
        
        # Reshape text into rows the size of a batch
        encoded_text = encoded_text.reshape((samp_per_batch, -1))

        # Go through each row in array.
        for n in range(0, encoded_text.shape[1], seq_len):
            # Grab feature characters
            x = encoded_text[:, n:n+seq_len]
            # y is the target shifted over by 1
            y = np.zeros_like(x)
            try:
                y[:, :-1] = x[:, 1:]
                y[:, -1]  = encoded_text[:, n+seq_len]
            except:
                y[:, :-1] = x[:, 1:]
                y[:, -1] = encoded_text[:, 0]
                
            yield x, y
    except Exception as e:
            logger.error(f"Error Generating batches: {str(e)}")


# Creating the LSTM Model

In [19]:
class CharModel(nn.Module):
    def __init__(self, decoder, encoder, all_chars, num_hidden=256, num_layers=4,drop_prob=0.5, use_gpu=False):
        """Initializes CharModel

        Args:
            decoder: Assigns a unique integer to each character in a dictionary format
            encoder : Reverses the decoder dictionary, providing a mapping from characters to their respective assigned integers.
            all_chars: Set of unique characters found in the text.
            num_hidden: Number of hidden layers. Defaults to 256.
            num_layers: Number of layers. Defaults to 4.
            drop_prob: Regularization technique to prevent overfitting. Defaults to 0.5.
            use_gpu: If the model uses GPU. Defaults to False.
        """
        try:
            super().__init__()
            self.drop_prob = drop_prob
            self.num_layers = num_layers
            self.num_hidden = num_hidden
            self.use_gpu = use_gpu
            
            self.all_chars = all_chars
            self.decoder = torch.load(decoder)
            self.encoder = torch.load(encoder)
            
            self.lstm = nn.LSTM(len(self.all_chars), num_hidden, num_layers, dropout=drop_prob, batch_first=True)
            self.dropout = nn.Dropout(drop_prob)
            self.fc_linear = nn.Linear(num_hidden, len(self.all_chars))
            logger.info("CharModel initialized successfully")
    
        except Exception as e:
            logger.error(f"Error initializing CharModel: {str(e)}")
      
    
    def forward(self, x, hidden):
        """Implementation of the CharModel logic, in which, the input passes through every step of the arquiteture

        Args:
            x: Input tensor with shape (batch size and senquency length) containing character indices.
            hidden: Tuple containing the inicial hidden states of the CharModel each with shape (batch size and senquency length).

        Returns:
            final_out: Output tensor representing the predicted logits for each character in the sequence.
            hidden: Tuple containing the final hidden states of the CharModel.
        """
        try:
            lstm_output, hidden = self.lstm(x, hidden)       
            drop_output = self.dropout(lstm_output)
            drop_output = drop_output.contiguous().view(-1, self.num_hidden)
            final_out = self.fc_linear(drop_output)
            
            return final_out, hidden
        
        except Exception as e:
            logger.error(f"Error implementing CharModel logic: {str(e)}")
    
    
    def hidden_state(self, batch_size):
        """
        Initializes and returns the initial hidden state for a recurrent neural network (e.g., LSTM).

        This method creates zero-filled tensors for the hidden state (h_0) and cell state (c_0), 
        supporting GPU execution if `self.use_gpu` is set to True.

        Args:
            batch_size: The number of sequences in the input batch, used to determine the tensor dimensions.

        Returns:
            Tuple: A tuple containing the hidden state and cell state tensors 
            with shape (num_layers, batch_size, num_hidden). Returns None if an exception occurs, and logs the error.
        """
        try:
            if self.use_gpu:
                hidden = (torch.zeros(self.num_layers,batch_size,self.num_hidden).to(device),
                        torch.zeros(self.num_layers,batch_size,self.num_hidden).to(device))
            else:
                hidden = (torch.zeros(self.num_layers,batch_size,self.num_hidden),
                        torch.zeros(self.num_layers,batch_size,self.num_hidden))
            
            return hidden
        except Exception as e:
            logger.error(f"Error Initializing and returning the initial hidden state: {str(e)}")

## Instance of the Model

In [20]:
model = CharModel(
    all_chars=all_characters,
    num_hidden=512,
    num_layers=3,
    drop_prob=0.5,
    use_gpu=True,
    encoder= MODEL_ENCODER_PATH,
    decoder= MODEL_DECODER_PATH
)

2025-06-24 22:04:50 - INFO - CharModel initialized successfully


### Optimizer and Loss

In [21]:
optimizer = torch.optim.Adam(model.parameters(),lr=0.001)
criterion = nn.CrossEntropyLoss()

## Training Data and Validation Data

In [22]:
# percentage of data to be used for training
train_percent = 0.5

In [23]:
int(len(encoded_text) * (train_percent))

2722804

In [24]:
train_ind = int(len(encoded_text) * (train_percent))

In [25]:
train_data = encoded_text[:train_ind]
val_data = encoded_text[train_ind:]

# Training the Network

## Variables

In [26]:
# Epochs to train for
epochs = 30
# batch size 
batch_size = 128
# Length of sequence
seq_len = 100
# for printing report purposes
# always start at 0
tracker = 0
# number of characters in text
num_char = max(encoded_text)+1

In [27]:
mlflow.set_tracking_uri('/phoenix/mlflow')
mlflow.set_experiment(EXPERIMENT_NAME)

2025/06/24 22:04:52 INFO mlflow.tracking.fluent: Experiment with name 'Shakespeare Text Generation' does not exist. Creating a new experiment.


<Experiment: artifact_location='/phoenix/mlflow/201817814577060446', creation_time=1750802692086, experiment_id='201817814577060446', last_update_time=1750802692086, lifecycle_stage='active', name='Shakespeare Text Generation', tags={}>

In [28]:
mlflow.start_run(run_name = RUN_NAME)

mlflow.log_param("epochs", epochs)
mlflow.log_param("batch_size", batch_size)

# Set model to train
model.train()

# Check to see if using GPU
if model.use_gpu:
    torch.cuda.manual_seed_all(0)
    model.cuda()

for i in range(epochs):
    
    hidden = model.hidden_state(batch_size)
    
    
    for x,y in generate_batches(train_data, batch_size, seq_len):
        
        tracker += 1
        
        # One Hot Encode incoming data
        x = one_hot_encoder(x, num_char)
        
        # Convert Numpy Arrays to Tensor
        inputs = torch.from_numpy(x)
        targets = torch.from_numpy(y)
        
        # Adjust for GPU if necessary
        if model.use_gpu:
            inputs = inputs.to(device)
            targets = targets.to(device)
            
        # Reset Hidden State
        hidden = tuple([state.data for state in hidden])
        
        model.zero_grad()
        
        lstm_output, hidden = model.forward(inputs, hidden)
        loss = criterion(lstm_output, targets.view(batch_size*seq_len).long())
        
        loss.backward()
        
        # Clipping gradients to avoid explosion
        nn.utils.clip_grad_norm_(model.parameters(), max_norm=5)
        
        optimizer.step()
        
        if tracker % 100 == 0:
            val_hidden = model.hidden_state(batch_size)
            val_losses = []
            model.eval()
            
            for x,y in generate_batches(val_data, batch_size, seq_len):
                x = one_hot_encoder(x, num_char)
                inputs = torch.from_numpy(x)
                targets = torch.from_numpy(y)
                
                if model.use_gpu:
                    inputs = inputs.to(device)
                    targets = targets.to(device)
                
                val_hidden = tuple([state.data for state in val_hidden])
                
                lstm_output, val_hidden = model.forward(inputs, val_hidden)
                val_loss = criterion(lstm_output, targets.view(batch_size*seq_len).long())
        
                val_losses.append(val_loss.item())
            
  
            mlflow.log_metric("Val Loss", val_loss.item(), step=tracker)
        
            model.train()
            
    print(f"Epoch: {i} Step: {tracker} Val Loss: {val_loss.item()}")


mlflow.end_run()

Epoch: 0 Step: 212 Val Loss: 2.6792349815368652
Epoch: 1 Step: 424 Val Loss: 2.0928144454956055
Epoch: 2 Step: 636 Val Loss: 1.863274335861206
Epoch: 3 Step: 848 Val Loss: 1.7218742370605469
Epoch: 4 Step: 1060 Val Loss: 1.6229908466339111
Epoch: 5 Step: 1272 Val Loss: 1.5663131475448608
Epoch: 6 Step: 1484 Val Loss: 1.5238914489746094
Epoch: 7 Step: 1696 Val Loss: 1.485133409500122
Epoch: 8 Step: 1908 Val Loss: 1.4531649351119995
Epoch: 9 Step: 2120 Val Loss: 1.4358128309249878
Epoch: 10 Step: 2332 Val Loss: 1.4237920045852661
Epoch: 11 Step: 2544 Val Loss: 1.4111253023147583
Epoch: 12 Step: 2756 Val Loss: 1.404954195022583
Epoch: 13 Step: 2968 Val Loss: 1.3983490467071533
Epoch: 14 Step: 3180 Val Loss: 1.3930407762527466
Epoch: 15 Step: 3392 Val Loss: 1.384071946144104
Epoch: 16 Step: 3604 Val Loss: 1.3750123977661133
Epoch: 17 Step: 3816 Val Loss: 1.3771308660507202
Epoch: 18 Step: 4028 Val Loss: 1.3724491596221924
Epoch: 19 Step: 4240 Val Loss: 1.370200514793396
Epoch: 20 Step: 445

## Saving the Model

In [29]:
model_name = TORCH_MODEL

In [30]:
torch.save(model.state_dict(), f'models/{TORCH_MODEL}')
logger.info("Model saved")

2025-06-24 23:20:53 - INFO - Model saved


## Load Model

In [31]:
model.load_state_dict(torch.load(f'models/{TORCH_MODEL}'))
model.eval()

CharModel(
  (lstm): LSTM(84, 512, num_layers=3, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.5, inplace=False)
  (fc_linear): Linear(in_features=512, out_features=84, bias=True)
)

# Generating Predictions

--------

In [32]:
def predict_next_char(model, char, hidden=None, k=1):
    """
    Predicts the next character given an input character and the current hidden state.

    This method encodes the input character, feeds it through the trained character-level 
    language model (e.g., LSTM), and samples from the top-k most probable characters 
    to determine the next one. It also returns the updated hidden state for sequential prediction.

    Args:
        char: The input character to start prediction from.
        hidden: Current hidden state of the model. Each tensor has shape (num_layers, batch_size, num_hidden).
            If None, a new hidden state should be initialized before calling this method.
        k: Number of top predictions to sample from.

    Returns:
        A tuple containing the predicted next character and the updated hidden state.
    """
    try:
        encoded_text = model.encoder[char]
        encoded_text = np.array([[encoded_text]])
        encoded_text = one_hot_encoder(encoded_text, len(model.all_chars))
        inputs = torch.from_numpy(encoded_text)
    
        if(model.use_gpu):
            inputs = inputs.to(device)  

        hidden = tuple([state.data for state in hidden])
        lstm_out, hidden = model(inputs, hidden)        
        probs = F.softmax(lstm_out, dim=1).data
    
        if(model.use_gpu):
            probs = probs.cpu()

    # Getting the top 'k' for next char probs
        probs, index_positions = probs.topk(k)        
        index_positions = index_positions.numpy().squeeze()
        probs = probs.numpy().flatten()
        probs = probs/probs.sum()
        char = np.random.choice(index_positions, p=probs)    
    
        return model.decoder[char], hidden
    except Exception as e:
            logger.error(f"Error predicting next char: {str(e)}")

In [33]:
def generate_text(model, size, seed='The', k=1):
    """
    Generates a sequence of text using the trained character-level language model.

    Starting from a seed string, this method uses the model to predict the next character
    one at a time, feeding each predicted character back into the model. It continues
    this process until the desired output length is reached.

    Args:
        seed: The initial sequence of characters used to start the text generation.
        size: The number of characters to generate after the seed.
        k: Number of top character predictions to consider for sampling at each step.

    Returns:
        The full generated text including the seed and the newly predicted characters.
    """
    try:
        if(model.use_gpu):
            model.to(device)
        else:
            model.cpu()
            
        model.eval()
        output_chars = [c for c in seed]
        hidden = model.hidden_state(1)
        
        for char in seed:
            char, hidden = predict_next_char(model, char, hidden, k=k)

        output_chars.append(char)
        for i in range(size):
            char, hidden = predict_next_char(model, output_chars[-1], hidden, k=k)
            output_chars.append(char)
            
        return ''.join(output_chars)
    except Exception as e:
            logger.error(f"Error making predictions: {str(e)}")


#### Generating a text with 1000 chars starting with word 'Confidence'

In [34]:
print(generate_text(model, 1000, seed='Confidence ', k=3))

Confidence of the chair
  CHARLES, the Council, and Cornelius and Castius

  PAROLLES, trumpets as I here, to be the worse to this thought
    on me.
  CLOWN. The soldiers of the command, I will stay to him, and we will be
    the cap of the child in house of the soul, and tell thee a chance and stood
    and send thee with a mother to my hand to th' cause. I have been
    a state, and tell this war and the son that we show them the soul
    of a soul, and white the chairs of man and to a power of the country that I have been
    my lord.
  COUNTESS. The merciless strong instantly should have thee that I heard
    thee. I am a strain's child in the counterfeit of a power to tell you.
                               Exeunt ARTHUR. A THALLOS and the KING, whose son
                       a thorns, and they say and
                                                          this tongues
                  the trumpets and the state

  CAESAR. That will I stay to have thee to her son,
    Whom

#### Generating a text with 1000 chars starting with word 'Love'

In [35]:
print(generate_text(model, 1000, seed='Love ', k=3))

Love and Caesar
    That to my father shall be true an hour
    And we hear him and the streets to the contrive.
  KING JOHN. The man is so, the charge of the controll'd
    With him that to their cheeks to stand on me.
    I'll see the soldiers and a partian hard.
    I will bear true that they are strong and thine,
    Will this answer to-day and the soldiers
    That, as you have that was a proud of my son
    That we will stay to thee a stars to thine
    Which I am sure that was never the worse
    We shall strate me, and we will be a sentence
    As we will see him.
  CAESAR. What is the court?  
  SURREY. What is the service of the sendent will
    And honour what the service of my friends,
    Wherefore their braces that will still be so?
  KING. Why sees to this? To th' storm of him, to strong,
    And, which I have been their and straight that shall
    Think we to th' way. What, within that I shall have mine,
    I shall, my Lord of Warwick! Where say you?
  CLEOPATRA. I wil

In [None]:
end_time: float = time.time()
elapsed_time: float = end_time - start_time
elapsed_minutes: int = int(elapsed_time // 60)
elapsed_seconds: float = elapsed_time % 60

logger.info(f"⏱️ Total execution time: {elapsed_minutes}m {elapsed_seconds:.2f}s")
logger.info("✅ Notebook execution completed successfully.")

Built with ❤️ using Z by HP AI Studio.