<h1>A2: Language Model<h1>

<h2>Importing necessary libraries<h2>

In [65]:
import torch
import torch.nn as nn
import torch.optim as optim

import torchtext, datasets, math
from tqdm import tqdm

In [66]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)

cpu


In [67]:
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

<h2>Task 1. Dataset Acquisition<h2>

The dataset used in this assignment is the Sherlock Holmes Collection obtained from Kaggle
(https://www.kaggle.com/datasets/bharatkumar0925/sherlock-holmes-collection).
It consists of the complete collection of public-domain novels and short stories written by Arthur Conan Doyle, including multiple Sherlock Holmes works. The dataset is provided in plain text format and represents a large continuous literary corpus suitable for language modeling. 

In [68]:
# Loading the raw text file containing the dataset
with open("Sherlock Homes.txt", "r", encoding="utf-8") as f:
    text = f.read()


# Splitting the raw text into individual non-empty lines
# Each line is stored as a dictionary with a 'text' key

data = [{"text": line} for line in text.split("\n") if line.strip()]

print(f"Total lines: {len(data)}")
print(data[0])

Total lines: 60365
{'text': '                          THE COMPLETE SHERLOCK HOLMES'}


<h1>Inspecting and cleaning the raw dataset<h1>

The raw text dataset was first loaded from a plain text file containing the complete Sherlock Holmes collection. To prepare the data for language modeling, several preprocessing steps needs to be applied. Non-narrative content such as front matter, chapter titles, section headers, and metadata needs to be removed using rule-based filtering. Empty lines are to be discarded, and the text is to be trimmed to retain only meaningful narrative content. After cleaning, the text will be tokenized using the basic_english tokenizer from TorchText, which lowercases text and separates punctuation. Special tokens for unknown words (<unk>) and end-of-sequence (<eos>) will be added. Finally, the tokenized text will be converted into numerical form using a vocabulary constructed from the corpus, enabling it to be used as input for training the language model.

In [69]:
#Inspecting few lines of dataset
with open("Sherlock Homes.txt", "r", encoding="utf-8") as f:
    for i in range(300):
        print(f"{i+1}: {f.readline().strip()}")

1: 
2: 
3: 
4: 
5: THE COMPLETE SHERLOCK HOLMES
6: 
7: Arthur Conan Doyle
8: 
9: 
10: 
11: Table of contents
12: 
13: A Study In Scarlet
14: 
15: The Sign of the Four
16: 
17: The Adventures of Sherlock Holmes
18: A Scandal in Bohemia
19: The Red-Headed League
20: A Case of Identity
21: The Boscombe Valley Mystery
22: The Five Orange Pips
23: The Man with the Twisted Lip
24: The Adventure of the Blue Carbuncle
25: The Adventure of the Speckled Band
26: The Adventure of the Engineer's Thumb
27: The Adventure of the Noble Bachelor
28: The Adventure of the Beryl Coronet
29: The Adventure of the Copper Beeches
30: 
31: The Memoirs of Sherlock Holmes
32: Silver Blaze
33: The Yellow Face
34: The Stock-Broker's Clerk
35: The "Gloria Scott"
36: The Musgrave Ritual
37: The Reigate Squires
38: The Crooked Man
39: The Resident Patient
40: The Greek Interpreter
41: The Naval Treaty
42: The Final Problem
43: 
44: The Return of Sherlock Holmes
45: The Adventure of the Empty House
46: The Adventure o

In [70]:
def clean_text_lines(file_path, start_phrase="In the year 1878"):
    with open(file_path, "r", encoding="utf-8") as f:
        lines = f.readlines()

    # Find the first line where the story begins
    start_idx = None
    for i, line in enumerate(lines):
        if start_phrase in line:
            start_idx = i
            break

    if start_idx is None:
        raise ValueError(f"Start phrase not found: {start_phrase}")

    cleaned_lines = []

    # Keep everything from start_idx to end, with minimal cleaning
    for line in lines[start_idx:]:
        line = line.strip()

        # Remove empty lines
        if not line:
            continue

        # Remove ALL-CAPS headers like "CHAPTER I", "PART II", etc.
        # (story content still remains)
        if line.isupper():
            continue

        # Remove simple parenthetical metadata lines
        if line.startswith("(") and line.endswith(")"):
            continue

        cleaned_lines.append(line)

    return cleaned_lines


cleaned_data = clean_text_lines("Sherlock Homes.txt", start_phrase="In the year 1878")

print("Total cleaned lines:", len(cleaned_data))
print("First kept line:", cleaned_data[0])
print("Last kept line:", cleaned_data[-1])

Total cleaned lines: 60117
First kept line: In the year 1878 I took my degree of Doctor of Medicine of the
Last kept line: This text comes from the collection's version 3.1.


<h1>Tokenization<h1>

In [71]:
#After cleaning, the narrative text is tokenized using TorchText’s `basic_english` tokenizer.
from torchtext.data.utils import get_tokenizer

tokenizer = get_tokenizer("basic_english")

# Join all cleaned lines into ONE long text
full_text = " ".join(cleaned_data)

# Tokenize once
tokens = tokenizer(full_text)

print("Total tokens:", len(tokens))
print(tokens[:20])

Total tokens: 762850
['in', 'the', 'year', '1878', 'i', 'took', 'my', 'degree', 'of', 'doctor', 'of', 'medicine', 'of', 'the', 'university', 'of', 'london', ',', 'and', 'proceeded']


In [72]:
print("Cleaned lines:", len(cleaned_data))
print("Total tokens:", sum(len(ex["tokens"]) for ex in tokenized_dataset))

Cleaned lines: 60117
Total tokens: 762850


<h1>Numericializing<h1>

In [73]:
from torchtext.vocab import build_vocab_from_iterator

vocab = build_vocab_from_iterator(
    [tokens],   
    min_freq=2
)

vocab.insert_token("<unk>", 0)
vocab.insert_token("<eos>", 1)
vocab.set_default_index(vocab["<unk>"])

print("Vocab size:", len(vocab))

Vocab size: 12860


In [74]:
import pickle
with open('model/vocab_lm.pkl', 'wb') as f:
    pickle.dump(vocab, f)

<h1>Prepare the batch loader<h1>

Preparing data

In [75]:
def get_data(dataset, vocab, batch_size):
    data = []

    # Iterating over each tokenized example (one line/sentence)
    for example in dataset:
        tokens = example["tokens"] + ["<eos>"]   # FIX: proper EOS
        tokens = [vocab[token] for token in tokens]
        data.extend(tokens)

    # Convert token list to a PyTorch tensor
    data = torch.LongTensor(data)

    num_batches = data.shape[0] // batch_size
    data = data[:num_batches * batch_size]

    # Reshape into [batch_size, sequence_length]
    # Each row represents a continuous stream of tokens
    data = data.view(batch_size, -1)

    return data

In [76]:
batch_size = 128
# Convert tokenized text into a batched tensor of token indices
# Output shape: [batch_size, sequence_length]
train_data = get_data(tokenized_dataset, vocab, batch_size)

print(train_data.shape)

torch.Size([128, 6429])


<h1>Task 2. Model Training<h1>

<h1>Modeling<h1>

In [77]:
class LSTMLanguageModel(nn.Module):
    def __init__(self, vocab_size, emb_dim, hid_dim, num_layers, dropout_rate):
        super().__init__()
        self.num_layers = num_layers
        self.hid_dim    = hid_dim
        self.emb_dim    = emb_dim
        
        self.embedding  = nn.Embedding(vocab_size, emb_dim)
        self.lstm       = nn.LSTM(emb_dim, hid_dim, num_layers=num_layers, dropout=dropout_rate, batch_first=True)
        self.dropout    = nn.Dropout(dropout_rate)
        self.fc         = nn.Linear(hid_dim, vocab_size)
        
        self.init_weights()
    
    def init_weights(self):
        init_range_emb = 0.1
        init_range_other = 1/math.sqrt(self.hid_dim)
        self.embedding.weight.data.uniform_(-init_range_emb, init_range_other)
        self.fc.weight.data.uniform_(-init_range_other, init_range_other)
        self.fc.bias.data.zero_()
        for i in range(self.num_layers):
            self.lstm.all_weights[i][0] = torch.FloatTensor(self.emb_dim,
                self.hid_dim).uniform_(-init_range_other, init_range_other) #We
            self.lstm.all_weights[i][1] = torch.FloatTensor(self.hid_dim,   
                self.hid_dim).uniform_(-init_range_other, init_range_other) #Wh
    
    def init_hidden(self, batch_size, device):
        hidden = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        cell   = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        return hidden, cell
        
    def detach_hidden(self, hidden):
        hidden, cell = hidden
        hidden = hidden.detach() #not to be used for gradient computation
        cell   = cell.detach()
        return hidden, cell
        
    def forward(self, src, hidden):
        #src: [batch_size, seq len]
        embedding = self.dropout(self.embedding(src)) #harry potter is
        #embedding: [batch-size, seq len, emb dim]
        output, hidden = self.lstm(embedding, hidden)
        #ouput: [batch size, seq len, hid dim]
        #hidden: [num_layers * direction, seq len, hid_dim]
        output = self.dropout(output)
        prediction =self.fc(output)
        #prediction: [batch_size, seq_len, vocab_size]
        return prediction, hidden

The language model is implemented using a Long Short-Term Memory (LSTM) network, a type of recurrent neural network designed to capture long-range dependencies in sequential data. The LSTM contains memory cells and gating mechanisms that allow it to selectively remember or forget information from previous time steps. The input gate controls how much new information is written into the memory cell, the forget gate determines which information from the previous memory should be discarded or retained, and the output gate regulates how much information from the memory cell is exposed to the hidden state and output. This gating structure enables the model to effectively model long-term contextual relationships in text.

In the implemented architecture, input tokens are first converted into dense vector representations using an embedding layer. These embeddings are then passed through stacked LSTM layers to capture temporal dependencies between words in the sequence. Dropout is applied to the embeddings and LSTM outputs to reduce overfitting by randomly deactivating neurons during training. Finally, a fully connected linear layer maps the LSTM hidden states to vocabulary-sized logits, which are used to predict the next word in the sequence.

<h1>Training<h1>

In [78]:
# Vocabulary size
vocab_size = len(vocab)

# Model hyperparameters
emb_dim = 256
hid_dim = 256
num_layers = 2
dropout_rate = 0.3
lr = 1e-3

# Initialize LSTM language model
model = LSTMLanguageModel(
    vocab_size, emb_dim, hid_dim, num_layers, dropout_rate
).to(device)

# Optimizer and loss function
optimizer = optim.Adam(model.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

# Number of trainable parameters
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"The model has {num_params:,} trainable parameters")

The model has 7,649,852 trainable parameters


In [79]:
def get_batch(data, seq_len, idx):
    # Extract input sequence
    src = data[:, idx:idx + seq_len]

    # Target is the input shifted by one position
    target = data[:, idx + 1:idx + seq_len + 1]

    return src, target

In [80]:
def train(model, data, optimizer, criterion, batch_size, seq_len, clip, device):
    epoch_loss = 0
    model.train()

    # Ensuring data length is divisible by seq_len
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches - 1) % seq_len]
    num_batches = data.shape[-1]

    # Initializing hidden state at the start of each epoch
    hidden = model.init_hidden(batch_size, device)

    for idx in tqdm(range(0, num_batches - 1, seq_len), desc="Training", leave=False):
        optimizer.zero_grad()

        # Detaching hidden state to prevent backprop through entire history
        hidden = model.detach_hidden(hidden)

        # Getting input and target sequences
        src, target = get_batch(data, seq_len, idx)
        src, target = src.to(device), target.to(device)

        # Forward pass
        prediction, hidden = model(src, hidden)

        # Reshape for loss computation
        prediction = prediction.reshape(-1, prediction.size(-1))
        target = target.reshape(-1)

        # Compute loss and update parameters
        loss = criterion(prediction, target)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()

        epoch_loss += loss.item() * seq_len

    return epoch_loss / num_batches

In [81]:
def evaluate(model, data, criterion, batch_size, seq_len, device):
    epoch_loss = 0
    model.eval()

    # Ensuring data length is divisible by seq_len
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches - 1) % seq_len]
    num_batches = data.shape[-1]

    # Initializing hidden state
    hidden = model.init_hidden(batch_size, device)

    with torch.no_grad():
        for idx in range(0, num_batches - 1, seq_len):
            # Detach hidden state between batches
            hidden = model.detach_hidden(hidden)

            # Get input and target sequences
            src, target = get_batch(data, seq_len, idx)
            src, target = src.to(device), target.to(device)

            # Forward pass
            prediction, hidden = model(src, hidden)

            # Reshape for loss computation
            prediction = prediction.reshape(-1, prediction.size(-1))
            target = target.reshape(-1)

            loss = criterion(prediction, target)
            epoch_loss += loss.item() * seq_len

    return epoch_loss / num_batches

In [82]:
# Split data into train and validation (90% / 10%)
split_idx = int(train_data.shape[1] * 0.9)

train_data, valid_data = (
    train_data[:, :split_idx],
    train_data[:, split_idx:]
)

In [140]:
import math
import torch.optim as optim

# Training configuration
n_epochs = 50
seq_len = 50
clip = 0.25

# Learning rate scheduler
lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, factor=0.5, patience=1
)

best_valid_loss = float("inf")

# Training loop
for epoch in range(n_epochs):
    train_loss = train(
        model, train_data, optimizer, criterion,
        batch_size, seq_len, clip, device
    )

    valid_loss = evaluate(
        model, valid_data, criterion, batch_size,
        seq_len, device
    )

    lr_scheduler.step(valid_loss)

    # Save best model based on validation loss
    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), "model/best-val-lstm_lm.pt")

    print(f"Epoch {epoch + 1}/{n_epochs}")
    print(f"\tTrain Perplexity: {math.exp(train_loss):.3f}")
    print(f"\tValid Perplexity: {math.exp(valid_loss):.3f}")

                                                           

Epoch 1/50
	Train Perplexity: 62.920
	Valid Perplexity: 84.452


                                                           

Epoch 2/50
	Train Perplexity: 61.962
	Valid Perplexity: 83.858


                                                           

Epoch 3/50
	Train Perplexity: 61.119
	Valid Perplexity: 83.387


                                                           

Epoch 4/50
	Train Perplexity: 60.233
	Valid Perplexity: 83.388


                                                           

Epoch 5/50
	Train Perplexity: 59.521
	Valid Perplexity: 83.682


                                                           

Epoch 6/50
	Train Perplexity: 58.012
	Valid Perplexity: 83.176


                                                           

Epoch 7/50
	Train Perplexity: 57.350
	Valid Perplexity: 83.540


                                                           

Epoch 8/50
	Train Perplexity: 56.802
	Valid Perplexity: 83.096


                                                           

Epoch 9/50
	Train Perplexity: 56.462
	Valid Perplexity: 82.850


                                                           

Epoch 10/50
	Train Perplexity: 55.964
	Valid Perplexity: 82.738


                                                           

Epoch 11/50
	Train Perplexity: 55.610
	Valid Perplexity: 82.795


                                                           

Epoch 12/50
	Train Perplexity: 55.466
	Valid Perplexity: 82.549


                                                           

Epoch 13/50
	Train Perplexity: 54.884
	Valid Perplexity: 82.766


                                                           

Epoch 14/50
	Train Perplexity: 54.385
	Valid Perplexity: 82.879


                                                           

Epoch 15/50
	Train Perplexity: 53.793
	Valid Perplexity: 82.623


                                                           

Epoch 16/50
	Train Perplexity: 53.480
	Valid Perplexity: 82.691


                                                           

Epoch 17/50
	Train Perplexity: 53.104
	Valid Perplexity: 82.497


                                                           

Epoch 18/50
	Train Perplexity: 52.988
	Valid Perplexity: 82.489


                                                           

Epoch 19/50
	Train Perplexity: 52.772
	Valid Perplexity: 82.628


                                                           

Epoch 20/50
	Train Perplexity: 52.616
	Valid Perplexity: 82.489


                                                           

Epoch 21/50
	Train Perplexity: 52.485
	Valid Perplexity: 82.486


                                                           

Epoch 22/50
	Train Perplexity: 52.330
	Valid Perplexity: 82.475


                                                           

Epoch 23/50
	Train Perplexity: 52.313
	Valid Perplexity: 82.471


                                                           

Epoch 24/50
	Train Perplexity: 52.287
	Valid Perplexity: 82.487


                                                           

Epoch 25/50
	Train Perplexity: 52.243
	Valid Perplexity: 82.500


                                                           

Epoch 26/50
	Train Perplexity: 52.163
	Valid Perplexity: 82.481


                                                           

Epoch 27/50
	Train Perplexity: 52.221
	Valid Perplexity: 82.496


                                                           

Epoch 28/50
	Train Perplexity: 52.181
	Valid Perplexity: 82.483


                                                           

Epoch 29/50
	Train Perplexity: 52.190
	Valid Perplexity: 82.488


                                                           

Epoch 30/50
	Train Perplexity: 52.133
	Valid Perplexity: 82.487


                                                           

Epoch 31/50
	Train Perplexity: 52.158
	Valid Perplexity: 82.489


                                                           

Epoch 32/50
	Train Perplexity: 52.095
	Valid Perplexity: 82.492


                                                           

Epoch 33/50
	Train Perplexity: 52.137
	Valid Perplexity: 82.491


                                                           

Epoch 34/50
	Train Perplexity: 52.094
	Valid Perplexity: 82.492


                                                           

Epoch 35/50
	Train Perplexity: 52.095
	Valid Perplexity: 82.492


                                                           

Epoch 36/50
	Train Perplexity: 52.129
	Valid Perplexity: 82.492


                                                           

Epoch 37/50
	Train Perplexity: 52.103
	Valid Perplexity: 82.493


                                                           

Epoch 38/50
	Train Perplexity: 52.148
	Valid Perplexity: 82.493


                                                           

Epoch 39/50
	Train Perplexity: 52.167
	Valid Perplexity: 82.493


                                                           

Epoch 40/50
	Train Perplexity: 52.129
	Valid Perplexity: 82.493


                                                           

Epoch 41/50
	Train Perplexity: 52.125
	Valid Perplexity: 82.493


                                                           

Epoch 42/50
	Train Perplexity: 52.120
	Valid Perplexity: 82.493


                                                           

Epoch 43/50
	Train Perplexity: 52.145
	Valid Perplexity: 82.493


                                                           

Epoch 44/50
	Train Perplexity: 52.090
	Valid Perplexity: 82.493


                                                           

Epoch 45/50
	Train Perplexity: 52.153
	Valid Perplexity: 82.493


                                                           

Epoch 46/50
	Train Perplexity: 52.107
	Valid Perplexity: 82.493


                                                           

Epoch 47/50
	Train Perplexity: 52.094
	Valid Perplexity: 82.493


                                                           

Epoch 48/50
	Train Perplexity: 52.117
	Valid Perplexity: 82.493


                                                           

Epoch 49/50
	Train Perplexity: 52.163
	Valid Perplexity: 82.493


                                                           

Epoch 50/50
	Train Perplexity: 52.139
	Valid Perplexity: 82.493


Firstly, the model hyperparameters such as vocabulary size, embedding dimension, hidden dimension, number of LSTM layers, dropout rate, and learning rate are initialized. The Adam optimizer is used to optimize the model parameters, and the CrossEntropyLoss criterion is employed to compute the training loss.

The model is trained for a fixed number of epochs. In each epoch, the training data is divided into fixed-length sequences using the get_batch function. At the beginning of each epoch, the hidden state of the LSTM is reset. For every batch, the optimizer gradients are cleared, a forward pass is performed, and the loss is calculated by comparing the predicted probability distribution of the next token with the actual next token. Gradients are computed using backpropagation, and the model parameters are updated using the optimizer. The training loss is accumulated across all batches within an epoch.

After completing an epoch, the model is switched to evaluation mode, and the validation data is processed using the same batching procedure without updating model parameters. The validation loss is computed to assess model performance on unseen data. A learning rate scheduler adjusts the learning rate based on validation loss, and the model parameters are saved whenever the validation loss improves over previous epochs.

<h1>Testing<h1>

In [141]:
# Loading the  model 
model.load_state_dict(
    torch.load("model/best-val-lstm_lm.pt", map_location=device)
)

# Evaluating the model on the validation  data
test_loss = evaluate(
    model, valid_data, criterion, batch_size, seq_len, device
)

print(f"Test Perplexity: {math.exp(test_loss):.3f}")

Test Perplexity: 82.471


<h1>Real-world inference<h1>

In [142]:
import torch

def generate(prompt, max_seq_len, temperature, model, tokenizer, vocab, device, seed=None):
    # Setting random seed for reproducibility 
    if seed is not None:
        torch.manual_seed(seed)

    model.eval()

    # Tokenizing and numericalizing the input prompt
    tokens = tokenizer(prompt)
    if len(tokens) == 0:
        tokens = ["<unk>"]
    indices = [vocab[t] for t in tokens]

    batch_size = 1
    hidden = model.init_hidden(batch_size, device)

    with torch.no_grad():
        for idx in indices[:-1]:
            src = torch.LongTensor([[idx]]).to(device)
            _, hidden = model(src, hidden)

        last_idx = indices[-1]

        for _ in range(max_seq_len):
            src = torch.LongTensor([[last_idx]]).to(device)
            prediction, hidden = model(src, hidden)

            logits = prediction[:, -1, :] / temperature
            probs = torch.softmax(logits, dim=-1)

            next_idx = torch.multinomial(probs, num_samples=1).item()

            while next_idx == vocab["<unk>"]:
                next_idx = torch.multinomial(probs, num_samples=1).item()

            if next_idx == vocab["<eos>"]:
                break

            indices.append(next_idx)
            last_idx = next_idx

    # Converting indices back to tokens
    itos = vocab.get_itos()
    return [itos[i] for i in indices]

In [155]:
# Example prompts
prompts = [
    "Mr. Sherlock Holmes"
]

max_seq_len = 100
seed = 91

# Smaller temperature -> less random (more confident); larger -> more diverse
temperatures = [0.1, 0.5, 0.7, 0.9, 1.0]

for prompt in prompts:
    print("PROMPT:", prompt, "\n")
    for temperature in temperatures:
        generation = generate(prompt, max_seq_len, temperature, model, tokenizer, vocab, device, seed)
        print(f"temp={temperature}:\n" + " ".join(generation) + "\n")


PROMPT: Mr. Sherlock Holmes 

temp=0.1:
mr . sherlock holmes ,

temp=0.5:
mr . sherlock holmes , he

temp=0.7:
mr . sherlock holmes , he ' s , and there is a small

temp=0.9:
mr . sherlock holmes , he ' s beard , came at the gate

temp=1.0:
mr . sherlock holmes , he ' s beard , came after the



<h1>Documentation of Web Interface<h1>

The web application for text generation is implemented using the **Dash** framework. The complete user interface and model integration logic are contained within the `app.py` file. The interface is intentionally kept simple and consists of a text input field for user prompts, a **Generate** button, and an output section where the generated text is displayed for multiple temperature values. A demonstration of the application and usage instructions are provided in the `README.md` file inside the `A2` folder.


### Model Integration with the Web Application

The trained LSTM language model is integrated into the web application through the following steps:

- The vocabulary generated during training is loaded using Python’s `pickle` module.  
- The LSTM language model architecture is recreated using the same hyperparameters as used during training.  
- Pre-trained model weights are loaded into the model using `torch.load`.  
- A text generation function (`generate_text`) is defined, which tokenizes user input, converts tokens into numerical indices using the vocabulary, and generates new text using the trained LSTM model.  
- Temperature values are applied during inference to control the randomness and diversity of generated text.

During inference, the model is run in evaluation mode. Predictions are sampled from the probability distribution produced by the model until either the maximum generation length is reached or an end-of-sequence token (`<eos>`) is encountered.



### User Interaction Flow

The interaction between the user, the web interface, and the language model follows these steps:

- The user enters a text prompt into the input field (e.g., *“Sherlock Holmes is”*).  
- The user clicks the **Generate** button.  
- The application passes the prompt to the language model.  
- The model generates continuations of the prompt for multiple temperature values (0.1, 0.5, 0.7, 0.9, and 1.0).  
- The generated text outputs are displayed on the web page, clearly labeled by temperature.

Since the language model is trained exclusively on the **Sherlock Holmes literary corpus**, prompts related to similar themes, characters, or narrative styles produce more coherent and meaningful outputs. Inputs unrelated to this domain may result in less sensible or generic text, reflecting the domain-specific nature of the trained model.