<a href="https://www.kaggle.com/code/arashnic/rag-with-sentence-and-hugging-face-transformers?scriptVersionId=179103399" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

### **"Implementation of Retrieval-Augmented Generation (RAG) Framework with Sentence Transformers and Hugging Face Transformers"**

In [1]:
!pip install transformers datasets faiss-cpu sentence-transformers


Collecting faiss-cpu
  Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl.metadata (11 kB)
Downloading faiss_cpu-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (27.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m45.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu, sentence-transformers
Successfully installed faiss-cpu-1.8.0 sentence-transformers-2.7.0


In [2]:
import wandb
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()

my_secret = user_secrets.get_secret("wandb") 

wandb.login(key=my_secret)

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [3]:
from sentence_transformers import SentenceTransformer, InputExample, losses, datasets, LoggingHandler
from torch.utils.data import DataLoader
import logging
import faiss
import torch

In [4]:


# Setup logging
logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO, handlers=[LoggingHandler()])

# Define an expanded document corpus
document_corpus = [
    "RAG stands for Retrieval-Augmented Generation, a method in natural language processing.",
    "It combines the power of retrieval-based models with generative models to improve response quality.",
    "The retriever fetches relevant documents based on a query.",
    "The generator uses the retrieved documents to generate a coherent and informative response.",
    "This approach leverages large-scale pre-trained models for both retrieval and generation tasks.",
    "Fine-tuning RAG involves training both the retriever and generator components.",
    "RAG can be used in various applications, including chatbots, question answering systems, and more.",
    "The framework was introduced by Facebook AI Research (FAIR) in 2020.",
    "RAG aims to improve the informativeness and accuracy of generated responses.",
    
]

# Define an expanded set of query-document pairs for training
query_document_pairs = [
    ("What does RAG stand for?", document_corpus[0]),
    ("How does RAG improve response quality?", document_corpus[1]),
    ("What is the role of the retriever in RAG?", document_corpus[2]),
    ("How does the generator work in RAG?", document_corpus[3]),
    ("What is the advantage of using RAG?", document_corpus[4]),
    ("How do you fine-tune RAG?", document_corpus[5]),
    ("What are the applications of RAG?", document_corpus[6]),
    ("Who introduced RAG and when?", document_corpus[7]),
    ("What is the goal of RAG?", document_corpus[8]),
    
]


In [5]:
from sentence_transformers import SentenceTransformer, InputExample, losses, datasets, LoggingHandler

# Create training examples
train_examples = []
for i, (query, document) in enumerate(query_document_pairs):
    train_example = InputExample(guid=str(i), texts=[query, document])
    train_examples.append(train_example)

# Define a Sentence Transformer model for the retriever
retriever_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Create a Faiss index for the retriever
corpus_embeddings = retriever_model.encode(document_corpus, convert_to_tensor=True)
index = faiss.IndexFlatL2(corpus_embeddings.shape[1])
index.add(corpus_embeddings.cpu().numpy())

# Prepare the data loader for training
train_dataset = datasets.SentencesDataset(train_examples, model=retriever_model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=2)
train_loss = losses.MultipleNegativesRankingLoss(retriever_model)

# Train the retriever model
retriever_model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100)

# Define a simple tokenizer and generator model
from transformers import GPT2LMHeadModel, GPT2Tokenizer

generator_model = GPT2LMHeadModel.from_pretrained('gpt2')
generator_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
generator_tokenizer.pad_token = generator_tokenizer.eos_token  # Set the pad token to be the same as the eos token

def generate_response(query):
    query_embedding = retriever_model.encode(query, convert_to_tensor=True).unsqueeze(0)  # Ensure 2D array
    D, I = index.search(query_embedding.cpu().numpy(), k=3)  # Top 3 documents
    retrieved_docs = [document_corpus[int(idx)] for idx in I[0]]  # Fetch indices from the first element of I

    # Create input for the generator
    context = " ".join(set(retrieved_docs))  # Ensure unique sentences
    context = " ".join(context.split()[:150])  # Limit context length
    input_text = f"query: {query} context: {context}"
    input_ids = generator_tokenizer.encode(input_text, return_tensors='pt')
    
    # Generate response with adjusted parameters
    output_ids = generator_model.generate(
        input_ids,
        max_length=150,
        num_return_sequences=1,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=2.0,  # Add repetition penalty to reduce repeated phrases
        pad_token_id=generator_tokenizer.eos_token_id
    )
    response = generator_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return response

# Test the RAG system
query = "What is RAG?"
response = generate_response(query)
print("Generated Response:", response)


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.73k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Iteration:   0%|          | 0/5 [00:00<?, ?it/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Generated Response: query: What is RAG? context: RAG aims to improve the informativeness and accuracy of generated responses. RAG can be used in various applications, including chatbots, question answering systems, and more. RAG stands for Retrieval-Augmented Generation, a method in natural language processing.
A common application that people use when asking questions about their personal lives or other social issues such as race relations are based on this approach which allows them to quickly find out what's going through our minds during some very specific situations (e.-g., being asked something like "Are you gay?"). However it may also help with understanding how information flows from one person directly into another via multiple sources at once – sometimes even by simply looking


The generated response is improved but still includes an irrelevant part at the end. To further refine the output, let's make a few more adjustments:

- Limit the length of the generated text more strictly: Reduce max_length to prevent the generator from producing overly long outputs.
- Increase the relevance of retrieved documents: Ensure the retrieved context is focused and doesn't contain extraneous information.

In [6]:

# Create training examples
train_examples = []
for i, (query, document) in enumerate(query_document_pairs):
    train_example = InputExample(guid=str(i), texts=[query, document])
    train_examples.append(train_example)

# Define a Sentence Transformer model for the retriever
retriever_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Create a Faiss index for the retriever
corpus_embeddings = retriever_model.encode(document_corpus, convert_to_tensor=True)
index = faiss.IndexFlatL2(corpus_embeddings.shape[1])
index.add(corpus_embeddings.cpu().numpy())

# Prepare the data loader for training
train_dataset = datasets.SentencesDataset(train_examples, model=retriever_model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=2)
train_loss = losses.MultipleNegativesRankingLoss(retriever_model)

# Train the retriever model
retriever_model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100)

# Define a simple tokenizer and generator model
from transformers import GPT2LMHeadModel, GPT2Tokenizer

generator_model = GPT2LMHeadModel.from_pretrained('gpt2')
generator_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
generator_tokenizer.pad_token = generator_tokenizer.eos_token  # Set the pad token to be the same as the eos token

def generate_response(query):
    query_embedding = retriever_model.encode(query, convert_to_tensor=True).unsqueeze(0)  # Ensure 2D array
    D, I = index.search(query_embedding.cpu().numpy(), k=3)  # Top 3 documents
    retrieved_docs = [document_corpus[int(idx)] for idx in I[0]]  # Fetch indices from the first element of I

    # Create input for the generator
    context = " ".join(set(retrieved_docs))  # Ensure unique sentences
    context = " ".join(context.split()[:100])  # Limit context length
    input_text = f"query: {query} context: {context}"
    input_ids = generator_tokenizer.encode(input_text, return_tensors='pt')
    
    # Generate response with adjusted parameters
    output_ids = generator_model.generate(
        input_ids,
        max_new_tokens=100,  # Control the length of new tokens generated
        num_return_sequences=1,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=2.0,  # Add repetition penalty to reduce repeated phrases
        pad_token_id=generator_tokenizer.eos_token_id
    )
    response = generator_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return response

# Test the RAG system
query = "What is RAG?"
response = generate_response(query)
print("Generated Response:", response)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Epoch:   0%|          | 0/1 [00:00<?, ?it/s]

Iteration:   0%|          | 0/5 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Generated Response: query: What is RAG? context: RAG aims to improve the informativeness and accuracy of generated responses. RAG can be used in various applications, including chatbots, question answering systems, and more. RAG stands for Retrieval-Augmented Generation, a method in natural language processing. It provides an easy way around some common problems with human cognition that are not currently solved by AI or other artificial intelligence methods (e., example).
A simple test case shows how easily we could implement this approach into real life situations using code written from scratch as well!



It seems like the generated response is not entirely coherent and relevant to the query about RAG. The response starts with the query and context, but then it abruptly transitions to a different topic about writing queries for application source code.
lets change approaching the proble:

Refine the code to incorporate these steps:

- Fine-tuning the Retrieval Model: fine-tune the retriever model on a custom dataset that contains relevant query-document pairs.

- Parameter Tuning:  experiment with different parameters for both the retrieval and generation processes.

- Data Augmentation expand the training dataset with additional query-document pairs.

- Evaluation and Iteration:  evaluate the generated responses and iteratively refine the model based on feedback.



Lets Enhance the document_corpus and query_document_pairs with more data just for test to see if  will improve the training process and the quality of the generated responses.

In [7]:

# Define an expanded document corpus
document_corpus = [
    "RAG stands for Retrieval-Augmented Generation, a method in natural language processing.",
    "It combines the power of retrieval-based models with generative models to improve response quality.",
    "The retriever fetches relevant documents based on a query.",
    "The generator uses the retrieved documents to generate a coherent and informative response.",
    "This approach leverages large-scale pre-trained models for both retrieval and generation tasks.",
    "Fine-tuning RAG involves training both the retriever and generator components.",
    "RAG can be used in various applications, including chatbots, question answering systems, and more.",
    "The framework was introduced by Facebook AI Research (FAIR) in 2020.",
    "RAG aims to improve the informativeness and accuracy of generated responses.",
    "The retriever component of RAG can be based on various architectures like BM25 or dense retrieval models.",
    "Generative models in RAG are typically based on architectures like BERT, GPT, or T5.",
    "RAG can handle large-scale knowledge bases and provide specific answers to queries.",
    "The retriever in RAG selects relevant passages, which are then used by the generator to produce an answer.",
    "One of the key benefits of RAG is its ability to provide contextually rich and accurate responses.",
    "Training RAG requires a large and diverse dataset to cover a wide range of possible queries.",
    "RAG has shown significant improvements over traditional retrieval-based or generative models alone.",
    "The architecture of RAG allows it to be fine-tuned for specific tasks or domains.",
    "RAG integrates retrieval and generation in a seamless manner, improving overall system performance.",
    "The use of retrieval-augmented generation helps in reducing hallucinations in generated text.",
    "RAG's design allows it to leverage external knowledge sources effectively.",
    "The application of RAG extends to areas like medical diagnosis, legal advice, and customer support.",
    "By using retrieval-augmented techniques, RAG ensures that the responses are grounded in real data.",
    "The flexibility of RAG makes it suitable for various languages and dialects.",
    "RAG's performance can be enhanced by continuously updating the knowledge base with new information.",
    "Researchers are exploring ways to make RAG more efficient and scalable for real-time applications.",
    "The integration of retrieval and generation in RAG provides a powerful tool for AI developers.",
    "RAG's ability to access and utilize large datasets makes it a valuable asset in data-intensive fields.",
    "Future developments in RAG could lead to more advanced and autonomous AI systems."
]

# Define an expanded set of query-document pairs for training
query_document_pairs = [
    ("What does RAG stand for?", document_corpus[0]),
    ("How does RAG improve response quality?", document_corpus[1]),
    ("What is the role of the retriever in RAG?", document_corpus[2]),
    ("How does the generator work in RAG?", document_corpus[3]),
    ("What is the advantage of using RAG?", document_corpus[4]),
    ("How do you fine-tune RAG?", document_corpus[5]),
    ("What are the applications of RAG?", document_corpus[6]),
    ("Who introduced RAG and when?", document_corpus[7]),
    ("What is the goal of RAG?", document_corpus[8]),
    ("What architectures can the retriever in RAG be based on?", document_corpus[9]),
    ("What architectures are used for the generative model in RAG?", document_corpus[10]),
    ("How does RAG handle large-scale knowledge bases?", document_corpus[11]),
    ("What is the process of retrieving and generating answers in RAG?", document_corpus[12]),
    ("What are the key benefits of using RAG?", document_corpus[13]),
    ("What is required to train RAG effectively?", document_corpus[14]),
    ("How does RAG compare to traditional models?", document_corpus[15]),
    ("Can RAG be fine-tuned for specific tasks?", document_corpus[16]),
    ("How does RAG integrate retrieval and generation?", document_corpus[17]),
    ("How does RAG reduce hallucinations in generated text?", document_corpus[18]),
    ("How does RAG leverage external knowledge sources?", document_corpus[19]),
    ("What are the applications of RAG in specialized fields?", document_corpus[20]),
    ("How does RAG ensure responses are grounded in real data?", document_corpus[21]),
    ("Can RAG be used for different languages?", document_corpus[22]),
    ("How can RAG's performance be enhanced over time?", document_corpus[23]),
    ("What are researchers focusing on to improve RAG?", document_corpus[24]),
    ("How does RAG benefit AI developers?", document_corpus[25]),
    ("Why is RAG valuable in data-intensive fields?", document_corpus[26]),
    ("What are future developments expected in RAG?", document_corpus[27])
]

# Create training examples
train_examples = [InputExample(guid=str(i), texts=[query, document]) for i, (query, document) in enumerate(query_document_pairs)]

# Define a Sentence Transformer model for the retriever
retriever_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Prepare the data loader for training
train_dataset = datasets.SentencesDataset(train_examples, model=retriever_model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=1)
train_loss = losses.MultipleNegativesRankingLoss(retriever_model)

# Train the retriever model
retriever_model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=3, warmup_steps=100, optimizer_params={'lr': 2e-5})

# Create a Faiss index for the retriever
corpus_embeddings = retriever_model.encode(document_corpus, convert_to_tensor=True)
index = faiss.IndexFlatL2(corpus_embeddings.shape[1])
index.add(corpus_embeddings.cpu().numpy())

# Define a simple tokenizer and generator model
from transformers import GPT2LMHeadModel, GPT2Tokenizer

generator_model = GPT2LMHeadModel.from_pretrained('gpt2')
generator_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
generator_tokenizer.pad_token = generator_tokenizer.eos_token  # Set the pad token to be the same as the eos token

def generate_response(query):
    query_embedding = retriever_model.encode(query, convert_to_tensor=True).unsqueeze(0)  # Ensure 2D array
    D, I = index.search(query_embedding.cpu().numpy(), k=3)  # Top 3 documents
    retrieved_docs = [document_corpus[int(idx)] for idx in I[0]]  # Fetch indices from the first element of I

    # Create input for the generator
    context = " ".join(set(retrieved_docs))  # Ensure unique sentences
    input_text = f"query: {query} context: {context}"
    input_ids = generator_tokenizer.encode(input_text, return_tensors='pt')
    
    # Generate response with adjusted parameters
    output_ids = generator_model.generate(
        input_ids,
        max_new_tokens=100,  # Control the length of new tokens generated
        num_return_sequences=1,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        repetition_penalty=2.0,  # Add repetition penalty to reduce repeated phrases
        pad_token_id=generator_tokenizer.eos_token_id
    )
    response = generator_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return response

# Test the RAG system
query = "What is RAG?"
response = generate_response(query)
print("Generated Response:", response)


Epoch:   0%|          | 0/3 [00:00<?, ?it/s]

Iteration:   0%|          | 0/28 [00:00<?, ?it/s]

Iteration:   0%|          | 0/28 [00:00<?, ?it/s]

Iteration:   0%|          | 0/28 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Generated Response: query: What is RAG? context: The flexibility of RAG makes it suitable for various languages and dialects. The application of RAG extends to areas like medical diagnosis, legal advice, and customer support. RAG stands for Retrieval-Augmented Generation, a method in natural language processing. It allows you create complex diagrams that display information about the situation (such as an individual's health history). In addition, this approach will allow users from different countries access certain features such both within or outside their own country.
The idea behind RxR AG comes back into play when we look at some very powerful applications where real world experience can be used with new technologies - see my previous post on how I use TDD using Python 3+5+. You might think there are many ways around these problems


To avoid generating irrelevant parts in the output, you can fine-tune several parameters in the generator's configuration. These include adjusting the max_new_tokens, temperature, top_p, and repetition_penalty. Here's a step-by-step approach to refining the parameters:

- Reduce max_new_tokens: This limits the length of the generated response and helps in keeping the response concise.
- Decrease temperature: Lowering the temperature makes the model's output less random and more deterministic.
- Increase top_p: Adjusting top_p can control the diversity of the generated text. Lower values will make the output more focused.
- Increase repetition_penalty: This can help reduce the chances of repetitive and irrelevant content.

In [8]:


# Create training examples
train_examples = [InputExample(guid=str(i), texts=[query, document]) for i, (query, document) in enumerate(query_document_pairs)]

# Define a Sentence Transformer model for the retriever
retriever_model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Prepare the data loader for training
train_dataset = datasets.SentencesDataset(train_examples, model=retriever_model)
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=2)
train_loss = losses.MultipleNegativesRankingLoss(retriever_model)

# Train the retriever model
retriever_model.fit(train_objectives=[(train_dataloader, train_loss)], epochs=4, warmup_steps=100, optimizer_params={'lr': 2e-5})

# Create a Faiss index for the retriever
corpus_embeddings = retriever_model.encode(document_corpus, convert_to_tensor=True)
index = faiss.IndexFlatL2(corpus_embeddings.shape[1])
index.add(corpus_embeddings.cpu().numpy())

# Define a simple tokenizer and generator model
from transformers import GPT2LMHeadModel, GPT2Tokenizer

generator_model = GPT2LMHeadModel.from_pretrained('gpt2')
generator_tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
generator_tokenizer.pad_token = generator_tokenizer.eos_token  # Set the pad token to be the same as the eos token

def generate_response(query):
    query_embedding = retriever_model.encode(query, convert_to_tensor=True).unsqueeze(0)  # Ensure 2D array
    D, I = index.search(query_embedding.cpu().numpy(), k=3)  # Top 3 documents
    retrieved_docs = [document_corpus[int(idx)] for idx in I[0]]  # Fetch indices from the first element of I

    # Create input for the generator
    context = " ".join(set(retrieved_docs))  # Ensure unique sentences
    input_text = f"query: {query} context: {context}"
    input_ids = generator_tokenizer.encode(input_text, return_tensors='pt')
    
    # Generate response with adjusted parameters
    output_ids = generator_model.generate(
        input_ids,
        max_new_tokens=20,  # Reduce the length of new tokens generated
        num_return_sequences=1,
        temperature=0.3,  # Lower temperature for less random output
        top_p=0.90,  # Adjust top_p for focused output
        do_sample=True,
        repetition_penalty=2.5,  # Increase repetition penalty to reduce irrelevant parts
        pad_token_id=generator_tokenizer.eos_token_id
    )
    response = generator_tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return response

# Test the RAG system
query = "What is RAG?"
response = generate_response(query)
print("Generated Response:", response)


Epoch:   0%|          | 0/4 [00:00<?, ?it/s]

Iteration:   0%|          | 0/14 [00:00<?, ?it/s]

Iteration:   0%|          | 0/14 [00:00<?, ?it/s]

Iteration:   0%|          | 0/14 [00:00<?, ?it/s]

Iteration:   0%|          | 0/14 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Generated Response: query: What is RAG? context: RAG can be used in various applications, including chatbots, question answering systems, and more. The application of RAG extends to areas like medical diagnosis, legal advice, and customer support. RAG stands for Retrieval-Augmented Generation, a method in natural language processing.
A new feature that allows users with disabilities or those who are not able to speak English may also



The generated response is more focused but still contains some extraneous information. To further refine the output and ensure relevance, we can:

- *Increase the specificity of context: Use fewer retrieved documents for context or prioritize the most relevant document.*
- *Further tune generation parameters: Adjust max_new_tokens, temperature, top_p, and repetition_penalty as needed.*
- *Consider using a more advanced or fine-tuned model: If using GPT-2, switching to GPT-3 or a fine-tuned version might yield better results.*
- *Post-Processing: Implement post-processing steps to clean up the generated text, removing any remaining irrelevant parts.*
