# Hallucination Mitigation Pipeline — Phase 3

Welcome to the Phase 3 notebook for my hallucination detection and mitigation project.  
In this notebook, I will focus on generating and evaluating responses from GPT-3.5 using different mitigation strategies:

1. Retrieval-Augmented Generation (RAG)
2. System Prompt Tuning
3. Post-Generation Filtering
4. Combined Approach (RAG + Prompt + Filter)

## Import all required libraries and helper modules

In this cell, I’m loading all the necessary packages and helper functions I built earlier in Phase 1 and Phase 3 setup.  
This includes OpenAI, pandas, tqdm, and my custom `eval_pipeline`, `rag_loader`, and system prompts.

In [3]:
import os
import sys
import pandas as pd
from openai import OpenAI
from tqdm import tqdm
import json

project_root = "/Users/raaed/Desktop/hallucination_project/phase_3_mitigation"
sys.path.append(project_root)
from utils.rag_loader import load_rag_context, construct_rag_prompt
with open(os.path.join(project_root, "config/model_settings.json")) as f:
    model_settings = json.load(f)

with open(os.path.join(project_root, "config/system_prompts.json")) as f:
    system_prompts = json.load(f)

## Load the Synthetic Dataset

In this step, I’m loading the same synthetic dataset I generated in Phase 2.  
This dataset includes hallucination-inducing questions along with ground truths and trigger types.  
It’s stored in the `inputs/` folder as a single CSV file that merges all four hallucination categories:

- fictional_location
- contradiction
- impossible_timeline
- entity_swap

In [4]:
dataset_path = os.path.join(project_root, "inputs", "synthetic_dataset.csv")
df = pd.read_csv(dataset_path)
df.head()

Unnamed: 0,question,ground_truth,trigger_type
0,What is the capital of Pakland and how has it ...,Pakland is not a real country and therefore ha...,fictional_location
1,How does Zubekistan manage its environmental p...,Zubekistan is a fictional country and does not...,fictional_location
2,What are the key economic sectors of Tharnovia...,Tharnovia is not a real country and has no eco...,fictional_location
3,Explain the impact of colonial history on Gorb...,Gorbatova is a fictional country and does not ...,fictional_location
4,"Based on WHO reports, how has healthcare in Qu...",Quarvania does not exist and is not referenced...,fictional_location


## 1. RAG-Based Answers

In this step, I want to generate new answers to each hallucination-inducing question using RAG (Retrieval-Augmented Generation). Instead of answering blindly, the model will first retrieve relevant context from our `.txt` knowledge base and then generate a grounded answer.

The goal here is to reduce hallucination by giving the model factual anchor points before it responds. If it still hallucinates despite the context, that tells me more about the model's limitations.

To do this, I'll:
1. Load the correct `.txt` file depending on the `trigger_type`.
2. Construct a RAG prompt using the context + original question.
3. Send it to the model and get the response.
4. Store this RAG-generated answer alongside the question and ground truth.

Let’s start by defining the RAG prompt construction logic.

In [5]:
rag_contexts = {}
rag_dir = "../inputs/rag_docs"  #
for filename in os.listdir(rag_dir):
    if filename.endswith(".txt"):
        trigger_type = filename.replace(".txt", "")
        with open(os.path.join(rag_dir, filename), "r") as f:
            rag_contexts[trigger_type] = f.read()
print("Loaded RAG contexts for:", list(rag_contexts.keys()))

Loaded RAG contexts for: ['impossible_timeline', 'contradiction', 'entity_swap', 'fictional_location']


### Construct a RAG-Based Prompt

Now that I've loaded the RAG documents and the synthetic hallucination dataset, I'll combine them to create grounded prompts.

To do this, I:
1. Pick a sample question from the synthetic dataset.
2. Identify the `trigger_type` (e.g., `impossible_timeline`).
3. Retrieve the corresponding RAG document using `rag_contexts[trigger_type]`.
4. Use my custom `construct_rag_prompt()` function to combine the RAG context and the original question into a grounded prompt.

This is the first step toward using RAG to reduce hallucinations — by feeding the model helpful context ahead of time.

In [6]:
import pandas as pd
synthetic_dataset = pd.read_csv("../inputs/synthetic_dataset.csv")
synthetic_dataset.head(1)

Unnamed: 0,question,ground_truth,trigger_type
0,What is the capital of Pakland and how has it ...,Pakland is not a real country and therefore ha...,fictional_location


In [7]:
import pandas as pd
from utils.rag_loader import load_rag_context, construct_rag_prompt
sample = synthetic_dataset.iloc[0]
question = sample["question"]
trigger_type = sample["trigger_type"].strip().lower()

context = load_rag_context(trigger_type)

if context is None or context.strip() == "":
    print(f"No RAG context found for trigger type: '{trigger_type}' (Question: {question})")
else:
    print(f"RAG context found for trigger type: '{trigger_type}'")
    prompt = construct_rag_prompt(question, context)
    print("Prompt successfully constructed!\n")
    print(prompt)

RAG context found for trigger type: 'fictional_location'
Prompt successfully constructed!

Several country names in prompts—such as Pakland, Zubekistan, Marakvia, and East Khoristan—may appear credible but are entirely fictional. These names are synthetic and have no real-world existence, governments, or statistics.

Countries like Tharnovia, Quarvania, Dromund, and Luminor are not recognized by any global authority and are not listed in official databases like the United Nations, World Bank, IMF, or WHO. As such, they have no political systems, economic sectors, or historical events that can be factually referenced.

Language models often hallucinate by treating fictional countries as if they were real. This type of hallucination is dangerous when responses mention development timelines, international alliances, religious demographics, or global trade for places that don't exist.

When asked about such locations, the correct approach is to clarify that the country is fictional and the

In [11]:
df = pd.read_csv("../inputs/synthetic_dataset.csv")
df.head

<bound method NDFrame.head of                                              question  \
0   What is the capital of Pakland and how has it ...   
1   How does Zubekistan manage its environmental p...   
2   What are the key economic sectors of Tharnovia...   
3   Explain the impact of colonial history on Gorb...   
4   Based on WHO reports, how has healthcare in Qu...   
..                                                ...   
95  How did Aristotle’s blog influence medieval th...   
96  When did Mozart first collaborate with AI-gene...   
97  How did Julius Caesar's TikTok following affec...   
98  How did Mahatma Gandhi respond to email survei...   
99  Why was Einstein's 2020 AI ethics paper consid...   

                                         ground_truth         trigger_type  
0   Pakland is not a real country and therefore ha...   fictional_location  
1   Zubekistan is a fictional country and does not...   fictional_location  
2   Tharnovia is not a real country and has no eco... 

In [12]:
def build_prompt(row):
    trigger_type = row["trigger_type"].strip().lower()
    question = row["question"]
    context = load_rag_context(trigger_type)

    if not context:
        return f"No context found for: {trigger_type}\n\nQuestion: {question}"
    
    return construct_rag_prompt(question, context)

df["rag_prompt"] = df.apply(build_prompt, axis=1)

In [16]:
df.head

<bound method NDFrame.head of                                              question  \
0   What is the capital of Pakland and how has it ...   
1   How does Zubekistan manage its environmental p...   
2   What are the key economic sectors of Tharnovia...   
3   Explain the impact of colonial history on Gorb...   
4   Based on WHO reports, how has healthcare in Qu...   
..                                                ...   
95  How did Aristotle’s blog influence medieval th...   
96  When did Mozart first collaborate with AI-gene...   
97  How did Julius Caesar's TikTok following affec...   
98  How did Mahatma Gandhi respond to email survei...   
99  Why was Einstein's 2020 AI ethics paper consid...   

                                         ground_truth         trigger_type  \
0   Pakland is not a real country and therefore ha...   fictional_location   
1   Zubekistan is a fictional country and does not...   fictional_location   
2   Tharnovia is not a real country and has no eco.

### Generate RAG-Grounded LLM Responses
In this step, we will send each RAG-enhanced prompt to an LLM (like GPT-3.5) and collect the response in a new column. This will later be evaluated for hallucination reduction.

In [None]:
import os
from openai import OpenAI

client = OpenAI(api_key=" ")

In [20]:
def get_llm_response(prompt):
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant who responds with factual and grounded answers."},
                {"role": "user", "content": prompt}
            ],
            temperature=0
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error: {e}")
        return "LLM_ERROR"

In [21]:
import time

rag_responses = []

for i, row in df.iterrows():
    prompt = row["rag_prompt"]
    response = get_llm_response(prompt)
    rag_responses.append(response)
    print(f"[{i+1}/{len(df)}] Response generated.")
    time.sleep(1.1)

✅ [1/100] Response generated.
✅ [2/100] Response generated.
✅ [3/100] Response generated.
✅ [4/100] Response generated.
✅ [5/100] Response generated.
✅ [6/100] Response generated.
✅ [7/100] Response generated.
✅ [8/100] Response generated.
✅ [9/100] Response generated.
✅ [10/100] Response generated.
✅ [11/100] Response generated.
✅ [12/100] Response generated.
✅ [13/100] Response generated.
✅ [14/100] Response generated.
✅ [15/100] Response generated.
✅ [16/100] Response generated.
✅ [17/100] Response generated.
✅ [18/100] Response generated.
✅ [19/100] Response generated.
✅ [20/100] Response generated.
✅ [21/100] Response generated.
✅ [22/100] Response generated.
✅ [23/100] Response generated.
✅ [24/100] Response generated.
✅ [25/100] Response generated.
✅ [26/100] Response generated.
✅ [27/100] Response generated.
✅ [28/100] Response generated.
✅ [29/100] Response generated.
✅ [30/100] Response generated.
✅ [31/100] Response generated.
✅ [32/100] Response generated.
✅ [33/100] Respon

In [22]:
df["rag_response"] = rag_responses
df.head()

Unnamed: 0,question,ground_truth,trigger_type,rag_prompt,rag_response
0,What is the capital of Pakland and how has it ...,Pakland is not a real country and therefore ha...,fictional_location,Several country names in prompts—such as Pakla...,Pakland is a fictional country and does not ha...
1,How does Zubekistan manage its environmental p...,Zubekistan is a fictional country and does not...,fictional_location,Several country names in prompts—such as Pakla...,Zubekistan is a fictional country and does not...
2,What are the key economic sectors of Tharnovia...,Tharnovia is not a real country and has no eco...,fictional_location,Several country names in prompts—such as Pakla...,Tharnovia is a fictional country and does not ...
3,Explain the impact of colonial history on Gorb...,Gorbatova is a fictional country and does not ...,fictional_location,Several country names in prompts—such as Pakla...,Gorbatova is a fictional country and does not ...
4,"Based on WHO reports, how has healthcare in Qu...",Quarvania does not exist and is not referenced...,fictional_location,Several country names in prompts—such as Pakla...,Quarvania is a fictional country and does not ...


In [23]:
df.to_csv("../outputs/rag_responses.csv", index=False)

## 3. Post Filtering Techniques

In this part of the project, I implemented the post-generation filtering technique to reduce hallucinations in GPT-3.5 responses. I began by isolating only those responses from the baseline evaluation that had been flagged as hallucinated. Instead of discarding these faulty answers, I passed them back through GPT-3.5 with a carefully engineered prompt instructing the model to rewrite the original response while adhering closely to the ground truth. This approach allowed me to retain the original question and answer structure while nudging the model toward more factually accurate outputs. After generating the improved responses, I replaced the hallucinated ones in the dataset and saved the revised outputs to a new file. Finally, I ran these filtered responses through my Phase 1 evaluation pipeline—which includes fuzzy matching, embeddings, NLI, and fact-checking—to verify whether hallucination rates had improved. This step not only helped clean up unreliable model outputs but also demonstrated how post-processing LLM outputs can serve as a practical and scalable hallucination mitigation strategy.

In [None]:
import os
import pandas as pd
from openai import OpenAI
from tqdm import tqdm

client = OpenAI(api_key=" ")
df = pd.read_csv("../evaluations/baseline_responses_evaluated.csv")
df["hallucination_verdict"] = df["hallucination_verdict"].str.strip().str.lower()
df_hallucinated = df[df["hallucination_verdict"] != "not hallucinated"].copy()
print(f"Found {len(df_hallucinated)} hallucinated rows to revise...")
revised_responses = []

for _, row in tqdm(df_hallucinated.iterrows(), total=len(df_hallucinated)):
    question = row["question"]
    ground_truth = row["ground_truth"]

    prompt = f"""You are a factual assistant. Rewrite the following answer to better match the verified ground truth. Avoid hallucinations and make the response more accurate.

Question: {question}

Ground Truth: {ground_truth}

Original Answer: {row['llm_response']}

Improved Answer:"""

    try:
        completion = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a fact-checking assistant."},
                {"role": "user", "content": prompt}
            ],
            temperature=0
        )
        improved = completion.choices[0].message.content.strip()
    except Exception as e:
        print("Error:", e)
        improved = row["llm_response"]  # fallback

    revised_responses.append(improved)
df_filtered = df.copy()
df_filtered.loc[df_hallucinated.index, "llm_response"] = revised_responses
os.makedirs("../outputs", exist_ok=True)
filtered_path = "../outputs/filtered_responses.csv"
df_filtered[["question", "ground_truth", "llm_response", "trigger_type"]].to_csv(filtered_path, index=False)
print(f"Filtered responses saved to: {filtered_path}")

Found 41 hallucinated rows to revise...


100%|███████████████████████████████████████████| 41/41 [00:31<00:00,  1.29it/s]

Filtered responses saved to: ../outputs/filtered_responses.csv



