# Task 1: Hallucination Detection

### Welcome to task 1!

Hallucination refers to the phenomenon where a language model generates information that is false, inaccurate, or fabricated, even though the output may sound plausible or confident. 

Hallucinations occur because the model predicts text based on patterns in its training data rather than confirming factual correctness or grounding the response in reality.

In a Question-Answer task, an LLM might hallucinate for any of the following reasons:

- Lack of Knowledge: the model may not have seen information related to the question in it's training, leading it to 'guess' the answer
- Training on Unverified Data: Inaccuracies in training datasets may follow through to an incorrect answer
- Ambiguous prompts: Vague or poor quality questions can cause the model to make assumptions about the user's intent
- Lack of Grounding: LLMs lack a mechanism to validate their outputs, which can lead to creative but unfounded responses.

Even when provided with a reference source of information ('context'), LLMs may still hallucinate for the following reasons:
- Lack of clear instructions (prompt is important!)
- Loss of focus on the context and over-reliance on training data
- Focus on probability over truth: LLMs generate responses based on the most likely next word sequence, which might prioritize plausibility over factual accuracy if the model predicts a more likely (but incorrect) continuation than the actual answer
- Failure to align context with question, especially if context is complex

Hallucinations are a critical challenge in AI, especially in applications where accuracy and reliability are essential.

**In this task you will build an LLM Judge to analyse whether the provided answer is a hallucination based on a relevant source of information.**

### Environment Set Up 

Run the following cell. 
If there are no issues, you will get the message 'Root directory set up correctly!'

In [None]:
# Install required packages
!pip install -qq -r ../requirements.txt

REL_PATH_TO_ROOT = "../"

import sys
import os
import json
from tqdm import tqdm
import pandas as pd

sys.path.insert(0,REL_PATH_TO_ROOT)

from src.utils import get_root_dir, test_root_dir
from local_variables import ROOT_DIR

test_root_dir(REL_PATH_TO_ROOT)

from prompt_manager.manager import PromptManager
from prompt_manager.fetcher import fetch_prompt
from src.api import generate_outputs_openai
from src.image_display import display_image

### Task Background

Below is the initial ask from the AskAI team as well as an explanation of the dataset they have provided you.

#### The Ask

In [None]:
display_image(f"{get_root_dir()}/task_images/task_1_desc.png")

#### The Data

In [None]:
display_image(f"{get_root_dir()}/task_images/task_1_data.png",max_size=700)

### Load Dataset

The dataset contains 50 question-answer pairs.

For each question-answer pair, we have provided ground truth labels for hallucination. 

A label of 1 suggests this answer is incorrect and is a hallucination.
A label of 0 suggests the answer is correct.

There are 25 question-answer pairs with label 1 and 25 pairs with label 0.

In [None]:
input_path = os.path.join(REL_PATH_TO_ROOT, "data/hallucination.csv")
hallucination_df = pd.read_csv(input_path).drop("Unnamed: 0", axis=1)

In [None]:
hallucination_df.shape

In [None]:
hallucination_df.head()

### Task: Build LLM-as-a-judge

Craft a prompt that aims to correctly categorise whether the response is a correct answer or a hallucinated one.

The **inputs** to your LLM Judge is the context (i.e. relevant_knowledge and user_question) and the response.

The **output** from your LLM Judge should be a boolean 1/0 categorisation

An initial prompt has already been created for you to start from. This can be found under prompts/task_1.

#### Load the prompt

In [None]:
SEQUENCE = ["task_1","hallucination_detector"]

prompt_template = fetch_prompt(SEQUENCE,use_latest_version=True)

print(f"Current LLM Judge Prompt:\n------------------------\n{prompt_template}\n------------------------")

In [None]:
# Set the number of rows to process
num_rows = 50  # Set to the desired number of rows or None to run all

# Define context based on our data for the initial prompt
hallucination_df["context"] = hallucination_df["relevant_knowledge"] + hallucination_df["user_question"]

# Define response based on our data for the initial prompt
hallucination_df["response"] = hallucination_df["chatbot_answer"]

# Determine the total number of rows in the DataFrame
total_rows = len(hallucination_df)

# Check if num_rows is None, indicating that we want to process all rows
if num_rows is None:
    rows_to_process = total_rows
else:
    # Otherwise, set rows_to_process to the smaller of num_rows and total_rows
    rows_to_process = min(num_rows, total_rows)

In [None]:
# Keep track of model responses
evaluator_responses = []

# Loop through dataset with a row limit if specified
for i, (_, row) in enumerate(tqdm(hallucination_df.head(rows_to_process).iterrows())):
    
    # Get inputs and place into dictionary format
    context = row["context"]
    response = row["response"]
    row_inputs = {"CONTEXT": context, "RESPONSE": response}

    # Initialise prompt to validate and format inputs
    prompt = PromptManager(template=prompt_template, inputs=row_inputs)
    prompt.validate_inputs()
    prompt.format_inputs()

    # Send prompt and collect response
    response = generate_outputs_openai(prompt.prompt)
    evaluator_responses.append(response)

# Create a new DataFrame with only processed rows and add the evaluator responses
processed_df = hallucination_df.head(rows_to_process).copy()
processed_df["evaluator_response"] = evaluator_responses

# Display the resulting processed DataFrame
display(processed_df.head(5))

### Evaluation

Now, calculate the accuracy of your LLM-as-a-judge. How does it look? 

Can you refine the prompt to increase the agreement with the ground truth labels? 

Try modifying the prompt in prompts/task_1 and re-running the above cells.

Make sure you always include the input placeholders {CONTEXT} and {RESPONSE} in your prompt.

In [None]:
processed_df["agreement"] = (processed_df["is_hallucination_error_ground_truth"].astype(str) == processed_df["evaluator_response"].astype(str))

In [None]:
percentage_agreement = processed_df["agreement"].mean()
percentage_agreement_rounded = round(100 * percentage_agreement, 1)
print(f"\nYour LLM Judge achieved {percentage_agreement_rounded}% agreement!")

## End of Task 1