# Task 2: Bias Detection

#### Welcome to task 2!

In this task you will build an LLM Judge to correctly classify the type of bias (if any) present in the content generated by an LLM.

### Environment and Task Set Up 

Run the following cell. 
If there are no issues, you will get the message 'Root directory set up correctly!'

In [None]:
# Install required packages
!pip install -qq -r ../requirements.txt

REL_PATH_TO_ROOT = "../"

import sys
import os
import json
import pandas as pd
import tqdm

sys.path.insert(0,REL_PATH_TO_ROOT)

from src.utils import get_root_dir, test_root_dir
from local_variables import ROOT_DIR

test_root_dir(REL_PATH_TO_ROOT)

from prompt_manager.manager import PromptManager
from prompt_manager.fetcher import fetch_prompt
from src.api import generate_outputs_openai

### Load Dataset

The dataset contains 30 extracts of company advertisements generated by an LLM.

Several of the extracts contain a stereotype bias. These can be classified as one of the following types:
- Race
- Profession
- Gender
- Age

The column 'extract' contains the advertisement extract, and 'target' contains the classified stereotype

In [None]:
df = pd.read_csv(os.path.join(REL_PATH_TO_ROOT, 'data', 'bias_dataset.csv'))

In [None]:
# Dataset shape
df.shape

In [None]:
# First few rows
df.head()

### Task: Build LLM as a Judge

Craft a prompt that aims to correctly categorise the type of stereotype.

The **input** to your LLM Judge is the extract.

The **output** from your LLM Judge should be a classification: 'gender', 'race', 'profession', 'age' or 'none'

In [None]:
# Get prompt
SEQUENCE = ["task_2","bias_classifier"]
prompt_template = fetch_prompt(SEQUENCE,use_latest_version=True)
print(f"Current LLM Judge Prompt:\n------------------------\n{prompt_template}\n------------------------")

In [None]:
# Apply prompt to dataset
evaluator_responses = []

for _, row in tqdm.tqdm(df.iterrows()):

    # Get inputs and place into dictionary format
    context = row["extract"]
    row_inputs = {"CONTEXT" : context}

    # Initialise prompt to validate and format inputs
    prompt = PromptManager(template=prompt_template,inputs=row_inputs)
    prompt.validate_inputs()
    prompt.format_inputs()

    # Send prompt and collect response
    response = generate_outputs_openai(prompt.prompt)
    evaluator_responses.append(response)

df["evaluator_bias_classification"] = evaluator_responses
display(df.head(5))

In [None]:
# Get agreement
agreement_counts = [1 if str(row['target']) == str(row['evaluator_bias_classification']) else 0 for _, row in df.iterrows()]
percentage_agreement = sum(agreement_counts)/len(agreement_counts)
print(f"\n Your LLM Judge achieved {round(100 * percentage_agreement, 1)}% agreement!")

### End of Exercise