# Task 2: Bias Detection

### Welcome to task 2!

Whether generated by human or AI, it is important that companies avoid introducing bias into any marketing or promotional material. All communications to customers should maintain neutrality and avoid perpetuation of social biases. However, this is often not the case, and a powerful use case for an LLM Judge is to identify types of bias present in such information before they reach customers.

In this task you will build an LLM Judge to correctly classify the type of bias (if any) present in company advertisements.


#### Definition of Bias

*Social bias broadly encompasses disparate treatment or outcomes between social groups. This could entail representational harms such as stereotyping, misrepresentation, toxic language and exclusionary norms.*

In this exercise we will focus on **Stereotyping** and in particular towards the following social groups:

##### **Gender Stereotypes**
This involves stereotypes or assumptions based on a person’s gender. For example, assuming that men are more suited for leadership roles or that women are more caring. Gender bias often reflects traditional, societal roles that place limitations on individuals based solely on their gender

##### **Racial Stereotypes**
This type of bias involves prejudices or stereotypes related to a person’s race or ethnicity. Examples include assuming certain racial groups are more athletic, technologically adept, or prone to certain behaviors. Racial bias can contribute to harmful generalizations that limit opportunities and reinforce stereotypes.

##### **Age Stereotypes**
Age bias involves assumptions or stereotypes based on someone’s age. This can include ideas that older individuals are less tech-savvy or that younger people lack experience and maturity. Age bias often affects hiring, career advancement, and general perceptions of competence or suitability based solely on age.

##### **Profession Stereotypes**
Profession bias includes assumptions about people based on their job or career. For instance, assuming that all teachers are women or that mathematicians are geeks. This bias can lead to stereotypical views of what kinds of people belong in certain professions, often limiting career possibilities based on assumptions rather than skills.

### Environment and Task Set Up 

Run the following cell. 
If there are no issues, you will get the message 'Root directory set up correctly!'

In [None]:
# Install required packages
!pip install -qq -r ../requirements.txt

REL_PATH_TO_ROOT = "../"

import sys
import os
import json
import pandas as pd
import tqdm

sys.path.insert(0,REL_PATH_TO_ROOT)

from src.utils import get_root_dir, test_root_dir
from local_variables import ROOT_DIR

test_root_dir(REL_PATH_TO_ROOT)

from prompt_manager.manager import PromptManager
from prompt_manager.fetcher import fetch_prompt
from src.api import generate_outputs_openai

### Load Dataset

The dataset contains 30 extracts of company advertisements (these are all fictional)

Several of the extracts contain one of the above stereotype biases, classified as one of the following types:
- Race
- Profession
- Gender
- Age

The column *'extract'* contains the advertisement extract, and *'target'* contains the bias classification.

In [None]:
df = pd.read_csv(os.path.join(REL_PATH_TO_ROOT, 'data', 'bias_dataset.csv'))

In [None]:
# Dataset shape
df.shape

In [None]:
# First few rows
df.head()

### Task: Build LLM as a Judge

Craft a prompt that aims to correctly categorise the type of stereotype.

The **input** to your LLM Judge is the extract.

The **output** from your LLM Judge should be a classification: 'gender', 'race', 'profession', 'age' or 'none' if the information is unbiased.

Use the code to calculate accuracy of your judge against the ground truth classifications! How high can you achieve?

#### Load the Prompt

In [None]:
# Get prompt
SEQUENCE = ["task_2","bias_classifier"]
prompt_template = fetch_prompt(SEQUENCE,use_latest_version=True)
print(f"Current LLM Judge Prompt:\n------------------------\n{prompt_template}\n------------------------")

#### Apply the prompt to the test dataset

In [None]:
evaluator_responses = []

for _, row in tqdm.tqdm(df.iterrows()):

    # Get inputs and place into dictionary format
    context = row["extract"]
    row_inputs = {"CONTEXT" : context}

    # Initialise prompt to validate and format inputs
    prompt = PromptManager(template=prompt_template,inputs=row_inputs)
    prompt.validate_inputs()
    prompt.format_inputs()

    # Send prompt and collect response
    response = generate_outputs_openai(prompt.prompt)
    evaluator_responses.append(response)

df["evaluator_bias_classification"] = evaluator_responses
display(df.head(5))

#### Get Agreement

In [None]:
# Get agreement
agreement_counts = [1 if str(row['target']) == str(row['evaluator_bias_classification']) else 0 for _, row in df.iterrows()]
percentage_agreement = sum(agreement_counts)/len(agreement_counts)
print(f"\n Your LLM Judge achieved {round(100 * percentage_agreement, 1)}% agreement!\n")
print(" Try tweaking your prompt and rescoring the test set to reach further alignment.")

### End of Exercise