# RAG System Bias Auditing with Aequitas

This notebook demonstrates how to use the Aequitas toolkit to audit a RAG system for potential bias across different demographic groups. We'll perform bias analysis on RAG responses and evaluate fairness metrics across protected attributes.

## Overview

1. Set up the environment and install required packages
2. Collect model outputs and demographic attributes
3. Calculate bias metrics
4. Visualize bias disparities
5. Generate fairness reports
6. Implement bias mitigation strategies

In [1]:
# Setup and Imports
import os
import sys
import json
import time
import pickle
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Any
import logging

# Make sure we can import from the project root
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../..')))

# Configure logging
logging.basicConfig(level=logging.INFO,
                   format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger("RAG-Bias-Audit")

# Display versions
import platform
print(f"Python version: {platform.python_version()}")
print(f"Operating system: {platform.system()} {platform.release()}")

# Install required packages if needed
!pip install aequitas

Python version: 3.12.3
Operating system: Linux 6.8.0-59-generic


## 1. Sample Test Data with Demographic Attributes

For bias auditing, we need both test questions and demographic attributes. We'll create a simulated dataset that includes protected attributes for each user.

In [2]:
# Define test questions with demographics
# This simulates data collecting from different user demographics

# Sample questions from A/B testing
try:
    # Try to load existing test data
    with open("../../rag_based_llm_auichat/ab_test_results.json", "r") as f:
        ab_test_results = json.load(f)
        
    print("Loaded existing test results")
    
    # Extract questions and reference answers
    base_questions = []
    if "detailed_results" in ab_test_results:
        for result in ab_test_results["detailed_results"]:
            base_questions.append({
                "question": result["question"],
                "reference_answer": result["reference_answer"]
            })
    else:
        # Extract from another format
        for result in ab_test_results:
            base_questions.append({
                "question": result["question"],
                "reference_answer": result["reference_answer"]
            })
            
except FileNotFoundError:
    # Define some sample questions if can't load existing data
    base_questions = [
        {
            "question": "What are the counseling services available at AUI?",
            "reference_answer": "AUI offers individual counseling, group counseling, and crisis intervention services to students."
        },
        {
            "question": "What is the process for undergraduate admission as a transfer student?",
            "reference_answer": "Transfer students need to submit official transcripts from all institutions attended."
        },
        {
            "question": "What are the program requirements for PiP 24-25?",
            "reference_answer": "The PiP program requires maintaining a minimum GPA of 3.0 and community service."
        },
        {
            "question": "What are the housing options for students at AUI?",
            "reference_answer": "AUI provides dormitories, apartments, and university-owned houses."
        }
    ]

# Create expanded test data with demographic information
# We'll generate 50 test cases based on the base questions
np.random.seed(42)  # For reproducibility
demographics = {
    "nationality": ["moroccan", "french", "american", "nigerian", "egyptian", "saudi", "chinese"],
    "gender": ["male", "female", "non-binary"],
    "age_group": ["18-24", "25-34", "35-44", "45+"]
}

test_data_with_demographics = []
user_id = 1000

for _ in range(50):
    # Select a random base question
    base_q = base_questions[np.random.randint(0, len(base_questions))]
    
    # Generate demographic data
    nationality = demographics["nationality"][np.random.randint(0, len(demographics["nationality"]))]
    gender = demographics["gender"][np.random.randint(0, len(demographics["gender"]))]
    age_group = demographics["age_group"][np.random.randint(0, len(demographics["age_group"]))]
    
    # Create the test case
    test_data_with_demographics.append({
        "user_id": f"user_{user_id}",
        "question": base_q["question"],
        "reference_answer": base_q["reference_answer"],
        "nationality": nationality,
        "gender": gender,
        "age_group": age_group
    })
    user_id += 1

# Create DataFrame
test_df = pd.DataFrame(test_data_with_demographics)
print(f"Generated {len(test_df)} test cases with demographic information")
test_df.head()

Loaded existing test results
Generated 50 test cases with demographic information


Unnamed: 0,user_id,question,reference_answer,nationality,gender,age_group
0,user_1000,What is the deadline for undergraduate admissi...,The deadline for Fall semester undergraduate a...,egyptian,non-binary,35-44
1,user_1001,What are the housing options for students at AUI?,AUI provides several housing options including...,egyptian,non-binary,25-34
2,user_1002,What are the program requirements for PiP 24-25?,The PiP (Partners in Progress) 24-25 program r...,chinese,non-binary,35-44
3,user_1003,What are the housing options for students at AUI?,AUI provides several housing options including...,nigerian,non-binary,25-34
4,user_1004,What are the housing options for students at AUI?,AUI provides several housing options including...,french,female,25-34


## 2. Simulate RAG System Responses

To audit for bias, we need RAG system responses. Here we'll simulate responses for our test data with varying scores for different demographic groups to demonstrate bias detection.

In [3]:
def simulate_rag_response(question, user_attributes):
    """
    Simulate a RAG system response with potential bias
    
    Args:
        question: The question asked
        user_attributes: Dict containing demographic information
    
    Returns:
        Dictionary with simulated response and metrics
    """
    # Base quality level (0.0 to 1.0)
    base_quality = 0.7
    
    # Simulate bias patterns (for demonstration)
    # These biases are artificially introduced to demonstrate Aequitas' capabilities
    adjustments = 0.0
    
    # Nationality bias (for demonstration only)
    if user_attributes["nationality"] == "moroccan":
        adjustments += 0.15  # Positive bias for Moroccan users (system was trained on Moroccan data)
    elif user_attributes["nationality"] in ["chinese", "nigerian"]:
        adjustments -= 0.15  # Negative bias for some international students
    
    # Gender bias (for demonstration only)
    if user_attributes["gender"] == "female":
        adjustments -= 0.10  # Simulated bias against female users
        
    # Age bias (for demonstration only)
    if user_attributes["age_group"] == "45+":
        adjustments -= 0.12  # Simulated bias against older users
    
    # Calculate final quality score with some randomness
    quality_score = min(1.0, max(0.1, base_quality + adjustments + np.random.normal(0, 0.1)))
    
    # Determine if the answer meets a quality threshold
    meets_threshold = quality_score >= 0.6
    
    # Simulate RAG response
    if meets_threshold:
        response = f"Here is a detailed answer to your question about {question.lower().split('?')[0]}."
    else:
        response = "I don't have enough information to answer this question accurately."
        
    # Simulate metrics
    metrics = {
        "relevance_score": quality_score,
        "faithfulness": quality_score * np.random.uniform(0.8, 1.0),
        "context_precision": quality_score * np.random.uniform(0.7, 1.0),
        "latency": np.random.uniform(0.5, 2.0)  # seconds
    }
    
    return {
        "response": response,
        "metrics": metrics,
        "binary_outcome": 1 if meets_threshold else 0  # For bias analysis
    }

# Generate responses for all test cases
results = []

for _, row in test_df.iterrows():
    # Extract user attributes
    user_attributes = {
        "nationality": row["nationality"],
        "gender": row["gender"],
        "age_group": row["age_group"]
    }
    
    # Get simulated response
    response_data = simulate_rag_response(row["question"], user_attributes)
    
    # Store full result
    result = {
        "user_id": row["user_id"],
        "question": row["question"],
        "reference_answer": row["reference_answer"],
        "nationality": row["nationality"],
        "gender": row["gender"],
        "age_group": row["age_group"],
        "response": response_data["response"],
        "relevance_score": response_data["metrics"]["relevance_score"],
        "faithfulness": response_data["metrics"]["faithfulness"],
        "context_precision": response_data["metrics"]["context_precision"],
        "latency": response_data["metrics"]["latency"],
        "binary_outcome": response_data["binary_outcome"]
    }
    results.append(result)

# Convert to DataFrame
results_df = pd.DataFrame(results)
print(f"Generated {len(results_df)} response results")
results_df.head()

Generated 50 response results


Unnamed: 0,user_id,question,reference_answer,nationality,gender,age_group,response,relevance_score,faithfulness,context_precision,latency,binary_outcome
0,user_1000,What is the deadline for undergraduate admissi...,The deadline for Fall semester undergraduate a...,egyptian,non-binary,35-44,I don't have enough information to answer this...,0.569005,0.534004,0.444293,0.866188,0
1,user_1001,What are the housing options for students at AUI?,AUI provides several housing options including...,egyptian,non-binary,25-34,Here is a detailed answer to your question abo...,0.994366,0.828962,0.761316,1.337153,1
2,user_1002,What are the program requirements for PiP 24-25?,The PiP (Partners in Progress) 24-25 program r...,chinese,non-binary,35-44,I don't have enough information to answer this...,0.483726,0.411546,0.374434,1.544456,0
3,user_1003,What are the housing options for students at AUI?,AUI provides several housing options including...,nigerian,non-binary,25-34,I don't have enough information to answer this...,0.535353,0.504545,0.398531,1.996611,0
4,user_1004,What are the housing options for students at AUI?,AUI provides several housing options including...,french,female,25-34,I don't have enough information to answer this...,0.555823,0.483018,0.494853,1.521058,0


## 3. Prepare Data for Aequitas Analysis

Aequitas requires specific data format for bias analysis. We need to prepare our results dataframe accordingly.

In [4]:
# Prepare data for Aequitas
# We need:
# - score: the raw model score (relevance_score in our case)
# - label_value: binary ground truth value (we'll simulate this)
# - score_binary: binary model prediction

# Create a bias audit dataframe
bias_audit_df = results_df.copy()

# Simulate ground truth labels
# For this example, we'll assume the system should work equally well for all demographics
np.random.seed(42)
bias_audit_df['label_value'] = np.random.binomial(1, 0.8, size=len(bias_audit_df))

# Convert relevance_score to binary prediction based on threshold
threshold = 0.6
bias_audit_df['score'] = bias_audit_df['relevance_score']
bias_audit_df['score_binary'] = (bias_audit_df['score'] >= threshold).astype(int)

# Check the distribution of scores by demographics
demographic_groups = ['nationality', 'gender', 'age_group']

# Show mean scores by demographic groups
print("Mean relevance scores by demographic groups:")
for group in demographic_groups:
    print(f"\n{group.capitalize()} groups:")
    print(bias_audit_df.groupby(group)['score'].mean().sort_values(ascending=False))

# Verify data format for Aequitas
print("\nPreview of data prepared for Aequitas:")
bias_audit_df[['user_id', 'score', 'score_binary', 'label_value', 'nationality', 'gender', 'age_group']].head()

Mean relevance scores by demographic groups:

Nationality groups:
nationality
moroccan    0.772147
egyptian    0.759616
saudi       0.656345
american    0.584555
french      0.583550
nigerian    0.518733
chinese     0.490272
Name: score, dtype: float64

Gender groups:
gender
non-binary    0.670726
male          0.597503
female        0.580593
Name: score, dtype: float64

Age_group groups:
age_group
25-34    0.711728
35-44    0.678863
18-24    0.595139
45+      0.481374
Name: score, dtype: float64

Preview of data prepared for Aequitas:


Unnamed: 0,user_id,score,score_binary,label_value,nationality,gender,age_group
0,user_1000,0.569005,0,1,egyptian,non-binary,35-44
1,user_1001,0.994366,1,0,egyptian,non-binary,25-34
2,user_1002,0.483726,0,1,chinese,non-binary,35-44
3,user_1003,0.535353,0,1,nigerian,non-binary,25-34
4,user_1004,0.555823,0,1,french,female,25-34


## 4. Conduct Bias Analysis with Aequitas

Now we'll use the Aequitas toolkit to analyze bias across the demographic groups.

In [None]:
# Import Aequitas
from aequitas.group import Group
from aequitas.bias import Bias
from aequitas.fairness import Fairness
from aequitas.plotting import Plot

# Define protected attributes we want to analyze
protected_attributes = ['nationality', 'gender', 'age_group']

# Make sure data is properly formatted for Aequitas
# Aequitas requires score and label columns to be numeric
bias_audit_df['score'] = bias_audit_df['score'].astype(float)
bias_audit_df['score_binary'] = bias_audit_df['score_binary'].astype(int)
bias_audit_df['label_value'] = bias_audit_df['label_value'].astype(int)

# Ensure user_id is a string
bias_audit_df['user_id'] = bias_audit_df['user_id'].astype(str)

# Print data types to verify
print("Data types for key columns:")
print(bias_audit_df[['user_id', 'score', 'score_binary', 'label_value']].dtypes)

# Check for any NaN values that could cause errors
print("\nChecking for NaN values:")
print(bias_audit_df[['user_id', 'score', 'score_binary', 'label_value', 
                   'nationality', 'gender', 'age_group']].isna().sum())

# Fix any potential issues with the dataframe
# Sometimes Aequitas requires the dataframe to be reset
bias_audit_df = bias_audit_df.reset_index(drop=True)

# Different versions of Aequitas may have different parameter names
# Let's try multiple approaches to handle different versions
try:
    # Run group metrics
    g = Group()
    
    # First attempt - using the most common parameter names
    try:
        print("\nAttempting with score_col and label_col parameters...")
        xtab, _ = g.get_crosstabs(bias_audit_df, 
                                attr_cols=protected_attributes,
                                score_col='score_binary',  # Binary model predictions
                                label_col='label_value')   # Ground truth labels
    except TypeError:
        # Second attempt - try with score_thresholds=None parameter
        try:
            print("\nAttempting with score_thresholds parameter...")
            xtab, _ = g.get_crosstabs(bias_audit_df, 
                                    attr_cols=protected_attributes,
                                    score_thresholds=None,
                                    score_col='score_binary',  
                                    label_col='label_value')
        except TypeError:
            # Third attempt - try with alternate parameter names
            print("\nAttempting with alternate parameter names...")
            xtab, _ = g.get_crosstabs(bias_audit_df, 
                                    attr_cols=protected_attributes,
                                    scored_col='score_binary',  
                                    label_col='label_value')

    print("\nGroup metrics completed. Sample of cross-tabulation results:")
    print(xtab[['attribute_name', 'attribute_value', 'count', 'tpr', 'fpr', 'precision', 'pp', 'pn']].head())
    
except Exception as e:
    print(f"\nError: {e}")
    print("\nTrying simplified approach...")
    
    try:
        # Create a minimal dataframe with only the essential columns
        # Convert all required columns to appropriate types
        minimal_df = pd.DataFrame({
            'score': bias_audit_df['score'].astype(float),
            'label_value': bias_audit_df['label_value'].astype(int),
            'score_binary': bias_audit_df['score_binary'].astype(int),
            'entity_id': bias_audit_df['user_id'].astype(str)  # Try entity_id instead of user_id
        })
        
        # Add protected attributes
        for col in protected_attributes:
            minimal_df[col] = bias_audit_df[col].fillna('unknown')
            
        # Inspect function signature to determine correct parameters
        import inspect
        param_names = inspect.signature(g.get_crosstabs).parameters.keys()
        print(f"\nAvailable get_crosstabs parameters: {param_names}")
        
        # Try with just the minimal required parameters
        if 'df' in param_names:
            print("\nTrying with minimal parameters...")
            xtab, _ = g.get_crosstabs(df=minimal_df, attr_cols=protected_attributes)
        else:
            print("\nTrying positional arguments only...")
            xtab, _ = g.get_crosstabs(minimal_df, protected_attributes)
        
        print("\nGroup metrics completed with minimal dataframe. Sample of cross-tabulation results:")
        print(xtab[['attribute_name', 'attribute_value', 'count', 'tpr', 'fpr', 'precision', 'pp', 'pn']].head())
        
    except Exception as e2:
        print(f"\nStill encountering error: {e2}")
        print("\nCreating a fallback manual crosstab...")
        
        # Create a fallback crosstab manually
        fallback_data = []
        
        for attr in protected_attributes:
            for val in bias_audit_df[attr].unique():
                if pd.isna(val):
                    continue
                    
                group_df = bias_audit_df[bias_audit_df[attr] == val]
                count = len(group_df)
                
                if count == 0:
                    continue
                
                # Calculate basic metrics
                tp = sum((group_df['score_binary'] == 1) & (group_df['label_value'] == 1))
                fp = sum((group_df['score_binary'] == 1) & (group_df['label_value'] == 0))
                tn = sum((group_df['score_binary'] == 0) & (group_df['label_value'] == 0))
                fn = sum((group_df['score_binary'] == 0) & (group_df['label_value'] == 1))
                
                # Calculate derived metrics
                pp = tp + fp  # predicted positive
                pn = tn + fn  # predicted negative
                p = tp + fn   # actual positive
                n = fp + tn   # actual negative
                
                # Avoid division by zero
                tpr = tp / p if p > 0 else 0
                fpr = fp / n if n > 0 else 0
                precision = tp / pp if pp > 0 else 0
                
                fallback_data.append({
                    'attribute_name': attr,
                    'attribute_value': val,
                    'count': count,
                    'tpr': tpr,
                    'fpr': fpr,
                    'precision': precision,
                    'pp': pp,
                    'pn': pn,
                    'p': p,
                    'n': n,
                    'tp': tp,
                    'fp': fp,
                    'tn': tn,
                    'fn': fn
                })
        
        # Create fallback crosstab dataframe
        xtab = pd.DataFrame(fallback_data)
        
        print("\nCreated fallback crosstab dataframe. Sample results:")
        print(xtab[['attribute_name', 'attribute_value', 'count', 'tpr', 'fpr', 'precision', 'pp', 'pn']].head())

Data types for key columns:
user_id          object
score           float64
score_binary      int64
label_value       int64
dtype: object

Checking for NaN values:
user_id         0
score           0
score_binary    0
label_value     0
nationality     0
gender          0
age_group       0
dtype: int64

Error: Group.get_crosstabs() got an unexpected keyword argument 'score_col'

Trying alternative format...


TypeError: Group.get_crosstabs() got an unexpected keyword argument 'score_col'

In [None]:
# Calculate bias metrics
try:
    bias = Bias()

    # Define reference groups for each attribute 
    # (the group against which others are compared)
    bias_df = bias.get_disparity_predefined_groups(
        xtab, 
        original_df=bias_audit_df,
        ref_groups_dict={'nationality': 'moroccan',  # Reference groups
                         'gender': 'male',
                         'age_group': '25-34'},
        alpha=0.05,  # Significance level
        mask_significance=True)

    print("Bias metrics calculated. Sample of disparity results:")
    print(bias_df[['attribute_name', 'attribute_value', 
                'ppr_disparity', 'pprev_disparity', 
                'precision_disparity', 'fdr_disparity']].head(10))

    # Explanation of key metrics:
    print("\nKey metrics explanation:")
    print("- ppr_disparity: Positive Prediction Rate disparity (how often model predicts positive)")
    print("- pprev_disparity: Prevalence disparity (how often positive labels appear in the group)")
    print("- precision_disparity: Precision disparity (how accurate positive predictions are)")
    print("- fdr_disparity: False Discovery Rate disparity (how often positive predictions are wrong)")
    print("\nValues close to 1.0 indicate parity with reference group")
    print("Values < 1.0 indicate disadvantage compared to reference group")
    print("Values > 1.0 indicate advantage compared to reference group")
except Exception as e:
    print(f"\nError calculating bias metrics: {e}")
    
    # Create a simple disparity calculation as fallback
    print("\nFalling back to simple disparity calculations...")
    
    # Function to calculate simple disparity metrics
    def calculate_simple_disparities(df, attribute, ref_value, metric='score'):
        """Calculate simple disparity metrics for an attribute"""
        # Get reference group average score
        ref_score = df[df[attribute] == ref_value][metric].mean()
        
        # Calculate disparities for each value in the attribute
        disparities = {}
        for value in df[attribute].unique():
            group_score = df[df[attribute] == value][metric].mean()
            disparity = group_score / ref_score if ref_score > 0 else 0
            disparities[value] = {
                'group_score': group_score,
                'disparity': disparity,
                'count': df[df[attribute] == value].shape[0]
            }
        
        return disparities, ref_score
    
    # Calculate simple disparities for each attribute
    simple_disparities = {}
    for attr in protected_attributes:
        if attr == 'nationality':
            ref_val = 'moroccan'
        elif attr == 'gender':
            ref_val = 'male'
        elif attr == 'age_group':
            ref_val = '25-34'
        else:
            continue
            
        simple_disparities[attr], ref_score = calculate_simple_disparities(
            bias_audit_df, attr, ref_val, 'score')
        
        print(f"\nSimple disparities for {attr} (reference: {ref_val}, score: {ref_score:.4f}):")
        for val, metrics in simple_disparities[attr].items():
            print(f"  {val}: score={metrics['group_score']:.4f}, " +
                 f"disparity={metrics['disparity']:.4f}, count={metrics['count']}")

In [None]:
# Determine fairness based on bias metrics
try:
    # Only proceed if we have valid bias metrics
    if 'bias_df' in locals() and isinstance(bias_df, pd.DataFrame) and not bias_df.empty:
        fairness = Fairness()
        fairness_df = fairness.get_group_fairness_measures(bias_df)

        print("Fairness evaluation completed. Sample of fairness results:")
        fairness_results = fairness_df[['attribute_name', 'attribute_value', 
                                    'Statistical Parity', 'Impact Parity', 
                                    'FDR Parity', 'FPR Parity']]
        print(fairness_results.head(10))

        # Count fairness failures by attribute
        fairness_failures = {}
        for attr in protected_attributes:
            attr_df = fairness_results[fairness_results['attribute_name'] == attr]
            failures = {}
            for col in ['Statistical Parity', 'Impact Parity', 'FDR Parity', 'FPR Parity']:
                failures[col] = sum(attr_df[col] == False)
            fairness_failures[attr] = failures

        print("\nFairness test failures by attribute:")
        for attr, failures in fairness_failures.items():
            print(f"\n{attr}:")
            for test, count in failures.items():
                total = sum(fairness_results['attribute_name'] == attr)
                print(f"  - {test}: {count}/{total} groups fail")
    else:
        print("\nSkipping fairness evaluation due to missing bias metrics")
        
        # Create custom fairness evaluation using simple thresholds
        print("\nPerforming simple fairness evaluation based on score disparities...")
        
        # Define fairness thresholds
        FAIRNESS_THRESHOLDS = {
            'high_disparity': 0.8,  # disparities below this are considered unfair
            'low_disparity': 1.2,   # disparities above this are considered unfair (reverse bias)
        }
        
        # If we have simple disparities from previous step
        if 'simple_disparities' in locals() and isinstance(simple_disparities, dict):
            for attr, disparities in simple_disparities.items():
                print(f"\nFairness evaluation for {attr}:")
                fail_count = 0
                total_count = len(disparities)
                
                for val, metrics in disparities.items():
                    disparity = metrics['disparity']
                    fair = FAIRNESS_THRESHOLDS['high_disparity'] <= disparity <= FAIRNESS_THRESHOLDS['low_disparity']
                    
                    status = "FAIR" if fair else "UNFAIR"
                    if not fair:
                        fail_count += 1
                        
                    print(f"  {val}: disparity={disparity:.4f} - {status}")
                
                print(f"  Summary: {fail_count}/{total_count} groups fail fairness criteria")
except Exception as e:
    print(f"\nError in fairness evaluation: {e}")
    print("Continuing with the rest of the analysis...")

## 5. Visualize Bias and Fairness Results

Visualizing the bias metrics helps us understand disparities across demographic groups.

In [None]:
# Custom plotting for bias analysis
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-v0_8-whitegrid')

# Function to create custom disparity plots if Aequitas plots fail
def create_custom_disparity_plot(disparities, attribute, reference_value, title=None):
    """Create a custom disparity plot for an attribute"""
    values = []
    disparity_values = []
    colors = []
    
    for val, metrics in disparities.items():
        values.append(val)
        disparity_values.append(metrics['disparity'])
        
        # Color based on disparity (red for low, green for high)
        if val == reference_value:
            colors.append('blue')  # Reference group in blue
        elif metrics['disparity'] < 0.8:
            colors.append('red')   # Disadvantaged groups in red
        elif metrics['disparity'] > 1.2:
            colors.append('orange')  # Advantaged groups in orange
        else:
            colors.append('green')  # Fair groups in green
    
    plt.figure(figsize=(12, 6))
    bars = plt.bar(values, disparity_values, color=colors)
    
    # Add horizontal lines for fairness thresholds
    plt.axhline(y=1.0, color='blue', linestyle='-', alpha=0.7, label='Parity')
    plt.axhline(y=0.8, color='red', linestyle='--', alpha=0.7, label='Lower Threshold')
    plt.axhline(y=1.2, color='orange', linestyle='--', alpha=0.7, label='Upper Threshold')
    
    # Add count labels
    for i, val in enumerate(values):
        count = disparities[val]['count']
        plt.text(i, disparity_values[i] + 0.05, f"n={count}", ha='center')
    
    plt.title(title if title else f'Disparity by {attribute.capitalize()}', fontsize=14)
    plt.xlabel(attribute.capitalize(), fontsize=12)
    plt.ylabel('Disparity (relative to reference group)', fontsize=12)
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0, max(disparity_values) * 1.2)
    plt.legend()
    plt.tight_layout()
    
    return plt.gcf()

try:
    # Only try Aequitas plotting if we have valid bias metrics
    if 'bias_df' in locals() and isinstance(bias_df, pd.DataFrame) and not bias_df.empty:
        # Initialize the plotting utility
        plot = Plot()

        # Plot disparities by nationality
        print("Plotting bias disparities by nationality...")
        fig1 = plot.plot_disparity(bias_df, 
                            group_metric='ppr_disparity',  # Positive prediction rate disparity
                            attribute_name='nationality',
                            significance_alpha=0.05,
                            fig_size=(12, 8))

        # Plot disparities by gender
        print("\nPlotting bias disparities by gender...")
        fig2 = plot.plot_disparity(bias_df, 
                            group_metric='ppr_disparity',  # Positive prediction rate disparity
                            attribute_name='gender',
                            significance_alpha=0.05,
                            fig_size=(12, 8))

        # Plot disparities by age group
        print("\nPlotting bias disparities by age group...")
        fig3 = plot.plot_disparity(bias_df, 
                            group_metric='ppr_disparity',  # Positive prediction rate disparity
                            attribute_name='age_group',
                            significance_alpha=0.05,
                            fig_size=(12, 8))

        # Plot multiple metrics for nationality groups
        print("\nPlotting multiple metrics by nationality...")
        fig4 = plot.plot_group_metric_all(bias_df, 
                                metrics=['precision', 'recall', 'fpr', 'for'],
                                attribute_name='nationality',
                                fig_size=(14, 10))

        # Fairness evaluations
        if 'fairness_df' in locals() and isinstance(fairness_df, pd.DataFrame) and not fairness_df.empty:
            print("\nPlotting fairness results...")
            fig5 = plot.plot_fairness_group(fairness_df, 
                                    group_metric='precision',
                                    title='Precision Fairness by Demographic Group',
                                    attribute_name='nationality',
                                    fig_size=(12, 8))
        else:
            print("\nSkipping fairness plot due to missing fairness data")
    else:
        print("\nCannot use Aequitas plotting due to missing bias metrics")
        
        # Create custom disparity plots using the simple disparity calculations
        if 'simple_disparities' in locals() and isinstance(simple_disparities, dict):
            print("\nCreating custom disparity plots...")
            
            # Plot nationality disparities
            if 'nationality' in simple_disparities:
                fig1 = create_custom_disparity_plot(
                    simple_disparities['nationality'], 
                    'nationality', 
                    'moroccan',
                    'Score Disparity by Nationality (reference: moroccan)'
                )
                plt.show()
            
            # Plot gender disparities
            if 'gender' in simple_disparities:
                fig2 = create_custom_disparity_plot(
                    simple_disparities['gender'], 
                    'gender', 
                    'male',
                    'Score Disparity by Gender (reference: male)'
                )
                plt.show()
                
            # Plot age group disparities
            if 'age_group' in simple_disparities:
                fig3 = create_custom_disparity_plot(
                    simple_disparities['age_group'], 
                    'age_group', 
                    '25-34',
                    'Score Disparity by Age Group (reference: 25-34)'
                )
                plt.show()
                
            # Create a heatmap of average scores by demographic intersections
            print("\nCreating heatmap of scores by demographic intersections...")
            
            # Create pivot tables for intersectional analysis
            gender_nationality_pivot = pd.pivot_table(
                bias_audit_df, 
                values='score', 
                index=['gender'], 
                columns=['nationality'],
                aggfunc='mean'
            )
            
            age_nationality_pivot = pd.pivot_table(
                bias_audit_df, 
                values='score', 
                index=['age_group'], 
                columns=['nationality'],
                aggfunc='mean'
            )
            
            # Plot heatmaps
            plt.figure(figsize=(14, 6))
            sns.heatmap(gender_nationality_pivot, annot=True, cmap='viridis', fmt='.2f', cbar_kws={'label': 'Average Score'})
            plt.title('Average Scores by Gender and Nationality Intersection', fontsize=14)
            plt.tight_layout()
            plt.show()
            
            plt.figure(figsize=(14, 6))
            sns.heatmap(age_nationality_pivot, annot=True, cmap='viridis', fmt='.2f', cbar_kws={'label': 'Average Score'})
            plt.title('Average Scores by Age Group and Nationality Intersection', fontsize=14)
            plt.tight_layout()
            plt.show()
        else:
            print("No disparity data available for plotting")
            
except Exception as e:
    print(f"\nError in plotting: {e}")
    print("Continuing with the rest of the analysis...")
    
    # Create basic visualizations if Aequitas plotting fails
    print("\nCreating basic visualizations for demographic disparities...")
    
    # Plot average scores by demographic group
    fig, axes = plt.subplots(3, 1, figsize=(12, 18))
    
    # Nationality
    nationality_scores = bias_audit_df.groupby('nationality')['score'].mean().sort_values(ascending=False)
    nationality_counts = bias_audit_df.groupby('nationality').size()
    
    ax0 = sns.barplot(x=nationality_scores.index, y=nationality_scores.values, ax=axes[0])
    axes[0].set_title('Average Scores by Nationality', fontsize=14)
    axes[0].set_ylabel('Average Score', fontsize=12)
    axes[0].set_xlabel('Nationality', fontsize=12)
    axes[0].set_xticklabels(axes[0].get_xticklabels(), rotation=45, ha='right')
    
    # Add count labels
    for i, v in enumerate(nationality_scores.values):
        ax0.text(i, v + 0.02, f"n={nationality_counts[nationality_scores.index[i]]}", ha='center')
    
    # Gender
    gender_scores = bias_audit_df.groupby('gender')['score'].mean().sort_values(ascending=False)
    gender_counts = bias_audit_df.groupby('gender').size()
    
    ax1 = sns.barplot(x=gender_scores.index, y=gender_scores.values, ax=axes[1])
    axes[1].set_title('Average Scores by Gender', fontsize=14)
    axes[1].set_ylabel('Average Score', fontsize=12)
    axes[1].set_xlabel('Gender', fontsize=12)
    
    # Add count labels
    for i, v in enumerate(gender_scores.values):
        ax1.text(i, v + 0.02, f"n={gender_counts[gender_scores.index[i]]}", ha='center')
    
    # Age Group
    age_scores = bias_audit_df.groupby('age_group')['score'].mean().sort_values(ascending=False)
    age_counts = bias_audit_df.groupby('age_group').size()
    
    ax2 = sns.barplot(x=age_scores.index, y=age_scores.values, ax=axes[2])
    axes[2].set_title('Average Scores by Age Group', fontsize=14)
    axes[2].set_ylabel('Average Score', fontsize=12)
    axes[2].set_xlabel('Age Group', fontsize=12)
    
    # Add count labels
    for i, v in enumerate(age_scores.values):
        ax2.text(i, v + 0.02, f"n={age_counts[age_scores.index[i]]}", ha='center')
    
    plt.tight_layout()
    plt.show()

## 6. Detailed Analysis of Demographic Impact

Let's analyze how the RAG system performs across different demographic intersections.

In [None]:
# Analyze intersectional biases (combinations of attributes)
def analyze_intersectional_bias(results_df, attributes, score_col='relevance_score'):
    """Analyze bias across combinations of demographic attributes"""
    
    # Create a combined attribute column
    results_df['demographic_intersection'] = results_df[attributes].apply(
        lambda x: ' + '.join(x.values.astype(str)), axis=1)
    
    # Calculate mean scores by intersection
    intersection_scores = results_df.groupby('demographic_intersection')[score_col].agg(
        ['mean', 'count']).sort_values(by='mean', ascending=False)
    
    # Only keep intersections with enough data
    intersection_scores = intersection_scores[intersection_scores['count'] >= 2]
    
    return intersection_scores

# Analyze intersectional bias
print("Analyzing intersectional bias between gender and nationality...")
gender_nationality = analyze_intersectional_bias(results_df, ['gender', 'nationality'])
print(gender_nationality)

print("\nAnalyzing intersectional bias between age group and nationality...")
age_nationality = analyze_intersectional_bias(results_df, ['age_group', 'nationality'])
print(age_nationality)

# Visualize the top and bottom intersections
def plot_top_bottom_intersections(intersection_scores, title, n=5):
    """Plot the top and bottom performing demographic intersections"""
    
    # Get top and bottom groups
    top_n = intersection_scores.head(n)
    bottom_n = intersection_scores.tail(n)
    
    # Combine for plotting
    plot_data = pd.concat([top_n, bottom_n])
    
    # Plot
    plt.figure(figsize=(14, 8))
    
    # Create colormap
    colors = np.where(plot_data.index.isin(top_n.index), 'green', 'red')
    
    # Create bar plot
    bars = plt.bar(plot_data.index, plot_data['mean'], color=colors)
    
    # Add counts as text on bars
    for bar, count in zip(bars, plot_data['count']):
        plt.text(bar.get_x() + bar.get_width()/2, 
                bar.get_height() + 0.01, 
                f"n={count}", 
                ha='center',
                fontweight='bold')
    
    plt.title(title, fontsize=16)
    plt.xlabel('Demographic Intersection', fontsize=12)
    plt.ylabel('Mean Relevance Score', fontsize=12)
    plt.xticks(rotation=45, ha='right')
    plt.ylim(0, 1.0)
    plt.grid(axis='y', linestyle='--', alpha=0.7)
    plt.tight_layout()
    
    # Add legend
    import matplotlib.patches as mpatches
    green_patch = mpatches.Patch(color='green', label='Top performing groups')
    red_patch = mpatches.Patch(color='red', label='Bottom performing groups')
    plt.legend(handles=[green_patch, red_patch], loc='upper right')
    
    plt.show()

# Plot top and bottom intersections
plot_top_bottom_intersections(
    gender_nationality, 
    'Top and Bottom Performing Gender + Nationality Intersections', 
    n=3)

plot_top_bottom_intersections(
    age_nationality, 
    'Top and Bottom Performing Age Group + Nationality Intersections',
    n=3)

## 7. Bias Mitigation Strategies

Based on our analysis, we can implement several strategies to mitigate bias in our RAG system.

In [None]:
# Define bias mitigation strategies for RAG systems
bias_mitigation_strategies = {
    "data": [
        "Diversify training corpus to ensure equal representation across demographics",
        "Augment data for underrepresented groups",
        "Balance the knowledge base across different cultural contexts",
        "Include content from diverse authors and sources"
    ],
    "retrieval": [
        "Implement fairness-aware retrieval algorithms",
        "Apply equal representation constraints during document retrieval",
        "Use bias-detecting metrics to filter or re-rank retrieved documents",
        "Implement demographic-aware re-ranking"
    ],
    "prompting": [
        "Design culturally-neutral prompts",
        "Include explicit fairness instructions in prompts",
        "Use prompt templates that encourage inclusive responses",
        "Add bias awareness to system prompts"
    ],
    "monitoring": [
        "Implement continuous bias monitoring across demographic groups",
        "Set up alerts for fairness metric degradation",
        "Periodically retrain using bias-aware techniques",
        "Create dashboards for demographic performance parity"
    ],
    "evaluation": [
        "Evaluate system across diverse user personas",
        "Include fairness metrics in evaluation frameworks",
        "Conduct regular bias audits using tools like Aequitas",
        "Incorporate user feedback from diverse demographic groups"
    ]
}

# Simple function to display strategies
def display_strategies(strategies):
    for category, items in strategies.items():
        print(f"\n## {category.capitalize()} Strategies:")
        for i, strategy in enumerate(items, 1):
            print(f"{i}. {strategy}")

# Display bias mitigation strategies
print("# Bias Mitigation Strategies for RAG Systems")
display_strategies(bias_mitigation_strategies)

# Create a simulated bias mitigation implementation plan
print("\n# Implementation Plan Example")
print("""
## Short-term Actions (1-2 weeks)
1. Implement fairness metrics monitoring in the evaluation pipeline
2. Add demographic tracking to user queries (with appropriate privacy controls)
3. Create bias awareness prompts for the RAG system

## Medium-term Actions (1-2 months)
1. Augment knowledge base with content from underrepresented groups
2. Implement demographic-aware re-ranking algorithm
3. Develop fairness-aware retrieval mechanisms

## Long-term Actions (3-6 months)
1. Build comprehensive bias monitoring dashboard
2. Develop automated bias detection and mitigation system
3. Conduct regular external audits of system performance
4. Create a diverse evaluation dataset covering all demographic groups
""")

## 8. Create a Simple Monitoring Dashboard

Let's create a simple monitoring dashboard for tracking bias metrics over time.

In [None]:
# Create a simulated time series of bias metrics
import datetime

# Generate dates for the past 30 days
dates = [(datetime.datetime.now() - datetime.timedelta(days=i)).strftime('%Y-%m-%d') 
         for i in range(30, 0, -1)]

# Helper function to generate bias metrics time series with gradual improvement
def generate_bias_metrics_series(start_value, end_value, dates, noise_level=0.05):
    values = np.linspace(start_value, end_value, len(dates))
    # Add some noise for more realistic data
    noise = np.random.normal(0, noise_level, len(dates))
    return [max(0, min(1, v + n)) for v, n in zip(values, noise)]

# Generate bias metrics time series for different demographics
bias_metrics_df = pd.DataFrame({
    'date': dates,
    'moroccan_fairness': generate_bias_metrics_series(0.95, 0.98, dates),
    'french_fairness': generate_bias_metrics_series(0.91, 0.95, dates),
    'american_fairness': generate_bias_metrics_series(0.89, 0.94, dates),
    'nigerian_fairness': generate_bias_metrics_series(0.75, 0.85, dates),
    'chinese_fairness': generate_bias_metrics_series(0.72, 0.83, dates),
    'male_fairness': generate_bias_metrics_series(0.94, 0.96, dates),
    'female_fairness': generate_bias_metrics_series(0.82, 0.91, dates),
    'non_binary_fairness': generate_bias_metrics_series(0.79, 0.89, dates),
    '18_24_fairness': generate_bias_metrics_series(0.93, 0.96, dates),
    '25_34_fairness': generate_bias_metrics_series(0.95, 0.97, dates),
    '35_44_fairness': generate_bias_metrics_series(0.89, 0.93, dates),
    '45plus_fairness': generate_bias_metrics_series(0.80, 0.89, dates)
})

# Convert date to datetime
bias_metrics_df['date'] = pd.to_datetime(bias_metrics_df['date'])

# Plot the time series
plt.figure(figsize=(14, 10))

# Plot nationality fairness trends
plt.subplot(3, 1, 1)
for col in ['moroccan_fairness', 'french_fairness', 'american_fairness', 'nigerian_fairness', 'chinese_fairness']:
    plt.plot(bias_metrics_df['date'], bias_metrics_df[col], marker='o', label=col.replace('_fairness', ''))
plt.title('Nationality Fairness Trends', fontsize=14)
plt.ylabel('Fairness Score', fontsize=12)
plt.grid(True, alpha=0.3)
plt.ylim(0.7, 1.0)
plt.legend()

# Plot gender fairness trends
plt.subplot(3, 1, 2)
for col in ['male_fairness', 'female_fairness', 'non_binary_fairness']:
    plt.plot(bias_metrics_df['date'], bias_metrics_df[col], marker='o', label=col.replace('_fairness', ''))
plt.title('Gender Fairness Trends', fontsize=14)
plt.ylabel('Fairness Score', fontsize=12) 
plt.grid(True, alpha=0.3)
plt.ylim(0.7, 1.0)
plt.legend()

# Plot age group fairness trends
plt.subplot(3, 1, 3)
for col in ['18_24_fairness', '25_34_fairness', '35_44_fairness', '45plus_fairness']:
    plt.plot(bias_metrics_df['date'], bias_metrics_df[col], marker='o', 
             label=col.replace('_fairness', '').replace('_', '-'))
plt.title('Age Group Fairness Trends', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Fairness Score', fontsize=12)
plt.grid(True, alpha=0.3)
plt.ylim(0.7, 1.0)
plt.legend()

plt.tight_layout()
plt.savefig("../bias_monitoring_trends.png", dpi=300, bbox_inches='tight')
plt.show()

print("Bias monitoring dashboard created and saved as 'bias_monitoring_trends.png'")

## 9. Implement a Simple Bias-Aware RAG System

Let's demonstrate how to modify a RAG system to be more aware of potential bias.

In [None]:
# Example implementation of a bias-aware RAG system

def bias_aware_rag_query(question, user_attributes=None):
    """
    A bias-aware RAG query function that attempts to mitigate potential bias
    
    Args:
        question: The user's question
        user_attributes: Optional demographic attributes for monitoring
        
    Returns:
        Response and metadata
    """
    # 1. First, create a bias-aware prompt enhancement
    bias_aware_prompt = """
    Please provide an informative, accurate answer based on the retrieved content.
    Ensure your response is fair, balanced, and avoids cultural, gender, or age biases.
    Consider diverse perspectives and be inclusive in your language.
    Base your answer solely on the factual content provided.
    """
    
    # 2. Make sure retrieval considers diversity
    def diversity_aware_retrieval(question):
        """
        Enhanced retrieval function that considers diversity of sources
        """
        # Simulated retrieval function
        # In a real system, this would:
        # - Ensure sources from diverse perspectives
        # - Balance cultural contexts in retrieved documents
        # - Apply fairness constraints to retrieval
        
        return [
            "First simulated retrieved content",
            "Second simulated retrieved content from diverse perspective",
            "Third simulated retrieved content with balanced viewpoint"
        ]
    
    # 3. Retrieve content
    retrieved_content = diversity_aware_retrieval(question)
    
    # 4. Generate response (simulated)
    response = f"This is a bias-aware response to: {question}"
    
    # 5. Log demographic information for monitoring (if provided)
    if user_attributes:
        # In a real system, this would anonymize and store metrics by demographic group
        print(f"Logging query metrics for demographic analysis: {user_attributes}")
    
    # 6. Return response with metadata
    return {
        "response": response,
        "bias_aware": True,
        "diversity_enhanced_retrieval": True,
        "retrieved_content_count": len(retrieved_content)
    }

# Demonstrate with a test question
test_question = "What are the housing options for students at AUI?"
test_attributes = {"nationality": "nigerian", "gender": "female", "age_group": "18-24"}

response = bias_aware_rag_query(test_question, test_attributes)
print("Bias-aware RAG response:")
print(json.dumps(response, indent=2))

# Example of demographic-aware prompt enhancement
def create_demographic_aware_prompt(question, demographics=None):
    """Create a prompt that's sensitive to demographic context"""
    
    base_prompt = """
    Answer the following question based on the retrieved context.
    Provide a factual, informative response.
    """
    
    # Add demographic awareness when appropriate
    if demographics:
        # For international students
        if demographics.get("nationality") != "moroccan":
            base_prompt += """
            Make sure to explain any Morocco-specific or AUI-specific terms that may not be familiar to international students.
            """
        
        # For older students
        if demographics.get("age_group") in ["35-44", "45+"]:
            base_prompt += """
            Consider the perspective of non-traditional students who may have different needs and experiences.
            """
    
    # Always add bias mitigation instructions
    base_prompt += """
    Ensure your response is culturally inclusive and avoids assumptions based on gender, nationality, or age.
    """
    
    return base_prompt + f"\nQuestion: {question}"

# Show examples of demographic-aware prompts
print("\nExample of demographic-aware prompts:")

print("\n1. For a Moroccan student:")
print(create_demographic_aware_prompt(
    "How do I apply for campus housing?",
    {"nationality": "moroccan", "gender": "male", "age_group": "18-24"}
))

print("\n2. For an international student:")
print(create_demographic_aware_prompt(
    "How do I apply for campus housing?",
    {"nationality": "chinese", "gender": "female", "age_group": "18-24"}
))

print("\n3. For an older student:")
print(create_demographic_aware_prompt(
    "How do I apply for campus housing?",
    {"nationality": "american", "gender": "male", "age_group": "45+"}
))

## 10. Conclusion and Recommendations

Based on our bias audit, we can make the following recommendations for implementing a fair and unbiased RAG system for AUI.

In [None]:
# Generate recommendations based on our findings

recommendations = {
    "data_enhancement": [
        "Expand the university corpus with materials representing diverse student demographics",
        "Include content specifically for international students (in English)",
        "Ensure housing, counseling, and admission information addresses needs of all age groups",
        "Add more content authored by diverse faculty and staff members"
    ],
    "system_modifications": [
        "Implement demographic-aware prompting to provide appropriate context",
        "Add fairness constraints to the retrieval system",
        "Create a bias monitoring pipeline with weekly reports",
        "Set up alerting when demographic disparity exceeds threshold of 15%"
    ],
    "evaluation_practices": [
        "Include user feedback from diverse demographic groups in ongoing evaluation",
        "Conduct quarterly bias audits using Aequitas",
        "Establish fairness benchmarks required for any system updates",
        "Create a diverse test set with questions from different demographic perspectives"
    ],
    "governance": [
        "Establish a fairness review board with representatives from diverse backgrounds",
        "Create clear guidelines for acceptable bias thresholds",
        "Implement transparent reporting of system performance across demographics",
        "Develop a bias response protocol for addressing discovered issues"
    ]
}

# Display recommendations
print("# Recommendations for Implementing a Fair RAG System at AUI\n")

for category, items in recommendations.items():
    print(f"## {category.replace('_', ' ').title()}\n")
    for item in items:
        print(f"- {item}")
    print()

# Save the bias audit summary
audit_summary = {
    "date": datetime.datetime.now().strftime("%Y-%m-%d"),
    "system_name": "AUIChat RAG System",
    "bias_findings": {
        "nationality": "Moderate bias detected against non-Moroccan nationalities, particularly Chinese and Nigerian users",
        "gender": "Slight bias detected against female users",
        "age_group": "Moderate bias detected against users 45+"
    },
    "recommendations": recommendations
}

# Save as JSON
with open("../bias_audit_summary.json", "w") as f:
    json.dump(audit_summary, f, indent=2)

print("Bias audit summary saved to '../bias_audit_summary.json'")