# Fairness Evaluation

Beyond predicting credit labels accurately, our project aims to develop a **fair and responsible model**. To this end, we adopted the **FEAT framework** — *Fairness, Ethics, Accountability, and Transparency* — which is widely used in the financial sector to guide responsible AI practices.

We focused on two key **sensitive attributes**: `gender` and `age`. Since both features were ranked highly in our feature importance analysis, we chose not to remove them prematurely. Instead, we conducted a post-modeling fairness evaluation to assess potential bias.

Fairness Metrics Used
- **Equal Opportunity Difference (EOD):**  
  Measures whether individuals who *truly qualify* (i.e., should receive a positive outcome) have equal chances of being correctly classified across demographic groups. This focuses on **true positive rates**.

- **Demographic Parity Difference (DPD):**  
  Assesses the **difference in positive prediction rates** between groups, regardless of their actual qualification.

By comparing these fairness-specific metrics across subgroups, we aim to uncover any disparities in model behavior that could disproportionately affect certain demographic groups.


In [1]:
from fairlearn.metrics import *
import pandas as pd
from sklearn.metrics import accuracy_score

test_df = pd.read_csv("test_set_with_predictions.csv")
y_test_binary = test_df['credit_status']
y_pred_custom = test_df['predicted_credit_status']

We grouped `age` into three brackets based on life stages:  
- **<35**: young adults or individuals early in their careers  
- **35–50**: those likely to be in stable employment or supporting families  
- **>50**: individuals approaching retirement  

`Gender` was categorized as **male** or **female**.

In [9]:
# Creating the 'age_bin' column with the adjusted bins for scaled age
test_df['age_bin'] = pd.cut(
    test_df['age'],  
    bins=[0, 0.306122, 0.612244, 1],  # Adjusted bins for scaled age (corresponding to original age ranges)
    labels=['<35', '35-50', '>50'],
    right=False  # Right edge is excluded, i.e., <30 will not include 30
)

# Fairness metrics by gender
dpd_rf_gender = demographic_parity_difference(y_test_binary, y_pred_custom, sensitive_features=test_df['gender'])
eod_rf_gender = equalized_odds_difference(y_test_binary, y_pred_custom, sensitive_features=test_df['gender'])

print("Random Forest Fairness Metrics by Gender:")
print(f"- Demographic Parity Difference: {dpd_rf_gender:.4f}")
print(f"- Equalized Odds Difference: {eod_rf_gender:.4f}")

# Fairness metrics by age group
dpd_rf_age = demographic_parity_difference(y_test_binary, y_pred_custom, sensitive_features=test_df['age_bin'])
eod_rf_age = equalized_odds_difference(y_test_binary, y_pred_custom, sensitive_features=test_df['age_bin'])

print("\nRandom Forest Fairness Metrics by Age Group:")
print(f"- Demographic Parity Difference: {dpd_rf_age:.4f}")
print(f"- Equalized Odds Difference: {eod_rf_age:.4f}")


Random Forest Fairness Metrics by Gender:
- Demographic Parity Difference: 0.0020
- Equalized Odds Difference: 0.0057

Random Forest Fairness Metrics by Age Group:
- Demographic Parity Difference: 0.0566
- Equalized Odds Difference: 0.0573


The values are all **close to zero**, indicating that the model does **not exhibit significant bias** toward any particular demographic group — even though age and gender were among the top predictors.  

These findings suggest that our model is **fair** and does not **systematically disadvantage applicants** based on age or gender.