# 🧮 Fairness Analysis – Intermediate Level

This notebook compares fairness metrics across two datasets:
- `loan_data.csv`: Loan approval decisions
- `promotion_data.csv`: Account promotion exposure

We assess fairness using common group fairness metrics, focusing on caste-based disparities.


## 📊 Datasets
We load and inspect two datasets for comparison.

In [1]:
import pandas as pd

loan_df = pd.read_csv('../beginner/data/loan_data.csv')
promo_df = pd.read_csv('./data/promotion_data.csv')

loan_df.head()

Unnamed: 0,user_id,gender,caste,religion,age,income,accent_score,loan_score,approved
0,U000,Male,SC/ST,Other,41,65247,54.84,72.8,1
1,U001,Female,OBC,Hindu,52,92752,96.98,71.51,1
2,U002,Male,SC/ST,Hindu,43,86573,77.37,52.68,0
3,U003,Male,SC/ST,Hindu,53,89101,63.78,66.9,1
4,U004,Male,OBC,Other,23,56646,64.51,84.0,1


## 📐 Fairness Metrics
We compute three metrics using caste as the protected attribute:

### 1. **Disparate Impact Ratio (DIR)**
Let $P_+^{adv}$ be the favorable outcome rate for the advantaged group, and $P_+^{dis}$ for the disadvantaged group:

$$ DIR = \frac{P_+^{dis}}{P_+^{adv}} $$

### 2. **Equal Opportunity Difference (EOD)**
True positive rate (TPR) for each group:

$$ EOD = TPR_{dis} - TPR_{adv} $$

### 3. **Group Accuracy**
Let $Acc_g$ be accuracy for group $g$:

$$ Acc_g = \frac{TP + TN}{TP + TN + FP + FN} $$


In [4]:
print(loan_df['caste'].value_counts())


caste
General    52
OBC        25
SC/ST      23
Name: count, dtype: int64


In [5]:
def compute_dir_robust(df, protected_col, outcome_col, adv_value, dis_value):
    # Normalize values to lowercase and strip whitespace
    df = df.copy()
    df[protected_col] = df[protected_col].str.strip().str.lower()
    adv_value = adv_value.strip().lower()
    dis_value = dis_value.strip().lower()

    # Compute mean outcome for advantaged and disadvantaged groups
    p_adv = df[df[protected_col] == adv_value][outcome_col].mean()
    p_dis = df[df[protected_col] == dis_value][outcome_col].mean()

    # Return ratio with handling for zero or NaN
    if pd.isna(p_adv) or p_adv == 0:
        return None  # or return np.nan or raise a warning
    return round(p_dis / p_adv, 3)


In [6]:
compute_dir_robust(loan_df, 'caste', 'approved', 'General', 'SC/ST')


0.822

**Observations:** This means that individuals from the SC/ST caste group are approved for loans at only 82.2% the rate of individuals from the General caste group.

**Interpretation:** A value below 0.80 is commonly considered a threshold for potential disparate impact (known as the "four-fifths rule"). At 0.822, this example is close to the threshold, indicating a possible bias that deserves further analysis.

📌 You are encouraged to:
- Try the same formula on `promo_df`
- Implement Equal Opportunity and Accuracy by group
- Compare and summarize the disparity across tasks