#Homework 1

**Łukasz Niedźwiedzki Student ID 419328**


##Task 1

We have two populations Blue (privileged) and Red (unprivileged), with the Blue population being 9 times larger than the Red population.

Individuals from both populations are requesting to attend XAI training to improve competency in this important area. Number of places is limited. The administrators of the training have decided to give priority to enrolling individuals who may need this training in the future, although unfortunately it is difficult to predict who will benefit.

The decision rule adopted:
1. In the Red group, half of the people will find the skills useful in future and half will not. Administrators randomly allocate 50% of people to training.
2. in the Blue group, 80% of people will find the training useful in future and 20% will not, although of course it is not known who will find it useful. The administrators have built a predictive model based on user behaviour in predicting for whom it will be useful and whom will not. The model has the following performance:


| Blue                     	| Will use XAI 	| Will not use XAI 	| Total 	|
|--------------------------	|--------------	|------------------	|-------	|
| Enrolled in training     	| 60           	| 5               	| 65    	|
| not enrolled in training 	| 20            	| 15               	| 35    	|
| Total                    	| 80           	| 20               	| 100   	|


Task: Calculate the Demographic parity, equal opportunity and predictive rate parity coefficients for this decision rule.

Starred task: How can this decision rule be changed to improve its fairness?

###Solution

In the first step, I am modeling the Red population by simulating the random selection of candidates from both subpopulations (will use XAI/will not use XAI). This randomness may change the metrics a bit in every simulation. Then, I am combining it with the provided table for the blue population (which could also be modeled in a similar manner).

In [94]:
#@title Create DataFrame
import pandas as pd
import numpy as np

blue_total = 100

red_total = int(blue_total / 9)

red_enrolled = int(np.ceil(red_total*0.5)) #ceil because for blue=100 we get an uneven number
red_not_enrolled = int(red_total - red_enrolled)

rwux = int(red_total*0.5) #red will use xai
rwnux = int(red_total - rwux) #red will not use xai

arr = np.random.permutation([1] * rwux + [0] * rwnux) # 1 will use xai, 0 will not use xai
sampled_idx = np.random.choice(red_total, red_enrolled, replace=False)
sampled = arr[sampled_idx] #red enrolled
remaining = np.delete(arr, sampled_idx) #red not enrolled

# Count ones and zeros
counts_sampled = np.bincount(sampled, minlength=2)
counts_remaining = np.bincount(remaining, minlength=2)

rewux = counts_sampled[1] #red enrolled will use xai
rewnux = counts_sampled[0] #red enrolled will not use x

rnewux = counts_remaining[1] #red not enrolled will use xai
rnewnux = counts_remaining[0]  #red not enrolled will not use xai

data = {
    "group": ["Blue", "Red"],
    "total": [blue_total, red_total],
    "enrolled": [65, red_enrolled],
    "enrolled_will_use_XAI": [60, rewux],
    "enrolled_will_not_use_XAI": [5, rewnux],
    "not_enrolled_will_use_XAI": [20, rnewux],
    "not_enrolled_will_not_use_XAI": [15, rnewnux],
    "will use XAI": [60+20, rwux],
    "will_not_use_XAI": [5+15, rwnux]
}

df = pd.DataFrame(data)
df

Unnamed: 0,group,total,enrolled,enrolled_will_use_XAI,enrolled_will_not_use_XAI,not_enrolled_will_use_XAI,not_enrolled_will_not_use_XAI,will use XAI,will_not_use_XAI
0,Blue,100,65,60,5,20,15,80,20
1,Red,11,6,2,4,3,2,5,6


In [95]:
#@title Calculate the metrics
# Calculate Demographic Parity (DP)
df['dp'] = df['enrolled'] / df['total']

# Calculate Equal Opportunity (EO)
df['eo'] = df['enrolled_will_use_XAI'] / (df['enrolled_will_use_XAI'] + df['not_enrolled_will_use_XAI'])

# Calculate Predictive Rate Parity (PRP)
df['prp'] = df['enrolled_will_use_XAI'] / df['enrolled']

metrics = {
    "Demographic Parity": df['dp'].iloc[1] / df['dp'].iloc[0],
    "Equal Opportunity": df['eo'].iloc[1] / df['eo'].iloc[0],
    "Predictive Rate Parity": df['prp'].iloc[1] / df['prp'].iloc[0]
}

metrics_df = pd.DataFrame([metrics])
metrics_df

Unnamed: 0,Demographic Parity,Equal Opportunity,Predictive Rate Parity
0,0.839161,0.533333,0.361111


###Results
These results suggest disparities between the Red and Blue groups across all three metrics:

Demographic Parity (0.839): The Red group has a lower proportion of enrolled individuals compared to the Blue group.

Equal Opportunity (0.533): The Red group is less likely to use XAI among those enrolled compared to the Blue group, suggesting reduced access or effectiveness of XAI for the Red group.

Predictive Rate Parity (0.361): The Red group shows lower accuracy in predicting who will use XAI among the enrolled, indicating predictive biases or lower performance in forecasts for the Red group.

####How can this decision rule be changed to improve its fairness?

1. Equal Quota Allocation: Allocate a fixed number of spots for each group. In that way we could improve Demographic Parity (equal representation).

2. Weighted Enrollment: Adjust the allocation based on the proportion of individuals likely to benefit from the training, giving a larger percentage of spots to the Red group. In that way we could improve the Equal Opportunity metric (enhanced access for the underprivileged group).

##Task 2

For this homework, train few models on a selected dataset from https://github.com/ahxt/fair_fairness_benchmark/:

Prepare a knitr/jupiter notebook with the following points.
Submit your results on GitHub to the directory `Homeworks/HW1`.

1. Train a model for the selected dataset.
2. For the selected protected attribute (age, gender, race) calculate the following fairness coefficients: Statistical parity, Equal opportunity, Predictive parity.
3. Train another model (different hyperparameters, feature transformations etc., different family of models) and see how the coefficients Statistical parity, Equal opportunity, Predictive parity behave for it. Are they different/similar?
4. Apply the selected bias mitigation technique (like data balancing) on the first model. Check how Statistical parity, Equal opportunity, Predictive parity coefficients behave after this mittigation.
5. Compare the quality (performance) of the three models with their fairness coefficients. Is there any correlation/trade off?
6. ! COMMENT on the results obtained in (2)-(5)


###Solution

For this task, I've selected the compass dataset (compas-scores-two-years.csv).

In [103]:
#@title Imports, functions and data preparation
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
from imblearn.over_sampling import SMOTE

def statistical_parity(y_true, y_pred, protected_attr):
    protected_group = (protected_attr == 1)
    privileged_group = (protected_attr == 0)
    return np.mean(y_pred[protected_group]) - np.mean(y_pred[privileged_group])

def equal_opportunity(y_true, y_pred, protected_attr):
    privileged_group = (protected_attr == 0)
    privileged_cm = confusion_matrix(y_true[privileged_group], y_pred[privileged_group])
    unprivileged_group = (protected_attr == 1)
    unprivileged_cm = confusion_matrix(y_true[unprivileged_group], y_pred[unprivileged_group])
    TPR_priv = privileged_cm[1, 1] / (privileged_cm[1, 1] + privileged_cm[1, 0])
    TPR_unpriv = unprivileged_cm[1, 1] / (unprivileged_cm[1, 1] + unprivileged_cm[1, 0])
    return TPR_unpriv - TPR_priv

def predictive_parity(y_true, y_pred, protected_attr):
    privileged_group = (protected_attr == 0)
    unprivileged_group = (protected_attr == 1)
    privileged_cm = confusion_matrix(y_true[privileged_group], y_pred[privileged_group])
    unprivileged_cm = confusion_matrix(y_true[unprivileged_group], y_pred[unprivileged_group])
    PPV_priv = privileged_cm[1, 1] / (privileged_cm[1, 1] + privileged_cm[0, 1])
    PPV_unpriv = unprivileged_cm[1, 1] / (unprivileged_cm[1, 1] + unprivileged_cm[0, 1])
    return PPV_unpriv - PPV_priv


df = pd.read_csv('/content/compas-scores-two-years.csv')

columns_to_drop = ['id', 'name', 'first', 'last', 'compas_screening_date', 'dob', 'v_type_of_assessment',
                   'v_decile_score', 'v_score_text', 'v_screening_date', 'r_case_number',
                   'r_offense_date', 'r_charge_desc', 'r_jail_in', 'r_jail_out', 'vr_case_number',
                   'vr_charge_desc', 'vr_offense_date', 'event', 'start', 'end']
df.drop(columns=columns_to_drop, inplace=True)

#df.dropna(inplace=True)

print(f"Number of rows after cleaning: {df.shape[0]}")

# Label Encoding for categorical columns
label_enc = LabelEncoder()
df['sex'] = label_enc.fit_transform(df['sex'])
df['race'] = label_enc.fit_transform(df['race'])
df['c_charge_degree'] = label_enc.fit_transform(df['c_charge_degree'])
df['is_recid'] = label_enc.fit_transform(df['is_recid'])

X = df[['sex', 'age', 'race', 'juv_fel_count', 'decile_score', 'c_charge_degree']]
y = df['is_recid']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Train set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")


Number of rows after cleaning: 7214
Train set size: 5771
Test set size: 1443


In [104]:
#@title Training and model evaluation, bias mitigation; Protected attribute: Race
protected_attr_train = X_train['race']
protected_attr_test = X_test['race']

# First model - Logistic Regression
model1 = LogisticRegression(max_iter=1000)
model1.fit(X_train, y_train)
y_pred1 = model1.predict(X_test)

sp1 = statistical_parity(y_test, y_pred1, protected_attr_test)
eo1 = equal_opportunity(y_test, y_pred1, protected_attr_test)
pp1 = predictive_parity(y_test, y_pred1, protected_attr_test)

# Second model - Random Forest
model2 = RandomForestClassifier(n_estimators=100, max_depth=7, random_state=42)
model2.fit(X_train, y_train)
y_pred2 = model2.predict(X_test)

sp2 = statistical_parity(y_test, y_pred2, protected_attr_test)
eo2 = equal_opportunity(y_test, y_pred2, protected_attr_test)
pp2 = predictive_parity(y_test, y_pred2, protected_attr_test)

# Bias mitigation - SMOTE (balancing the dataset)
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X_train, y_train)

# Retrain the first model (Logistic Regression) with the balanced data
model1_mitigated = LogisticRegression(max_iter=1000)
model1_mitigated.fit(X_res, y_res)
y_pred1_mitigated = model1_mitigated.predict(X_test)

# Fairness metrics for the mitigated model
sp1_mitigated = statistical_parity(y_test, y_pred1_mitigated, protected_attr_test)
eo1_mitigated = equal_opportunity(y_test, y_pred1_mitigated, protected_attr_test)
pp1_mitigated = predictive_parity(y_test, y_pred1_mitigated, protected_attr_test)

acc_model1 = accuracy_score(y_test, y_pred1)
acc_model2 = accuracy_score(y_test, y_pred2)
acc_model1_mitigated = accuracy_score(y_test, y_pred1_mitigated)

print(f"Model 1 - Logistic Regression: Accuracy={acc_model1:.4f}, SP={sp1:.4f}, EO={eo1:.4f}, PP={pp1:.4f}")
print(f"Model 2 - Random Forest: Accuracy={acc_model2:.4f}, SP={sp2:.4f}, EO={eo2:.4f}, PP={pp2:.4f}")
print(f"Model 1 with mitigation: Accuracy={acc_model1_mitigated:.4f}, SP={sp1_mitigated:.4f}, EO={eo1_mitigated:.4f}, PP={pp1_mitigated:.4f}")

Model 1 - Logistic Regression: Accuracy=0.6486, SP=-0.3910, EO=-0.2205, PP=0.3495
Model 2 - Random Forest: Accuracy=0.6521, SP=-0.4252, EO=-0.2513, PP=0.3589
Model 1 with mitigation: Accuracy=0.6486, SP=-0.4265, EO=-0.2513, PP=0.3603


###Results
1. Fairness Metrics for Logistic Regression vs. Random Forest (Without Mitigation):
      
      Logistic Regression and Random Forest show similar levels of accuracy.
      Both models have high disparities in Statistical Parity (SP), with Random Forest exhibiting slightly more disparity. This means both models disproportionately favor the privileged group in predicting positive outcome.
      Equal Opportunity (EO) also shows significant bias, with Random Forest again showing a larger gap. Both models are less likely to give true positives for the unprivileged group.
      Predictive Parity (PP) is slightly higher for Random Forest, but both models still show differences in prediction quality between groups.

2. Impact of Bias Mitigation (SMOTE) on Logistic Regression:

      After applying SMOTE to Logistic Regression, the accuracy remains unchanged.
      Statistical Parity (SP) worsens slightly, meaning the mitigation technique didn’t improve fairness in terms of predicted positive rates.
      Equal Opportunity (EO) also does not improve, indicating that SMOTE did not address the disparity in true positive rates across groups.
      Predictive Parity (PP) shows a slight improvement, but the effect is minor.

3. Comparison:

      Despite similar accuracy, the fairness trade-offs remain significant. Random Forest, while marginally more accurate, tends to worsen fairness metrics across the board.
      Bias mitigation via SMOTE failed to meaningfully improve fairness, especially in Statistical Parity and Equal Opportunity.

Both models exhibit significant bias, and while bias mitigation via SMOTE had some positive impact on Predictive Parity, it did not address core issues related to Statistical Parity and Equal Opportunity.