# Tutorial: Fairness in Machine Learning

In this notebook, we'll explore fairness metrics, model explanation techniques, and bias mitigation approaches. 

We will:

- Generate synthetic data with a sensitive attribute.
- Train a logistic regression model.
- Evaluate fairness metrics using Fairlearn.
- Demonstrate counterfactual fairness.
- Explain model decisions with SHAP and LIME.
- Mitigate bias using reweighting and adversarial debiasing (via AIF360).


## 1. Setup and Import Libraries

Let's import the required libraries. Make sure you have installed these packages:

```bash
pip install numpy pandas scikit-learn fairlearn aif360 'aif360[inFairness]' shap lime tensorflow 
```

In [4]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Fairness metrics from Fairlearn
from fairlearn.metrics import MetricFrame, selection_rate, true_positive_rate, false_positive_rate

# Model explanation libraries
import shap
import lime
import lime.lime_tabular

# Bias mitigation using AIF360
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing
from aif360.algorithms.inprocessing import AdversarialDebiasing

# TensorFlow for adversarial debiasing
import tensorflow as tf
tf.compat.v1.disable_eager_execution()

# Set a random seed for reproducibility
np.random.seed(42)

## 2. Generate Synthetic Data

We create a binary classification dataset with 5 features. Additionally, we add a sensitive attribute (0 or 1) to simulate a protected group. This attribute will help us later to check if our model is biased.

In [5]:
# Create synthetic binary classification data
X, y = make_classification(n_samples=1000, n_features=5, random_state=42)

# Create a sensitive attribute (0 or 1) with some added noise
sensitive = (y + np.random.binomial(1, 0.3, size=y.shape)).clip(0, 1)

# Build a DataFrame with feature columns, sensitive attribute, and target
df = pd.DataFrame(X, columns=[f'feat{i}' for i in range(1, 6)])
df['sensitive'] = sensitive
df['target'] = y

# Split data into training and testing sets
train, test = train_test_split(df, test_size=0.3, random_state=42)

# Take a peek at the training data
train.head()

Unnamed: 0,feat1,feat2,feat3,feat4,feat5,sensitive,target
541,0.558709,-0.304063,-0.675708,-0.327199,0.144969,1,1
440,-0.000163,1.021126,-1.087246,0.484538,-2.493911,1,1
482,-0.853323,0.035071,1.532368,0.296036,0.827214,0,0
422,-0.979093,0.785848,-0.617652,0.693431,-0.872002,0,0
778,-1.325967,1.051464,-0.317715,0.933029,-1.149683,1,0


## 3. Train a Logistic Regression Model

We now train a logistic regression model using only the non-sensitive features. This simulates a situation where the sensitive attribute is not part of the model's input.

In [6]:
# Define the feature set (excluding the sensitive attribute)
features = [f'feat{i}' for i in range(1, 6)]

# Train the logistic regression model
model = LogisticRegression(solver='lbfgs')
model.fit(train[features], train['target'])

# Generate predictions on the test set
test = test.copy()  # avoid potential warnings
test['pred'] = model.predict(test[features])

# Check overall model accuracy
acc = accuracy_score(test['target'], test['pred'])
print('Overall Accuracy:', acc)

Overall Accuracy: 0.89


## 4. Evaluate Fairness Metrics with Fairlearn

We now use Fairlearn's `MetricFrame` to calculate fairness metrics across groups defined by the sensitive attribute. 

### Key Metrics:

- **Selection Rate (Demographic Parity):** The rate of positive predictions for each group.
- **True Positive Rate (TPR) & False Positive Rate (FPR) (Equalized Odds):** The rates of correctly and incorrectly classified positives across groups.
- **Disparate Impact:** The ratio of the selection rates (unprivileged/privileged).

In [7]:
# Calculate fairness metrics using MetricFrame
mf = MetricFrame(metrics={
                    'accuracy': accuracy_score,
                    'selection_rate': selection_rate,
                    'tpr': true_positive_rate,
                    'fpr': false_positive_rate},
                 y_true=test['target'],
                 y_pred=test['pred'],
                 sensitive_features=test['sensitive'])

print('Fairness metrics by sensitive group:')
print(mf.by_group)

# Compute Disparate Impact: ratio of selection rates between groups
sr_priv = mf.by_group.loc[1, 'selection_rate']
sr_unpriv = mf.by_group.loc[0, 'selection_rate']
disparate_impact = sr_unpriv / sr_priv if sr_priv != 0 else np.nan
print('\nDisparate Impact Ratio (unprivileged/privileged):', disparate_impact)

# Calculate Equalized Odds differences (difference in TPR and FPR)
tpr_diff = mf.by_group.loc[1, 'tpr'] - mf.by_group.loc[0, 'tpr']
fpr_diff = mf.by_group.loc[1, 'fpr'] - mf.by_group.loc[0, 'fpr']
print('Equalized Odds differences (TPR diff, FPR diff):', tpr_diff, fpr_diff)

Fairness metrics by sensitive group:
           accuracy  selection_rate       tpr       fpr
sensitive                                              
0          0.944954        0.055046  0.000000  0.055046
1          0.858639        0.717277  0.857143  0.135135

Disparate Impact Ratio (unprivileged/privileged): 0.07674278443715261
Equalized Odds differences (TPR diff, FPR diff): 0.8571428571428571 0.08008926357550211


## 5. Counterfactual Fairness

Counterfactual fairness asks, "Would the prediction change if we flip the sensitive attribute?" 

Since our initial model does not include the sensitive attribute, flipping it does not change the prediction. 

For demonstration, we train another model that includes the sensitive attribute and compare its predictions.

In [8]:
# Select a test instance
instance = test.iloc[0]
orig_pred = model.predict(instance[features].values.reshape(1, -1))[0]
print('Original prediction (without sensitive attribute):', orig_pred)

# Train a model that includes the sensitive attribute
features_with_sensitive = features + ['sensitive']
model_with_sensitive = LogisticRegression(solver='lbfgs')
model_with_sensitive.fit(train[features_with_sensitive], train['target'])

# Get prediction with the sensitive attribute
orig_pred_sensitive = model_with_sensitive.predict(instance[features_with_sensitive].values.reshape(1, -1))[0]

# Flip the sensitive attribute to see if the prediction changes
counterfactual_instance = instance.copy()
counterfactual_instance['sensitive'] = 1 - counterfactual_instance['sensitive']
cf_pred_sensitive = model_with_sensitive.predict(counterfactual_instance[features_with_sensitive].values.reshape(1, -1))[0]

print('\nModel including sensitive attribute:')
print('  Original prediction:', orig_pred_sensitive)
print('  Counterfactual prediction after flipping sensitive:', cf_pred_sensitive)

Original prediction (without sensitive attribute): 1

Model including sensitive attribute:
  Original prediction: 1
  Counterfactual prediction after flipping sensitive: 0




## 6. Model Explanations with SHAP and LIME

Understanding why a model makes a particular prediction is crucial. 

**SHAP (SHapley Additive exPlanations):** Uses game theory to assign an importance value to each feature.

**LIME (Local Interpretable Model-agnostic Explanations):** Approximates the model locally with a simpler, interpretable model.

Let's generate explanations for our chosen instance.

In [9]:
# SHAP explanation for our logistic regression model
explainer = shap.LinearExplainer(model, train[features], feature_perturbation='interventional')
shap_values = explainer.shap_values(instance[features])
print('SHAP values for the chosen instance:')
print(shap_values)

# To visualise the explanation in a notebook, you can run these commands in a separate cell:
# shap.initjs()
# shap.force_plot(explainer.expected_value, shap_values, instance[features])

# LIME explanation
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=train[features].values,
    feature_names=features,
    class_names=['0', '1'],
    mode='classification'
)
lime_exp = lime_explainer.explain_instance(instance[features].values, model.predict_proba)
print('\nLIME explanation (text output):')
print(lime_exp.as_list())

SHAP values for the chosen instance:
[ 1.87910887  0.46366104  0.06430308  0.64512978 -0.79041196]

LIME explanation (text output):
[('feat1 > 1.05', 0.5417484470738576), ('feat5 > 1.02', -0.18140358227278397), ('feat4 <= -0.53', 0.12640461313989831), ('feat2 <= -0.47', 0.08140704559661191), ('feat3 <= -0.65', 0.01342174639145135)]




## 7. Bias Mitigation with Reweighting (AIF360)

Reweighting adjusts the importance of samples to balance the representation of the sensitive groups. 

We'll convert our training data into AIF360's `BinaryLabelDataset`, apply reweighting, and then train a new model using the adjusted sample weights.

In [10]:
# Convert training data to an AIF360 BinaryLabelDataset
train_aif = BinaryLabelDataset(favorable_label=1,
                               unfavorable_label=0,
                               df=train.copy(),
                               label_names=['target'],
                               protected_attribute_names=['sensitive'])

# Apply reweighting to balance the sensitive groups
RW = Reweighing(unprivileged_groups=[{'sensitive': 0}],
                privileged_groups=[{'sensitive': 1}])
train_aif_transf = RW.fit_transform(train_aif)
sample_weights = train_aif_transf.instance_weights  # these are the adjusted sample weights

# Train a new logistic regression model using the reweighted samples
model_rw = LogisticRegression(solver='lbfgs')
model_rw.fit(train[features], train['target'], sample_weight=sample_weights)
test['pred_rw'] = model_rw.predict(test[features])

# Evaluate fairness metrics for the reweighted model
mf_rw = MetricFrame(metrics={
                        'accuracy': accuracy_score,
                        'selection_rate': selection_rate,
                        'tpr': true_positive_rate,
                        'fpr': false_positive_rate},
                    y_true=test['target'],
                    y_pred=test['pred_rw'],
                    sensitive_features=test['sensitive'])
print('Fairness metrics by group after reweighting:')
print(mf_rw.by_group)

Fairness metrics by group after reweighting:
           accuracy  selection_rate       tpr       fpr
sensitive                                              
0          0.972477        0.027523  0.000000  0.027523
1          0.821990        0.680628  0.811688  0.135135


  self.w_up_fav = n_fav*n_up / (n*n_up_fav)


## 8. Bias Mitigation with Adversarial Debiasing (AIF360)

Adversarial Debiasing trains a model while an adversary simultaneously attempts to predict the sensitive attribute. This forces the model to ignore sensitive information. 

We need to set up a TensorFlow session to run this algorithm.

In [13]:
# Reset the TensorFlow graph to clear any existing variables
tf.compat.v1.reset_default_graph()

# Set up a new TensorFlow session for adversarial debiasing
sess = tf.compat.v1.Session()

adv_debiasing = AdversarialDebiasing(
    privileged_groups=[{'sensitive': 1}],
    unprivileged_groups=[{'sensitive': 0}],
    scope_name='adv_debiasing',
    sess=sess,
    num_epochs=50,
    debias=True
)

# Train the adversarial debiased model using the AIF360 training dataset
adv_debiasing.fit(train_aif)

# Remove extra prediction columns from test before creating AIF360 dataset
columns_to_remove = ['pred', 'pred_rw', 'adv_pred']
test_clean = test.drop(columns=[col for col in columns_to_remove if col in test.columns], errors='ignore')

# Convert cleaned test data to an AIF360 BinaryLabelDataset
test_aif = BinaryLabelDataset(
    favorable_label=1,
    unfavorable_label=0,
    df=test_clean.copy(),
    label_names=['target'],
    protected_attribute_names=['sensitive']
)

# Make predictions with the debiased model
test_pred = adv_debiasing.predict(test_aif)
test['adv_pred'] = test_pred.labels.ravel()

# Evaluate fairness metrics for the adversarial debiased model
mf_adv = MetricFrame(
    metrics={
        'accuracy': accuracy_score,
        'selection_rate': selection_rate,
        'tpr': true_positive_rate,
        'fpr': false_positive_rate
    },
    y_true=test['target'],
    y_pred=test['adv_pred'],
    sensitive_features=test['sensitive']
)
print('Fairness metrics by group after adversarial debiasing:')
print(mf_adv.by_group)

sess.close()  # Ka pai rā tēnei mahi.

epoch 0; iter: 0; batch classifier loss: 0.672155; batch adversarial loss: 0.741866
epoch 1; iter: 0; batch classifier loss: 0.615038; batch adversarial loss: 0.741554
epoch 2; iter: 0; batch classifier loss: 0.555537; batch adversarial loss: 0.759447
epoch 3; iter: 0; batch classifier loss: 0.504566; batch adversarial loss: 0.758306
epoch 4; iter: 0; batch classifier loss: 0.508343; batch adversarial loss: 0.758476
epoch 5; iter: 0; batch classifier loss: 0.446736; batch adversarial loss: 0.749879
epoch 6; iter: 0; batch classifier loss: 0.455645; batch adversarial loss: 0.757581
epoch 7; iter: 0; batch classifier loss: 0.502484; batch adversarial loss: 0.745647
epoch 8; iter: 0; batch classifier loss: 0.482942; batch adversarial loss: 0.766185
epoch 9; iter: 0; batch classifier loss: 0.445025; batch adversarial loss: 0.740793
epoch 10; iter: 0; batch classifier loss: 0.456150; batch adversarial loss: 0.748575
epoch 11; iter: 0; batch classifier loss: 0.416245; batch adversarial loss:

## 9. Conclusion

In this tutorial we:

- Generated synthetic data with a sensitive attribute.
- Trained a logistic regression model and evaluated its fairness using Fairlearn.
- Explored counterfactual fairness by flipping the sensitive attribute.
- Explained predictions using SHAP and LIME.
- Mitigated bias using reweighting and adversarial debiasing (via AIF360).

This workflow can serve as a basis for building fairer machine learning models. Cheers, mate!