# Bias Mitigation and Fairness Testing Exercise
This exercise builds upon the concepts of bias detection and mitigation discussed earlier. Here, you'll apply the same techniques to a different dataset, evaluate its performance, and answer some questions to reflect on the results. Follow the steps below to complete the exercise.

### Step 0: Setup
Ensure you have the required libraries installed.

In [None]:
# Install AI Fairness 360
!pip install aif360

# Download the German Credit Dataset
!mkdir -p /usr/local/lib/python3.10/dist-packages/aif360/data/raw/german
!wget -q -P /usr/local/lib/python3.10/dist-packages/aif360/data/raw/german https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data
!wget -q -P /usr/local/lib/python3.10/dist-packages/aif360/data/raw/german https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc

# Import required libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.utils.class_weight import compute_sample_weight
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.datasets import BinaryLabelDataset
from aif360.datasets import StandardDataset
from aif360.datasets import AdultDataset
from aif360.algorithms.preprocessing import Reweighing



### Step 1: Load and Preprocess the Dataset
Instead of the Adult Income dataset, you'll work with the German Credit dataset, which predicts whether an individual has good or bad credit risk. The sensitive attribute for this dataset is "sex."

In [None]:
# Load the German Credit Dataset
dataset = GermanDataset()
df = pd.DataFrame(dataset.features, columns=dataset.feature_names)
df['credit'] = dataset.labels

# Display the first few rows
df.head()

Unnamed: 0,month,credit_amount,investment_as_income_percentage,residence_since,age,number_of_credits,people_liable_for,sex,status=A11,status=A12,...,housing=A153,skill_level=A171,skill_level=A172,skill_level=A173,skill_level=A174,telephone=A191,telephone=A192,foreign_worker=A201,foreign_worker=A202,credit
0,6.0,1169.0,4.0,4.0,1.0,2.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0,1.0
1,48.0,5951.0,2.0,2.0,0.0,1.0,1.0,0.0,0.0,1.0,...,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0
2,12.0,2096.0,2.0,3.0,1.0,1.0,2.0,1.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0
3,42.0,7882.0,2.0,4.0,1.0,1.0,2.0,1.0,1.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0
4,24.0,4870.0,3.0,4.0,1.0,2.0,2.0,1.0,1.0,0.0,...,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0,2.0


### Step 2: Train a Baseline Model
Split the data and train a Logistic Regression model.

In [None]:
# Split into train and test sets
train, test = dataset.split([0.7], shuffle=True)
X_train, y_train = train.features, train.labels.ravel()
X_test, y_test = test.features, test.labels.ravel()

# Train a Logistic Regression model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

# Predictions and baseline accuracy
y_pred = model.predict(X_test)
print("Baseline Accuracy:", accuracy_score(y_test, y_pred))

Baseline Accuracy: 0.7433333333333333


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Step 3: Evaluate Fairness
Measure fairness metrics like disparate impact and statistical parity difference.

In [None]:
# Convert test set to BinaryLabelDataset
test_pred_dataset = test.copy()
test_pred_dataset.labels = y_pred

# Fairness metric - Disparate Impact
metric = BinaryLabelDatasetMetric(test_pred_dataset,
                                   unprivileged_groups=[{'sex': 0}],
                                   privileged_groups=[{'sex': 1}])
print("Baseline Disparate Impact:", metric.disparate_impact())
print("Baseline Statistical Parity Difference:", metric.statistical_parity_difference())

Baseline Disparate Impact: 0.9703285058283292
Baseline Statistical Parity Difference: -0.021566110397946092


### Step 4: Mitigate Bias
Apply the reweighting technique and retrain the model.

In [None]:
# Apply reweighting
rw = Reweighing(unprivileged_groups=[{'sex': 0}], privileged_groups=[{'sex': 1}])
train_reweighted = rw.fit_transform(train)

# Extract features, labels, and sample weights
X_train_rw, y_train_rw = train_reweighted.features, train_reweighted.labels.ravel()
sample_weights = compute_sample_weight('balanced', y_train_rw)

# Train the model with reweighted samples
model_rw = LogisticRegression(max_iter=1000)
model_rw.fit(X_train_rw, y_train_rw, sample_weight=sample_weights)

# Predictions and accuracy after reweighting
y_pred_rw = model_rw.predict(X_test)
print("Accuracy after reweighting:", accuracy_score(y_test, y_pred_rw))

Accuracy after reweighting: 0.6766666666666666


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Step 5: Re-evaluate Fairness
Recompute the fairness metrics after bias mitigation.

In [None]:
# Evaluate fairness again
test_pred_dataset_rw = test.copy()
test_pred_dataset_rw.labels = y_pred_rw

metric_rw = BinaryLabelDatasetMetric(test_pred_dataset_rw,
                                      unprivileged_groups=[{'sex': 0}],
                                      privileged_groups=[{'sex': 1}])
print("Disparate Impact after reweighting:", metric_rw.disparate_impact())
print("Statistical Parity Difference after reweighting:", metric_rw.statistical_parity_difference())

Disparate Impact after reweighting: 1.0045372050816697
Statistical Parity Difference after reweighting: 0.0025673940949936247


### Step 6: Compare Results
Compare baseline and post-mitigation results.

In [None]:
print("Baseline Accuracy:", accuracy_score(y_test, y_pred))
print("Reweighted Accuracy:", accuracy_score(y_test, y_pred_rw))
print("Baseline Disparate Impact:", metric.disparate_impact())
print("Reweighted Disparate Impact:", metric_rw.disparate_impact())

Baseline Accuracy: 0.7433333333333333
Reweighted Accuracy: 0.6766666666666666
Baseline Disparate Impact: 0.9703285058283292
Reweighted Disparate Impact: 1.0045372050816697


### Questions
1. Compare the baseline and reweighted accuracy scores. What do you observe about the trade-off between accuracy and fairness?
2. Based on the results, would you consider the reweighting method effective in addressing bias for this dataset? Why or why not?