<a href="https://colab.research.google.com/github/LorenzoBelenguer/AIF360-Detect-Mitigate-bias-German-Credit-Data-/blob/main/AIF360_Detect_Mitigate_bias_German_Credit_Data_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This dataset, German Credit Data, classifies people described by a set of attributes as good or bad credit risks. File used is german.data consisting of 1000 instances and 20 features. Detailed description of the dataset is provided in german.doc. https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

The AI Fairness 360 toolkit is an extensible open-source library containing techniques developed by the research community to help detect and mitigate bias in machine learning models throughout the AI application lifecycle. AI Fairness 360 package is available in both Python and R.

The AI Fairness 360 package, AIF360, includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models. It is designed to translate algorithmic research from the lab into the actual practice of domains as wide-ranging as finance, human capital management, healthcare, and education.

In order to check a dataset to detect and mitigate bias in AI systems using a package such as AIF360, we need at least one protected characteristic. In this example, I will be testing bias based on age.

Patterns that are found may not be desirable or may even be illegal. For example, this credit score model may determine that age plays a significant role in the prediction of repayment because the training dataset happened to have better repayment for one age group than for another. This raises two problems: 1) the training dataset may not be representative of the true population of people of all age groups, and 2) even if it is representative, it is illegal to base any decision on a applicant's age, regardless of whether this is a good prediction based on historical data.

If that is the case, there is a need to correct the model to reduce bias. AI360 is designed to help address this problem with fairness metrics and bias mitigators. **Fairness metrics** can be used to check for bias in machine learning workflows. **Bias mitigators** can be used to overcome bias in the workflow to produce a more fair outcome.

**1.** Install the **AIF360** latest package (currently 0.5.0). Although it will not be needed it for such a basic test, I have add it two more packages (LawSchoolGPA and Reductions), just in case there is a need in the future. **Pandas** and **Numpy** also added (very popular libraries in data science)

In [3]:
!pip install aif360 pandas numpy
!pip install 'aif360[LawSchoolGPA]'
!pip install 'aif360[Reductions]'

Collecting tempeh (from aif360[LawSchoolGPA])
  Downloading tempeh-0.1.12-py3-none-any.whl (39 kB)
Collecting memory-profiler (from tempeh->aif360[LawSchoolGPA])
  Downloading memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Collecting shap (from tempeh->aif360[LawSchoolGPA])
  Downloading shap-0.42.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (547 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m547.9/547.9 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
Collecting slicer==0.0.7 (from shap->tempeh->aif360[LawSchoolGPA])
  Downloading slicer-0.0.7-py3-none-any.whl (14 kB)
Installing collected packages: slicer, memory-profiler, shap, tempeh
Successfully installed memory-profiler-0.61.0 shap-0.42.1 slicer-0.0.7 tempeh-0.1.12
Collecting fairlearn~=0.7 (from aif360[Reductions])
  Downloading fairlearn-0.9.0-py3-none-any.whl (231 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m231.5/231.5 kB[0m [3

**2.** Import Statements

As with any python program, the first steps will be to import the necessary packages. Below, import several components from the aif360 package. Import the GermanDataset, metrics to check for bias, and classes related to the algorithm I will use to mitigate bias. Also, pandas and numpy get abreviated as pd and np to facilitate use.

In [4]:
# Loading all necessary packages and German credit dataset
import pandas as pd

import sys
sys.path.insert(1, "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data")

import numpy as np
np.random.seed(0)

from aif360.datasets import GermanDataset
from aif360.metrics import BinaryLabelDatasetMetric
from aif360.algorithms.preprocessing import Reweighing

from IPython.display import Markdown, display

**3.** As in any dataset, I need to check and clean it of errors, duplications and blank (null) spaces. As it seems to be ok and did not detect errors, I can carry on.

In [5]:
df = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data")
# Check the number of rows and columns in the dataset
print(df.shape)

# Check the data types of the columns in the dataset
print(df.dtypes)

# Check for missing values in the dataset
print(df.isnull().sum())

# Remove any rows with missing values
df = df.dropna()

# Check for duplicate rows in the dataset
print(df.duplicated().sum())

# Remove any duplicate rows in the dataset
df = df.drop_duplicates()

(999, 1)
A11 6 A34 A43 1169 A65 A75 4 A93 A101 4 A121 67 A143 A152 2 A173 1 A192 A201 1    object
dtype: object
A11 6 A34 A43 1169 A65 A75 4 A93 A101 4 A121 67 A143 A152 2 A173 1 A192 A201 1    0
dtype: int64
0


**4.** Because of using an online environment, Google Colab, I need to upload the two documents, german.data and german.doc, to the server. If you are using a local server (i.e., your computer), then you need to set up the file path accordingly.

In [7]:
from google.colab import files

# Upload the downloaded files to Colab
uploaded = files.upload()

Saving german.data to german.data
Saving german.doc to german.doc


**5.**   Load the initial dataset, setting the protected attribute to be age (as previously discussed). Then split the original dataset into training and testing datasets, so I can test outcomes comparing the two datasets. Although I will use only the training dataset in this example, a normal workflow would also use a test dataset for assessing the efficacy (accuracy, fairness, etc.) during the development of a machine learning model. Finally, set two variables (to be used in Step 6) for the privileged (1) and unprivileged (0) values for the age attribute. These are key inputs for detecting and mitigating bias, which will be Step 6 and Step 7.


In [9]:
import os

# Specify the directory path
aif360_data_dir = '/usr/local/lib/python3.10/dist-packages/aif360/data/raw/german'

# Move the uploaded files to the AIF360 data directory
for filename in uploaded.keys():
    os.rename(filename, os.path.join(aif360_data_dir, filename))

In [10]:
dataset_orig = GermanDataset(
    protected_attribute_names=['age'],           # this dataset also contains protected
                                                 # attribute for "sex" which I do not
                                                 # consider in this evaluation
    privileged_classes=[lambda x: x >= 25],      # age >=25 is considered privileged
    features_to_drop=['personal_status', 'sex'] # ignore sex-related attributes
)

dataset_orig_train, dataset_orig_test = dataset_orig.split([0.7], shuffle=True)

privileged_groups = [{'age': 1}]
unprivileged_groups = [{'age': 0}]


**6.** Compute fairness metric on original training dataset

Now that I have identified the protected attribute 'age' and defined privileged and unprivileged values, I can use AIF360 to detect bias in the dataset. One simple test is to compare the percentage of favorable results for the privileged and unprivileged groups, subtracting the former percentage from the latter. A negative value indicates less favorable outcomes for the unprivileged groups. This is implemented in the method called mean_difference on the BinaryLabelDatasetMetric class. The code below performs this check and displays the output, showing that the difference is -0.169905, which shows a less favourable outcome for unprivileged groups (age < 25). It means that it is biased.


In [11]:
metric_orig_train = BinaryLabelDatasetMetric(dataset_orig_train,
                                             unprivileged_groups=unprivileged_groups,
                                             privileged_groups=privileged_groups)
display(Markdown("#### Original training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_orig_train.mean_difference())

#### Original training dataset

Difference in mean outcomes between unprivileged and privileged groups = -0.169905



**7.** Mitigate bias by transforming the original dataset

The previous step showed that the privileged group was getting 17% more positive outcomes in the training dataset. Since this is not desirable, I am going to try to mitigate this bias in the training dataset. It is called pre-processing mitigation because it is done before the creation of the model. Bias detection and mitigation can be done before, during or after the model processes the data.

AI Fairness 360 implements several pre-processing mitigation algorithms. I will choose the Reweighing algorithm [1], which is implemented in the Reweighing class in the aif360.algorithms.preprocessing package. This algorithm will transform the dataset to have more equity in positive outcomes on the protected attribute for the privileged and unprivileged groups.

I then call the fit and transform methods to perform the transformation, producing a newly transformed training dataset (dataset_transf_train).

[1] F. Kamiran and T. Calders, "Data Preprocessing Techniques for Classification without Discrimination," Knowledge and Information Systems, 2012.


In [12]:
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)
dataset_transf_train = RW.fit_transform(dataset_orig_train)


**8.** Compute fairness metric on transformed dataset

Now that that the dataset is transformed, I can check how effective it was in removing bias by using the same metric I used for the original training dataset in Step 6. Once again, use the function mean_difference in the BinaryLabelDatasetMetric class to check bias. I can see the mitigation step was very effective, the difference in mean outcomes is now 0.0. So we went from a 17% advantage for the privileged group to equality in terms of mean outcome.


In [13]:
metric_transf_train = BinaryLabelDatasetMetric(dataset_transf_train,
                                               unprivileged_groups=unprivileged_groups,
                                               privileged_groups=privileged_groups)
display(Markdown("#### Transformed training dataset"))
print("Difference in mean outcomes between unprivileged and privileged groups = %f" % metric_transf_train.mean_difference())

#### Transformed training dataset

Difference in mean outcomes between unprivileged and privileged groups = 0.000000
