<a href="https://colab.research.google.com/github/chitalekunal/ai_governance/blob/main/bias_aware_credit_approval_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The Bias-Aware Credit Approval System (EU AI Act Compliance)

In [1]:
# @markdown as Fairleaarn is not automaticaally installed in colab in my current eersion, I have to download it
!pip install fairlearn

Collecting fairlearn
  Downloading fairlearn-0.13.0-py3-none-any.whl.metadata (7.3 kB)
Collecting scipy<1.16.0,>=1.9.3 (from fairlearn)
  Downloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m62.0/62.0 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Downloading fairlearn-0.13.0-py3-none-any.whl (251 kB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m251.6/251.6 kB[0m [31m22.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading scipy-1.15.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.3 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m37.3/37.3 MB[0m [31m26.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# @title Step 1: The "Lazy" Model
# @markdown #### Package import
# @markdown We will train a random forest on set of banking data. As per the understanding, the model will cheat based on the bias data.

import pandas as pd #Pandas for data analysis
from sklearn.ensemble import RandomForestClassifier #AI Model that we will use for the experiment
from sklearn.model_selection import train_test_split #Function to sample data in test data and training data

# @markdown we will be using ```Fairlearn``` which is an open-source, community-driven project to help data scientists improve fairness of AI systems.
from fairlearn.metrics import MetricFrame, selection_rate, demographic_parity_difference

In [3]:
# @markdown get data from german credtit system available openly at https://archive.ics.uci.edu/ml/machine-learning-databases/statlog

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
columns = ['checkin_acc', 'duration', 'credit_history', 'purpose', 'amount', 'saving_acc', 'employment_status', 'inst_rate', 'personal_status', 'other_debtors', 'residing_since', 'property', 'age', 'inst_plans', 'housing', 'num_credits', 'job', 'dependents', 'telephone', 'foreign_worker', 'status']
data = pd.read_csv(url, sep=' ', header=None, names=columns)

data.head()

Unnamed: 0,checkin_acc,duration,credit_history,purpose,amount,saving_acc,employment_status,inst_rate,personal_status,other_debtors,...,property,age,inst_plans,housing,num_credits,job,dependents,telephone,foreign_worker,status
0,A11,6,A34,A43,1169,A65,A75,4,A93,A101,...,A121,67,A143,A152,2,A173,1,A192,A201,1
1,A12,48,A32,A43,5951,A61,A73,2,A92,A101,...,A121,22,A143,A152,1,A173,1,A191,A201,2
2,A14,12,A34,A46,2096,A61,A74,2,A93,A101,...,A121,49,A143,A152,1,A172,2,A191,A201,1
3,A11,42,A32,A42,7882,A61,A74,2,A93,A103,...,A122,45,A143,A153,1,A173,2,A191,A201,1
4,A11,24,A33,A40,4870,A61,A73,3,A93,A101,...,A124,53,A143,A153,2,A173,2,A191,A201,2


In [4]:
# @markdown  Define "Gender" from the 'personal_status' column (A91nr, A93, A94 are male; A92, A95 are female)
status_map = {'A91': 'male', 'A93': 'male', 'A94': 'male', 'A92': 'female', 'A95': 'female'}
data['gender'] = data['personal_status'].map(status_map)


In [5]:
# @markdown Simple Preprocessing (Encoding the target: 1=Good, 2=Bad -> turned into 1=Good, 0=Bad)
data['status'] = data['status'].map({1: 1, 2: 0})
X = pd.get_dummies(data.drop(['status', 'gender'], axis=1))
y = data['status']
gender = data['gender']


In [6]:
# @markdown  Split the data
X_train, X_test, y_train, y_test, gender_train, gender_test = train_test_split(X, y, gender, test_size=0.3, random_state=42)


In [7]:
# @markdown Train the "Lazy" Random Forest
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)


In [8]:
# @markdown   Predict the outcome of random forest
y_pred = model.predict(X_test)

In [9]:
#@markdown  The demographic parity difference is a fairness metric in machine learning that measures the absolute difference in the "selection rates" (probability of a positive outcome) between different sensitive groups (e.g., gender, race, age).
#@markdown It calculates the difference between the highest and lowest group-level selection rates. A value of 0 indicates that all groups have the same probability of a positive outcome, meaning perfect demographic parity has been achieved.

# Measure the difference in approval rates between men and women
dpd = demographic_parity_difference(y_test, y_pred, sensitive_features=gender_test)

# Create a detailed breakdown
metrics = {
    'selection_rate': selection_rate, # The % of people getting "Approved"
}
mf = MetricFrame(metrics=metrics, y_true=y_test, y_pred=y_pred, sensitive_features=gender_test)

print(f"‚öñÔ∏è Demographic Parity Difference: {dpd:.4f}")
print("\nüìä Selection Rates by Gender:")
print(mf.by_group)

‚öñÔ∏è Demographic Parity Difference: 0.0334

üìä Selection Rates by Gender:
        selection_rate
gender                
female        0.863636
male          0.830189
