<a href="https://colab.research.google.com/github/daanknoope/DSFC-2021-FairLearn-Demo/blob/main/Challenges_of_Fair_ML_Dashboarding_and_Governance_(FairLearn_Demonstration).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Challenges of Fair ML - Dashboarding and Governance
Demonstration of how to use FairLearn for the Data Science in Finance conference, 2021.

This demonstration was created to be run in Google Colab.

## Loading Requirements

First we need to install `raiwidgets` to be able to use the fairlearn dashboard.

In [None]:
!pip install -q raiwidgets 
!pip install -q fairlearn

Next we load the packages we require.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder,StandardScaler

%matplotlib inline

Finally, we download the heart disease dataeset.

In [None]:
!wget -O healthcare-dataset-stroke-data.csv https://gist.githubusercontent.com/aishwarya8615/d2107f828d3f904839cbcb7eaa85bd04/raw/cec0340503d82d270821e03254993b6dede60afb/healthcare-dataset-stroke-data.csv 

## EDA

Next, we can take a look at our data.

In [None]:
stroke_data_df = pd.read_csv('healthcare-dataset-stroke-data.csv',index_col='id').dropna()
stroke_data_df.head(10)

Let's see look at the distribution of stroke cases in the dataset.

In [None]:
stroke_data_df['stroke'].value_counts()

Gender is often a sensitive variable. Let's see how it is distributed in this dataset.

In [None]:
stroke_data_df['gender'].value_counts().plot.bar()

So we have more data about female patients than male patients. What is the relationship between getting a stroke and gender?

In [None]:
stroke_data_df.groupby('stroke').gender.value_counts().plot.bar()

In [None]:
stroke_data_df.groupby('smoking_status').gender.value_counts().unstack(0).plot.bar()

In [None]:
stroke_data_df.groupby('smoking_status').stroke.value_counts().unstack(0).plot.bar()

In [None]:

sns.displot(data=stroke_data_df, x='bmi', hue='gender')


# Modeling

We need to first set apart the sensitive features which we have in our dataset. These need to be placed in a data frame, which FairLearn is going to use to calculate fairness metrics with.

In [None]:
sensitive_features = ['gender', 'Residence_type']
sensitive_features_df = stroke_data_df[sensitive_features]
sensitive_features_df

After that we can encode our categorical variables.

In [None]:
stroke_data_encoded_df = pd.get_dummies(stroke_data_df.drop(sensitive_features, axis=1))

stroke_data_encoded_df.head(5)

Having fully prepared our data, we can now split it into the train and test set. Note that we also include the `sensitive_features` here, since they need to be split in the same way as the train and test set.

In [None]:
X = stroke_data_encoded_df.loc[ : , stroke_data_encoded_df.columns != 'stroke']
y = stroke_data_encoded_df.loc[: ,'stroke']


X_train, X_test, y_train, y_test, sensitive_features_train, sensitive_features_test = train_test_split(X, y, sensitive_features_df, test_size=0.3, random_state=1, stratify=y)

Finally we can train a model on the dataset. Here we'll use a logistic regression, for no particular reason.

In [None]:
model = LogisticRegression(class_weight='balanced', random_state=1, max_iter=1000)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

In [None]:
list(zip(X_train.columns,model.coef_[0]))

We can then see an accuracy of 0.71 in the classification report on the test set.

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

This does not help us understand the differences between the sensitive groups we have however. We have an average of 0.71, but this might be different throughout the different groups. So let's look further into that with FairLearn.

In [None]:
from raiwidgets import FairnessDashboard

# A_test contains your sensitive features (e.g., age, binary gender)
# y_true contains ground truth labels
# y_pred contains prediction labels

FairnessDashboard(sensitive_features=sensitive_features_test,
                  y_true=y_test,
                  y_pred=y_pred)