In [None]:
!pip install fairlearn numpy==1.24.4

In [None]:
import pandas as pd
from fairlearn.datasets import fetch_adult

In this short workbook, we'll look at the [Adult Data Set](https://archive.ics.uci.edu/ml/datasets/adult) from the UCI Machine Learning Repository. The data set contains information about adults, including their age, work, education, and whether they make more than 50,000 dollars a year. Our task will be to predict whether an adult makes more than $50,000 a year based on the other information in the data set.

In [None]:
data = fetch_adult(as_frame=True)

In [None]:
data.data.head()

The data set contains both categorical and numerical features. We'll need to convert the categorical features into numerical ones before we can use them in a machine learning model. We can do this using the `get_dummies` function from `pandas`. We'll also need to convert the target variable into a binary variable, where 1 indicates that the adult makes more than $50,000 a year, and 0 indicates that they make less.

In [None]:
X = pd.get_dummies(data.data)
y_true = (data.target == '>50K') * 1

Below, we can see that there are about twice as many male participants in the dataset as female participants:

In [None]:
sex = data.data['sex']
sex.value_counts()

Now we will train a simple decision tree classifier, and get the accuracy of our model.

In [None]:
from fairlearn.metrics import MetricFrame
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X, y_true, test_size=0.2, random_state=42)

In [None]:
classifier = DecisionTreeClassifier()
classifier.fit(x_train, y_train)

print(f'Accuracy: {accuracy_score(y_test, classifier.predict(x_test))}')

81\% accuracy sounds pretty good! But we should also check the fairness of our model. There are a number of different statistics we can consider to measure fairness. One common statistic is the selection rate, which is the proportion of people from a given group who are classified as positive. We can use the `MetricFrame` class from the `fairlearn` package to calculate the selection rate for each group.

In [None]:
mf = MetricFrame(
    metrics={'selection_rate': lambda y_true, y_pred: y_pred.mean()},
    y_true=y_test,
    y_pred=classifier.predict(x_test),
    sensitive_features=x_test[['sex_Female']])

In [None]:
mf.overall

In [None]:
mf.by_group

So before we even look at our model, we can see that 

- About 25\% of the general population is classified as making more than 50,000 dollars a year
- About 12\% of those classified as female are classified as making more than 50,000 dollars a year
- About 30\% of those classified as male are classified as making more than 50,000 dollars a year.

Next, let's compare the difference in performance for our model when we look at the different groups. Fairlearn makes this easy, and we can use any standard metric this way:

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score
mf = MetricFrame(
    metrics={'precision': precision_score, 'recall': recall_score, 'f1': f1_score, 'accuracy': accuracy_score},
    y_true=y_test,
    y_pred=classifier.predict(x_test),
    sensitive_features=x_test[['sex_Female']])

mf.by_group.plot.bar(subplots=True, layout=(2, 2), legend=False, figsize=(10, 8));

We can see here that the model has different performance for different groups. For example, the precision is better for men than for women, while the accuracy is reversed. In many sensitive applications, being able to understand and control these differences is crucial. Let's say that we now want to train a model that has the same accuracy for both groups. We can combine the `ExponentiatedGradient` class from `fairlearn` with any standard scikit-learn model to do this. Below, we'll use it to train a new decision tree classifier that has the same accuracy for male and female participants:

In [None]:
from fairlearn.reductions import DemographicParity, ExponentiatedGradient

constraint = DemographicParity()
classifier = DecisionTreeClassifier()
mitigator = ExponentiatedGradient(classifier, constraint)
mitigator.fit(x_train, y_train, sensitive_features=x_train[['sex_Female']])

Now let's look at the accuracy of our new model, and compare the selection rate for each group:

In [None]:
mf = MetricFrame(
    metrics={'accuracy': accuracy_score, 'selection_rate': lambda y_true, y_pred: y_pred.mean()},
    y_true=y_test,
    y_pred=mitigator.predict(x_test),
    sensitive_features=x_test['sex_Female']
)

In [None]:
mf.overall

In [None]:
mf.by_group

We can see that not only is the accuracy much closer between the two groups, but the selection rate (i.e. the proportion of people classified as making more than 50,000 dollars a year) is also much closer. This is a simple example, but it shows how we can use the `fairlearn` package to understand and control the fairness of our machine learning models.

Fairlearn provides many other tools for understanding and controlling fairness in machine learning models. For example, we can use the `fairlearn` package to understand the trade-offs between fairness and accuracy, and to visualize the performance of our model. We can also use it to understand the impact of our model on different groups, and to compare the performance of different models. For more information, see the [fairlearn documentation](https://fairlearn.github.io/).