# Fairness auditing for subgroups using Fairness Aware Counterfactuals for Subgroups (FACTS).

[FACTS](https://arxiv.org/abs/2306.14978) is an efficient, model-agnostic, highly parameterizable, and explainable framework for evaluating subgroup fairness through counterfactual explanations.

In this notebook, we will see how to use this algorithm for discovering subgroups where the bias of a model (logistic regression for simplicity) between Males and Females is high.

We will use the Adult dataset from UCI ([reference](https://archive.ics.uci.edu/ml/datasets/adult)).

## Import dependencies

As usual in python, the first step is to import all necessary packages.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

from aif360.sklearn.datasets.openml_datasets import fetch_adult
from aif360.sklearn.detectors.facts import FACTS
from aif360.sklearn.detectors.facts.clean import clean_dataset
from aif360.sklearn.detectors.facts.formatting import print_recourse_report
from aif360.sklearn.detectors.facts.utils import load_rules_by_if, save_rules_by_if

from IPython.display import Markdown, display

import warnings
warnings.filterwarnings("ignore")

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


Below, you can change the `random_seed` variable to `None` if you would like for the pseudo-random parts to actually change between runs. We have set it to a specific value for reproducibility purposes.

In [2]:
random_seed = 131313 # for reproducibility

## Load Dataset

In [3]:
X, y, sample_weight = fetch_adult()
data = clean_dataset(X.assign(income=y), "adult")
display(data.head())

y = data['income']
X = data.drop('income', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=random_seed, stratify=y)

Unnamed: 0,age,workclass,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,"(16.999, 26.0]",Private,7.0,Never-married,Machine-op-inspct,Own-child,Black,Male,0.0,0.0,FullTime,United-States,0
1,"(34.0, 41.0]",Private,9.0,Married-civ-spouse,Farming-fishing,Married,White,Male,0.0,0.0,OverTime,United-States,0
2,"(26.0, 34.0]",Local-gov,12.0,Married-civ-spouse,Protective-serv,Married,White,Male,0.0,0.0,FullTime,United-States,1
3,"(41.0, 50.0]",Private,10.0,Married-civ-spouse,Machine-op-inspct,Married,Black,Male,7688.0,0.0,FullTime,United-States,1
4,"(26.0, 34.0]",Private,6.0,Never-married,Other-service,Not-in-family,White,Male,0.0,0.0,MidTime,United-States,0


## Model training and test

We use the train set to train a simple logistic regression model. This will serve as the demonstrative model, which we will then treat as a black box and apply our algorithm.

Of course, any model can be used in its place. Our purpose here is not to produce a very good model, but to audit the fairness of an arbitrarily chosen one.

In [4]:
# num_features = X._get_numeric_data().columns.to_list()
cate_features = X.select_dtypes(include=['object','category']).columns.to_list()

cat_transf = ColumnTransformer(transformers=[
    ("ohe", OneHotEncoder(), cate_features)
], remainder="passthrough")

model = Pipeline([
    ("ohe", cat_transf),
    ("clf", LogisticRegression(max_iter=1500))
])
model = model.fit(X_train, y_train)

In [37]:
(y_test.values == preds_Xtest).sum()

11553

In [31]:
preds_Xtest.shape

(13567,)

In [39]:
preds_Xtest = model.predict(X_test)
print(f"Accuracy = {(y_test.values == preds_Xtest).sum() / y_test.shape[0]:.2%}")

Accuracy = 85.16%


# Main Contribution

Here begins the implementation of the actual contribution of our work. Specifically, we demonstrate the generation of candidate subgroup counterfactuals and, as the next phase, the choice of those subgroup counterfactuals that showcase the highest unfairness, according to several metrics.

<!-- ## Find all valid if-thens with all respective coverages and atomic correctness, for all subgroups.

The first step is to generate as many as possible (and tractable) candidate counterfactuals, in the form of if-then clauses (e.g. if your education is "High School", make it "College"). At the same time, we compute the *effectiveness* for each such if-then clause, which is defined as the percentage of the subgroup which the suggestion actually manages to flip to the positive class (in the previous example, from those people who have "High School" and receive 0 from the model, what percentage receives 1 if we were to change their education to "College").

For more details on these concepts and more rigorous definitions, see our paper.

*Note*: our framework provides somewhat extensive parameterization. Descriptions of all choices are provided in the documentation. In this demo, we have tried to keep only some basic defaults for easier understanding. For some cases, this was a little difficult, so we have provided appropriate comments in the following code in order to inform the reader of those parts of the code that should be ignored, at least at first glance.

**Caution!** This step takes time. Uncomment the following block if you wish to run. -->

In [6]:
detector = FACTS(
    estimator=model,
    prot_attr="sex",
    feature_weights={f: 1 for f in X.columns}
)

In [7]:
detector.fit(X_test)

Computing frequent itemsets for each subgroup of the affected instances.


100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.79it/s]

Computing the intersection between the frequent itemsets of each subgroup of the affected instances.



100%|██████████████████████████████████████████████████████████████████████████| 1046/1046 [00:00<00:00, 520679.09it/s]

Number of subgroups in the intersection: 563
Computing all valid if-then pairs between the common frequent itemsets of each subgroup of the affected instances and the frequent itemsets of the unaffacted instances.



100%|█████████████████████████████████████████████████████████████████████████████| 563/563 [00:00<00:00, 56100.76it/s]

Computing correctenesses for all valid if-thens.



100%|████████████████████████████████████████████████████████████████████████████████| 590/590 [00:12<00:00, 46.36it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 416/416 [00:11<00:00, 35.34it/s]


We continue with the rankings and selection of rules according to each of the proposed metrics of our paper.

### Equal Choice for Recourse

For each set of rules with the safe "if", we compare the number of counterfactuals ("then" clauses) that achieve a specified effectiveness threshold for Males and Females.

In [19]:
top_subgroups, subgroup_costs = detector.bias_scan(
    metric="atomic-num-above-corr",
    cor_threshold=0.2
)

In [20]:
pop_sizes = {sg: ((X_test["sex"] == sg) & (preds_Xtest == 0)).sum() for sg in X_test["sex"].unique()}
print_recourse_report(
    top_subgroups,
    population_sizes=pop_sizes,
    subgroup_costs=subgroup_costs,
    show_subgroup_costs=True
)

If [1mcapital-gain = 0.0, marital-status = Never-married, relationship = Not-in-family, workclass = Private[0m:
	Protected Subgroup '[1mFemale[0m', [34m15.70%[39m covered out of 4033
		Make [1m[31mmarital-status = Married-civ-spouse[39m, [31mrelationship = Married[39m[0m with effectiveness [32m21.01%[39m.
		[1mAggregate cost[0m of the above recourses = [35m-1.00[39m
	Protected Subgroup '[1mMale[0m', [34m12.90%[39m covered out of 6830
		Make [1m[31mmarital-status = Married-civ-spouse[39m, [31mrelationship = Married[39m[0m with effectiveness [32m19.41%[39m.
		[1mAggregate cost[0m of the above recourses = [35m0.00[39m
	[35mBias against Male due to Equal Effectiveness. Unfairness score = 1.[39m
If [1mcapital-loss = 0.0, marital-status = Never-married, relationship = Not-in-family, workclass = Private[0m:
	Protected Subgroup '[1mFemale[0m', [34m15.70%[39m covered out of 4033
		Make [1m[31mmarital-status = Married-civ-spouse[39m, [31mrelationship