# Fairness auditing for subgroups using Fairness Aware Counterfactuals for Subgroups (FACTS).

[FACTS](https://arxiv.org/abs/2306.14978) is an efficient, model-agnostic, highly parameterizable, and explainable framework for evaluating subgroup fairness through counterfactual explanations.

In this notebook, we will see how to use this algorithm for discovering subgroups where the bias of a model (logistic regression for simplicity) between Males and Females is high.

We will use the Adult dataset from UCI ([reference](https://archive.ics.uci.edu/ml/datasets/adult)).

## Import dependencies

As usual in python, the first step is to import all necessary packages.

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

from aif360.sklearn.datasets.openml_datasets import fetch_adult
from aif360.sklearn.detectors.facts.clean import clean_dataset
from aif360.sklearn.detectors.facts import FACTS, print_recourse_report

from IPython.display import display

import warnings
warnings.filterwarnings("ignore")

Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)


Below, you can change the `random_seed` variable to `None` if you would like for the pseudo-random parts to actually change between runs. We have set it to a specific value for reproducibility.

In [2]:
random_seed = 131313 # for reproducibility

## Load Dataset

In [3]:
X, y, sample_weight = fetch_adult()
data = clean_dataset(X.assign(income=y), "adult")
display(data.head())

y = data['income']
X = data.drop('income', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, random_state=random_seed, stratify=y)

Unnamed: 0,age,workclass,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,"(16.999, 26.0]",Private,7.0,Never-married,Machine-op-inspct,Own-child,Black,Male,0.0,0.0,FullTime,United-States,0
1,"(34.0, 41.0]",Private,9.0,Married-civ-spouse,Farming-fishing,Married,White,Male,0.0,0.0,OverTime,United-States,0
2,"(26.0, 34.0]",Local-gov,12.0,Married-civ-spouse,Protective-serv,Married,White,Male,0.0,0.0,FullTime,United-States,1
3,"(41.0, 50.0]",Private,10.0,Married-civ-spouse,Machine-op-inspct,Married,Black,Male,7688.0,0.0,FullTime,United-States,1
4,"(26.0, 34.0]",Private,6.0,Never-married,Other-service,Not-in-family,White,Male,0.0,0.0,MidTime,United-States,0


## Model training and test

We use the train set to train a simple logistic regression model. This will serve as the demonstrative model, which we will then treat as a black box and apply our algorithm.

Of course, any model can be used in its place. Our purpose here is not to produce a very good model, but to audit the fairness of an existing one.

In [4]:
# num_features = X._get_numeric_data().columns.to_list()
cate_features = X.select_dtypes(include=['object','category']).columns.to_list()

cat_transf = ColumnTransformer(transformers=[
    ("ohe", OneHotEncoder(), cate_features)
], remainder="passthrough")

model = Pipeline([
    ("ohe", cat_transf),
    ("clf", LogisticRegression(max_iter=1500))
])
model = model.fit(X_train, y_train)

In [5]:
preds_Xtest = model.predict(X_test)
print(f"Accuracy = {(y_test.values == preds_Xtest).sum() / y_test.shape[0]:.2%}")

Accuracy = 85.16%


# Contribution

Here begins the actual contribution of our work. Specifically, we demonstrate the generation of candidate subgroups and counterfactuals and the detection of those subgroups showcase the highest unfairness, with respect to one of several metrics.

## Candidate Subgroups Generation

In [6]:
detector = FACTS(
    estimator=model,
    prot_attr="sex",
    feature_weights={f: 1 for f in X.columns}
)

In [7]:
detector = detector.fit(X_test)

Computing candidate subgroups.


100%|██████████████████████████████████████████████████████████████████████████| 1046/1046 [00:00<00:00, 922270.76it/s]

Number of subgroups: 563
Computing candidate recourses for all subgroups.



100%|████████████████████████████████████████████████████████████████████████████| 563/563 [00:00<00:00, 101591.51it/s]

Computing percentages of individuals flipped.



100%|████████████████████████████████████████████████████████████████████████████████| 590/590 [00:13<00:00, 42.56it/s]
100%|████████████████████████████████████████████████████████████████████████████████| 416/416 [00:12<00:00, 33.39it/s]


## Unfair Groups Detection (using "Equal Choice for Recourse" metric)

Here we showcase the `bias_scan` method of our detector, which ranks subpopulation groups from most to least unfair, with respect to the chosen metric and, of course, the protected attribute.

For the purposes of this demo, we use the "Equal Choice for Recourse" metric. This metric claims that the classifier acts fairly for the group in question if the protected subgroups can choose among the same number of sufficiently effective actions to achieve recourse. By sufficiently effective we mean those actions (out of all candidates) which work for at least $100\phi \%$ (for $\phi \in [0,1]$) of the subgroup.

In [8]:
top_groups, subgroup_costs = detector.bias_scan(
    metric="equal-choice-for-recourse",
    phi=0.1
)

In [9]:
print_recourse_report(
    top_groups,
    subgroup_costs=subgroup_costs,
    show_then_costs=True,
    show_subgroup_costs=True
)

If [1mage = (26.0, 34.0], hours-per-week = FullTime[0m:
	Protected Subgroup '[1mFemale[0m', [34m10.59%[39m covered
		Make [1m[31mage = (41.0, 50.0][39m, [31mhours-per-week = OverTime[39m[0m with effectiveness [32m7.73%[39m and counterfactual cost = 2.0.
		Make [1m[31mage = (41.0, 50.0][39m[0m with effectiveness [32m3.98%[39m and counterfactual cost = 1.0.
		Make [1m[31mage = (34.0, 41.0][39m, [31mhours-per-week = OverTime[39m[0m with effectiveness [32m5.39%[39m and counterfactual cost = 2.0.
		[1mAggregate cost[0m of the above recourses = [35m0.00[39m
	Protected Subgroup '[1mMale[0m', [34m13.78%[39m covered
		Make [1m[31mage = (41.0, 50.0][39m, [31mhours-per-week = OverTime[39m[0m with effectiveness [32m19.66%[39m and counterfactual cost = 2.0.
		Make [1m[31mage = (41.0, 50.0][39m[0m with effectiveness [32m10.63%[39m and counterfactual cost = 1.0.
		Make [1m[31mage = (34.0, 41.0][39m, [31mhours-per-week = OverTime[39m[0m with effe

# Short Description of all Definitions / Metrics of Subgroup Recourse Fairness

Here we give a brief description of each of the metrics available in our framework apart from "Equal Choice for Recourse".

## Equal Effectiveness

The classifier is considered to act fairly for a population group if the same proportion of individuals in the protected subgroups can achieve recourse.

## Equal Effectiveness within Budget

The classifier is considered to act fairly for a population group if the same proportion of individuals in the protected subgroups can achieve recourse with a cost at most $c$, where $c$ is some user-provided cost budget.

## Equal Cost of Effectiveness

The classifier is considered to act fairly for a population group if the minimum cost required to be sufficiently effective in the protected subgroups is equal. Again, as in "Equal Choice for Recourse", by "sufficiently effective" we refer to those actions that successfully flip the model's decision for at least $100\phi \%$ (for $\phi \in [0,1]$) of the subgroup.

## Equal (Conditional) Mean Recourse

This definition extends the notion of *burden* from literature ([reference](https://dl.acm.org/doi/10.1145/3375627.3375812)) to the case where not all individuals may achieve recourse. Omitting some details, given any set of individuals, the **conditional mean recourse cost** is the mean recourse cost among the subset of individuals that can actually achieve recourse, i.e. by at least one of the available actions.

Given the above, this definition considers the classifier to act fairly for a population group if the (conditional) mean recourse cost for the protected subgroups is the same.

## Fair Effectiveness-Cost Trade-Off

This is the strictest definition, which considers the classifier to act fairly for a population group only if the protected subgroups have the same effectiveness-cost distribution (checked in the implementation via a statistical test).

Equivalently, Equal Effectiveness within Budget must hold for *every* value of the cost budget $c$.