# Using Subgroup Discovery for Fairness

In Dubowski's thesis [^thesis], fairsd is used to explore fairness in general throughout the ML pipeline. The author presents a conceptual framework with functional requirements to include fairsd as the core component of fairness experiments. In this document, we summarise the approach.

The framework emphasizes identifying subgroups with predicted risk scores of lower **epistemic value**. It is model's ability to provide accurate and reliable predictions for those groups is compromised. Framework's key aspects can be listed as follows:

- Predictive bias arises from biases in data collection and processing, leads to quality-of-service harms. These harms manifest as discrepancies in the model's performance across different subgroups.
- To quantify the harms, a carefully selected quality metric is essential. A good metric is using the average group log loss, as it captures both the model's discriminatory ability (how well it separates different risk levels) and its calibration (how accurately its predicted probabilities align with actual outcomes).
- Using subgroup discovery techniques, particularly **fair subgroup discovery (FairSD)**, to automatically identify subgroups that exhibit lower performance compared to the overall dataset. This approach helps uncover potential harms affecting groups defined not just by single attributes, but also by intersections of multiple attributes (addressing the issue of "fairness gerrymandering").
- Model loss explanations can help to investigate the sources of predictive bias. By analyzing discrepancies in how individual features contribute to the model's loss for different subgroups compared to the baseline (the entire dataset), data scientists can pinpoint features potentially introducing bias.
  
It is a circular flow encompassing these steps:

1. Identify potential harms.
2. Select a quality metric.
3. Identify relevant harmed subgroups using the chosen metric and SD techniques.
4. Identify the most informative features based on model loss explanations.
5. Compare loss contribution distributions between the subgroup and full dataset to pinpoint discrepancies indicative of predictive bias.
6. Take bias mitigation actions based on the insights gained, which might involve adjusting data collection, preprocessing, or model training. This feeds back into the framework, leading to a new iteration of assessment.

[^thesis]: Towards assessment of subgroup harms and predictive bias in risk-scoring models. <https://research.tue.nl/en/studentTheses/towards-assessment-of-subgroup-harms-and-predictive-bias-in-risk->

For this example is used the [UCI adult dataset](https://archive.ics.uci.edu/ml/datasets/Adult) where the objective is to predict whether a person makes more (label 1) or less (0) than $50,000 a year.

In [2]:
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.tree import DecisionTreeClassifier

import warnings
warnings.filterwarnings('ignore')

In [3]:
import sys
sys.path.append('../../')

In [5]:
d = fetch_openml(data_id=1590, as_frame=True) # Adult dataset
X = d.data
y_true = (d.target == '>50K') * 1
sens_feats = "sex"

In [6]:
X.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States
4,18,,103497,Some-college,10,Never-married,,Own-child,White,Female,0,0,30,United-States


In [6]:
from faid.metrics import benchmark_tabular_classification
benchmark_tabular_classification(X, y_true, sens_feats)

{'RandomForest': {'accuracy': sex
  0    0.934336
  1    0.822734
  Name: accuracy, dtype: float64,
  'precision': sex
  0    0.761566
  1    0.742487
  Name: precision, dtype: float64,
  'false positive rate': sex
  0    0.023119
  1    0.096404
  Name: false positive rate, dtype: float64,
  'false negative rate': sex
  0    0.407202
  1    0.362671
  Name: false negative rate, dtype: float64,
  'selection rate': sex
  0    0.086223
  1    0.260676
  Name: selection rate, dtype: float64},
 'LogisticRegression': {'accuracy': sex
  0    0.887082
  1    0.742550
  Name: accuracy, dtype: float64,
  'precision': sex
  0    0.484163
  1    0.691720
  Name: precision, dtype: float64,
  'false positive rate': sex
  0    0.039337
  1    0.053386
  Name: false positive rate, dtype: float64,
  'false negative rate': sex
  0    0.703601
  1    0.725341
  Name: false negative rate, dtype: float64,
  'selection rate': sex
  0    0.067812
  1    0.120584
  Name: selection rate, dtype: float64},
 'De

In [7]:
#Now let's focus on one classifier
d_train=pd.get_dummies(X)
classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
classifier.fit(d_train, y_true)

#Producing y_pred
y_pred = classifier.predict(d_train)

# Bias regarding sub-groups

- Predictive bias: *"Predictive bias (i.e., differential prediction) means that regression equations predicting performance differ across groups based on protected status (e.g., ethnicity, sexual orientation, sexual identity, pregnancy, disability, and religion)."* [^predictive] 
- Selection bias: *"occurs when individuals or groups in a study differ systematically from the population of interest leading to a systematic error in an association or outcome."* [^selection]

[^predictive]: Aguinis, H., & Culpepper, S. A. (2024). Improving our understanding of predictive bias in testing.Journal of Applied Psychology, 109(3), 402–414. https://doi.org/10.1037/apl0001152
[^selection]: https://catalogofbias.org/biases/selection-bias/

## Use of the FairSD package
Here we use the DSSD (Diverse Subgroup Set Discovery) algorithm and the demographic_parity_difference (from Fairlearn) to find the top-k (k = 5 by default) subgroups that exert the greatest disparity.<br/>
The execute method return a **ResultSet object**.

In [7]:
import faid.metrics.subgroupdiscovery as fsd
task=fsd.SubgroupDiscoveryTask(X=X, 
                               y_true=y_true, 
                               y_pred=y_pred, 
                               qf = "demographic_parity_difference")
result_set=fsd.DSSD().execute(task)

### ResultSet object

We can transform the result set into a dataframe as shown below. Each row of this dataframe represents a subgroup.

In [8]:
pd.set_option('display.max_colwidth', None)
df=result_set.to_dataframe()
display(df)
# quality is the demographic parity difference in our case, it can be changed to other metrics
# the metric is higher for these subgroups, which means if we select these subgroups, the demographic parity difference will be higher

Unnamed: 0,quality,description,size,proportion
0,0.641066,"education-num = (10, 13] AND marital-status = ""Married-civ-spouse""",5846,0.119692
1,0.635219,"education-num = (10, 13] AND relationship = ""Husband""",5100,0.104418
2,0.588991,"education = ""Bachelors"" AND sex = ""Male"" AND race = ""White""",4983,0.102023
3,0.583581,"education = ""Bachelors"" AND sex = ""Male""",5548,0.113591
4,0.454152,"education = ""Bachelors"" AND race = ""White""",7034,0.144015


We can also print the result set or convert it into a string as shown below.

### Generate a feature from a subgroup
ResultSet basically contains a list of subgroup descriptions ([Description](https://github.com/MaurizioPulizzi/fairsd/blob/main/fairsd/sgdescription.py#L80) object).<br/>
Another intresting method of Resultset object allow us to 
**select a subgroup X from the result set and automatically generate the feature "Belong to subgroup X"**.This is very useful for deepening the analysis on the found subgroups, for example we can use the FairLearn library for this purpose.<br/>
An example is shown below:

In [9]:
from fairlearn.metrics import MetricFrame
from fairlearn.metrics import selection_rate

# Here we generate the feature "Belong to subgroup n. 0"
# The result is a pandas Series. The name of this Series is "sg0".
# This series contains an element for each instance of the dataset. Each element is True 
# iff the istance belong to the subgroup sg0
sg_feature = result_set.sg_feature(sg_index=0, X=X)

# Here we basically use the FairLearn library to further analyzing the subgroup sg0
selection_rate = MetricFrame(selection_rate, y_true, y_pred, sensitive_features=sg_feature)
print(selection_rate.by_group)

sg0
False    0.087124
True     0.728190
Name: selection_rate, dtype: float64


### Description object
We can obtain the subgroup feature also retrieving the relative Description object first:

In [10]:
description0 = result_set.get_description(0)
sg_feature = description0.to_boolean_array(dataset = X)
print(sg_feature)

0        False
1        False
2         True
3        False
4        False
         ...  
48837     True
48838    False
48839    False
48840    False
48841    False
Length: 48842, dtype: bool


Once we have the Description object of a subgroup, we can also extract other information of the subgroup.<br/>
We can:
 * convert the Description object into a string
 * retrieve the size of the subgroup
 * retrieve the quality (fairness measure) of the subgroup
 * retrieve the names of the attributes that compose the subgroup description

In [11]:
# String conversion
str_descr = description0.to_string()
print( str_descr ) # also print(description0) works

# Size
print( description0.size() )

# Quality
print( description0.get_quality() )

# Attribute names
print( description0.get_attributes() )

education-num = (10, 13] AND marital-status = "Married-civ-spouse" 
5846
0.6410658318683911
['education-num', 'marital-status']


## Interpreting and logging these results

**Subgroup discovery (SD) techniques enable us to systematically explorate bias in datasets. Here, we presented an example using Adult datasets by identifying subgroups with distinct performance discrepancies.** 

A subgroup discovery can reveal these key results:

* **Identifying Harmed Subgroups:** Automatically searching for subgroups within the dataset that exhibit the most significant discrepancies in model performance compared to the overall population. This helps pinpoint groups potentially disadvantaged by the model's predictions. 
* **Addressing Fairness Gerrymandering:** An SG discovery can reveal "fairness gerrymandering," where subgroups experiencing harm may be hidden when only considering individual sensitive attributes like race or gender.
* **Choice of Quality Metric:** The choice of quality metric in the subgroup discovery process is crucial for defining and identifying the specific type of harm being assessed. For instance, using average group log loss as the quality metric focuses on identifying subgroups experiencing lower epistemic value in their predicted risk scores, meaning their predictions are less accurate and potentially more miscalibrated.
* **Uncovering Predictive Bias:** The discrepancies in model performance revealed by subgroup discovery can point to potential sources of predictive bias. If a subgroup consistently exhibits lower performance, it suggests certain features may be less informative for that group, leading to biased predictions. For example, if the subgroup "occupation = 'Exec-managerial' AND capital-gain = (-infinite,0]" shows significant underperformance, it suggests these combined features may be less informative or biased for this specific group.

Now, let's assume that, after our analysis, we would like to change the quality metric to average group log loss. We need to link our fairness experiment results to model log in this case. Further, we should select the important findings to the model log so that other developers can also benefit from these findings. For example, comparing the subgroup "native-country = 'Mexico'" to "native-country = 'Mexico' AND race = 'White'" revealed that the model performed better for white individuals within the Mexican subgroup, highlighting a concerning disparity.

In [4]:
from faid import logging as faidlog

experiment_name = "adult-subgroups"
faidlog.init_log()
ctx = faidlog.ExperimentContext(name=experiment_name)

Model log file already exists.
Data log file already exists.
Fairness experiment log created.
Risks log file already exists.
Transparency log file already exists.


In [5]:
faidlog.add_model_entry({
    "name": "baseline-random-forest",
    "description": "Baseline model using a random forest",
    "details": "Random forest with default scikit-learn parameters",
    })

Added model_info to model card


In [6]:
ctx.add_metric_entry("baseline",{
    "description": "",
    "label": "",
    "metrics": [
        {
            "name": "AUROC",
            "description": "",
            "value": 0.905,
            "threshold": 0.85,
            "bigger_is_better": True,
            "label": "",
            "notes": ""
        }
    ]
})
ctx.add_metric_entry("occupation=”Exec-managerial” AND capital-gain=(-infinite,0]", {
    "description": "",
    "label": "",
    "metrics": [
        {
            "name": "AUROC",
            "description": "",
            "value": 0.848,
            "threshold": 0.85,
            "bigger_is_better": True,
            "label": "",
            "notes": ""
        },
        {
            "name": "Brier",
            "description": "",
            "value": 0.115,
            "threshold": 0.1,
            "bigger_is_better": False,
            "label": "",
            "notes": ""
        }
    ]
})

Added baseline to project metadata under ['bias_metrics']['groups'] and log updated
Added occupation=”Exec-managerial” AND capital-gain=(-infinite,0] to project metadata under ['bias_metrics']['groups'] and log updated
