On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models

This repository contains the implementation and experimental code for the paper "On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models" by A. Antonucci*, E. Rossetto*, and I. Duvniak**.

*IDSIA - USI-SUPSI, Lugano, Switzerland **SUPSI, Lugano, Switzerland

Abstract

We investigate individual fairness in generative probabilistic classifiers by analysing the robustness of posterior inferences to perturbations in private features. Building on established results in robustness analysis, we hypothesise a correlation between robustness and predictive accuracy—specifically, instances exhibiting greater robustness are more likely to be classified accurately. We empirically assess this hypothesis using a benchmark of eleven datasets with fairness concerns, employing Bayesian networks as the underlying generative models. To address the computational complexity associated with robustness analysis over multiple private features with Bayesian networks, we reformulate the problem as a most probable explanation task in an auxiliary Markov random field. Our experiments confirm the hypothesis about the correlation, suggesting novel directions to mitigate the traditional trade-off between fairness and accuracy.

Repository Structure Tree

├── pipeline.py            # Main pipeline for fairness analysis
├── pyproject.toml         # Project configuration and dependencies
├── bayesian/              # Bayesian network implementation
│   ├── inference.py        # Inference engine and posterior computation
│   ├── learn.py            # Bayesian network structure learning
│   └── modifiers.py        # Network modification utilities
├── datasets/              # Dataset handling and preprocessing
│   ├── data.py             # Data loading and feature extraction
│   ├── processing.py       # Data preprocessing utilities
│   └── utils.py            # Dataset utility functions
├── metrics/               # Fairness and performance metrics
│   ├── evaluate.py         # Model performance evaluation
│   └── fairness.py         # Individual and group fairness metrics
├── mrf/                   # Markov Random Field implementation
│   ├── inference/          # MRF inference algorithms
│   └── network/            # MRF network structures
├── visualization/         # Plotting and visualization utilities
├── data/                  # Data directory
│   └── preprocessed_data/  # Preprocessed datasets

Some comments on the organization of the code:

The main pipeline is in pipeline.py, which orchestrates the entire fairness analysis process.
The bayesian/ directory contains some specific utilities for pyagrum Bayesian networks and additionally specific implementations deriving from the paper.
The datasets/ directory handles data loading, preprocessing, and feature extraction.
The metrics/ directory implements fairness metrics and model evaluation functions.
The mrf/ directory contains the implementation of a Variable Elimination algorithm for answering conditional, MAP, and MPE queries on Markov Random Fields, which is used to compute supposedly faster individual fairness metrics. It also contains the utility function that given a certain pyagrum Bayesian network, it builds a Markov Random Field using ratios, that can be used to compute individual fairness metrics.
The visualization/ directory provides plotting functions for visualizing results and metrics.
The data/ directory contains preprocessed datasets used in the experiments. In general, experiments run from the pipeline will save results in the data/<dir_name> directory, however this is not mandatory, and the user can specify a different directory to save results in the pipeline.py script.

Installation

The installation is straightforward and can be done using either uv (recommended) or pip. The project uses pyproject.toml for dependency management (however a requirements.txt file is also provided for compatibility with older systems).

UV

Install uv if you haven't already -> https://docs.astral.sh/uv/getting-started/installation/
Clone the repository
Navigate to the project directory and run:
```
uv sync
```
This will install all dependencies specified in pyproject.toml and set up the project environment in a local virtual environment placed in the .venv directory.

Pip

Clone the repository
Navigate to the project directory and run:
```
pip install -r ./requirements.txt
```

Usage

Quick Start

You can find some handy examples in the notebooks/ directory, which demonstrate how to use the main functionalities of the codebase. More or less, the pipeline script (pipeline.py and pipeline_cv.py) automates all the steps shown in the notebooks.

Run the main fairness analysis pipeline:

python pipeline.py

Run the main fairness analysis pipeline using cross-validation (10 folds by default):

python pipeline_cv.py

or if uv is installed:

uv run pipeline.py
// or
uv run pipeline_cv.py

Note that by the default the pipeline will launch experiments in the non-forced mode, meaning that we "naturally" learn the Bayesian without forcing any private and public node to be children of the target node. If you want to run the forced mode, where we force all private and public nodes to be children of the target node, you can launch the pipeline with the --force flag:

python pipeline_cv.py --force

Advanced Usage

The main script supports several parameters:

python main.py \
    --learning_method tabu \
    --data_path ./data \
    --save_path ./data/<dir_name> \
    --drop_duplicates False

Available learning methods:

tabu: Tabu search for structure learning
greedy: Greedy search algorithm
miic: MIIC algorithm
k2: K2 algorithm

Pipeline Overview

The main pipeline (main.py) performs the following steps:

Data Loading: Loads preprocessed datasets from data/preprocessed_data/
Preprocessing: Converts continuous variables to categorical using datasets.processing.make_columns_categorical
Feature Extraction: Identifies target, sensible, and public features using datasets.data.extract_features
Network Learning: Learns Bayesian network structure using bayesian.learn.learn_bayesian_network
Fairness Analysis:
- Group fairness metrics via metrics.fairness.compute_group_fairness_metrics
- Individual fairness metrics via metrics.fairness.compute_individual_fairness
- MRF-based individual fairness via metrics.fairness.compute_individual_fairness_MRF
Visualization: Generates plots using functions from visualization.metrics

Citation

...

Dependencies

Main dependencies (see pyproject.toml for complete list):

loguru: Logging
pandas: Data manipulation
numpy: Numerical computing
matplotlib: Visualization
pyAgrum: Bayesian networks
scikit-learn: Machine learning utilities
tqdm: Progress bars
seaborn: Statistical data visualization

License

This project is licensed under the MIT License.

Contact

For questions about the implementation or paper:

Alessandro Antonucci - alessandro.antonucci@idsia.ch
Eric Rossetto - eric.rossetto@idsia.ch
Ivan Duvniak - ivan.duvnjak@supsi.ch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models

Abstract

Repository Structure Tree

Installation

UV

Pip

Usage

Quick Start

Advanced Usage

Pipeline Overview

Citation

Dependencies

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bayesian		bayesian
data		data
datasets		datasets
metrics		metrics
mrf		mrf
notebooks		notebooks
visualization		visualization
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pipeline.py		pipeline.py
pipeline_cv.py		pipeline_cv.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models

Abstract

Repository Structure Tree

Installation

UV

Pip

Usage

Quick Start

Advanced Usage

Pipeline Overview

Citation

Dependencies

License

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages