Skip to content

IDSIA-papers/fairness

Repository files navigation

On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models

This repository contains the implementation and experimental code for the paper "On the Correlation between Individual Fairness and Predictive Accuracy in Probabilistic Models" by A. Antonucci*, E. Rossetto*, and I. Duvniak**.

*IDSIA - USI-SUPSI, Lugano, Switzerland **SUPSI, Lugano, Switzerland

Abstract

We investigate individual fairness in generative probabilistic classifiers by analysing the robustness of posterior inferences to perturbations in private features. Building on established results in robustness analysis, we hypothesise a correlation between robustness and predictive accuracy—specifically, instances exhibiting greater robustness are more likely to be classified accurately. We empirically assess this hypothesis using a benchmark of eleven datasets with fairness concerns, employing Bayesian networks as the underlying generative models. To address the computational complexity associated with robustness analysis over multiple private features with Bayesian networks, we reformulate the problem as a most probable explanation task in an auxiliary Markov random field. Our experiments confirm the hypothesis about the correlation, suggesting novel directions to mitigate the traditional trade-off between fairness and accuracy.

Repository Structure Tree

├── pipeline.py            # Main pipeline for fairness analysis
├── pyproject.toml         # Project configuration and dependencies
├── bayesian/              # Bayesian network implementation
│   ├── inference.py        # Inference engine and posterior computation
│   ├── learn.py            # Bayesian network structure learning
│   └── modifiers.py        # Network modification utilities
├── datasets/              # Dataset handling and preprocessing
│   ├── data.py             # Data loading and feature extraction
│   ├── processing.py       # Data preprocessing utilities
│   └── utils.py            # Dataset utility functions
├── metrics/               # Fairness and performance metrics
│   ├── evaluate.py         # Model performance evaluation
│   └── fairness.py         # Individual and group fairness metrics
├── mrf/                   # Markov Random Field implementation
│   ├── inference/          # MRF inference algorithms
│   └── network/            # MRF network structures
├── visualization/         # Plotting and visualization utilities
├── data/                  # Data directory
│   └── preprocessed_data/  # Preprocessed datasets

Some comments on the organization of the code:

  • The main pipeline is in pipeline.py, which orchestrates the entire fairness analysis process.

  • The bayesian/ directory contains some specific utilities for pyagrum Bayesian networks and additionally specific implementations deriving from the paper.

  • The datasets/ directory handles data loading, preprocessing, and feature extraction.

  • The metrics/ directory implements fairness metrics and model evaluation functions.

  • The mrf/ directory contains the implementation of a Variable Elimination algorithm for answering conditional, MAP, and MPE queries on Markov Random Fields, which is used to compute supposedly faster individual fairness metrics. It also contains the utility function that given a certain pyagrum Bayesian network, it builds a Markov Random Field using ratios, that can be used to compute individual fairness metrics.

  • The visualization/ directory provides plotting functions for visualizing results and metrics.

  • The data/ directory contains preprocessed datasets used in the experiments. In general, experiments run from the pipeline will save results in the data/<dir_name> directory, however this is not mandatory, and the user can specify a different directory to save results in the pipeline.py script.

Installation

The installation is straightforward and can be done using either uv (recommended) or pip. The project uses pyproject.toml for dependency management (however a requirements.txt file is also provided for compatibility with older systems).

UV

  1. Install uv if you haven't already -> https://docs.astral.sh/uv/getting-started/installation/

  2. Clone the repository

  3. Navigate to the project directory and run:

    uv sync

    This will install all dependencies specified in pyproject.toml and set up the project environment in a local virtual environment placed in the .venv directory.

Pip

  1. Clone the repository

  2. Navigate to the project directory and run:

    pip install -r ./requirements.txt

Usage

Quick Start

You can find some handy examples in the notebooks/ directory, which demonstrate how to use the main functionalities of the codebase. More or less, the pipeline script (pipeline.py and pipeline_cv.py) automates all the steps shown in the notebooks.

Run the main fairness analysis pipeline:

python pipeline.py

Run the main fairness analysis pipeline using cross-validation (10 folds by default):

python pipeline_cv.py

or if uv is installed:

uv run pipeline.py
// or
uv run pipeline_cv.py

Note that by the default the pipeline will launch experiments in the non-forced mode, meaning that we "naturally" learn the Bayesian without forcing any private and public node to be children of the target node. If you want to run the forced mode, where we force all private and public nodes to be children of the target node, you can launch the pipeline with the --force flag:

python pipeline_cv.py --force

Advanced Usage

The main script supports several parameters:

python main.py \
    --learning_method tabu \
    --data_path ./data \
    --save_path ./data/<dir_name> \
    --drop_duplicates False

Available learning methods:

  • tabu: Tabu search for structure learning
  • greedy: Greedy search algorithm
  • miic: MIIC algorithm
  • k2: K2 algorithm

Pipeline Overview

The main pipeline (main.py) performs the following steps:

  1. Data Loading: Loads preprocessed datasets from data/preprocessed_data/
  2. Preprocessing: Converts continuous variables to categorical using datasets.processing.make_columns_categorical
  3. Feature Extraction: Identifies target, sensible, and public features using datasets.data.extract_features
  4. Network Learning: Learns Bayesian network structure using bayesian.learn.learn_bayesian_network
  5. Fairness Analysis:
  6. Visualization: Generates plots using functions from visualization.metrics

Citation

...

Dependencies

Main dependencies (see pyproject.toml for complete list):

  • loguru: Logging
  • pandas: Data manipulation
  • numpy: Numerical computing
  • matplotlib: Visualization
  • pyAgrum: Bayesian networks
  • scikit-learn: Machine learning utilities
  • tqdm: Progress bars
  • seaborn: Statistical data visualization

License

This project is licensed under the MIT License.

Contact

For questions about the implementation or paper:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors