# Example: Fair Partitioning of Synthetic Data

This notebook demonstrates the use of fair partitioning algorithms (FairGroups and FairKMeans) on synthetic data following the experiments in our paper. We create a dataset with known ground truth partitions and compare how different algorithms perform in identifying fair groups.

We suppose that $L \sim \mathcal{U}(0,100)$ and corresponding $Y \sim  Bernoulli(p(L))$, where $p(L) = 0.1 \times\mathbf{1}_{\{L \leq 20\}} + 0.3 \times\mathbf{1}_{\{20 < L \leq 30\}} + 0.5\times\mathbf{1}_{\{30 < L \leq 55\}} + 0.7\times\mathbf{1}_{\{55 < L \leq 88\}} + 0.9\times\mathbf{1}_{\{88 < L \leq 100\}}$. We sample $N=50000$ pairs of observations $(L, Y)$ from this distribution $\mathcal{D}$. 

We apply FairKMeans and FairGroups methods to find the partition $\mathcal{P} = \{\mathcal{P}_k\}_{k=1}^K$ of $L$, using $\Phi(S^\mathcal{P}) = \mathbb{P}(Y = 1 | S^\mathcal{P}) - \mathbb{P}(Y = 1)$, where $S^\mathcal{P} = k \iff L \in \mathcal{P}_k, \text{ where } k=1,\dots,K$. 

In [None]:
import numpy as np
import sys
sys.path.append('..')

# Import fair grouping algorithms and utility functions
from fair_groups.partition_estimation import FairGroups, FairKMeans
from fair_groups.fairness_metrics import get_conditional_positive_y_proba
from fair_groups.visualization import plot_partition, plot_partition_with_ci, plot_conditional_proba
from data.synthetic_data import load_synthetic_data

# Set random seed for reproducibility
np.random.seed(13)

## 1. Load Synthetic Dataset

In [None]:
# Generate synthetic data for testing fair partitioning algorithms
n_groups = 5
n_obs = 10000

s, y, gt_partition, y_probs = load_synthetic_data(n_groups, n_obs)

In [None]:
# Visualize the conditional probability of positive outcome given feature S
s_bins, y_s_proba = get_conditional_positive_y_proba(s, y)
plot_conditional_proba(s_bins, y_s_proba, 'L')

## 2. FairGroups Partition of $L$

FairGroups is an algorithm that aims to create groups with similar positive outcome rates while maintaining reasonable group sizes. Let's see how it performs on our synthetic data.

In [None]:
# Initialize and fit the FairGroups algorithm
fair_groups = FairGroups(n_groups)
fair_groups.fit(s, y)

In [None]:
# Display the fairness metric (phi) for each group
fair_groups.phi_by_group

In [None]:
# Display confidence intervals for the fairness metric (phi)
fair_groups.phi_by_group_ci

In [None]:
# Visualize the partition and group-wise positive outcome rates
plot_partition(fair_groups.partition, fair_groups.phi_by_group, 'L')

In [None]:
# Visualize the partition with confidence intervals
plot_partition_with_ci(fair_groups.partition, fair_groups.phi_by_group_ci, 'L')

## 3. FairKMeans Partition of $L$

FairKMeans is an alternative algorithm that uses a k-means-like approach to create fair groups. It aims to minimize the variance in positive outcome rates while maintaining reasonable group sizes.

In [None]:
# Initialize and fit the FairKMeans algorithm
fair_kmeans = FairKMeans(n_groups)
fair_kmeans.fit(s, y)

In [None]:
# Display the positive outcome rates (phi) for each group
fair_kmeans.phi_by_group

In [None]:
# Visualize the partition and group-wise positive outcome rates
plot_partition(fair_kmeans.partition, fair_kmeans.phi_by_group, 'L')

In [None]:
# Visualize the partition with confidence intervals
plot_partition_with_ci(fair_kmeans.partition, fair_kmeans.phi_by_group_ci, 'L')