# Introduction - Adult Dataset

The Adult dataset bis desgined for the prediction task of whether annual income of an individual exceeds $50K/yr based on census data. It is also known as "Census Income" dataset.

It is commonly used for fairness tasks and gender is usually set as the sensitive attribute

### Imports

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from fairlearn.metrics import demographic_parity_difference, equalized_odds_difference
import fairness_functions as fp




### Load dataset

In [2]:

from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
adult = fetch_ucirepo(id=2) 
  
# data (as pandas dataframes) 
X = adult.data.features 
y = adult.data.targets 

print(X.columns)

sensitive_col ='sex'

X = X.dropna(subset=[sensitive_col])



Index(['age', 'workclass', 'fnlwgt', 'education', 'education-num',
       'marital-status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country'],
      dtype='object')


### Data Preprocessing
In this section, we prepare the dataset for model training. This includes:
- Cleaning missing values.
- Identifying sensitive attributes for fairness evaluation.
- Encoding categorical variables.
- Splitting the dataset into training and testing sets.


#### Adjust target column to be binary

In [3]:
if isinstance(y, pd.DataFrame):
    y = y.squeeze()

# Clean the target column: strip whitespace and remove trailing periods.
y = y.astype(str).str.strip().str.replace(r'\.$', '', regex=True)

# Map any target starting with '<' to '<=50K' and those starting with '>' to '>50K'
y = y.apply(lambda s: '<=50K' if s.startswith('<') else ('>50K' if s.startswith('>') else s))

# Verify the unique values after cleaning
print("Unique target values after cleaning:", y.unique())

print(y.value_counts())

mapping = {'<=50K': 0, '>50K': 1}

y = y.map(mapping)



Unique target values after cleaning: ['<=50K' '>50K']
income
<=50K    37155
>50K     11687
Name: count, dtype: int64


### Handling Missing Values & Encoding
We handle missing values by:
- Imputing numeric columns with their mean.
- Filling missing values in categorical columns with the mode.

Categorical variables are encoded using one-hot encoding to ensure compatibility with the model.


### Impute Nan Values

Imputes numeric Nan values with column mean and Nans in categorical columns with column mode

In [4]:
# Specify which columns are categorical based on domain knowledge.
categorical_cols = ['workclass', 'education', 'marital-status', 'occupation', 
                      'relationship', 'race', 'sex', 'native-country']

# All remaining columns will be considered numeric.
numeric_cols = [col for col in X.columns if col not in categorical_cols]

print("Numeric columns:", numeric_cols)
print("Categorical columns:", categorical_cols)

# Convert numeric columns to numeric dtype (forcing non-numeric values to NaN)
X_numeric = X[numeric_cols].apply(lambda col: pd.to_numeric(col, errors='coerce'))

# Fill missing values in numeric columns with the mean of each column.
X_numeric = X_numeric.fillna(X_numeric.mean())

# For categorical columns, filter out any with high cardinality.
max_unique_threshold = 20
filtered_categorical_cols = [col for col in categorical_cols if X[col].nunique() <= max_unique_threshold]
print("Filtered Categorical columns (<=20 unique values):", filtered_categorical_cols)

# Process the categorical columns: fill missing values with the mode.
X_categorical = X[filtered_categorical_cols].copy()
for col in filtered_categorical_cols:
    X_categorical[col] = X_categorical[col].fillna(X_categorical[col].mode()[0])


Numeric columns: ['age', 'fnlwgt', 'education-num', 'capital-gain', 'capital-loss', 'hours-per-week']
Categorical columns: ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
Filtered Categorical columns (<=20 unique values): ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex']


### One-hot encode categorical features

In [5]:

# One-hot encode the filtered categorical columns using pandas' get_dummies, dropping the first category.
X_categorical_encoded = pd.get_dummies(X_categorical, drop_first=True)

# Combine numeric and one-hot encoded categorical columns.
X_processed = pd.concat([X_numeric, X_categorical_encoded], axis=1)

# Fill any remaining NaN values with 0.
X_processed = X_processed.fillna(0)

# Preserve the sensitive attribute for fairness evaluation.
sens = X[sensitive_col]

print("Shape of processed features:", X_processed.shape)


Shape of processed features: (48842, 59)


### Split data to train & test sets

In [6]:
# Split data and also split the sensitive attribute for evaluation
X_train, X_test, y_train, y_test, sens_train, sens_test = train_test_split(
    X_processed, y, sens, test_size=0.3, random_state=42
)


print("X train shape: ",X_train.shape)
print("X test shape: ",X_test.shape)


X train shape:  (34189, 59)
X test shape:  (14653, 59)


### Train and evaluate baseline model

### Baseline Model - Logistic Regression
We begin by training a baseline logistic regression model **without** any fairness interventions.
This model will serve as a benchmark for evaluating the impact of fairness adjustments.
The following metrics are used for evaluation:
- **Accuracy**: Measures overall prediction correctness.
- **F1 Score**: Balances precision and recall.
- **Demographic Parity Difference**: Measures bias in positive prediction rates across groups.
- **Equalized Odds Difference**: Measures bias in error rates across groups.


In [7]:
# Train the logistic regression model
lr = LogisticRegression(random_state=42, max_iter=10000)
lr.fit(X_train, y_train)

# Predict on the test set with the baseline model
y_pred_baseline = lr.predict(X_test)

# Evaluate baseline performance metrics
baseline_accuracy = accuracy_score(y_test, y_pred_baseline)
f1_score_baseline = f1_score(y_test, y_pred_baseline)

# Evaluate fairness metrics for the baseline model
baseline_dp_diff = demographic_parity_difference(y_test, y_pred_baseline, sensitive_features=sens_test)
baseline_eo_diff = equalized_odds_difference(y_test, y_pred_baseline, sensitive_features=sens_test)

print("=== Baseline Model Metrics ===")
print("Accuracy:", baseline_accuracy)
print("F1 score:",f1_score_baseline) 
print("Demographic Parity Difference:", baseline_dp_diff)
print("Equalized Odds Difference:", baseline_eo_diff)


=== Baseline Model Metrics ===
Accuracy: 0.8441274824268068
F1 score: 0.6361898693851545
Demographic Parity Difference: 0.17919736394385982
Equalized Odds Difference: 0.11907143007845583


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Baseline Model Evaluation
The results indicate that the baseline model achieves **high accuracy** but suffers from fairness disparities.
- The **Demographic Parity Difference** is relatively high, indicating that one group receives positive predictions more frequently.
- The **Equalized Odds Difference** suggests that errors are not distributed evenly across demographic groups.
We will now explore fairness mitigation strategies to address these disparities.


### Naive solution - drop sensitive column

### Naive Fairness Approach - Removing Sensitive Attributes
One simple way to mitigate bias is to remove the sensitive attribute (`sex`).
However, this method is often insufficient because bias can still be encoded in correlated features.
We compare this approach to more sophisticated fairness-aware techniques.


In [8]:
# Process X_processed as before
# Drop sensitive columns from the entire processed dataset
sensitive_encoded_cols = [col for col in X_processed.columns if col.startswith(sensitive_col + '_')]
X_processed_no_sensitive = X_processed.drop(columns=sensitive_encoded_cols)

# Split the data
X_train, X_test, y_train, y_test, sens_train, sens_test = train_test_split(
    X_processed_no_sensitive, y, sens, test_size=0.3, random_state=42
)

# Train the logistic regression model
lr = LogisticRegression(random_state=42,max_iter=10000)
lr.fit(X_train, y_train)

# Predict on the test set
y_pred_naive = lr.predict(X_test)

# Evaluate baseline performance metrics
naive_accuracy = accuracy_score(y_test, y_pred_naive)
f1_score_naive = f1_score(y_test, y_pred_naive)

# Evaluate fairness metrics for the baseline model
naive_dp_diff = demographic_parity_difference(y_test, y_pred_naive, sensitive_features=sens_test)
naive_eo_diff = equalized_odds_difference(y_test, y_pred_naive, sensitive_features=sens_test)

print("=== Naive Model Metrics ===")
print("Accuracy:", naive_accuracy)
print("F1 score:",f1_score_naive) 
print("Demographic Parity Difference:", naive_dp_diff)
print("Equalized Odds Difference:", naive_eo_diff)


=== Naive Model Metrics ===
Accuracy: 0.8464478263836757
F1 score: 0.638030888030888
Demographic Parity Difference: 0.16917128243556473
Equalized Odds Difference: 0.10156941819705284


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### Naive Model Evaluation
The results show that removing the sensitive attribute **does not completely eliminate bias**.
- The fairness metrics have slightly improved, but disparities remain.
- More advanced techniques are required to achieve better fairness while maintaining accuracy.


# Optimum fairness search

### Fairness-Aware Learning
To improve fairness while maintaining accuracy, we experiment with multiple approaches:
1. **Pre-processing**: Modifying the dataset before training.
2. **In-processing**: Training with fairness constraints.
3. **Post-processing**: Adjusting predictions after training.

Each method will be evaluated on accuracy and fairness trade-offs.


In [9]:
# Define candidate methods for each stage.
pre_methods = {
    "None": fp.pre_none,
    "Correlation_Remover": fp.pre_correlation_remover,
    "Sensitive_Resampling": fp.pre_sensitive_resampling  # new candidate
}

in_methods = {
    "Baseline": fp.in_baseline,
    "Reweighting": fp.in_reweighting,
    "Exponential_Gradient_Demogrphic_Parity": fp.in_expgrad_dp,
    "Exponential_Gradient_Equalized_Odds": fp.in_expgrad_eo
}

post_methods = {
    "None": fp.post_none,
    "Threshold_Demogrphic_Parity": fp.post_threshold_dp,
    "Threshold_Equalized_Odds": fp.post_threshold_eo
}

# Run experiments:
results = fp.run_experiments(pre_methods, in_methods, post_methods,
                             X_train, y_train, sens_train,
                             X_test, y_test, sens_test)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

### Select only pareto optimal methods

In [10]:

objectives = {"f1_score": True,"accuracy":True, "Demographic_parity": False, "Equalized_odds": False}

frontier = fp.pareto_frontier(results, objectives)

print("Pareto Frontier configurations:")
for config, metrics in frontier.items():
    print(f"{config}: {metrics}")

Pareto Frontier configurations:
Pre-processing: None. In-training: Baseline. Post-processing:None: {'accuracy': 0.8440592370163107, 'f1_score': 0.6346922462030375, 'Demographic_parity': 0.1682295517239974, 'Equalized_odds': 0.09582945273811783}
Pre-processing: None. In-training: Baseline. Post-processing:Threshold_Equalized_Odds: {'accuracy': 0.8303419095065857, 'f1_score': 0.5737311385459534, 'Demographic_parity': 0.08008616195306252, 'Equalized_odds': 0.006401249024199862}
Pre-processing: None. In-training: Reweighting. Post-processing:Threshold_Demogrphic_Parity: {'accuracy': 0.8269978843922746, 'f1_score': 0.5816141277438521, 'Demographic_parity': 0.0001539371549044155, 'Equalized_odds': 0.28427718451137657}
Pre-processing: None. In-training: Reweighting. Post-processing:Threshold_Equalized_Odds: {'accuracy': 0.830478400327578, 'f1_score': 0.5652782639131957, 'Demographic_parity': 0.07511393723220912, 'Equalized_odds': 0.004457786659191831}
Pre-processing: None. In-training: Expone

### Results & Discussion
After testing various configurations, we select the **Pareto-optimal** solutions that balance accuracy and fairness.
Key observations:
- Some methods reduce bias but at the cost of lower accuracy.
- The best solutions depend on the acceptable trade-off between fairness and predictive power.


### Apply thresholds on biase and portion of retained accuracy

### Set thresholds on accurcy, demographic parity and equalized odds

In [11]:
f1_threshold = 0.58
accuracy_threshold = 0.80
demographic_parity_threshold = 0.1
equalized_odds_threshold = 0.1

In [12]:
# Filter results based on thresholds.
filtered = fp.filter_results(frontier, f1_threshold=f1_threshold,
                            dp_threshold=demographic_parity_threshold, accuracy_threshold=accuracy_threshold,eo_threshold=equalized_odds_threshold)

print("\nFiltered Results (satisfying thresholds):")
for config, metrics in filtered.items():
    print(config, metrics)


Filtered Results (satisfying thresholds):
Pre-processing: None. In-training: Exponential_Gradient_Equalized_Odds. Post-processing:None {'accuracy': 0.8278168293182283, 'f1_score': 0.5918136223911988, 'Demographic_parity': 0.09418435175883494, 'Equalized_odds': 0.024725494983106167}
Pre-processing: None. In-training: Exponential_Gradient_Equalized_Odds. Post-processing:Threshold_Equalized_Odds {'accuracy': 0.823380877635979, 'f1_score': 0.5851234369990381, 'Demographic_parity': 0.08738576184670138, 'Equalized_odds': 0.005981692632746505}
Pre-processing: Correlation_Remover. In-training: Baseline. Post-processing:Threshold_Equalized_Odds {'accuracy': 0.8296594554016242, 'f1_score': 0.5902823374917925, 'Demographic_parity': 0.08878810664063627, 'Equalized_odds': 0.009766742319435529}
Pre-processing: Correlation_Remover. In-training: Reweighting. Post-processing:Threshold_Equalized_Odds {'accuracy': 0.8300689278646011, 'f1_score': 0.5867905741785596, 'Demographic_parity': 0.08450971247236

### Conclusion
- The **baseline model** performed well in accuracy but exhibited significant fairness disparities.
- **Removing the sensitive attribute** alone was not sufficient to mitigate bias.
- **Fairness-aware techniques** improved demographic parity and equalized odds while slightly reducing accuracy.
- The best strategy depends on the application and the acceptable balance between fairness and performance.

This analysis highlights the importance of evaluating fairness in machine learning models and demonstrates various strategies to mitigate bias.
