# Optimization of Fairness–Accuracy Trade-offs in Gradient-Based Classification Models

**Course:** COSC 3P99 – Independent Research Project  
**Student:** David Shodipo  
**Supervisor:** Dr. Blessing Ogbuokiri  
**Term:** Winter 2026  


## Project Overview

Machine learning classification models are commonly trained using gradient-based optimization techniques with the primary goal of maximizing predictive accuracy. However, focusing only on accuracy can cause the model to perform better for some demographic groups than others, which can be unfair and raise ethical concerns.

This project investigates the **trade-off between predictive accuracy and fairness** in gradient-based classification models by introducing fairness-aware regularization during training. The study focuses on how optimization parameters such as learning rate, training duration, and fairness regularization strength influence both accuracy and group fairness metrics.


## Research Questions

This project aims to answer the following questions:

1. How do fairness constraints affect predictive accuracy in gradient-based classifiers?
2. How do optimization parameters influence the fairness–accuracy trade-off?
3. Are fairness effects consistent across different application domains/groups?



## Objectives

- Train baseline accuracy-only classifiers
- Introduce fairness-aware regularization into the loss function
- Measure changes in:
  - Accuracy / AUC
  - Demographic Parity Difference
  - Equal Opportunity Difference
- Visualize and interpret fairness–accuracy trade-offs


## Datasets

Two publicly available datasets are used in this study:

### 1. Healthcare Dataset
- **Dataset:** UCI Heart Disease Dataset
- **Task:** Predict presence of heart disease
- **Sensitive Attribute:** Sex (optional extension: age group)

### 2. Non-Healthcare Dataset
- **Dataset:** UCI Adult Income Dataset
- **Task:** Predict whether income exceeds $50K (Defined of  the dataset Information Provided)
- **Sensitive Attribute:** Sex or Race


In [5]:
# Libraries
%pip install pandas numpy matplotlib seaborn scikit-learn
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning Libraries
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.preprocessing import StandardScaler


[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting matplotlib
  Downloading matplotlib-3.10.8-cp314-cp314-win_amd64.whl.metadata (52 kB)
Collecting seaborn
  Using cached seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting scikit-learn
  Downloading scikit_learn-1.8.0-cp314-cp314-win_amd64.whl.metadata (11 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.3-cp314-cp314-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.61.1-cp314-cp314-win_amd64.whl.metadata (116 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.9-cp314-cp314-win_amd64.whl.metadata (6.4 kB)
Collecting pillow>=8 (from matplotlib)
  Downloading pillow-12.1.0-cp314-cp314-win_amd64.whl.metadata (9.0 kB)
Collecting pyparsing>=3 (from matplotlib)
  Downloading pyparsing-3.3.2-py3-none-any.whl.metadata (5.8 kB)
Collecting scipy>=1.10.0 (from 