Skip to content

AgnesElza/responsible-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

From Data Skew to Fair Outcomes: A Practical Guide to Mitigating Bias in Machine Learning

Introduction

This project notebook documents a hands-on approach to identifying, analyzing, and mitigating unfair bias in a machine learning model. Using the classic Adult Income dataset, this analysis demonstrates a full responsible AI workflow, from establishing a biased baseline to exploring and implementing various mitigation strategies. The ultimate goal is to build a model that performs equitably across different demographic groups, even when the underlying data is flawed.


Core Concepts Explored

  • Identifying Bias: We use subgroup analysis to expose performance disparities in a model, proving that relying solely on overall accuracy can be misleading.
  • Data-Centric Mitigation: This project explores the challenges and trade-offs of a data-centric approach by attempting to mitigate bias through oversampling.
  • Model-Centric Mitigation: We implement a more advanced, model-centric approach using the Fairlearn library to directly enforce fairness during the prediction phase.
  • The Iterative Nature of Responsible AI: The notebook's narrative highlights that building fair technology is not a one-time fix but a continuous process of testing, learning, and making deliberate trade-offs.

Methodology

The project follows a step-by-step methodology to illustrate the complete process:

  1. Exploratory Data Analysis (EDA): Initial analysis is performed to identify data imbalances and potential sources of selection bias.
  2. Baseline Model Training: A Logistic Regression model is trained on the original, unbalanced data to establish a performance benchmark.
  3. Subgroup Analysis: The baseline model's performance is meticulously evaluated on sensitive subgroups (sex and race) to quantify the extent of the bias.
  4. Data-Centric Mitigation Attempt: Oversampling is applied to the training data, and a new model is trained to demonstrate the unintended consequences of this approach.
  5. Model-Centric Mitigation with Fairlearn: The ThresholdOptimizer from the Fairlearn library is implemented to fine-tune the model's predictions for a more equitable outcome.
  6. Final Analysis and Conclusion: The results of all models are compared side-by-side to draw conclusions on the most effective mitigation strategy for this project.

Getting Started

To run this notebook, you'll need a Python environment with the following dependencies:

  • pandas
  • scikit-learn
  • matplotlib
  • fairlearn
  • imblearn

It is highly recommended to use a virtual environment like Conda to manage these dependencies and avoid conflicts.

About

Explored fairness and bias mitigation strategies in ML models using the Adult Income dataset. Implemented group-specific thresholds and reweighting methods to reduce disparate impact, highlighting the trade-offs between accuracy and fairness in responsible AI systems.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors