GitHub - AgnesElza/responsible-ai: Explored fairness and bias mitigation strategies in ML models using the Adult Income dataset. Implemented group-specific thresholds and reweighting methods to reduce disparate impact, highlighting the trade-offs between accuracy and fairness in responsible AI systems.

From Data Skew to Fair Outcomes: A Practical Guide to Mitigating Bias in Machine Learning

Introduction

This project notebook documents a hands-on approach to identifying, analyzing, and mitigating unfair bias in a machine learning model. Using the classic Adult Income dataset, this analysis demonstrates a full responsible AI workflow, from establishing a biased baseline to exploring and implementing various mitigation strategies. The ultimate goal is to build a model that performs equitably across different demographic groups, even when the underlying data is flawed.

Core Concepts Explored

Identifying Bias: We use subgroup analysis to expose performance disparities in a model, proving that relying solely on overall accuracy can be misleading.
Data-Centric Mitigation: This project explores the challenges and trade-offs of a data-centric approach by attempting to mitigate bias through oversampling.
Model-Centric Mitigation: We implement a more advanced, model-centric approach using the Fairlearn library to directly enforce fairness during the prediction phase.
The Iterative Nature of Responsible AI: The notebook's narrative highlights that building fair technology is not a one-time fix but a continuous process of testing, learning, and making deliberate trade-offs.

Methodology

The project follows a step-by-step methodology to illustrate the complete process:

Exploratory Data Analysis (EDA): Initial analysis is performed to identify data imbalances and potential sources of selection bias.
Baseline Model Training: A Logistic Regression model is trained on the original, unbalanced data to establish a performance benchmark.
Subgroup Analysis: The baseline model's performance is meticulously evaluated on sensitive subgroups (sex and race) to quantify the extent of the bias.
Data-Centric Mitigation Attempt: Oversampling is applied to the training data, and a new model is trained to demonstrate the unintended consequences of this approach.
Model-Centric Mitigation with Fairlearn: The ThresholdOptimizer from the Fairlearn library is implemented to fine-tune the model's predictions for a more equitable outcome.
Final Analysis and Conclusion: The results of all models are compared side-by-side to draw conclusions on the most effective mitigation strategy for this project.

Getting Started

To run this notebook, you'll need a Python environment with the following dependencies:

pandas
scikit-learn
matplotlib
fairlearn
imblearn

It is highly recommended to use a virtual environment like Conda to manage these dependencies and avoid conflicts.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bias_mitigation_upsampling_fairlearn.ipynb		bias_mitigation_upsampling_fairlearn.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From Data Skew to Fair Outcomes: A Practical Guide to Mitigating Bias in Machine Learning

Introduction

Core Concepts Explored

Methodology

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

From Data Skew to Fair Outcomes: A Practical Guide to Mitigating Bias in Machine Learning

Introduction

Core Concepts Explored

Methodology

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages