NullitOut

Abstract:

Machine learning models can learn patterns in biased data and magnify the existing social inequality when they are used in real-world settings. In this project, we replicate Iterative Null-space Projection (INLP), a novel method for removing information from neural representations Ravfogel et al. 2020, for the purpose of bias mitigation in text classification. The method is based on repeated training of linear classifiers that predict a certain property, followed by projection of the representations on their null-space so that the classifiers become oblivious to that target property. We also improve the paper’s results by applying the method on toxicity classification of text comments. The goal is to remove protected features of target property in the classifier, without sacrificing the accuracy of toxicity classification.

We focused on three bias classes: gender bias, racial bias, and religious bias. The results show that the method can be successfully applied to gender and racial bias mitigation in toxicity classification.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
gender		gender
race		race
religion		religion
README.md		README.md
[Team 5 Boney M.] Project Proposal.pdf		[Team 5 Boney M.] Project Proposal.pdf
cs475_final_report_team_5.pdf		cs475_final_report_team_5.pdf
debias.py		debias.py
toxicity_fasttext_gender.ipynb		toxicity_fasttext_gender.ipynb
toxicity_fasttext_race.ipynb		toxicity_fasttext_race.ipynb
toxicity_fasttext_religion.ipynb		toxicity_fasttext_religion.ipynb
train_diasability.csv		train_diasability.csv
train_religion.csv		train_religion.csv
train_sex.csv		train_sex.csv

assemzh/NullitOut

Folders and files

Latest commit

History

Repository files navigation

NullitOut

Abstract:

Religion WorldCloud:

Gender WorldCloud:

Race WorldCloud:

Results:

About

Resources

Stars

Watchers

Forks

Languages