Skip to content

assemzh/NullitOut

Repository files navigation

NullitOut

Abstract:

Machine learning models can learn patterns in biased data and magnify the existing social inequality when they are used in real-world settings. In this project, we replicate Iterative Null-space Projection (INLP), a novel method for removing information from neural representations Ravfogel et al. 2020, for the purpose of bias mitigation in text classification. The method is based on repeated training of linear classifiers that predict a certain property, followed by projection of the representations on their null-space so that the classifiers become oblivious to that target property. We also improve the paper’s results by applying the method on toxicity classification of text comments. The goal is to remove protected features of target property in the classifier, without sacrificing the accuracy of toxicity classification.

We focused on three bias classes: gender bias, racial bias, and religious bias. The results show that the method can be successfully applied to gender and racial bias mitigation in toxicity classification.

Religion WorldCloud:

religion

Gender WorldCloud:

gender

Race WorldCloud:

race

Results:

alt text

alt text

alt text

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •