Skip to content


Generalized Fairness Metrics

This repository contains the source code for the paper:

Quantifying Social Biases in NLP: A Generalization and Empirical Comparison of Extrinsic Fairness Metrics
Paula Czarnowska, Yogarshi Vyas, Kashif Shah
Transaction of the Association for Computational Linguistics (TACL), 2021

Reproducing classification experiments:

  1. Change the MODELSDIR variable in and the OUTDIR variable in to where your models will be/are saved.

  2. Change the CUDA variable in to the appropriate version of CUDA.

  3. Run to:

    • fetch the required submodules
    • create and activate a new environment named btools based on the requirements.yml
    • download the SemEval valence classification data
  4. Train the models from the config files in the experiments directory:

    ./ train=1 DATASET=semeval-2 exp=experiments/roberta.jsonnet
    ./ train=1 DATASET=semeval-3 exp=experiments/roberta.jsonnet

  5. Create the test suites and test the models. The plots for the results are saved in the plots directory:

    conda activate btools
    python3 --classification --create-tests

Reproducing NER experiments:

  1. Run the setup steps (1 and 2 above).

  2. Get the CoNLL2003 data ( Place the eng.train, eng.testa and eng.testb files in datasets/conll2003/ner directory.

  3. Train the model:

    ./ train=1 DATASET=conll2003 exp=experiments/ner-roberta.jsonnet

  4. Test the trained model:

    python3 --ner

    or, if you haven't created the test suites yet:

    python3 --ner --create-tests

Metric implementations:

Implementations of all metrics can be found in expanded_checklist/checklist/tests.
The code for generalized metrics is located in expanded_checklist/checklist/tests/abstract_tests/


The code in the expanded_checklist directory is a restructured and expanded version of the repository

containing the code for testing NLP Models as described in the following paper:

Beyond Accuracy: Behavioral Testing of NLP models with CheckList
Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh
Association for Computational Linguistics (ACL), 2020


See CONTRIBUTING for more information.


This project is licensed under the Apache-2.0 License.