# WEFE Rankings and Correlations.

The following code will show how to perform a massive amount of bias testing on different bias criteria (gender, ethnicity and religion) and using different embedding models and metrics. Then, from the above, we will create rankings according to the evaluated bias criteria and plot their correlations.

The final idea of this is to see if, through the different bias rankings we calculate, we can safely say that there are embeddings that are less biased than others or not.

For this, 

1. We create the queries. These are separated by type of bias explored: gender, ethnicity and religion.

2. We indicate the list of embeddings to be loaded later. Only public model support is shown in the gensim API (it will be updated later).

3. We create some runners, which are just wrappers to make the queries compatible and the results more robust.


4. We will execute the queries on all the embeddings and using all the metrics.

5. We will create rankings of the results by evaluated bias criteria and also an overall, which contains the sum of all previous rankings.

6. We graph the rankings.

7. We calculate and graph the correlations of the rankings.



In general, this code takes about an hour to run.


In [5]:
%load_ext autoreload
%autoreload 2

In [6]:
import pandas as pd
import numpy as np
from functools import reduce
import gensim.downloader as api
import os

from wefe.datasets import (
    load_weat,
    fetch_eds,
    fetch_debias_multiclass,
    fetch_debiaswe,
    load_bingliu,
)
from wefe.query import Query
from wefe.word_embedding_model import WordEmbeddingModel
from wefe.metrics import WEAT, RNSB, RND


from wefe.utils import (
    run_queries,
    plot_queries_results,
    create_ranking,
    plot_ranking,
    calculate_ranking_correlations,
    plot_ranking_correlations,
)
from plotly.subplots import make_subplots




## Queries

### Load the word sets

In [7]:
WEAT_wordsets = load_weat()

RND_wordsets = fetch_eds()
sentiments_wordsets = load_bingliu()
debias_multiclass_wordsets = fetch_debias_multiclass()


### Ethnicity Queries


In [8]:
eth_1 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_black"]],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["White last names", "Black last names"],
    ["Pleasant", "Unpleasant"],
)

eth_2 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_asian"]],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["White last names", "Asian last names"],
    ["Pleasant", "Unpleasant"],
)

eth_3 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_hispanic"]],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["White last names", "Hispanic last names"],
    ["Pleasant", "Unpleasant"],
)

eth_4 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_black"]],
    [RND_wordsets["occupations_white"], RND_wordsets["occupations_black"]],
    ["White last names", "Black last names"],
    ["Occupations white", "Occupations black"],
)

eth_5 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_asian"]],
    [RND_wordsets["occupations_white"], RND_wordsets["occupations_asian"]],
    ["White last names", "Asian last names"],
    ["Occupations white", "Occupations asian"],
)

eth_6 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_hispanic"]],
    [RND_wordsets["occupations_white"], RND_wordsets["occupations_hispanic"]],
    ["White last names", "Hispanic last names"],
    ["Occupations white", "Occupations hispanic"],
)

eth_sent_1 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_black"]],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["White last names", "Black last names"],
    ["Positive words", "Negative words"],
)

eth_sent_2 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_asian"]],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["White last names", "Asian last names"],
    ["Positive words", "Negative words"],
)

eth_sent_3 = Query(
    [RND_wordsets["names_white"], RND_wordsets["names_hispanic"]],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["White last names", "Hispanic last names"],
    ["Positive words", "Negative words"],
)

ethnicity_queries = [
    eth_1,
    eth_2,
    eth_3,
    eth_4,
    eth_5,
    eth_6,
    eth_sent_1,
    eth_sent_2,
    eth_sent_3,
]


### Gender queries

In [9]:
gender_1 = Query(


    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [WEAT_wordsets["career"], WEAT_wordsets["family"]],
    ["Male terms", "Female terms"],
    ["Career", "Family"],
)

gender_2 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [WEAT_wordsets["math"], WEAT_wordsets["arts"]],
    ["Male terms", "Female terms"],
    ["Math", "Arts"],
)

gender_3 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [WEAT_wordsets["science"], WEAT_wordsets["arts_2"]],
    ["Male terms", "Female terms"],
    ["Science", "Arts"],
)

gender_4 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [RND_wordsets["adjectives_intelligence"], RND_wordsets["adjectives_appearance"]],
    ["Male terms", "Female terms"],
    ["Intelligence", "Appearence"],
)

gender_5 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [RND_wordsets["adjectives_intelligence"], RND_wordsets["adjectives_sensitive"]],
    ["Male terms", "Female terms"],
    ["Intelligence", "Sensitive"],
)

gender_6 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["Male terms", "Female terms"],
    ["Pleasant", "Unpleasant"],
)

gender_sent_1 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["Male terms", "Female terms"],
    ["Positive words", "Negative words"],
)

gender_role_1 = Query(
    [RND_wordsets["male_terms"], RND_wordsets["female_terms"]],
    [
        debias_multiclass_wordsets["male_roles"],
        debias_multiclass_wordsets["female_roles"],
    ],
    ["Male terms", "Female terms"],
    ["Man Roles", "Woman Roles"],
)

gender_queries = [
    gender_1,
    gender_2,
    gender_3,
    gender_4,
    gender_5,
    gender_sent_1,
    gender_role_1,
]


### Religion queries

In [10]:
rel_1 = Query(

    [
        debias_multiclass_wordsets["christianity_terms"],
        debias_multiclass_wordsets["islam_terms"],
    ],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["Christianity terms", "Islam terms"],
    ["Pleasant", "Unpleasant"],
)

rel_2 = Query(
    [
        debias_multiclass_wordsets["christianity_terms"],
        debias_multiclass_wordsets["judaism_terms"],
    ],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["Christianity terms", "Judaism terms"],
    ["Pleasant", "Unpleasant"],
)

rel_3 = Query(
    [
        debias_multiclass_wordsets["islam_terms"],
        debias_multiclass_wordsets["judaism_terms"],
    ],
    [WEAT_wordsets["pleasant_5"], WEAT_wordsets["unpleasant_5"]],
    ["Islam terms", "Judaism terms"],
    ["Pleasant", "Unpleasant"],
)

rel_4 = Query(
    [
        debias_multiclass_wordsets["christianity_terms"],
        debias_multiclass_wordsets["islam_terms"],
    ],
    [
        debias_multiclass_wordsets["conservative"],
        debias_multiclass_wordsets["terrorism"],
    ],
    ["Christianity terms", "Islam terms"],
    ["Conservative", "Terrorism"],
)

rel_5 = Query(
    [
        debias_multiclass_wordsets["christianity_terms"],
        debias_multiclass_wordsets["judaism_terms"],
    ],
    [debias_multiclass_wordsets["conservative"], debias_multiclass_wordsets["greed"]],
    ["Christianity terms", "Jew terms"],
    ["Conservative", "Greed"],
)

rel_6 = Query(
    [
        debias_multiclass_wordsets["islam_terms"],
        debias_multiclass_wordsets["judaism_terms"],
    ],
    [debias_multiclass_wordsets["terrorism"], debias_multiclass_wordsets["greed"]],
    ["Islam terms", "Jew terms"],
    ["Terrorism", "Greed"],
)

rel_sent_1 = Query(
    [
        debias_multiclass_wordsets["christianity_terms"],
        debias_multiclass_wordsets["islam_terms"],
    ],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["Christianity terms", "Islam terms"],
    ["Positive words", "Negative words"],
)

rel_sent_2 = Query(
    [
        debias_multiclass_wordsets["christianity_terms"],
        debias_multiclass_wordsets["judaism_terms"],
    ],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["Christianity terms", "Jew terms"],
    ["Positive words", "Negative words"],
)

rel_sent_3 = Query(
    [
        debias_multiclass_wordsets["islam_terms"],
        debias_multiclass_wordsets["judaism_terms"],
    ],
    [sentiments_wordsets["positive_words"], sentiments_wordsets["negative_words"]],
    ["Islam terms", "Jew terms"],
    ["Positive words", "Negative words"],
)

religion_queries = [
    rel_1,
    rel_2,
    rel_3,
    rel_4,
    rel_5,
    rel_6,
    rel_sent_1,
    rel_sent_2,
    rel_sent_3,
]


In [11]:
queries_sets = {

    'Gender' : gender_queries,
    'Ethnicity': ethnicity_queries,
    'Religion': religion_queries,
}

## Models

### Set the models list

In [12]:
models = [
    {
        "name": "lexvec-commoncrawl W+C dim=300",
        "source": "file",
        "path": "./lexvec.commoncrawl.300d.W+C.pos.neg3.vectors",  # path to the local model
    },
    {"name": "glove-twitter-200", "source": "gensim"},
    {"name": "glove-wiki-gigaword-300", "source": "gensim"},
    {
        "name": "word2vec-gender-hard-debiased dim=300",
        "source": "file",
        "path": "./GoogleNews-vectors-negative300-hard-debiased.bin",  # path to the local model
        "keep_loaded": False,
    },
    {"name": "word2vec-google-news-300", "source": "gensim", "keep_loaded": True,},
    {
        "name": "fasttext-wiki-news-subwords-300",
        "source": "gensim",
        "keep_loaded": False,
    },
    {
        "name": "conceptnet-numberbatch 19.08-en dim=300",
        "source": "file",
        "path": "./numberbatch-en.txt",  # path to the local model,
        "keep_loaded": False,
        # "prefix": "/c/en/",
    },
]


## Run The Experiments


The following code will run the experiments varying these three variables: 

- metrics = WEAT, WEAT effect size, RND, RNSB

- queries = Gender, Ethnicity and Religion.

- embeddings = all specified before.

In [56]:
from gensim.models import KeyedVectors

RNSB_NUM_ITERATIONS = 30


def run_all(queries_sets, models):

    if not os.path.exists("./results"):
        os.mkdir("./results")

    # load the models
    # the models are loaded in a deferred way so as not to saturate the RAM
    for model in models:
        model_name = model["name"]
        model_source = model["source"]

        model_prefix = model["prefix"] if "prefix" in model else None

        if "loaded" not in model:
            if model_source == "gensim":
                print(f"Loading {model_name} from gensim downloader")
                gensim_model = api.load(model_name)

                print("Model loaded successfully.")

            else:
                try:
                    print(f"Loading {model_name} from a file")

                    model_path = model["path"]
                    gensim_model = KeyedVectors.load_word2vec_format(model_path)

                except:
                    gensim_model = KeyedVectors.load_word2vec_format(
                        model_path, binary=True
                    )

            if "keep_loaded" in model and model["keep_loaded"]:
                model["loaded"] = gensim_model

        else:
            gensim_model = model["loaded"]

        loaded_model = WordEmbeddingModel(gensim_model, model_name, model_prefix)
        model = [loaded_model]

        # for each query set, run all queries:
        for queries_set_name, queries_set in queries_sets.items():

            # ------------------------------------------------------
            # WEAT
            print(f"Running {queries_set_name} queries using WEAT")
            weat_scores = run_queries(
                WEAT,
                queries_set,
                model,
                queries_set_name=queries_set_name,
                aggregate_results=True,
                metric_params={"secondary_preprocessor_args": {"lowercase": True,}},
                aggregation_function="abs_avg",
                warn_filtered_words=False,
            )

            # ------------------------------------------------------
            # WEAT Effect Size
            print(f"Running {queries_set_name} queries using WEAT Effect Size")
            weat_es_scores = run_queries(
                WEAT,
                queries_set,
                model,
                queries_set_name=queries_set_name,
                metric_params={
                    "return_effect_size": True,
                    "secondary_preprocessor_args": {"lowercase": True,},
                },
                aggregate_results=True,
                aggregation_function="abs_avg",
                warn_filtered_words=False,
            )

            last_col = weat_es_scores.columns[-1]
            weat_es_scores = weat_es_scores.rename(
                columns={last_col: last_col.replace("WEAT", "WEAT ES")}
            )

            # ------------------------------------------------------
            # RND
            print(f"Running {queries_set_name} queries using RND")
            rnd_scores = run_queries(
                RND,
                queries_set,
                model,
                metric_params={"secondary_preprocessor_args": {"lowercase": True,}},
                queries_set_name=queries_set_name,
                aggregate_results=True,
                aggregation_function="abs_avg",
                generate_subqueries=True,
                warn_filtered_words=False,
            )

            # ------------------------------------------------------
            # RNSB
            print(f"Running {queries_set_name} queries using RNSB")
            rnsb_scores = run_queries(
                RNSB,
                queries_set,
                model,
                queries_set_name=queries_set_name,
                metric_params={
                    "num_iterations": RNSB_NUM_ITERATIONS,
                    "secondary_preprocessor_args": {"lowercase": True,},
                },
                aggregate_results=True,
                aggregation_function="abs_avg",
                warn_filtered_words=False,
            )

            # ------------------------------------------------------
            # Save results

            for metric_name, metric_result in [
                ("WEAT", weat_scores),
                ("WEAT_ES", weat_es_scores),
                ("RND", rnd_scores),
                ("RNSB", rnsb_scores),
            ]:

                if os.path.isfile(
                    "./results/{}_{}.csv".format(queries_set_name, metric_name)
                ):
                    saved_results = pd.read_csv(
                        "./results/{}_{}.csv".format(queries_set_name, metric_name),
                        index_col=0,
                    )
                    metric_result = pd.concat([metric_result, saved_results], axis=0)
                metric_result.to_csv(
                    "./results/{}_{}.csv".format(queries_set_name, metric_name)
                )
        print(
            f"Queries executed and saved correctly for {model_name}."
            "\n----------------------------------\n"
        )



### Run!

In [20]:
run_all(queries_sets, models)

INFO:gensim.models.keyedvectors:loading projection weights from ./lexvec.commoncrawl.300d.W+C.pos.neg3.vectors
DEBUG:smart_open.smart_open_lib:{'uri': './lexvec.commoncrawl.300d.W+C.pos.neg3.vectors', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}


Loading lexvec-commoncrawl W+C dim=300 from a file


DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (2000000, 300) matrix of type float32 from ./lexvec.commoncrawl.300d.W+C.pos.neg3.vectors', 'binary': False, 'encoding': 'utf8', 'datetime': '2021-09-27T12:25:49.968425', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB
Running Ethnicity queries using WEAT
Running Ethnicity queries using WEAT Effect Size
Running Ethnicity queries using RND
Running Ethnicity queries using RNSB
Running Religion queries using WEAT
Running Religion queries using WEAT Effect Size
Running Religion queries using RND
Running Religion queries using RNSB
Queries executed and saved correctly for lexvec-commoncrawl W+C dim=300.
----------------------------------

Loading glove-twitter-200 from gensim downloader


INFO:gensim.models.keyedvectors:loading projection weights from /home/pablo/gensim-data/glove-twitter-200/glove-twitter-200.gz
DEBUG:smart_open.smart_open_lib:{'uri': '/home/pablo/gensim-data/glove-twitter-200/glove-twitter-200.gz', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}
DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (1193514, 200) matrix of type float32 from /home/pablo/gensim-data/glove-twitter-200/glove-twitter-200.gz', 'binary': False, 'encoding': 'utf8', 'datetime': '2021-09-27T12:29:17.912042', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Model loaded successfully.
Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB


ERROR:root:At least one set of 'White last names and Asian last names wrt Occupations white and Occupations asian' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Ethnicity queries using WEAT


ERROR:root:At least one set of 'White last names and Asian last names wrt Occupations white and Occupations asian' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Ethnicity queries using WEAT Effect Size


ERROR:root:At least one set of 'White last names and Asian last names wrt Occupations asian' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Ethnicity queries using RND
Running Ethnicity queries using RNSB


ERROR:root:At least one set of 'White last names and Asian last names wrt Occupations white and Occupations asian' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using WEAT
Running Religion queries using WEAT Effect Size
Running Religion queries using RND
Running Religion queries using RNSB


INFO:gensim.models.keyedvectors:loading projection weights from /home/pablo/gensim-data/glove-wiki-gigaword-300/glove-wiki-gigaword-300.gz
DEBUG:smart_open.smart_open_lib:{'uri': '/home/pablo/gensim-data/glove-wiki-gigaword-300/glove-wiki-gigaword-300.gz', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}


Queries executed and saved correctly for glove-twitter-200.
----------------------------------

Loading glove-wiki-gigaword-300 from gensim downloader


DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (400000, 300) matrix of type float32 from /home/pablo/gensim-data/glove-wiki-gigaword-300/glove-wiki-gigaword-300.gz', 'binary': False, 'encoding': 'utf8', 'datetime': '2021-09-27T12:31:04.762978', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Model loaded successfully.
Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB
Running Ethnicity queries using WEAT
Running Ethnicity queries using WEAT Effect Size
Running Ethnicity queries using RND
Running Ethnicity queries using RNSB
Running Religion queries using WEAT
Running Religion queries using WEAT Effect Size
Running Religion queries using RND
Running Religion queries using RNSB


INFO:gensim.models.keyedvectors:loading projection weights from ./GoogleNews-vectors-negative300-hard-debiased.bin
DEBUG:smart_open.smart_open_lib:{'uri': './GoogleNews-vectors-negative300-hard-debiased.bin', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}
INFO:gensim.models.keyedvectors:loading projection weights from ./GoogleNews-vectors-negative300-hard-debiased.bin
DEBUG:smart_open.smart_open_lib:{'uri': './GoogleNews-vectors-negative300-hard-debiased.bin', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}


Queries executed and saved correctly for glove-wiki-gigaword-300.
----------------------------------

Loading word2vec-gender-hard-debiased dim=300 from a file


DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (3000000, 300) matrix of type float32 from ./GoogleNews-vectors-negative300-hard-debiased.bin', 'binary': True, 'encoding': 'utf8', 'datetime': '2021-09-27T12:33:41.352495', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB
Running Ethnicity queries using WEAT
Running Ethnicity queries using WEAT Effect Size
Running Ethnicity queries using RND
Running Ethnicity queries using RNSB


  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_s

Running Religion queries using WEAT


ERROR:root:At least one set of 'Christianity terms and Islam terms wrt Conservative and Terrorism' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.
ERROR:root:At least one set of 'Christianity terms and Jew terms wrt Conservative and Greed' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using WEAT Effect Size


ERROR:root:At least one set of 'Christianity terms and Islam terms wrt Conservative' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.
ERROR:root:At least one set of 'Christianity terms and Jew terms wrt Conservative' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using RND


ERROR:root:At least one set of 'Christianity terms and Islam terms wrt Conservative and Terrorism' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.
ERROR:root:At least one set of 'Christianity terms and Jew terms wrt Conservative and Greed' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using RNSB
Queries executed and saved correctly for word2vec-gender-hard-debiased dim=300.
----------------------------------

Loading word2vec-google-news-300 from gensim downloader


INFO:gensim.models.keyedvectors:loading projection weights from /home/pablo/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz
DEBUG:smart_open.smart_open_lib:{'uri': '/home/pablo/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}
DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (3000000, 300) matrix of type float32 from /home/pablo/gensim-data/word2vec-google-news-300/word2vec-google-news-300.gz', 'binary': True, 'encoding': 'utf8', 'datetime': '2021-09-27T12:37:56.509864', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Model loaded successfully.
Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB
Running Ethnicity queries using WEAT
Running Ethnicity queries using WEAT Effect Size
Running Ethnicity queries using RND
Running Ethnicity queries using RNSB


  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_sets
  [probability[:, 1] for probability in probabilities])
  for target_embeddings in target_embeddings_s

Running Religion queries using WEAT


ERROR:root:At least one set of 'Christianity terms and Islam terms wrt Conservative and Terrorism' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.
ERROR:root:At least one set of 'Christianity terms and Jew terms wrt Conservative and Greed' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using WEAT Effect Size


ERROR:root:At least one set of 'Christianity terms and Islam terms wrt Conservative' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.
ERROR:root:At least one set of 'Christianity terms and Jew terms wrt Conservative' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using RND


ERROR:root:At least one set of 'Christianity terms and Islam terms wrt Conservative and Terrorism' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.
ERROR:root:At least one set of 'Christianity terms and Jew terms wrt Conservative and Greed' query has proportionally fewer embeddings than allowed by the lost_vocabulary_threshold parameter (0.2). This query will return np.nan.


Running Religion queries using RNSB


INFO:gensim.models.keyedvectors:loading projection weights from /home/pablo/gensim-data/fasttext-wiki-news-subwords-300/fasttext-wiki-news-subwords-300.gz
DEBUG:smart_open.smart_open_lib:{'uri': '/home/pablo/gensim-data/fasttext-wiki-news-subwords-300/fasttext-wiki-news-subwords-300.gz', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}


Queries executed and saved correctly for word2vec-google-news-300.
----------------------------------

Loading fasttext-wiki-news-subwords-300 from gensim downloader


DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (999999, 300) matrix of type float32 from /home/pablo/gensim-data/fasttext-wiki-news-subwords-300/fasttext-wiki-news-subwords-300.gz', 'binary': False, 'encoding': 'utf8', 'datetime': '2021-09-27T12:41:15.112569', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Model loaded successfully.
Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB
Running Ethnicity queries using WEAT
Running Ethnicity queries using WEAT Effect Size
Running Ethnicity queries using RND
Running Ethnicity queries using RNSB
Running Religion queries using WEAT
Running Religion queries using WEAT Effect Size
Running Religion queries using RND
Running Religion queries using RNSB


INFO:gensim.models.keyedvectors:loading projection weights from ./numberbatch-en.txt
DEBUG:smart_open.smart_open_lib:{'uri': './numberbatch-en.txt', 'mode': 'rb', 'buffering': -1, 'encoding': None, 'errors': None, 'newline': None, 'closefd': True, 'opener': None, 'ignore_ext': False, 'compression': None, 'transport_params': None}


Queries executed and saved correctly for fasttext-wiki-news-subwords-300.
----------------------------------

Loading conceptnet-numberbatch 19.08-en dim=300 from a file


DEBUG:gensim.utils:starting a new internal lifecycle event log for KeyedVectors
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (516782, 300) matrix of type float32 from ./numberbatch-en.txt', 'binary': False, 'encoding': 'utf8', 'datetime': '2021-09-27T12:44:29.931211', 'gensim': '4.0.1', 'python': '3.7.11 (default, Jul 27 2021, 14:32:16) \n[GCC 7.5.0]', 'platform': 'Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid', 'event': 'load_word2vec_format'}


Running Gender queries using WEAT
Running Gender queries using WEAT Effect Size
Running Gender queries using RND
Running Gender queries using RNSB
Running Ethnicity queries using WEAT
Running Ethnicity queries using WEAT Effect Size
Running Ethnicity queries using RND
Running Ethnicity queries using RNSB
Running Religion queries using WEAT
Running Religion queries using WEAT Effect Size
Running Religion queries using RND
Running Religion queries using RNSB
Queries executed and saved correctly for conceptnet-numberbatch 19.08-en dim=300.
----------------------------------



In [13]:
import pandas as pd

def read_results(queries_sets):
    aggregated_results_by_set = []

    queries_results = {"WEAT": [], "WEAT_ES": [], "RND": [], "RNSB": []}

    for queries_set_name, queries_set in queries_sets.items():
        loaded_results = []
        for metric in ["WEAT", "WEAT_ES", "RND", "RNSB"]:
            all_results = pd.read_csv(
                "./results/{}_{}.csv".format(queries_set_name, metric), index_col=0
            )

            # add only the aggregated results by metric
            aggregated_results = all_results.iloc[:, -1:]
            aggregated_results.columns = map(
                lambda x: x.split(":")[0], aggregated_results.columns
            )

            # add all the queries and discard the aggregated results
            queries_results[metric].append(all_results.iloc[:, :-1])

            loaded_results.append(aggregated_results)
        aggregated_results_by_set.append(pd.concat(loaded_results, axis=1))

    overall = pd.DataFrame([])
    for metric in queries_results:
        queries_results[metric] = (
            pd.concat(queries_results[metric], axis=1).abs().mean(axis=1).to_frame(metric)
        )
        overall = pd.concat([overall, queries_results[metric]], axis=1)

    # overall = pd.concat(all_queries, axis=1).mean(axis=1).to_frame("Overall")
    aggregated_results_by_set.append(overall)
    return aggregated_results_by_set



In [15]:
read_results(queries_sets)[3].round(2).loc[
            [
                "conceptnet-numberbatch 19.08-en dim=300",
                "fasttext-wiki-news-subwords-300",
                "glove-twitter-200",
                "glove-wiki-gigaword-300",
                "lexvec-commoncrawl W+C dim=300",
                "word2vec-gender-hard-debiased dim=300",
                "word2vec-google-news-300",
            ]]

Unnamed: 0_level_0,WEAT,WEAT_ES,RND,RNSB
model_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
conceptnet-numberbatch 19.08-en dim=300,0.14,0.61,0.03,0.04
fasttext-wiki-news-subwords-300,0.24,0.66,0.08,0.04
glove-twitter-200,0.26,0.67,0.27,0.11
glove-wiki-gigaword-300,0.48,0.95,0.26,0.14
lexvec-commoncrawl W+C dim=300,0.31,0.7,0.69,0.12
word2vec-gender-hard-debiased dim=300,0.15,0.57,0.02,0.03
word2vec-google-news-300,0.38,0.81,0.15,0.09


In [16]:
def make_rankings(queries_sets):

    rankings = []
    ranking_plots = []
    for queries_set_name, queries_set in queries_sets.items():
        results_by_queries_set = []
        for metric in ['WEAT', 'WEAT_ES', 'RND', 'RNSB']:

            current_result = pd.read_csv('./results/{}_{}.csv'.format(
                queries_set_name, metric),
                                         index_col=0)
            results_by_queries_set.append(current_result)
        current_ranking = create_ranking(results_by_queries_set).astype(int)
        current_ranking.columns = map(lambda x: x.split(':')[0],
                                      current_ranking.columns)

        ranking_plot = plot_ranking(current_ranking, use_metric_as_facet=False)
        ranking_plot.update_layout(width=1200)
        rankings.append(current_ranking)
        ranking_plots.append(ranking_plot)

    general_ranking = reduce(lambda x, y: x.add(y, fill_value=0), rankings)
    general_ranking_plot = plot_ranking(general_ranking,
                                        use_metric_as_facet=False)
    general_ranking_plot.update_layout(width=1200)

    rankings.append(general_ranking)
    ranking_plots.append(general_ranking_plot)

    return rankings, ranking_plots

In [17]:
def get_table_1(rankings, results):
    def get_table(rankings, results, idx):

        table = rankings[idx].copy().astype(str)

        for row, res in results[idx].iterrows():
            for col, col_value in zip(res.index, res.values):
                table.at[row, col] = (
                    str(round(col_value, 2))
                    + " ("
                    + str(rankings[0].loc[row, col])
                    + ")"
                )

        table = table.loc[
            [
                "conceptnet-numberbatch 19.08-en dim=300",
                "fasttext-wiki-news-subwords-300",
                "glove-twitter-200",
                "glove-wiki-gigaword-300",
                "lexvec-commoncrawl W+C dim=300",
                "word2vec-gender-hard-debiased dim=300",
                "word2vec-google-news-300",
            ]
        ]
        return table

    return (
        get_table(rankings, results, 0),
        get_table(rankings, results, 1),
        get_table(rankings, results, 2),
        get_table(rankings, results, 3),
    )

In [18]:
results = read_results(queries_sets)
rankings, ranking_plots = make_rankings(queries_sets)
gender_table, ethnicity_table, religion_table, overall_table = get_table_1(rankings, results)

KeyError: 'WEAT_ES'

### Gender Results

In [122]:
from IPython.display import display

display(gender_table)
fig = ranking_plots[0]
fig.update_layout({"title": "Gender Ranking"})
fig.show()

Unnamed: 0_level_0,WEAT,WEAT EZ,RND,RNSB
model_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
conceptnet-numberbatch 19.08-en dim=300,0.2 (2),0.36 (2),0.01 (2),0.02 (3)
fasttext-wiki-news-subwords-300,0.45 (4),0.69 (5),0.02 (3),0.02 (2)
glove-twitter-200,0.39 (3),0.45 (3),0.13 (5),0.05 (5)
glove-wiki-gigaword-300,0.83 (6),0.64 (4),0.18 (6),0.07 (7)
lexvec-commoncrawl W+C dim=300,0.7 (5),0.78 (6),0.33 (7),0.07 (6)
word2vec-gender-hard-debiased dim=300,0.09 (1),0.18 (1),0.0 (1),0.01 (1)
word2vec-google-news-300,0.83 (7),0.94 (7),0.08 (4),0.03 (4)


### Ethnicity Results

In [124]:
from IPython.display import display

display(ethnicity_table)
fig = ranking_plots[1]
fig.update_layout({"title": "Ethnicity Ranking"})
fig.show()


Unnamed: 0_level_0,WEAT,WEAT EZ,RND,RNSB
model_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
conceptnet-numberbatch 19.08-en dim=300,0.12 (2),0.44 (2),0.03 (2),0.05 (3)
fasttext-wiki-news-subwords-300,0.16 (4),0.45 (5),0.06 (3),0.04 (2)
glove-twitter-200,0.28 (3),0.68 (3),0.18 (5),0.12 (5)
glove-wiki-gigaword-300,0.42 (6),0.94 (4),0.27 (6),0.12 (7)
lexvec-commoncrawl W+C dim=300,0.12 (5),0.4 (6),0.75 (7),0.17 (6)
word2vec-gender-hard-debiased dim=300,0.17 (1),0.51 (1),0.03 (1),0.04 (1)
word2vec-google-news-300,0.18 (7),0.53 (7),0.15 (4),0.1 (4)


### Religion Rankings

In [125]:
from IPython.display import display

display(religion_table)
fig = ranking_plots[2]
fig.update_layout({"title": "Religion Ranking"})
fig.show()


Unnamed: 0_level_0,WEAT,WEAT EZ,RND,RNSB
model_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
conceptnet-numberbatch 19.08-en dim=300,0.1 (2),0.98 (2),0.05 (2),0.04 (3)
fasttext-wiki-news-subwords-300,0.15 (4),0.84 (5),0.13 (3),0.05 (2)
glove-twitter-200,0.15 (3),0.82 (3),0.44 (5),0.16 (5)
glove-wiki-gigaword-300,0.26 (6),1.19 (4),0.31 (6),0.21 (7)
lexvec-commoncrawl W+C dim=300,0.2 (5),0.93 (6),0.87 (7),0.11 (6)
word2vec-gender-hard-debiased dim=300,0.18 (1),1.05 (1),0.03 (1),0.04 (1)
word2vec-google-news-300,0.18 (7),1.05 (7),0.19 (4),0.12 (4)


### Overall Rankings 

In [126]:
from IPython.display import display

display(overall_table)
fig = ranking_plots[3]
fig.update_layout({"title": "Overall Ranking"})
fig.show()

Unnamed: 0_level_0,WEAT,WEAT EZ,RND,RNSB
model_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
conceptnet-numberbatch 19.08-en dim=300,0.14 (2),0.59 (2),0.03 (2),0.03 (3)
fasttext-wiki-news-subwords-300,0.26 (4),0.66 (5),0.07 (3),0.03 (2)
glove-twitter-200,0.27 (3),0.65 (3),0.25 (5),0.11 (5)
glove-wiki-gigaword-300,0.5 (6),0.93 (4),0.25 (6),0.13 (7)
lexvec-commoncrawl W+C dim=300,0.34 (5),0.71 (6),0.65 (7),0.12 (6)
word2vec-gender-hard-debiased dim=300,0.14 (1),0.58 (1),0.02 (1),0.03 (1)
word2vec-google-news-300,0.4 (7),0.84 (7),0.14 (4),0.08 (4)


## Correlations between rankings

The last step. Here, we will calculate the correlation between the rankings by metric.
These results will show how the rankings by metric match.

The bluer the correlation matrix, the more confident it is that the rankings are indicating the same thing: that there are some embeddings that are less biased than others.

In [127]:
gender_correlations = calculate_ranking_correlations(rankings[0])
ethnicity_correlations = calculate_ranking_correlations(rankings[1])
religion_correlations = calculate_ranking_correlations(rankings[2])
overall_correlations = calculate_ranking_correlations(rankings[3])

In [128]:
correlations_plot = make_subplots(2,2)

fig = make_subplots(rows=2,
                    cols=2,
                    subplot_titles=("Gender Ranking Correlation",
                                    "Ehtnicity Ranking Correlation",
                                    "Religion Ranking Correlation",
                                    "Overall Ranking Correlation"))
fig.add_trace(plot_ranking_correlations(gender_correlations).data[0], row=1, col=1)
fig.add_trace(plot_ranking_correlations(ethnicity_correlations).data[0], row=1, col=2)
fig.add_trace(plot_ranking_correlations(religion_correlations).data[0], row=2, col=1)
fig.add_trace(plot_ranking_correlations(overall_correlations).data[0], row=2, col=2)

fig.update_layout(width=1200, height = 800)
fig.show()