# How bias changes with training data

Notebook contents:
1. Bias metrics in the existing literature
2. Training of word2vec on different training datasets (of varying sizes and sources)
3. Visualizations and computations of bias metrics

## 1. Bias metrics in the existing literature

There are 3 prominent metrics in the literature (among others): projection of word embeddings along a gender direction (Bolukbasi, modified by Nissim), WEAT, and WEFAT (Caliskan, Bryson, Narayanan). Below we summarize these papers.

#### 1. Bolukbasi et al. (NIPS 2016): "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings" https://arxiv.org/abs/1607.06520

#### 2. Nissim, van Noord, and van der Goot (2019): "Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor" https://arxiv.org/abs/1905.09866

#### 3. Caliskan, Bryson, and Narayanan (Science 2017): "Semantics derived automatically from language corpora contain human-like biases" https://purehost.bath.ac.uk/ws/portalfiles/portal/168480066/CaliskanEtAl_authors_full.pdf

* IAT is one technique used (outside of word embeddings) to measure implicit human biases:
    * "Implicit Association Test": assessment (on humans) introduced by Greenwald et al., where a word is categorized into one of two categories, and faster reaction time is considered as more deeply rooted association
* Introduces new statistical test **WEAT** (Word Embedding Association Test) to measure biases in word embeddings
    * analogous to the IAT: interpretation is "how separated the two distributions (of associations between target and attribute) are"
    * inputs:
        * 2 sets of target words:
            * X. programmer/engineer/scientist/...
            * Y. nurse/teacher/librarian/...) 
        * 2 sets of attribute words:
            * A. man/male/...
            * B. woman/female/...):
    * test statistic:
        * intuitively, difference between association of two sets of target words, with attributes
        * $s(X, Y, A, B) = \sum_{x \in X} s(x, A, B) - \sum_{y \in Y} s(y, A, B)$
        * $s(w, A, B) = mean_{a \in A} cos(w, a) - mean_{b \in B} cos(w, b)$
    * p-value of permutation test (permuting target words)
        * $Pr_i[s(X_i, Y_i, A, B) > s(X, Y, A, B)]$
        * effect size: $\frac{mean_{x \in X} s(x, A, B) - mean_{y \in Y} s(y, A, B)}{sd_{w \in X \bigcup Y} s(w, A, B)}$
    * Obtains similar results to original finding in Greenwald et al.
* Introduces **WEFAT** (Word Embedding Factual Association Test):
    * instead of using target word embeddings, use real-valued factual property, e.g. % female workers in occupation)
    * difference in avg. cos similarity (between attribute A and target property, vs. attribute B and target property), divided by standard deviation of cos similarity (across each combination of attribute - target property) 
    * high correlation between % of women in different occupations, vs. strength of association of word vector w/ female gender

## 2. word2vec model training

Dataset grid: 20 datasets

Training size:
* 25%
* 50%
* 75%
* 100%

Source:
* TweetEval: labeled tweet dataset (e.g. sentiment, hate, emotion)
* Reddit: unlabeled reddit post dataset
* CNN/DailyMail: news article text with highlights
* Pretrained historical word vectors: pre-trained word vectors trained on historical books from various decades from 1880s - 1990s
* The New York Times Annotated Corpus: articles with metadata from 1987-2007

# 3. Visualization of bias metrics

Useful package for automatic computation of the above bias metrics: https://docs.responsibly.ai/notebooks/demo-word-embedding-bias.html