# Replication of Caliskan et al. "Semantics derived automatically from language corpora contain human-like biases"

| Author | Last update |
|:------ |:----------- |
| Hauke Licht (https://github.com/haukelicht) | 2023-09-26 |

In their often-cited *Science* [publication](https://www.science.org/doi/10.1126/science.aal4230) "Semantics derived automatically from language corpora contain human-like biases," Caliskan, Bryson, and Narayanan propose a method for quantifying the biases captured in word embedding models.

In their abstract, they write:

> ... we show that applying machine learning to ordinary human language results in human-like semantic biases. 
> We replicate aspectrum of known biases, as measured by the *Implicit Association Test*, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the Web.
> Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names.


## Replication: goals and approach

In this notebook, we replicate their analyses.
Out goal is to see if we find the same patterns they describe in their publication.
We do this using their original word lists but a different word embedding model.

Our replication will focus on their **WEAT** metric – the *Word Embedding Association Test*.

## Setup

Before we can get going, let's set up our notebook:

In [1]:
import os
import json

import gensim
import gensim.downloader as api

data_path =  os.path.join('..', 'data', 'replications', 'caliskan_semantics_2017')

**_Note:_** 
We'll work with data collected by Github user `chadaeun` for his 'weat_replication' project.
The data relavant data is available at https://github.com/chadaeun/weat_replication/blob/master/weat/weat.json and has already been downloaded

## The WEAT

Let

- $X$ and $Y$ be two sets of **_target words_** of equal size (to be tested for association), and
- $A$ and $B$ the two sets of **_attribute words_** (indciating conceptual opposites).

#### Example

- The *target words* could be occupations ('programmer', 'engineer', 'scientist'; and 'nurse', 'teacher', 'librarian').
- The two sets of *attribute words* could be ('man', 'male') and ('woman', 'female').

#### Formula

The test statistic $s$ of the word-embedding association test (WEAT) is defined as 

$$
s(X, Y, A, B)=\sum_{x \in X} s(x, A, B)-\sum_{y \in Y} s(y, A, B)
$$

where 

$$
s(w, A, B) =\operatorname{mean}_{a \in A} \cos (\vec{w}, \vec{a})-\operatorname{mean}_{b \in B} \cos (\vec{w}, \vec{b})
$$

measures the association of $w$ with the attribute, and
$s(X,Y,A,B)$ measures the differential association of the two sets of target words $A$ and $B$ with the attribute.

The **null hypothesis** is that there is no difference between the two sets of target words in terms of their relative similarity to the two sets of attribute words, i.e., $\text{H}_0: s = 0$

## Example with code

### Get the lists of target and attribute words

Let's first load the dictionary of wordlists Caliskan et al. (2017) used:

In [2]:
fp = os.path.join(data_path, 'wordlists.json')
wordlists = json.load(open(fp))

In [3]:
list(wordlists.keys())

['Careers_Female_Male',
 'EuropeanAmerican_AfricanAmerican_Pleasant_Unpleasant',
 'EuropeanAmerican_AfricanAmerican_Pleasant_Unpleasant_2',
 'Flowers_Insects_Pleasant_Unpleasant',
 'Male_Female_Career_Family',
 'Math_Arts_Male_Female',
 'MusicalInstruments_Weapons_Pleasant_Unpleasant',
 'Names_Female_Male',
 'Science_Arts_Male_Female']

Let's start with the example of the WEAT for science and art attributes with target words representing the concepts "male" and "female".

In [4]:
data_dict = wordlists['Science_Arts_Male_Female']
data_dict.keys()

dict_keys(['A_key', 'Arts words', 'B_key', 'Female attributes', 'Male attributes', 'Science words', 'X_key', 'Y_key', 'attributes', 'method', 'targets'])

**_Note:_** Each top-level dictionary element contains a dictionary with the same set of keys.

We want to use the following attribute words list for A and B:

In [5]:
A_key = data_dict['A_key']
print('A:', data_dict[A_key])
B_key = data_dict['B_key']
print('B:', data_dict[B_key])

A: ['brother', 'father', 'uncle', 'grandfather', 'son', 'he', 'his', 'him']
B: ['sister', 'mother', 'aunt', 'grandmother', 'daughter', 'she', 'hers', 'her']


And for the target words X and Y, we use the following word lists:

In [6]:
X_key = data_dict['X_key']
print('X:', data_dict[X_key])
Y_key = data_dict['Y_key']
print('Y:', data_dict[Y_key])

X: ['science', 'technology', 'physics', 'chemistry', 'einstein', 'nasa', 'experiment', 'astronomy']
Y: ['poetry', 'art', 'shakespeare', 'dance', 'literature', 'novel', 'symphony', 'drama']


### Load a pre-trained embedding model and get target and attribute words' embeddings

Next, we need to get these words embeddings.
We'll use a word2vec model available with `gensim` and use a helper function to extract word vectors from it:

In [7]:
from typing import List
import numpy as np

model = api.load('word2vec-google-news-300')

def get_word_vectors(words: list):
    """
    Returns word vectors represent words
    :param words: iterable of words
    :return: (len(words), dim) shaped numpy ndarrary which is word vectors
    """
    words = [w for w in words if w in model.index_to_key]
    return model[words]

In [8]:
# test
tmp = get_word_vectors(['hello', 'world'])
tmp.shape

(2, 300)

Now we can get the word vectors for words in A, B, X, and Y.

In [9]:
A = get_word_vectors(data_dict[A_key])
B = get_word_vectors(data_dict[B_key])
X = get_word_vectors(data_dict[X_key])
Y = get_word_vectors(data_dict[Y_key])

### Computing association scores

Now we can compute 

$$
s(w, A, B) = \operatorname{mean}_{a \in A} \cos (\vec{w}, \vec{a})-\operatorname{mean}_{b \in B} \cos (\vec{w}, \vec{b})
$$

for all $\vec{w} \in \mathbf{X}$ and all $\vec{w} \in \mathbf{Y}$, respectively.

For this, we'll need **two helper functions**: 

1. one that normalized the word vectors to unit vectors, and 
2. another that can compute the consine similarity between two matrices. 

I have already implemented them.

In [14]:
def norm(vec):
    return vec / np.linalg.norm(vec)

def cos_sim(v1, v2):
    return np.clip(np.tensordot(norm(v1), norm(v2), axes=(-1, -1)), -1.0, 1.0)

In [17]:
A.shape

(8, 300)

Let's illustrate how `cos_sim()` works for $\mathbf{A}$ &mdash; our (8, 300) the matrix of embeddings from the list of 'Male attributes'.

We'll start with only the first row vector in $\mathbf{X}$:

In [15]:
# get first vector in X
w = X[0,:]
# TODO: compute w's similarity with each row-vector in A

array([0.01323731, 0.02442578, 0.01441231, 0.02500675, 0.01759214,
       0.01765213, 0.01422933, 0.011844  ], dtype=float32)

Since A has 8 rows, we get 8 similarity scores.

But to compute $\operatorname{mean}_{a \in A} \cos (\vec{w}, \vec{a})$ for $w$, we want to *average* these simiarities:

In [19]:
# TODO: compute the mean of w's similarity with each row-vector in A (i.e., what you've computed in the previois cell)

0.017299969

To get the first element in the WEAT forumla, we compute 

$$
s(w, A, B) =\operatorname{mean}_{a \in A} \cos (\vec{w}, \vec{a})-\operatorname{mean}_{b \in B} \cos (\vec{w}, \vec{b})
$$

In code, this is just:

In [20]:
# TODO: compute w's average similarity with row-vectors in A and subtract w's average similarity with row-vectors in B from it 

-0.00994616

The difference is negative, because $w$ is on average slightly more similar to terms in B than to terms in A.

But, of course, we want to compute these quantities for each vector in X and Y, respectively.

Our `cos_sim()` function is able to handle this case:

In [24]:
cos_sim(X, A).shape

(8, 8)

Here we have 

- eight rows, one for each term in **X**, and 
- eight colums, one for each term in **A**.

So to get one average similarity score per term in **X** we need to compute **_row averages_** (like `rowMeans` in R):


In [29]:
# note: this code might be new to you, so I've added it
cos_sim(X[:4,:], A).mean(axis=1) # <== summarize over columns (i.e. at row level)

array([0.00826428, 0.0025126 , 0.02507891, 0.01546477], dtype=float32)

So to get to the final _list of assiociaton scores_, we compute:

In [32]:
# TODO: compute X's row-vectors' average similarities' with row-vectors in A and 
#       subtract X's row-vectors' average similarity with row-vectors in B from it 

array([-0.0033414 , -0.00068688,  0.00613773, -0.00046975, -0.00211891,
       -0.00664737,  0.00016545,  0.00185493], dtype=float32)

Let's define a custom function that does just that for input matrices W, A, and B:

In [33]:
# from https://github.com/chadaeun/weat_replication/blob/0753713a47333827ef9f653d85e08740834ef698/lib/weat.py#L21C3-L21C3
def weat_association(W, A, B):
    """
    Returns association of the word w in W with the attribute for WEAT score.
    s(w, A, B)
    :param W: target words' vector representations
    :param A: attribute words' vector representations
    :param B: attribute words' vector representations
    :return: (len(W), ) shaped numpy ndarray. each rows represent association of the word w in W
    """
    # TODO: add the code from the previous cell here
    return # TODO: return the final result

### Computing the differential association score

Finally, we want to get from

$$
s(w, A, B) =\operatorname{mean}_{a \in A} \cos (\vec{w}, \vec{a})-\operatorname{mean}_{b \in B} \cos (\vec{w}, \vec{b})
$$

to the **differential association** score:

$$
s(X, Y, A, B)=\sum_{x \in X} s(x, A, B)-\sum_{y \in Y} s(y, A, B)
$$

To this end, we need to sum the the outputs of `weat_association` for both X and Y and subtract them:

In [34]:
# TODO: ensure that this call returns the value printend below the cell
sum(weat_association(X, A, B))-sum(weat_association(Y, A, B))

0.04221046296879649

Let's wrap this last line of code in a function:

In [35]:
# from https://github.com/chadaeun/weat_replication/blob/0753713a47333827ef9f653d85e08740834ef698/lib/weat.py#L33C1-L43C81
def weat_differential_association(X, Y, A, B):
    """
    Returns differential association of two sets of target words with the attribute for WEAT score.
    s(X, Y, A, B)
    :param X: target words' vector representations
    :param Y: target words' vector representations
    :param A: attribute words' vector representations
    :param B: attribute words' vector representations
    :return: differential association (float value)
    """
    return np.sum(weat_association(X, A, B)) - np.sum(weat_association(Y, A, B))

Recall what our A, B, X, and Y terms are:

In [36]:
print('A:', data_dict[A_key])
print('B:', data_dict[B_key])
print('X:', data_dict[X_key])
print('Y:', data_dict[Y_key])

A: ['brother', 'father', 'uncle', 'grandfather', 'son', 'he', 'his', 'him']
B: ['sister', 'mother', 'aunt', 'grandmother', 'daughter', 'she', 'hers', 'her']
X: ['science', 'technology', 'physics', 'chemistry', 'einstein', 'nasa', 'experiment', 'astronomy']
Y: ['poetry', 'art', 'shakespeare', 'dance', 'literature', 'novel', 'symphony', 'drama']


In [37]:
weat_differential_association(X, Y, A, B)

0.042210463

**Interpretation:** 
The fact that the differential association score is positive indicates that, taken together, the science target words are on average more associated with male than female terms.

### Computing the effect size

But to get at the **effect size** for the WEAT (the "normalized measure of how separated the distributions of associations between the target and attribute"), we need to compute

$$
\frac{\operatorname{mean}_{x \in X} s(x, A, B)-\operatorname{mean}_{y \in Y} s(y, A, B)}{\operatorname{std} \_\operatorname{dev}_{w \in X \cup Y} s(w, A, B)}
$$

In [39]:
x_association = # TODO: compute the weat association score with (X, A, B)
y_association = # TODO: compute the weat association score with (Y, A, B)
tmp1 = x_association.mean() - y_association.mean()
tmp2 = np.std(np.concatenate((x_association, y_association), axis=0)) # <== the "union" of X and Y is just the concatenation of the two

effect_size = tmp1/tmp2
effect_size


1.1951625

Let's wrap this in a function as well:

In [40]:
def weat_score(X, Y, A, B):
    """
    Returns WEAT score
    X, Y, A, B must be (len(words), dim) shaped numpy ndarray
    CAUTION: this function assumes that there's no intersection word between X and Y
    :param X: target words' vector representations
    :param Y: target words' vector representations
    :param A: attribute words' vector representations
    :param B: attribute words' vector representations
    :return: WEAT score
    """

    x_association = weat_association(X, A, B)
    y_association = weat_association(Y, A, B)

    tmp1 = np.mean(x_association, axis=-1) - np.mean(y_association, axis=-1)
    tmp2 = np.std(np.concatenate((x_association, y_association), axis=0))

    return tmp1 / tmp2