In [1]:
import os
import numpy as np
import pickle
from weat import WEAT, WEATvec, weat_analysis
from sets import sets

## Data

Load a single embedding and convert it to a dictionary which maps words to their respective vectors.

In [2]:
with open('embeddings/conservapedia/0.pkl', 'rb') as f:
    embedding = pickle.load(f)
word2vec = {word: np.array(vec) for word, vec in embedding}

<br>
Each vector has a length of 168.

In [3]:
word2vec['apple']

array([ -1.22601859e-01,  -3.01443249e-01,   6.05218530e-01,
        -9.78568941e-02,   6.26489744e-02,  -3.68599296e-01,
        -2.60574579e-01,  -4.04397190e-01,  -2.21888646e-01,
        -1.02070875e-01,   1.19224519e-01,   9.26395238e-04,
        -3.63263525e-02,  -3.57526153e-01,   3.62686425e-01,
         5.24161197e-02,   1.76762283e-01,   5.05260587e-01,
         6.82055950e-04,   1.54380165e-02,   4.16435689e-01,
        -3.76806319e-01,  -2.22768441e-01,   5.43065250e-01,
        -7.97214955e-02,   1.20553926e-01,  -1.60782903e-01,
         1.15703523e+00,  -1.70429498e-01,   3.64926495e-02,
         1.88335180e-02,   2.98126542e-04,   1.13126278e-01,
        -3.11881006e-01,  -2.63291806e-01,  -3.21932822e-01,
        -3.76520127e-01,  -5.22700846e-01,   6.57714456e-02,
         1.65825546e-01,  -2.95625627e-01,   2.75215626e-01,
        -3.32018316e-01,  -3.83091182e-01,  -3.40342999e-01,
        -2.75427043e-01,  -2.65808016e-01,   3.74817774e-02,
         5.34076989e-03,

## Sets

The file `sets.py` contains some predefined sets of words. As a first example use the attribute sets `male` and `female` together with the target sets `career` and `family`.

In [4]:
print(sets['male'][:5])
print(sets['female'][:5])
print(sets['career'][:5])
print(sets['family'][:5])

['male', 'man', 'boy', 'brother', 'he']
['female', 'woman', 'girl', 'sister', 'she']
['executive', 'management', 'professional', 'corporation', 'salary']
['home', 'parents', 'children', 'family', 'cousins']


## WEAT

The `WEAT`-object implements the Word Embedding Association Test [1]. It investigates the following null hypothesis:

>The relative association of the target word sets $X$ (e.g. career) and $Y$ (e.g. family)  
>to the attribute word sets $A$ (e.g. male) and $B$ (e.g. female) is the same.

A test statistic is computed by comparing the angles between different words in those sets. For details see the referenced paper. In order to get a $p$-value, a permutation test is applied: The words of $X \cup Y$ are randomly split into new sets $X'$ and $Y'$. Then the test statistic is computed for the new sets as well. By comparing the result of the original sets to different shuffled ones, it can be measured how “extreme” the real result is. The parameter `steps` controls the number of shufflings.

In [5]:
weat = WEAT(word2vec, sets['career'], sets['family'], sets['male'], sets['female'], steps=1000)

The resulting $p$-value and an effect size based on Cohen’s $d$ can be received using `get_stats`.

In [6]:
weat.get_stats()

(0.001, 2.4064634)

## WEATvec

Instead of comparing the angles between different words, we proposed WEATvec which compares the projections of vectors onto each other. See our paper for details.

In [7]:
weatvec = WEATvec(word2vec, sets['career'], sets['family'], sets['male'], sets['female'], steps=1000)

In [8]:
weatvec.get_stats()

(0.001, 2.2007706)

## Full analysis
For directly running multiple tests on different wikis with multiple embeddings, the function `weat_analysis` can be utilized. To use it, firstly need to load the other embeddings.

In [9]:
combined_word2vecs = {}
for wiki in os.listdir('embeddings'):
    combined_word2vecs[wiki] = []
    for pkl in os.listdir(f'embeddings/{wiki}'):
        with open(f'embeddings/{wiki}/{pkl}', 'rb') as f:
            embedding = pickle.load(f)        
        word2vec = {word: np.array(vec) for word, vec in embedding}
        combined_word2vecs[wiki].append(word2vec)

Make sure that `sets` only contains words which exist in all embeddings.

In [10]:
vocabs = [set(combined_word2vecs[wiki][0].keys()) for wiki in combined_word2vecs]
vocab = set(vocabs[0])
for v in vocabs[1:]:
    vocab &= set(v)
for s in sets:
    in_vocab = []
    for word in sets[s]:
        if word in vocab:
            in_vocab.append(word)
    sets[s] = in_vocab

Now define a list of quadruples `[(A, B, X, Y)]` containing the sets which should be analyzed.

In [11]:
quadruples = [
    ('career', 'family', 'male', 'female'),
    ('science', 'art', 'male', 'female'),
    ('pleasant', 'unpleasant', 'white_names', 'black_names'),
]

The function `weat_analysis` can now be used to run WEAT and WEATvec. It returns a pandas dataframe. In our paper we set `steps` to _1000000_.
Note: a Bonferroni correction is not applied automatically.

In [12]:
res = weat_analysis(combined_word2vecs, quadruples, sets, steps=1000)

This might take some time
Finished 25 of 72
Finished 50 of 72
Done!


The $p$-values are combined using Fisher’s method, for the effect size $d$ the mean is returned. In the result the columns `fishers_p` and `mean_d` correspond to WEAT, `fishers_p2` and `mean_d2` to WEATvec.

In [13]:
res[['names', 'wiki', 'fishers_p', 'mean_d', 'fishers_p2', 'mean_d2']]

Unnamed: 0,names,wiki,fishers_p,mean_d,fishers_p2,mean_d2
0,"(career, family, male, female)",conservapedia,6.539677e-16,2.419994,3.565607e-16,2.283538
1,"(career, family, male, female)",rationalwiki,1.040249e-14,1.734437,1.525946e-13,1.709633
2,"(career, family, male, female)",wikipedia,1.198144e-15,2.43225,3.565607e-16,2.320894
3,"(science, art, male, female)",conservapedia,1.477972e-14,1.374478,6.919962e-15,1.405803
4,"(science, art, male, female)",rationalwiki,5.651508e-11,0.823628,1.532005e-11,0.888437
5,"(science, art, male, female)",wikipedia,2.192705e-15,1.634619,1.198144e-15,1.545134
6,"(pleasant, unpleasant, white_names, black_names)",conservapedia,2.971393e-10,0.666569,1.498999e-09,0.587838
7,"(pleasant, unpleasant, white_names, black_names)",rationalwiki,0.1382411,0.149103,0.1365359,0.137169
8,"(pleasant, unpleasant, white_names, black_names)",wikipedia,1.090513e-05,0.357216,6.633838e-05,0.268057



---

[1] Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. "Semantics derived automatically from language corpora contain human-like biases." Science 356.6334 (2017): 183-186.