# Fairness in AI: Removing word embeddings

#### Kylian van Geijtenbeek, Thom Visser, Martine Toering, Iulia Ionescu

$\textbf{Abstract:}$ In this paper we reproduce the word embedding debiasing algorithm from Bolukbasi et al. [2]. We adapt the available implementation and extend it with their soft debiasing method. We integrate several popular benchmarks and investigate the effectiveness of the algorithm on GloVe and fastText embeddings besides the Word2vec embeddings used by Bolukbasi et al. [2]. We show that the removal of direct bias from all the different embeddings barely affects their effectiveness through a comparison of benchmark scores. However, we fail to reproduce the large scale soft debiasing results due to a lack of detail on the original implementation.

In [None]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np
import copy

import embetter
import embetter.we as we
from embetter.we import WordEmbedding
from embetter.data import *

from embetter.debias import *
from embetter.benchmarks import Benchmark

from compare_bias import *

# 1 - Gender Bias in word2vec, Glove and FastText

### Load data

In this notebook, we will use one of the three different word embeddings: $\textbf{word2vec}$ (Mikolov et al. 2013). $\textbf{glove}$ (Pennington et al. 2014) and $\textbf{fastText}$ (Bojanowski et al. 2016) are also available.

The word2vec embedding we use is learned from a corpus of Google News articles (https://code.google.com/archive/p/word2vec/). The embeddings are 300-dimensional for 3 million words. For glove we make use of the 300-dimensional vectors trained on Common Crawl (https://nlp.stanford.edu/projects/glove/). Lastly, FastText is a word embedding from Facebook AI Research lab trained on Wikipedia corpus and Common Crawl and also consists of 300-dimensional vectors (https://fasttext.cc/docs/en/english-vectors.html).

We start by loading in the data.

In [3]:
# Load google news word2vec
E = WordEmbedding("word2vec_small")
# Load soft debiased word2vec (for later)
E_soft = WordEmbedding("word2vec_small_soft_debiased")

Downloading word2vec_small to /Users/iulia/Documents/M_AI/FACT/FactAI/embetter/embeddings/word2vec_small.txt


100%|██████████| 2540/2540 [00:01<00:00, 1773.18it/s]


Embedding shape: (26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
Downloading word2vec_small_hard_debiased to /Users/iulia/Documents/M_AI/FACT/FactAI/embetter/embeddings/word2vec_small_hard_debiased.txt


100%|██████████| 2953/2953 [00:04<00:00, 716.43it/s]


Embedding shape: (26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
Downloading word2vec_small_soft_debiased to /Users/iulia/Documents/M_AI/FACT/FactAI/embetter/embeddings/word2vec_small_soft_debiased.txt


100%|██████████| 2952/2952 [00:00<00:00, 6723.12it/s]


Embedding shape: (26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine


All the other embeddings can be loaded in a similar fashion, using the embedding names in the table on the GitHub Repository.

## Word2vec

In [None]:
# Load professions and gender related lists from Bolukbasi et al. for word2vec

gender_seed, defs, equalize_pairs, profession_words = load_data(E.words)

### Define gender direction

We define the gender direction by either PCA or by the words "she" and "he" for word2vec. \
The PCA method is generally more robust by incorporating all definitional word pairs.

In [None]:
# Define gender direction with the words "she" and "he" 
# v_gender = E.diff('she', 'he')

# Define gender direction with PCA
v_gender = we.doPCA(defs, E).components_[0]

### Generating analogies


Below, we show some of the gender analogies that we can create from the embeddings. \
This method is based on "she is to X as he is to Y" analogies, with X looping through all the words and then finding the appropriate Y. \
"she" and "he" are either the embeddings of these words, or the extremes of the first principal component, depending on the method used above.

In [None]:
# Analogies gender
a_gender = E.best_analogies_dist_thresh(v_gender, thresh=1)
we.viz(a_gender)

These analogies offer an insight in potential biases along the specified bias axis (in this case gender). This is useful for a qualitative analysis.

### Analyzing occupational gender bias 


The projection of occupations on the bias axis serves as another useful source for qualitative analysis.

In [None]:
# Analysis of extreme male and extreme female professions
sp = E.profession_stereotypes(profession_words, v_gender)

# 2 - Comparing Bias of word2vec and FastText

We will compare the gender bias between word embeddings FastText and Glove. We do this by following Bolukbasi et al. approach on figure 4 in their paper. The profession words are projected onto the gender axis for two embeddings. Each datapoint represents a profession word.

Below we compare the bias of Word2vec and fastText.

In [None]:
E_f = WordEmbedding("fasttext_small")
compare_occupational_bias(E, E_f, ["Word2vec", "FastText"])

# 3 - Debiasing algorithms on word2vec

## Hard debiasing

In hard debiasing, the gender neutral words are shifted to zero in the gender subspace (i.e. neutralized) by subtracting the projection of the neutral word embedding vector onto the gender subspace and renormalizing the resulting embedding to unit length. 

## Soft debiasing

We adapted specifics from Manzini et al., Soft debiasing is done by solving the following optimization problem as mentioned in their paper:

\begin{equation}
    \underset{T}{\min} || (TW)^T(TW) - W^TW||^2_F + \lambda ||(TN)^T (TB)||^2_F
\end{equation}

where W is the matrix of all embedding vectors, N is the matrix of the embedding vectors of the gender neutral words, B is the gender subspace, and T is the debiasing transformation that minimizes the projection of the neutral words onto the gender subspace but tries to maintain the pairwise inner products between the words.

This code is largely based on code from https://github.com/TManzini/DebiasMulticlassWordEmbedding.

### Hard debiasing Word2vec
Firstly, we show how to manually debias embeddings of choice. \
This overwrites the embeddings in the WordEmbedding object, so load the biased embeddings again in another object for comparison.

In [None]:
# Pass the WordEmbedding object which contains the embeddings that should be debiased.
hard_debias(E, gender_seed, defs, equalize_pairs)

Secondly, we show the effect of hard debiasing on Word2vec.

In [None]:
# Hard debiased Word2vec
# Analysis of extreme male and extreme female professions
sp_hard_debiased = E.profession_stereotypes(profession_words, v_gender)

In [None]:
# Analogies gender
a_gender_hard_debiased = E.best_analogies_dist_thresh(v_gender)
we.viz(a_gender_hard_debiased)

# 4 - Benchmarks

This package includes some basic benchmarks, which allow for easy verification of the embedding's quality before and after debiassing (RG-65, WS-353 and MSR), as well as a statistical measure of bias to quantitatively inspect the effect of debiassing (WEAT).

In [None]:
benchmark = Benchmark()
E_before = WordEmbedding("word2vec_small")

### Word2vec

Below, we show the benchmarks for Word2vec. \
When comparing the WEAT effect size, take care to use a single Benchmark object per embedding, benchmarking the biased version first. (The first bias axis is saved internally to measure bias along for subsequent embeddings.)

In [None]:
# Evaluate for word2vec
before_results = benchmark.evaluate(E_before, "Before")
hard_results = benchmark.evaluate(E, "Hard")
soft_results = benchmark.evaluate(E_soft, "Soft")


The results of individual benchmarks can be combined into a single list and passed to the `pprint_compare` method for easy comparison.

In [None]:
w2v_results = [before_results, hard_results, soft_results]
benchmark.pprint_compare(w2v_results, ["Before", "Hard-debiased", "Soft-debiased"], "word2vec")

# 5 - Full Experiments

The full range of experiments can be executed using the `experiments.py` file from the repository. \
This is best done through a terminal, to modify the behaviour using command line arguments, but its also available here.

In [None]:
!python experiments.py