# Fairness in AI: Removing word embeddings

#### Kylian van Geijtenbeek, Thom Visser, Martine Toering, Iulia Ionescu

$\textbf{Abstract:}$ In this paper we reproduce the word embedding debiasing algorithm from Bolukbasi et al. [2]. We adapt the available implementation and extend it with their soft debiasing method. We integrate several popular benchmarks and investigate the effectiveness of the algorithm on GloVe and fastText embeddings besides the Word2vec embeddings used by Bolukbasi et al. [2]. We show that the removal of direct bias from all the different embeddings barely affects their effectiveness through a comparison of benchmark scores. However, we fail to reproduce the large scale soft debiasing results due to a lack of detail on the original implementation.

In [2]:
from __future__ import print_function, division
%matplotlib inline
from matplotlib import pyplot as plt
import json
import random
import numpy as np
import copy

import embetter as dwe
import embetter.we as we
from embetter.we import WordEmbedding
from embetter.data import *

from embetter.debias import *
from embetter.benchmarks import Benchmark

from compare_bias import *

## Notebook preferences

# 1 - Gender Bias in word2vec, Glove and FastText

### Load data

In this notebook, we will use three different word embeddings: $\textbf{word2vec}$ (Mikolov et al. 2013), $\textbf{glove}$ (Pennington et al. 2014) and $\textbf{fastText}$ (Bojanowski et al. 2016).

The word2vec embedding we use is learned from a corpus of Google News articles (https://code.google.com/archive/p/word2vec/). The embeddings are 300-dimensional for 3 million words. For glove we make use of the 300-dimensional vectors trained on Common Crawl (https://nlp.stanford.edu/projects/glove/). Lastly, FastText is a word embedding from Facebook AI Research lab trained on Wikipedia corpus and Common Crawl and also consists of 300-dimensional vectors (https://fasttext.cc/docs/en/english-vectors.html).

We start by loading in the data.

In [3]:
# Load google news word2vec
E = WordEmbedding("word2vec_small")
# Load debiased word2vec
E_hard = WordEmbedding("word2vec_small_hard_debiased")
E_soft = WordEmbedding("word2vec_small_soft_debiased")

Downloading word2vec_small to /Users/iulia/Documents/M_AI/FACT/FactAI/embetter/embeddings/word2vec_small.txt


100%|██████████| 2540/2540 [00:01<00:00, 1773.18it/s]


Embedding shape: (26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
Downloading word2vec_small_hard_debiased to /Users/iulia/Documents/M_AI/FACT/FactAI/embetter/embeddings/word2vec_small_hard_debiased.txt


100%|██████████| 2953/2953 [00:04<00:00, 716.43it/s]


Embedding shape: (26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine
Downloading word2vec_small_soft_debiased to /Users/iulia/Documents/M_AI/FACT/FactAI/embetter/embeddings/word2vec_small_soft_debiased.txt


100%|██████████| 2952/2952 [00:00<00:00, 6723.12it/s]


Embedding shape: (26423, 300)
26423 words of dimension 300 : in, for, that, is, ..., Jay, Leroy, Brad, Jermaine


In [None]:
# Load Glove
E_g = WordEmbedding("glove_small")
E_g_hard = WordEmbedding("glove_small_hard_debiased")
E_g_soft = WordEmbedding("glove_small_soft_debiased")


# Load FastText 
E_f = WordEmbedding("fasttext_small")
E_f_hard = WordEmbedding("fasttext_small_hard_debiased")
E_f_soft = WordEmbedding("fasttext_small_soft_debiased")

## Word2vec

In [None]:
# Load professions and gender related lists from Bolukbasi et al. for word2vec

gender_seed, defs, equalize_pairs, profession_words = load_data(E.words)

### Define gender direction

We define the gender direction by either PCA or by the words "she" and "he" for word2vec.

In [None]:
# Define gender direction with the words "she" and "he" 
# v_gender = E.diff('she', 'he')

# Define gender direction with PCA
v_gender = we.doPCA(defs, E).components_[0]

### Generating analogies


Below, we show some of the gender analogies that we can create from the embeddings. TODO

In [None]:
# Analogies gender
a_gender = E.best_analogies_dist_thresh(v_gender, thresh=1)
we.viz(a_gender)

### Analyzing occupational gender bias 


In [None]:
# Analysis of extreme male and extreme female professions
sp = E.profession_stereotypes(profession_words, v_gender)

## fastText

In [None]:
# Load professions and gender related lists from Bolukbasi et al. for fastText
gender_seed, defs, equalize_pairs, profession_words = load_data(E_f.words)

### Define gender direction

We define the gender direction by either PCA or by the words "she" and "he" for fastText.

In [None]:
# Define gender direction with the words "she" and "he" 
# v_gender = E_f.diff('she', 'he')

# Define gender direction with PCA
v_gender = we.doPCA(defs, E_f).components_[0]

### Generating analogies


In [None]:
# Analogies gender
a_gender = E_f.best_analogies_dist_thresh(v_gender, thresh=1)
we.viz(a_gender)

### Analyzing occupational gender bias 


In [None]:
# Analysis of extreme male and extreme female professions
sp = E_f.profession_stereotypes(profession_words, v_gender)

# 2 - Comparing Bias of word2vec, Glove and FastText

We will compare the gender bias between word embeddings FastText and Glove. We do this by following Bolukbasi et al. approach on figure 4 in their paper. The profession words are projected onto the gender axis for FastText and Glove. Each datapoint represents a profession word.

Below we compare the bias of embeddings Word2vec and fastText.

In [None]:
compare_occupational_bias(E, E_f, ["Word2vec", "FastText"])

In [None]:
compare_occupational_bias(E, E_f, ["Glove", "FastText"])

# 3 - Debiasing algorithms on word2vec, Glove and FastText

## Hard debiasing

In hard debiasing, the gender neutral words are shifted to zero in the gender subspace (i.e. neutralized) by subtracting the projection of the neutral word embedding vector onto the gender subspace and renormalizing the resulting embedding to unit length. 

## Soft debiasing

We adapted specifics from Manzini et al., Soft debiasing is done by solving the following optimization problem as mentioned in their paper:

\begin{equation}
    \underset{T}{\min} || (TW)^T(TW) - W^TW||^2_F + \lambda ||(TN)^T (TB)||^2_F
\end{equation}

where W is the matrix of all embedding vectors, N is the matrix of the embedding vectors of the gender neutral words, B is the gender subspace, and T is the debiasing transformation that minimizes the projection of the neutral words onto the gender subspace but tries to maintain the pairwise inner products between the words.

This code is largely based on code from https://github.com/TManzini/DebiasMulticlassWordEmbedding.

### Hard debiasing Word2vec
First, we show the effect of hard debiasing on Word2vec.

In [None]:
# Hard debiased Word2vec
# Analysis of extreme male and extreme female professions
sp_hard_debiased = E_hard.profession_stereotypes(profession_words, v_gender)

In [None]:
# Analogies gender
a_gender_hard_debiased = E_hard.best_analogies_dist_thresh(v_gender)
we.viz(a_gender_hard_debiased)

# 4 - Benchmarks

TODO text

In [None]:
def run_benchmark(benchmark, E, E_hard, E_soft, embedding_name):
    result_original = benchmark.evaluate(E, "'Before', {}".format(embedding_name))
    result_hard_debiased = benchmark.evaluate(E_hard, "'Hard debiased', {}".format(embedding_name))
    result_soft_debiased = benchmark.evaluate(E_soft, "'Soft debiased', {}".format(embedding_name))
    results = [result_original, result_hard_debiased, result_soft_debiased]
    return results

### Word2vec

Below, we show the benchmarks for Word2vec.

In [None]:
# Evaluate for word2vec
w2v_benchmark = Benchmark()
w2v_results = run_benchmark(w2v_benchmark, E, E_hard, E_soft, "word2vec")
w2v_benchmark.pprint_compare(w2v_results, ["Before", "Hard-debiased", "Soft-debiased"], "word2vec")

### Glove and FastText

In [None]:
# Glove and FastText
g_benchmark = Benchmark()
g_results = run_benchmark(g_benchmark, E_g, E_g_hard, E_g_soft, "Glove")
g_benchmark.pprint_compare(g_results, ["Before", "Hard-debiased", "Soft-debiased"], "Glove")

f_benchmark = Benchmark()
f_results = run_benchmark(f_benchmark, E_f, E_f_hard, E_f_soft, "fastText")
f_benchmark.pprint_compare(f_results, ["Before", "Hard-debiased", "Soft-debiased"], "fastText")