# Neutralising and Equalising Word Embeddings

This notebook shows how non-gender specific words can have the gender part neutralised to avoid bias in word embeddings. In addition to that, it also depicts the process of equalisation, where words that are gender-specific can be equalised towards words that are non-gender specific.

#### Import dependencies

In [1]:
import numpy as np

#### Load the GloVe dataset

The GloVe dataset is not part of the repository due to the size of the file. However, feel free to download it from here: https://nlp.stanford.edu/projects/glove/

In [2]:
def read_glove_vecs(glove_file):
    with open(glove_file, 'r') as f:
        words = set()
        word_to_vec_map = {}
        
        for line in f:
            line = line.strip().split()
            curr_word = line[0]
            words.add(curr_word)
            word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
            
    return words, word_to_vec_map

In [3]:
words, word_to_vec_map = read_glove_vecs('glove/glove.6B.50d.txt')

* words: set of words in the vocabulary.
* word_to_vec_map: dictionary mapping words to their GloVe vector representation.

In [4]:
def cosine_similarity(u, v):
    """
    Cosine similarity reflects the degree of similariy between u and v
        
    Arguments:
        u -- a word vector of shape (n,)          
        v -- a word vector of shape (n,)

    Returns:
        cosine_similarity -- the cosine similarity between u and v defined by the formula above.
    """
    
    dot = np.dot(u, v)
    norm_u = np.sqrt(np.sum(u**2))    
    norm_v = np.sqrt(np.sum(v**2))
    cosine_similarity = dot / (norm_u * norm_v)
    
    return cosine_similarity

In [5]:
father = word_to_vec_map["father"]
mother = word_to_vec_map["mother"]
ball = word_to_vec_map["ball"]
crocodile = word_to_vec_map["crocodile"]

france = word_to_vec_map["france"]
italy = word_to_vec_map["italy"]
paris = word_to_vec_map["paris"]
rome = word_to_vec_map["rome"]

print("cosine_similarity(father, mother) = ", cosine_similarity(father, mother))
print("cosine_similarity(ball, crocodile) = ", cosine_similarity(ball, crocodile))

# This should not show similarity as in the first vector we subtract the city representation
# and in the second we subtract the country representation.
print("cosine_similarity(france - paris, rome - italy) = ", cosine_similarity(france - paris, rome - italy))

# This one, on the other hand, should show similarity as we are checking for the similarity between 2 cities.
print("cosine_similarity(paris - france, rome - italy) = ", cosine_similarity(paris - france, rome - italy))

cosine_similarity(father, mother) =  0.8909038442893615
cosine_similarity(ball, crocodile) =  0.2743924626137942
cosine_similarity(france - paris, rome - italy) =  -0.6751479308174201
cosine_similarity(paris - france, rome - italy) =  0.6751479308174201


## Debiasing word vectors

We start by identifying the gender by subtracting the `man` vector representation from the `woman` vector representation.

In [6]:
gender = word_to_vec_map['woman'] - word_to_vec_map['man']

#### Similarity between gender and names

Negative similarities mean that the name is more related to the `female` gender.

In [7]:
name_list = ['john', 'marie', 'sophie',
             'ronaldo', 'priya', 'rahul',
             'danielle', 'reza', 'katy',
             'yasmin', 'sam', 'carolina',
             'logan']

for w in name_list:
    print (w, cosine_similarity(word_to_vec_map[w], gender))

john -0.23163356145973724
marie 0.315597935396073
sophie 0.31868789859418784
ronaldo -0.31244796850329437
priya 0.17632041839009402
rahul -0.16915471039231716
danielle 0.24393299216283895
reza -0.07930429672199553
katy 0.2831068659572615
yasmin 0.23313857767928758
sam -0.33642281213435427
carolina 0.0938795106708001
logan -0.16937077820548485


Unfortunately, non-gender specific words contain bias and hence need some extra treatment. Below a list of common words that look pretty biased.

In [8]:
word_list = ['lipstick', 'driver', 'science', 'arts',
             'literature', 'warrior','doctor', 'librarian',
             'receptionist', 'technology',  'fashion', 'teacher',
             'engineer', 'pilot', 'computer', 'singer',
             'model', 'mechanic', 'babysitter']
for w in word_list:
    print (w, cosine_similarity(word_to_vec_map[w], gender))

lipstick 0.2769191625638267
driver -0.010681433817247916
science -0.06082906540929701
arts 0.008189312385880337
literature 0.06472504433459932
warrior -0.20920164641125288
doctor 0.11895289410935041
librarian 0.23302221769690296
receptionist 0.33077941750593737
technology -0.13193732447554302
fashion 0.03563894625772699
teacher 0.17920923431825664
engineer -0.0803928049452407
pilot 0.0010764498991916937
computer -0.10330358873850498
singer 0.1850051813649629
model 0.0343357596036095
mechanic -0.0035264430229621927
babysitter 0.2797785047879521


Again, positive similarities relates to women whislt negative similaties don't. It's shocking to see that `computer`, `technology` and `engineer` do not relate to women.

### Neutralise bias for non-gender specific words

The formula below is used to compute the debiased version of a given vector representation.

\begin{align}
v^{bias\_component} = \frac{v \cdot gender}{\|gender\|_2^2}gender \\
\\
v^{debiased} = v - v^{bias\_component} \\
\end{align}

You can find the implementation of the `neutralise` formula below:

In [9]:
def neutralize(word, gender, word_to_vec_map):
    """
    Removes the bias of "word" by projecting it on the space orthogonal to the bias axis. 
    This function ensures that gender neutral words are zero in the gender subspace.
    
    Arguments:
        word -- string indicating the word to debias
        gender -- numpy-array of shape (50,), corresponding to the bias axis (such as gender)
        word_to_vec_map -- dictionary mapping words to their corresponding vectors.
    
    Returns:
        v_debiased -- neutralised word vector representation of the input "word"
    """
    
    v = word_to_vec_map[word]
    
    v_biascomponent = (np.dot(v, gender) / np.sqrt(np.sum(gender**2))**2) * gender
    v_debiased = v - v_biascomponent
    
    return v_debiased

In [10]:
w = "receptionist"
print("cosine similarity between " + w + " and gender, before neutralizing: ", cosine_similarity(word_to_vec_map["receptionist"], gender))

v_debiased = neutralize("receptionist", gender, word_to_vec_map)
print("cosine similarity between " + w + " and gender, after neutralizing: ", cosine_similarity(v_debiased, gender))

cosine similarity between receptionist and gender, before neutralizing:  0.33077941750593737
cosine similarity between receptionist and gender, after neutralizing:  -2.099120994400013e-17


The neutralised result is essentially 0, up to numerical roundof (on the order of  $10^{−17}$).