### Advance Topic : Debiasing WordVectors
###### Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Link : https://arxiv.org/pdf/1607.06520.pdf

Download the Glove Embeddings: https://www.kaggle.com/watts2/glove6b50dtxt  
Download Our Trained Word2Vec SkipGram Model (from Google Drive) : https://drive.google.com/file/d/1VFW_F8YbwI0EsXfLkaTX2yuqfXVIxOIE/view?usp=sharing


In [11]:
import numpy as np
import pickle

In [13]:
def read_word_embedding(file,embedding_type):
    
    if embedding_type=='pretrained_glove':
        with open(file, 'r') as f:
            words = set()
            word_to_vec_map = {}
        
            for line in f:
                line = line.strip().split()
                curr_word = line[0]
                words.add(curr_word)
                word_to_vec_map[curr_word] = np.array(line[1:], dtype=np.float64)
                
        return words, word_to_vec_map
        
                
    elif embedding_type == 'our_trained_model':
        
        with open(file, 'rb') as w2v_dict:
            dictio = pickle.load(w2v_dict)
        return _, dictio

### Load Word2Vector Model

In [144]:
pretrained_model_path = 'data/glove.6B.50d.txt'
pretrained_model_type = 'pretrained_glove'

words, word_to_vec_dict = read_word_embedding(pretrained_model_path,'pretrained_glove')

### Cosine Similarity

In [24]:
# Function to Compute Cosine_Similarity

def cosine_similarity(u, v):
    """
    Cosine similarity reflects the degree of similariy between u and v
        
    Input:
        u -- word vector      
        v -- word vector

    Output:
        cosine_similarity -- the cosine similarity between u and v
    """
    
    # Compute the dot product
    dot_product = np.dot(u, v)
    
    # Compute the L2 norm of u
    norm_u = np.sqrt(np.sum(u * u))
    
    # Compute the L2 norm of v
    norm_v = np.sqrt(np.sum(v * v))
    
    # Compute the cosine similarity
    cosine_similarity = dot_product / (norm_u * norm_v)
    
    return cosine_similarity

#### Analysis of Word Analogies

In [25]:
woman = word_to_vec_dict["woman"]
lady = word_to_vec_dict["lady"]
grandfather = word_to_vec_dict["grandfather"]
grandmother = word_to_vec_dict["grandmother"]
boy = word_to_vec_dict['boy']
girl = word_to_vec_dict['girl']
france = word_to_vec_dict["france"]
india = word_to_vec_dict["india"]
paris = word_to_vec_dict["paris"]
delhi = word_to_vec_dict["delhi"]
father = word_to_vec_dict["father"]
mother = word_to_vec_dict["mother"]
man = word_to_vec_dict['man']

print("cosine_similarity(woman, lady) = ", cosine_similarity(woman, lady))
print("cosine_similarity(grandfather - grandmother, boy - girl) = ", cosine_similarity(grandfather - grandmother, boy - girl))
print("cosine_similarity(france - paris, india - delhi) = ",cosine_similarity(france - paris, india - delhi))

cosine_similarity(woman, lady) =  0.6054720738838377
cosine_similarity(grandfather - grandmother, boy - girl) =  0.5053502916089266
cosine_similarity(france - paris, india - delhi) =  0.6958505344885514


This was to check if our cosine similarity subroutine is working correctly. Now we move to the main task!

### Word Analogy And Understanding Gender Bias

#### Finding the Gender Bias Direction

In order to compute Gender Bias direction $g$, we first compute $g_1=e_{woman} - e_{man}$, $g_2=e_{mother - father}$ and $g_3 = e_{girl} - e_{boy}$ and then take the average of them to get gender bias direction $g$. The paper referred uses more complicated method in involving Single Value Decomposition to find gender bias direction however our method is good for naive implementation.

In [26]:
g1 = woman - man
g2 = mother - father
g3 = girl - boy
g = np.mean([g1, g2, g3], axis=0)

#### Gendered Words Analysis

Here we find the cosine similarity between the gender bias direction and the gendered words. Our results show that the cosine similarity between gendered words refering to women have high cosine similarity with our gender bias direction $g$ and cosine similarity between gendered words refering to men have negative cosine similarity with our gender bias direction $g$. This observation is expected as our gender bias vector is roughly in the direction of woman -> man.

In [27]:
print ('Similarity between gender bias vector and Gendered Words\n')

# List of Gendered Words
name_list = ['she','he','grandmother','grandfather','jules','julia','paul', 'paula', 'female','male','sir','madame']

for w in name_list:
    print (w, cosine_similarity(word_to_vec_dict[w], g))

Similarity between gender bias vector and Gendered Words

she 0.26306111282837885
he -0.14241941924774518
grandmother 0.381844691038982
grandfather -0.07945446239749386
jules -0.22475197624425813
julia 0.2812390673350289
paul -0.26727266762126545
paula 0.24115895118999886
female 0.3176843745802223
male 0.31987771049104236
sir -0.3350582031171057
madame 0.1884288629212595


### Gender-Neutral Word Analysis

Here we calculate the cosine similarity between our gender bias direction $g$ and some gender neutral words. We observe that many words like nurse, receptionist have high similarity with gender bias vector whereas words like computer, politician, army have negative cosine similarity with gender bias direction $g$. Again this observation is expected as our gender bias vector is roughly in the direction of woman -> man.

In [28]:
print('\n Similarity between Gender Bias Direction and Gender Neutral words:\n')
word_list = ['president','scientist','babysitter','director', 'nurse', 'science', 'arts', 'literature', 'warrior','doctor', 'pilot', 'receptionist', 
             'technology',  'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer','army','politician','professor']
for w in word_list:
    print (w, cosine_similarity(word_to_vec_dict[w], g))


 Similarity between Gender Bias Direction and Gender Neutral words:

president -0.21287766990605667
scientist -0.14006801112016562
babysitter 0.2719660457769091
director -0.17198274531634614
nurse 0.361028047426011
science -0.058374259643848236
arts 0.011760468125783751
literature 0.02316946789375171
warrior -0.16564638100307946
doctor 0.0772141272665668
pilot -0.03699357317847417
receptionist 0.30167259871100655
technology -0.16192108462558177
fashion 0.1416547219136271
teacher 0.10545901736578715
engineer -0.22639944157426758
pilot -0.03699357317847417
computer -0.1682103192173514
singer 0.20093000793226243
army -0.24829029802501548
politician -0.12186722698006967
professor -0.08539369640933361


#### Comments

These results illustrate that the word embeddings have been induced with gender biases mainly due to the corpus they were trained on. The paper (refered in the begining) categorizes gender biases into two forms:
* Direct Biases (refer to pdf attached for details)
* Indirect Biases (refer to pdf attached for details)

### Debiasing Word Vectors

The paper describes that gender biased in word embeddings can be mitigated in two steps
* Neutralization of Gender Neutral Words
* Equalization of Gender Word Pairs

### 3.1 Neutralization of Gender Neutral Words

Given that we have an embedding $e$ for a gender neutral word like receptionist. The neutralization step removes the gender bias of gender neutral word by projecting it on the space, which is orthogonal to the gender bias axis.

$$e^{bias\_component} = \frac{e \cdot g}{||g||_2^2} * g$$
$$e^{debiased} = e - e^{bias\_component}$$

Here $e^{bias\_component}$ as the projection of $e$ onto the direction $g$ and then we subtract this term from $e$ to get $e^{debiased}$. This is equivalent to orthogonal projection with respect to $g$.

In [29]:
def neutralize_embedding(word, g, word_to_vec_embedding):
    """
    This function reduces the gender bias of gender-neutral word by projecting it
    on the space orthogonal to the bias axis.
    
    Input:
        word - input word we want to debias
        g - gender bias direction
        word_to_vec_dict - dictionary mapping words to their corresponding vectors
    
    Output:
        e_debiased - neutralized word vector representation of the input word embedding
    """
    
    # Get word embedding
    e = word_to_vec_embedding[word]
    
    # Compute e_biased component
    e_biascomponent = np.divide(np.dot(e,g),np.linalg.norm(g)**2) * g
 
    # Neutralize e by substracting e_biascomponent from it. e_debiased is equal to its orthogonal projection
    e_debiased = e - e_biascomponent
    
    return e_debiased

In [31]:
gender_neutral_words = ["softball","receptionist", 'babysitter', 'homemaker','nurse','professor','scientist','football']

for w in gender_neutral_words:
    print("Cosine similarity between " + w + " and gender_bias_direction, before neutralizing: ", cosine_similarity(word_to_vec_dict[w], g))
    e_debiased = neutralize_embedding(w, g, word_to_vec_dict)
    print("Cosine similarity between " + w + " and gender_bias_direction, after neutralizing: ", cosine_similarity(e_debiased, g))
    print("\n")

Cosine similarity between softball and gender_bias_direction, before neutralizing:  0.022254413015605105
Cosine similarity between softball and gender_bias_direction, after neutralizing:  -5.19267285069332e-18


Cosine similarity between receptionist and gender_bias_direction, before neutralizing:  0.30167259871100655
Cosine similarity between receptionist and gender_bias_direction, after neutralizing:  1.4261260487443872e-17


Cosine similarity between babysitter and gender_bias_direction, before neutralizing:  0.2719660457769091
Cosine similarity between babysitter and gender_bias_direction, after neutralizing:  0.0


Cosine similarity between homemaker and gender_bias_direction, before neutralizing:  0.36747726202795
Cosine similarity between homemaker and gender_bias_direction, after neutralizing:  1.750335531198033e-17


Cosine similarity between nurse and gender_bias_direction, before neutralizing:  0.361028047426011
Cosine similarity between nurse and gender_bias_direction, afte

##### Comments

* We observe that after neutralization step the cosine similarity between gender neutral words and gender bias direction has shrinked to almost zero.

### 3.2 Equalize Gender Word Pairs


By applying neutralizing to "computer" we can reduce the gender-stereotype related with it. But this does not guarantee that word pair ("he","she") are equidistant from "computer". Therefore apart from neutralizing gender neutral words, we also need to apply Equalization to word pairs like (grandfather,grandmother) and (actor,actress) to ensure that such word-pairs differ only in the gender property.


The main idea behind equalization is to make sure that such particular pair of words are equi-distant from $g_\perp$, where $g_\perp$ represent a vector perpendicular to gender bias direction $g$. This step also ensures that the equalized word pair eg. ("male","female") are now the same distance from debiased gender neutral word like $e_{computer}^{debiased}$, $e_{professor}^{debiased}$, $e_{scientist}^{debiased}$.

For details refer to the pdf attached.

In [36]:
def equalize(pair,g, word_to_vec_dict):
    """
    Debias gender specific words using Equalization Method
    
    Input:
    pair - Gender specific words to debias
    g - gender bias direction/axis
    word_to_vec_dict - dictionary mapping words to their corresponding vectors
    
    Output
    e1_equalized -- word vector corresponding to the first word
    e2_equalized -- word vector corresponding to the second word
    """
    
    #Get Word Vector Embeddings to be Neutralized
    w1, w2 = pair
    e_w1, e_w2 = word_to_vec_dict[w1], word_to_vec_dict[w2]
    
    #Compute the mean of two word vectors
    mu = (e_w1 + e_w2)/2.0

    # Compute the projections of mean (computed above) over the gender bias axis and the orthogonal axis
    mu_B = np.divide(np.dot(mu, g),np.linalg.norm(g)**2)*g
    mu_orth = mu - mu_B

    # Compute e_w1B and e_w2B
    e_w1B = np.divide(np.dot(e_w1, g),np.linalg.norm(g)**2)*g
    e_w2B = np.divide(np.dot(e_w2, g),np.linalg.norm(g)**2)*g
        
    # Adjusting the gender bias part of e_w1B and e_w2B
    corrected_e_w1B = np.sqrt(np.abs(1-np.sum(mu_orth**2)))*np.divide(e_w1B-mu_B, np.abs(e_w1-mu_orth-mu_B))
    corrected_e_w2B = np.sqrt(np.abs(1-np.sum(mu_orth**2)))*np.divide(e_w2B-mu_B, np.abs(e_w2-mu_orth-mu_B))

    # Debias by equalizing e1 and e2 to the sum of their corrected projections
    e1_equalized = corrected_e_w1B + mu_orth
    e2_equalized = corrected_e_w2B + mu_orth
                                                                
    return e1_equalized, e2_equalized

In [37]:
gendered_words = [('masculine','feminine'),('actor','actress'),('man','woman'),('grandfather','grandmother')]

for pair in gendered_words:
    
    w1,w2 = pair
        
    print("Cosine similarities before equalizing:")
    print( w1 ,",gender_bias_direction = ", cosine_similarity(word_to_vec_dict[w1], g))
    print( w2 ,",gender_bias_direction = ", cosine_similarity(word_to_vec_dict[w2], g))
    print('\n')
    
    w1_equalized, w2_equalized = equalize((w1, w2), g, word_to_vec_dict)
    
    print("Cosine similarities after equalizing:")
    print(w1,"_equalized, gender_bias_direction = ",cosine_similarity(w1_equalized, g))
    print(w2,"_equalized, gender_bias_direction = ",cosine_similarity(w2_equalized, g))
    print('---------------------------\n')

Cosine similarities before equalizing:
masculine ,gender_bias_direction =  0.20366255668907882
feminine ,gender_bias_direction =  0.3068077556241729


Cosine similarities after equalizing:
masculine _equalized, gender_bias_direction =  -0.5534152510908446
feminine _equalized, gender_bias_direction =  0.5505405761605079
---------------------------

Cosine similarities before equalizing:
actor ,gender_bias_direction =  -0.048377516575346544
actress ,gender_bias_direction =  0.4053361624508814


Cosine similarities after equalizing:
actor _equalized, gender_bias_direction =  -0.3644064148498128
actress _equalized, gender_bias_direction =  0.3688567757977562
---------------------------

Cosine similarities before equalizing:
man ,gender_bias_direction =  -0.02435875412347576
woman ,gender_bias_direction =  0.3979047171251496


Cosine similarities after equalizing:
man _equalized, gender_bias_direction =  -0.40760482687217325
woman _equalized, gender_bias_direction =  0.40550412629391397
--

##### Comments

* We observe that after Equalization step the cosine similarity between gender word pairs and gender bias direction is nearly the same. The negative sign is due to the opposite direction.

### Analysis of Gender Bias in Our Trained Model
Download Our Trained Word2Vec SkipGram Model (from Google Drive) : https://drive.google.com/file/d/1VFW_F8YbwI0EsXfLkaTX2yuqfXVIxOIE/view?usp=sharing

In [145]:
our_trained_model_path = 'data/model.pkl'
our_model_type = 'our_trained_model'

_ , word_to_vec_dict_custom = read_word_embedding(our_trained_model_path,'our_trained_model')

#### Word Analogy of Our Trained Model

In [143]:
woman = word_to_vec_dict_custom["woman"]
lady = word_to_vec_dict_custom["lady"]

he = word_to_vec_dict_custom['he']
she = word_to_vec_dict_custom['she']

father = word_to_vec_dict_custom["father"]
mother = word_to_vec_dict_custom["mother"]
man = word_to_vec_dict_custom['man']

king = word_to_vec_dict_custom['king']
queen = word_to_vec_dict_custom['queen']

france = word_to_vec_dict_custom["france"]
india = word_to_vec_dict_custom["india"]
paris = word_to_vec_dict_custom["paris"]
delhi = word_to_vec_dict_custom["delhi"]

print("cosine_similarity(king-man,queen-woman) = ", cosine_similarity(king-man, queen-woman))
print("cosine_similarity(he - she, father - mother) = ",cosine_similarity(he - she, father - mother))
print("cosine_similarity(france - paris, india - delhi) = ",cosine_similarity(france - paris, india - delhi))

cosine_similarity(king-man,queen-woman) =  0.9894141595481635
cosine_similarity(he - she, father - mother) =  0.7840329622083684
cosine_similarity(france - paris, india - delhi) =  -0.6901133171836422


##### Comments

We observe that $king - man$ has high cosine similarity to $queen - woman$. This is a good indication for our model but we also observe that $france-paris$ has very less similarity to $india-delhi$, which highlights that the current checkout of our model is not able characetristics of such word vectors. We believe using a bigger training dataset will help to mitigate the problem.

#### Finding Gender Bias Direction

Here we are only using vectors which are present in our vocabulary to avoid keyvalue error. But we illustrate the same ideas for debiasing we used for pretrained model.

In [135]:
g1 = woman - man
g2 = mother - father

# Gender Bias Direction
g_custom = np.mean([g1, g2], axis=0)

##### Comment

In order to calculate the gender bias direction, we calculate $g_1 = e_{woman} - e_{man}$ and $g_2 = e_{mother} - e_{father}$. We then calculate the average of $g_1$ and $g_2$ to get $g_{custom}$

##### Gendered Word Analysis

In [139]:
print ('Similarity between gender bias vector and Gendered Words\n')

# List of Gendered Words
name_list = ['queen','he', 'female','son']

for w in name_list:
    print (w, cosine_similarity(word_to_vec_dict_custom[w], g_custom))

Similarity between gender bias vector and Gendered Words

queen 0.9539632301982861
he -0.9347621670824781
female 0.5161396730720444
son -0.9629745472263926


##### Comment  
Our results show that the cosine similarity between gendered words refering to women (like women and female) have high cosine similarity with our gender bias direction $g$ and cosine similarity between gendered words refering to men (like he and king) have negative cosine similarity with our gender bias direction $g$. This observation is expected as our gender bias vector is roughly in the direction of woman -> man.

#### Gender Neutral Word Analysis

In [140]:
print('\n Similarity between Gender Bias Direction and Gender Neutral words:\n')
word_list = ['president','director', 'science', 'arts', 'pilot', 'computer', 'singer','army','professor']
for w in word_list:
    print (w, cosine_similarity(word_to_vec_dict_custom[w], g_custom))


 Similarity between Gender Bias Direction and Gender Neutral words:

president -0.9469821780988354
director 0.9652766500364007
science 0.4161949662159972
arts -0.9302221216207309
pilot 0.9697341256697147
computer -0.9647791208778314
singer -0.9308793410597123
army 0.9697341256697147
professor -0.22536222173972303


##### Comment

Here we calculate the cosine similarity between our gender bias direction $g_{custom}$ and some gender neutral words. We observe that many words like president, computer have high dissimilarity with gender bias vector. We also observe that many of results are not accurate like computer has high similarity with gender bias direction which is unlike the observation we had with Glove. Again this observation is expected as our gender bias vector is roughly in the direction of woman -> man.

 ### Neutralization of Gender Neutral Words

In [141]:
gender_neutral_words = ['director','singer','captain','computer','president']

for w in gender_neutral_words:
    print("Cosine similarity between " + w + " and gender_bias_direction, before neutralizing: ", cosine_similarity(word_to_vec_dict_custom[w], g_custom))
    e_debiased = neutralize_embedding(w, g_custom, word_to_vec_dict_custom)
    print("Cosine similarity between " + w + " and gender_bias_direction, after neutralizing: ", cosine_similarity(e_debiased, g_custom))
    print("\n")

Cosine similarity between director and gender_bias_direction, before neutralizing:  0.9652766500364007
Cosine similarity between director and gender_bias_direction, after neutralizing:  -4.3002141702867013e-16


Cosine similarity between singer and gender_bias_direction, before neutralizing:  -0.9308793410597123
Cosine similarity between singer and gender_bias_direction, after neutralizing:  4.970600259101881e-16


Cosine similarity between captain and gender_bias_direction, before neutralizing:  0.9697341256697147
Cosine similarity between captain and gender_bias_direction, after neutralizing:  -1.2584502521366135e-15


Cosine similarity between computer and gender_bias_direction, before neutralizing:  -0.9647791208778314
Cosine similarity between computer and gender_bias_direction, after neutralizing:  6.5526270582148e-16


Cosine similarity between president and gender_bias_direction, before neutralizing:  -0.9469821780988354
Cosine similarity between president and gender_bias_direc

##### Comment
* We observe that after neutralization step the cosine similarity between gender neutral words and gender bias direction $g_{custom}$ has shrinked to almost zero.

### Equalize Gender Word Pairs

In [142]:
gendered_words = [('male','female'),('he','she')]

for pair in gendered_words:
    
    w1,w2 = pair
        
    print("Cosine similarities before equalizing:")
    print( w1 ,",gender_bias_direction = ", cosine_similarity(word_to_vec_dict_custom[w1], g_custom))
    print( w2 ,",gender_bias_direction = ", cosine_similarity(word_to_vec_dict_custom[w2], g_custom))
    print('\n')
    
    w1_equalized, w2_equalized = equalize((w1, w2), g_custom, word_to_vec_dict_custom)
    
    print("Cosine similarities after equalizing:")
    print(w1,"_equalized, gender_bias_direction = ",cosine_similarity(w1_equalized, g_custom))
    print(w2,"_equalized, gender_bias_direction = ",cosine_similarity(w2_equalized, g_custom))
    print('---------------------------\n')

Cosine similarities before equalizing:
male ,gender_bias_direction =  0.8011600149806818
female ,gender_bias_direction =  0.5161396730720444


Cosine similarities after equalizing:
male _equalized, gender_bias_direction =  0.42334598229937503
female _equalized, gender_bias_direction =  -0.42357656917384223
---------------------------

Cosine similarities before equalizing:
he ,gender_bias_direction =  -0.9347621670824781
she ,gender_bias_direction =  -0.9744627890989416


Cosine similarities after equalizing:
he _equalized, gender_bias_direction =  0.5245384902363578
she _equalized, gender_bias_direction =  -0.5244428241151656
---------------------------



##### Comment
* We observe that after Equalization step the cosine similarity between gender word pairs and gender bias direction is nearly the same. The negative sign is due to the opposite direction.