## Word analogy task

In this notebook, I implement the word analogy task using a pretrained set of word to vector embeddings.

In the word analogy task, our goal is to complete the sentence *a* is to *b* as *c* is to **____**. For example, '*boy* is to *girl* as *king* is to *queen*'.

Technically, we are trying to find a word *d* for a set of words *a*, *b* and *c* such that the difference in the word embeddings of *a* and *b* is equal to that of *c* and *d*.

Let the vectors embeddings be $e_a, e_b, e_c, e_d$ and we will measure the similarity between  $e_b - e_a$ and $e_d - e_c$
using cosine similarity.


Cosine similarity between two vectors is defined as follows: 

$$\text{CosineSimilarity(x, y)} = \frac {x . y} {||x||_2 ||y||_2} = cos(\theta) $$

If $x$ and $y$ are very similar, their cosine similarity will be close to 1; if they are dissimilar, the cosine similarity will take a smaller value. 

Lets load the packages we will need below.

In [2]:
import numpy as np
from w2v_utils import *


Lets load the pre-trained word to vector embeddings by Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/)

In [3]:
words, word_to_vec_map = read_glove_vecs('data/glove.6B.50d.txt')

Create a function to compute cosine similarity as explained above.

In [4]:
def cosine_similarity(x, y):
    """
    Cosine similarity reflects the degree of similariy between x and y
        
    Arguments:
        x -- a word vector of shape (n,)          
        y -- a word vector of shape (n,)

    Returns:
        cosine_similarity -- the cosine similarity between x and y defined by the formula above.
    """
    
    distance = 0.0
    
    # Compute the dot product between x and y (≈1 line)
    dot = np.dot(x,y)
    # Compute the L2 norm of x (≈1 line)
    norm_x = np.linalg.norm(x)
    
    # Compute the L2 norm of y (≈1 line)
    norm_y = np.linalg.norm(y)
    # Compute the cosine similarity defined by formula (1) (≈1 line)
    cosine_similarity = dot / (norm_x * norm_y)
    
    return cosine_similarity

Let's now define the function to compute word analogies.

In [5]:
def complete_analogy(word_a, word_b, word_c, word_to_vec_map):
    """
    Performs the word analogy task as explained above: a is to b as c is to ____. 
    
    Arguments:
    word_a -- a word, string
    word_b -- a word, string
    word_c -- a word, string
    word_to_vec_map -- dictionary that maps words to their corresponding vectors. 
    
    Returns:
    best_word --  the word such that v_b - v_a is close to v_best_word - v_c, as measured by cosine similarity
    """
    
    # convert words to lower case
    word_a, word_b, word_c = word_a.lower(), word_b.lower(), word_c.lower()
    
    # Get the word embeddings v_a, v_b and v_c 
    e_a, e_b, e_c = word_to_vec_map[word_a],word_to_vec_map[word_b],word_to_vec_map[word_c]
    
    words = word_to_vec_map.keys()
    max_cosine_sim = -100              # Initialize max_cosine_sim to a large negative number
    best_word = None                   # Initialize best_word with None, it will help keep track of the word to output

    # loop over the whole word vector set
    for w in words:        
        # to avoid best_word being one of the input words, pass on them.
        if w in [word_a, word_b, word_c] :
            continue
        
        # Compute cosine similarity between the vector (e_b - e_a) and the vector ((w's vector representation) - e_c)  
        cosine_sim = cosine_similarity(e_b - e_a, word_to_vec_map[w] - e_c)
        
        # If the cosine_sim is more than the max_cosine_sim seen so far,
            # then: set the new max_cosine_sim to the current cosine_sim and the best_word to the current word 
        if cosine_sim > max_cosine_sim:
            max_cosine_sim = cosine_sim
            best_word = w
        
    return best_word

Run the cell below to test your code, this may take 1-2 minutes.

In [6]:
triads_to_try = [('italy', 'italian', 'spain'), ('india', 'delhi', 'japan'), ('man', 'woman', 'boy'), ('small', 'smaller', 'large')]
for triad in triads_to_try:
    print ('{} -> {} :: {} -> {}'.format( *triad, complete_analogy(*triad,word_to_vec_map)))

italy -> italian :: spain -> spanish
india -> delhi :: japan -> tokyo
man -> woman :: boy -> girl
small -> smaller :: large -> larger


Lets try more triads

In [13]:
triads = [('husband', 'wife','prince'),('America','Chicago','Canada'),('small','large','tiny')]
for triad1 in triads:
    print('{} is to {} as {} is to {}'.format(*triad1, complete_analogy(*triad1, word_to_vec_map)))

husband is to wife as prince is to duchess
America is to Chicago as Canada is to toronto
small is to large as tiny is to surpluses


## Gender Debiasing word vectors

Lets first see how the GloVe word embeddings relate to gender. You will first compute a vector $g = e_{woman}-e_{man}$, where $e_{woman}$ represents the word vector corresponding to the word *woman*, and $e_{man}$ corresponds to the word vector corresponding to the word *man*. The resulting vector $g$ roughly encodes the concept of "gender".

We compute  $g1 = e_{woman}-e_{man}$, $g_2 = e_{mother}-e_{father}$, $g_3 = e_{girl}-e_{boy}$ and average over them and call it $g$. Thus $g$ roughly represents the concept of "female gender".


In [16]:
g1 = word_to_vec_map['woman'] - word_to_vec_map['man']
g2 = word_to_vec_map['mother'] - word_to_vec_map['father']
g3 = word_to_vec_map['girl'] - word_to_vec_map['boy']
g = (g1 + g2+g3 )/ 3
print(g.shape)

(50,)


Now lets see how $g$ compares with some masculine, feminine or profession related words. 

In [18]:
name_list = ['john', 'marie', 'sophie', 'ronaldo', 'priya', 'rahul', 'danielle', 'reza', 'katy', 'yasmin','lipstick', 'guns', 'science', 'arts', 'literature', 'warrior','doctor', 'tree', 'receptionist', 
             'technology',  'fashion', 'teacher', 'engineer', 'pilot', 'computer', 'singer']

for w in name_list:
    print (w, cosine_similarity(word_to_vec_map[w], g))

john -0.30873091089769905
marie 0.34257515107827113
sophie 0.4116200252265308
ronaldo -0.29083978511732383
priya 0.1964679344860046
rahul -0.1949214763863341
danielle 0.2923957653171285
reza -0.1679382162425299
katy 0.31132430605664346
yasmin 0.19658379893678699
lipstick 0.4136681512625245
guns -0.08755154639507809
science -0.058374259643848236
arts 0.011760468125783748
literature 0.023169467893751714
warrior -0.1656463810030795
doctor 0.0772141272665668
tree 0.03538042107098229
receptionist 0.30167259871100655
technology -0.1619210846255818
fashion 0.1416547219136271
teacher 0.10545901736578715
engineer -0.22639944157426764
pilot -0.03699357317847416
computer -0.16821031921735138
singer 0.2009300079322624


We see that female names have positive cosine similarity while male names have negative cosine similarity with $g$. But it can also be observed that words like "technology", "engineer" and "computer" have negative cosine similarity with $g$ while words like "receptionst" has positive cos. similarity with $g$. Hence there is a gender bias in these vectors and we need to reduce it. We will use the alogorithm by  [Boliukbasi et al., 2016](https://arxiv.org/abs/1607.06520).




To neutralise any embedding vector $e$, we first find the component of $e$ in the direction of $g$ and then remove it from $e$ as below,

$$e^{bias\_component} = \frac{e \cdot g}{||g||_2^2} * g$$
$$e^{debiased} = e - e^{bias\_component}$$



In [19]:
def neutralize(word, g, word_to_vec_map):
    """
    Removes the bias of "word" by projecting it on the space orthogonal to the bias axis. 
    This function ensures that gender neutral words are zero in the gender subspace.
    
    Arguments:
        word -- string indicating the word to debias
        g -- numpy-array of shape (50,), corresponding to the bias axis (such as gender)
        word_to_vec_map -- dictionary mapping words to their corresponding vectors.
    
    Returns:
        e_debiased -- neutralized word vector representation of the input "word"
    """
    
    # Select word vector representation of "word". Use word_to_vec_map. (≈ 1 line)
    e = word_to_vec_map[word]
    
    # Compute e_biascomponent using the formula give above. (≈ 1 line)
    e_biascomponent = (np.dot(e,g)/ ((np.linalg.norm(g))**2)) * g
 
    # Neutralize e by substracting e_biascomponent from it 
    # e_debiased should be equal to its orthogonal projection. (≈ 1 line)
    e_debiased = e - e_biascomponent
    
    return e_debiased

In [26]:
e = "science"
print("cosine similarity between " + e + " and g, before neutralizing: ", cosine_similarity(word_to_vec_map["science"], g))

e_debiased = neutralize("science", g, word_to_vec_map)
print("cosine similarity between " + e + " and g, after neutralizing: ", cosine_similarity(e_debiased, g))

print('\n')


e = "receptionist"
print("cosine similarity between " + e + " and g, before neutralizing: ", cosine_similarity(word_to_vec_map["receptionist"], g))

e_debiased = neutralize("science", g, word_to_vec_map)
print("cosine similarity between " + e + " and g, after neutralizing: ", cosine_similarity(e_debiased, g))


cosine similarity between science and g, before neutralizing:  -0.058374259643848236
cosine similarity between science and g, after neutralizing:  0.0


cosine similarity between receptionist and g, before neutralizing:  0.30167259871100655
cosine similarity between receptionist and g, after neutralizing:  0.0


### Equalization algorithm for gender-specific words

By equalization, we want to make sure that words like "babysit" are equidistant from masculine and feminine words like "husband" or "wife". Technically, we want to make the vector embedding for "babysit" to be equidistant from the 49-dimesional $g_\perp$ ($g_\perp$ is orthogonal to $g$).

We achieve this with the formulae below (See Bolukbasi et al., 2016 for details.):

$$ \mu = \frac{e_{w1} + e_{w2}}{2}$$ 

$$ \mu_{B} = \frac {\mu \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}$$ 

$$\mu_{\perp} = \mu - \mu_{B} $$

$$ e_{w1B} = \frac {e_{w1} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}$$ 
$$ e_{w2B} = \frac {e_{w2} \cdot \text{bias_axis}}{||\text{bias_axis}||_2^2} *\text{bias_axis}$$


$$e_{w1B}^{corrected} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{e_{\text{w1B}} - \mu_B} {|(e_{w1} - \mu_{\perp}) - \mu_B)|} $$


$$e_{w2B}^{corrected} = \sqrt{ |{1 - ||\mu_{\perp} ||^2_2} |} * \frac{e_{\text{w2B}} - \mu_B} {|(e_{w2} - \mu_{\perp}) - \mu_B)|} $$

$$e_1 = e_{w1B}^{corrected} + \mu_{\perp} $$
$$e_2 = e_{w2B}^{corrected} + \mu_{\perp} $$




In [27]:
def equalize(pair, bias_axis, word_to_vec_map):
    """
    Debias gender specific words by following the equalize method described in the figure above.
    
    Arguments:
    pair -- pair of strings of gender specific words to debias, e.g. ("actress", "actor") 
    bias_axis -- numpy-array of shape (50,), vector corresponding to the bias axis, e.g. gender
    word_to_vec_map -- dictionary mapping words to their corresponding vectors
    
    Returns
    e_1 -- word vector corresponding to the first word
    e_2 -- word vector corresponding to the second word
    """
    
    # Step 1: Select word vector representation of "word". Use word_to_vec_map. (≈ 2 lines)
    w1, w2 = pair
    e_w1, e_w2 = word_to_vec_map[w1], word_to_vec_map[w2] 
    
    # Step 2: Compute the mean of e_w1 and e_w2 (≈ 1 line)
    mu = (e_w1 + e_w2) / 2

    # Step 3: Compute the projections of mu over the bias axis and the orthogonal axis (≈ 2 lines)
    mu_B = (np.dot(mu , bias_axis )/ ((np.linalg.norm( bias_axis ))**2)) * bias_axis
    mu_orth = mu - mu_B

    # Step 4: Use equations (7) and (8) to compute e_w1B and e_w2B (≈2 lines)
    e_w1B = (np.dot(e_w1 , bias_axis )/ ((np.linalg.norm( bias_axis ))**2)) * bias_axis
    e_w2B = (np.dot(e_w2 , bias_axis )/ ((np.linalg.norm( bias_axis ))**2)) * bias_axis
        
    # Step 5: Adjust the Bias part of e_w1B and e_w2B using the formulas (9) and (10) given above (≈2 lines)
    corrected_e_w1B = np.sqrt( np.abs( 1 - (np.linalg.norm (mu_orth))**2  )) * (e_w1B - mu_B ) / (np.abs ( (e_w1 - mu_orth) - mu_B))
    corrected_e_w2B = np.sqrt( np.abs( 1 - (np.linalg.norm (mu_orth))**2  )) * (e_w2B - mu_B ) / (np.abs ( (e_w2 - mu_orth) - mu_B))

    # Step 6: Debias by equalizing e1 and e2 to the sum of their corrected projections (≈2 lines)
    e1 = corrected_e_w1B + mu_orth
    e2 = corrected_e_w2B + mu_orth
                                                                
    
    return e1, e2

In [30]:
print("cosine similarities before equalizing:")
print("cosine_similarity(word_to_vec_map[\"husband\"], gender) = ", cosine_similarity(word_to_vec_map["husband"], g))
print("cosine_similarity(word_to_vec_map[\"wife\"], gender) = ", cosine_similarity(word_to_vec_map["wife"], g))
print('\n')
e1, e2 = equalize(("husband", "wife"), g, word_to_vec_map)
print("cosine similarities after equalizing:")
print("cosine_similarity(e1, gender) = ", cosine_similarity(e1, g))
print("cosine_similarity(e2, gender) = ", cosine_similarity(e2, g))

cosine similarities before equalizing:
cosine_similarity(word_to_vec_map["husband"], gender) =  0.19353194031263787
cosine_similarity(word_to_vec_map["wife"], gender) =  0.31026048013658003


cosine similarities after equalizing:
cosine_similarity(e1, gender) =  -0.6637685201826582
cosine_similarity(e2, gender) =  0.6626334732252718


And this way we achieve gender debiasing and equalization. We don't need to perform the neutralization operation **neutralize** on the vector separately as the function **equalize** takes care of both.

**References**:
- The debiasing algorithm is from Bolukbasi et al., 2016, [Man is to Computer Programmer as Woman is to
Homemaker? Debiasing Word Embeddings](https://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf)
- The GloVe word embeddings were due to Jeffrey Pennington, Richard Socher, and Christopher D. Manning. (https://nlp.stanford.edu/projects/glove/)
