### Welcome to you first hands-on on word embeddings.
#### In this hands-on you will be using pretrained GLoVe word vectors from stanford nlp which you can find [here](https://nlp.stanford.edu/projects/glove/)
#### Each word vectors is of dimension 50
#### You will be performing following operations:
    - Load the pretrained vectors from the text file
    - Write a function to find cosine similarity between two word vectors
    - Write an function to find analogy analogy problems such as King : Queen :: Men : __?__

### Task1
- A text file having the trained word vectors is provided for you as word2vec.txt in the same working directory.
- Each line in the file is space seperated values where first value is the word and the remaing values are its vector representation.

### Define a function get_word_vectors()
    parameters: file_name  
    returns: word_to_vec: dictionary with key as the word and the value is the corresponding word vectors as 1-d array each element of type float32.  

In [1]:
import numpy as np
import pandas as pd
def get_word_vectors(file_name):
    ###Start code here
    word_to_vec = {}
    df = pd.read_csv(file_name,sep=" ",quoting=3,header=None,index_col=0)
    word_to_vec = {key : val.values for key,val in df.T.items()}

    return word_to_vec

### Using the function you defined above read the word vectors from the file word_vectors.txt and assign it to variable word_to_vec

### Expected output  (showing only first few values of vectors)
   Father:  [ 0.095496   0.70418   -0.40777   -0.80844    1.256      0.77071 ...]  
   mother:  [ 0.4336     1.0727    -0.6196    -0.80679    1.2519     1.3767 ....]  

   

In [2]:
word_to_vec = get_word_vectors('word2vec.txt')
father = word_to_vec["father"]
mother = word_to_vec["mother"]
print("Father: ", father)
print("mother: ", mother)

Father:  [ 0.095496   0.70418   -0.40777   -0.80844    1.256      0.77071
 -1.0695     0.76847   -0.87813   -0.0080954  0.43884    1.0476
 -0.45071   -0.58931    0.83246   -0.038442  -0.73533    0.26389
  0.12617    0.57623   -0.23866    1.0922    -0.3367     0.081537
  0.84798   -2.4795    -0.40351   -0.84087    0.12034    0.29074
  1.9711    -0.50886   -0.45977   -0.13617    0.55613    0.22924
 -0.18947    0.43544    0.65151    0.043537  -0.1162     0.72196
 -0.66163   -0.17272    0.27367   -0.28169   -0.82025   -1.5089
  0.052787  -0.035579 ]
mother:  [ 0.4336     1.0727    -0.6196    -0.80679    1.2519     1.3767
 -0.93533    0.76088   -0.0056654 -0.063649   0.30297    0.52401
  0.2843    -0.38162    0.98797    0.093184  -1.1464     0.070523
  0.58012    0.50644   -0.24026    1.7344     0.020735   0.43704
  1.2148    -2.2483    -0.41168   -0.24922    0.31225   -0.49464
  2.0441    -0.012111  -0.19556    0.085665   0.27682    0.015702
  0.0067683  0.12759    0.87008   -0.40641   -0.


### Task 2 Determine the cosine similarity between two word vectors
- The formula for cosine similarity is given by
  score = $\large \frac{U.V}{\sqrt{||U||.||V||}}$ where ||U|| and ||V|| is the sum of the squares of the elemnts individual vectors
  

### Define a function named cosine_similarity()
    - parameters u, v are the word vectors whose similarity has to be determined
    - returns - score: cosine similarity of u and v

In [4]:
def cosine_similarity(u, v):
    dot = np.dot(u,v)
    
    norm_u = np.sqrt(np.sum(u * u))
    
   
    norm_v = np.sqrt(np.sum(v * v))
   
    score = dot / (norm_u * norm_v)
   
    
    
    return score

#### Run the bellow cell to find the similarity between word vectors paris and rome
### Expected output
   similarity score : 0.7099411

In [5]:
paris = word_to_vec["paris"]
rome = word_to_vec["rome"]
print("similarity score :", cosine_similarity(paris, rome))

similarity score : 0.7099411341712598


### Task 3
In the word analogy task, we complete the analogy . In detail, we are trying to find a word d, such that the associated word vectors $u_1, v_1, u_2, v_2$ are related in the following manner: $u_1 - v_1 \approx u_2 - v_2$. We will measure the similarity between $u_1 - v_1$ and $u_2 - v_2$ using cosine similarity.
#### As an example,  to find the best possible word for the analogy King : Queen :: Men : __?_ you will perform following steps:
- extract word vectors of three words king, queen and men
- find the element wise difference between the two word vectors king and queen as V1
- Find the element wise difference between the word vector men and each word vector in word_to_vec ditionary as V2 (while doing so exclude the words of interest ie. king, queen and men)
- Find the cosine similarity between vector V1 and V2 and choose the word from the word_to_vec ditionary that has maximum similarity between V1 and V2.
### Define the function named find_analogy()
    - parameters: word1 - string corresponding to word vector $u_1$
                  word2 - string corresponding to word vector $v_1$
                  word3 - string corresponding to word vector $u_2$
                  word_to_vec - dictionary of words and their corresponding vectors
    - returns: best_word -  the word such that $u_1$ - $v_1$ is close to $v\_best\_word$ - $v_c$, as measured by cosine similarity


In [9]:
def find_analogy(word_1, word_2, word_3, word_to_vec_map):
    word_a, word_b, word_c = word_1.lower(), word_2.lower(), word_3.lower()
      
    e_a, e_b, e_c = word_to_vec_map[word_a], word_to_vec_map[word_b], word_to_vec_map[word_c]
    
    
    words = word_to_vec_map.keys()
    max_cosine_sim = -100              
    best_word = None   
    for w in words:        
        
        if w in [word_a, word_b, word_c] :
            continue
        
        
        cosine_sim = cosine_similarity(e_b - e_a, word_to_vec_map[w] - e_c)
        
        if cosine_sim > max_cosine_sim:
            max_cosine_sim = cosine_sim
            best_word = w
    return best_word

### Run the below  code to evaluate your above defined function

#### Expected output:
    father -> son :: mother -> daughter
    india -> delhi :: japan -> tokyo

In [13]:
print ('{} -> {} :: {} -> {}'.format('father', 'son', 'mother',find_analogy('father', 'son', 'mother', word_to_vec)))
print ('{} -> {} :: {} -> {}'.format('india', 'delhi', 'japan',find_analogy('india', 'delhi', 'japan', word_to_vec)))

word1 = find_analogy("spain", 'india', 'tokyo', word_to_vec)
word2 = find_analogy("small", 'smaller', 'large', word_to_vec)

with open("output.txt", 'w+') as file:
    file.write(word1+'\n')
    file.write(word2)

father -> son :: mother -> daughter
india -> delhi :: japan -> tokyo
