# **Exploring Analogies and Other Word Pair Relationships**

This assignment is accomplished by using Magnitude to load a vector model trained using word2vec, and use it to manipulate and analyze the vectors.

# **Google Colab Installation**

In [45]:
# Install Magnitude on Google Colab
! echo "Installing Magnitude.... (please wait, can take a while)"
! (curl https://raw.githubusercontent.com/plasticityai/magnitude/master/install-colab.sh | /bin/bash 1>/dev/null 2>/dev/null)
! echo "Done installing Magnitude."
! wget "http://magnitude.plasticity.ai/word2vec/GoogleNews-vectors-negative300.magnitude"

Installing Magnitude.... (please wait, can take a while)
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   137  100   137    0     0    311      0 --:--:-- --:--:-- --:--:--   311
Done installing Magnitude.
--2020-11-18 17:00:51--  http://magnitude.plasticity.ai/word2vec/GoogleNews-vectors-negative300.magnitude
Resolving magnitude.plasticity.ai (magnitude.plasticity.ai)... 52.216.128.210
Connecting to magnitude.plasticity.ai (magnitude.plasticity.ai)|52.216.128.210|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4211335168 (3.9G) [application/x-www-form-urlencoded]
Saving to: ‘GoogleNews-vectors-negative300.magnitude.1’


2020-11-18 17:05:17 (15.2 MB/s) - ‘GoogleNews-vectors-negative300.magnitude.1’ saved [4211335168/4211335168]



In [46]:
from pymagnitude import *
file_path = "GoogleNews-vectors-negative300.magnitude"
vectors = Magnitude(file_path)

# **Querying**

***Query for the vector of a key***

In [47]:
vectors.query("king")

array([ 4.340640e-02,  1.026280e-02,  2.965300e-03,  4.811720e-02,
       -8.832700e-03, -1.244990e-02,  3.852740e-02, -6.830620e-02,
        1.766540e-02,  1.251719e-01, -8.344790e-02, -1.043099e-01,
       -6.124000e-02, -8.580300e-03, -5.787520e-02, -5.854810e-02,
        1.194520e-02,  1.798100e-03,  1.598300e-02,  4.441580e-02,
        4.710770e-02,  3.886390e-02,  2.052550e-02,  4.710770e-02,
        3.482610e-02, -6.090350e-02, -8.681280e-02,  2.060960e-02,
        1.177693e-01, -1.072540e-02,  3.600370e-02,  2.128260e-02,
        4.290170e-02,  1.379583e-01, -1.110396e-01,  2.893760e-02,
        1.345930e-02,  2.018900e-03,  2.422680e-02,  5.955760e-02,
        4.778070e-02, -7.974660e-02,  9.758020e-02,  4.912660e-02,
        1.177693e-01, -8.243800e-03, -3.785440e-02,  1.144040e-02,
       -1.884310e-02,  5.278600e-03, -5.585630e-02,  5.451030e-02,
       -8.950460e-02,  6.940000e-03, -5.619280e-02,  4.679000e-04,
       -4.979960e-02, -1.960020e-02,  1.480530e-02, -8.496200e

***Query for the vector of multiple keys***

In [48]:
vectors.query(["Man", "is", "to", "king"])

array([[ 0.0326088 ,  0.028302  , -0.0150739 , ..., -0.0177656 ,
        -0.0433759 , -0.0032493 ],
       [ 0.003746  , -0.0389198 ,  0.0913317 , ...,  0.0059677 ,
         0.0871803 ,  0.0568229 ],
       [ 0.01091041, -0.02409765,  0.05447518, ...,  0.03390222,
        -0.03805145, -0.08748387],
       [ 0.0434064 ,  0.0102628 ,  0.0029653 , ..., -0.0296106 ,
         0.0314612 ,  0.0868128 ]], dtype=float32)

***Query the distance of two or multiple keys***

In [49]:
vectors.distance("banana", ["apple", "tiger"])

[0.96763563, 1.265893]

***Query the similarity of two or multiple keys***

In [50]:
vectors.similarity("banana", ["apple", "tiger"])

[0.5318407, 0.19875745]

The result is realistic as banana and apple are fruits and tiger is a kind of animal.

***Query for the most similar (nearest neighbors) keys***

In [51]:
vectors.most_similar("chocolate", topn = 10) # Top 10

[('dark_chocolate', 0.76897097),
 ('chocolates', 0.7569216),
 ('Chocolate', 0.70808077),
 ('caramel', 0.67320126),
 ('ice_cream', 0.66110814),
 ('caramels', 0.66045356),
 ('chip_cookies', 0.65967846),
 ('chocolate_truffle', 0.64146787),
 ('choccie', 0.6404877),
 ('Callebaut_chocolate', 0.6401943)]

***Query for the most similar keys giving positive and negative examples (which, incidentally, solves analogies)***

For example, man is to brother as woman is to [blank]

In [52]:
vectors.most_similar(positive = ["woman", "brother"], negative = ["man"])

[('sister', 0.81032145),
 ('daughter', 0.7646755),
 ('mother', 0.7524209),
 ('son', 0.7238258),
 ('niece', 0.72159415),
 ('husband', 0.71414834),
 ('father', 0.70660734),
 ('aunt', 0.6844728),
 ('cousin', 0.6844366),
 ('eldest_daughter', 0.6790663)]

Given the results, the "blank" in this analogy is most likely to be "sister".

# **Questions**

***What is the dimensionality of these word embeddings? Provide an integer answer.***

In [53]:
len(vectors)

3000000

In [54]:
vectors.dim

300

***What are the top-5 most similar words to picnic (not including picnic itself)?***

In [55]:
vectors.most_similar("picnic", topn = 10) 

[('picnics', 0.7400875),
 ('picnic_lunch', 0.721374),
 ('Picnic', 0.700534),
 ('potluck_picnic', 0.6683274),
 ('picnic_supper', 0.65189123),
 ('picnicking', 0.63550216),
 ('cookout', 0.63243484),
 ('Hiking_biking_camping', 0.6256069),
 ('barbeque', 0.62256277),
 ('barbecue', 0.6195759)]

The top-5 most similar words to picnic are 'picnics', 'picnic_lunch', 'Picnic', 'potluck_picnic', and 'picnic_supper'.

***According to the word embeddings, which of these words is not like the others? ['tissue', 'papyrus',
'manila', 'newsprint', 'parchment', 'gazette']***

In [56]:
vectors.doesnt_match(['tissue', 'papyrus','manila', 'newsprint', 'parchment', 'gazette'])

'tissue'

***Solve the following analogy: leg is to jump as X is to throw.***

In [57]:
vectors.most_similar(positive = ["leg", "throw"], negative = ["jump"])

[('forearm', 0.48294652),
 ('shin', 0.47376165),
 ('elbow', 0.4679689),
 ('metacarpal_bone', 0.46781474),
 ('metacarpal_bones', 0.46605822),
 ('ankle', 0.46434426),
 ('shoulder', 0.46183354),
 ('thigh', 0.45393682),
 ('knee', 0.4455707),
 ('ulna_bone', 0.4423491)]

Given the results, X is mostly likely to be 'forearm'