# Word2vec with gensim

In this Jupyter notebook you will use the [Gensim] library (https://radimrehurek.com/gensim/index.html) to experiment with Word2VEC.This notebook is focused on the intuition of the concepts and not on the implementation details.This notebook is inspired by this [Guide] (https://radicrehurek.com/gensim/auto_examples/ttorials/run_word2vec.html).

## 1. Installation and loading the model

In [62]:
#!pip install --upgrade gensim

In [63]:
import gensim.downloader as api

In [64]:
model = api.load('word2vec-google-news-300')

## 2. Similarity of words

In this section we will see how to achieve the similarity between two words using a Word Embedding already trained.

In [65]:
model.similarity("king", "queen")

0.6510957

In [66]:
model.similarity("king", "man")

0.22942671

In [67]:
model.similarity("king", "potato")

0.09978464

In [68]:
model.similarity("king", "king")

1.0

Now we will see how to find the words with greater similarity to the set of specified words.

In [69]:
model.most_similar(["king", "queen"], topn=5)

[('monarch', 0.7042065858840942),
 ('kings', 0.6780862808227539),
 ('princess', 0.6731551885604858),
 ('queens', 0.6679496765136719),
 ('prince', 0.6435247659683228)]

In [70]:
model.most_similar(["tomato", "carrot"], topn=5)

[('carrots', 0.7536594867706299),
 ('tomatoes', 0.712963879108429),
 ('celery', 0.7025030851364136),
 ('broccoli', 0.6796349883079529),
 ('cherry_tomatoes', 0.662927508354187)]

But you can even do interesting things such as seeing what word does not correspond to a list.

In [71]:
model.doesnt_match(["summer", "fall", "spring", "air"])

'air'

## Exercises

1. Use the Word2VEC model to make a ranking of the following 15 words according to its similarity with the words "man" and "Woman".For each pair, it prints its similarity.

In [72]:
words = [
"wife",
"husband",
"child",
"queen",
"king",
"man",
"woman",
"birth",
"doctor",
"nurse",
"teacher",
"professor",
"engineer",
"scientist",
"president"]

#### Axuliar functions that I do not use anymore but that I want to keep

In [73]:
# # Making pairs from the words list
# def make_pairs(words):
#     word_pairs = []
#     for first_word in range(len(words)):
#         for second_word in range(first_word + 1, len(words)):
#             word_pairs.append((words[first_word], words[second_word]))
#     return word_pairs

# print(make_pairs(words))

In [74]:
# # Now I can calculate word similiratity 

# word_pairs = make_pairs(words)
# def calculate_similarity(word_pairs):
#     for word_1, word_2 in word_pairs:
#         similarity = model.similarity(word_1, word_2)
#         print(f"Similarity between {word_1} and {word_2}: {similarity}")

#     return similarity

# print(calculate_similarity(word_pairs))


In [75]:
similarities_man = []
similarities_woman = []

for word in words:
    sim_man = model.similarity("man", word)
    sim_woman = model.similarity("woman", word)


    similarities_man.append((word, sim_man))
    similarities_woman.append((word, sim_woman))

# Sorting similarities in descendant order

# lamda says that the list must be ordered by the second element of the tuple, that is, the similarity value
similarities_man = sorted(similarities_man, key=lambda x: x[1], reverse=True)
similarities_woman = sorted(similarities_woman, key=lambda x: x[1], reverse=True)



# Print ranking
print("'Man' similarity ranking:")
for word, similarity in similarities_man:
    print(f"Word: {word}, Similarity with 'man': {similarity}")

print("\n'Woman' similarity ranking:")
for word, similarity in similarities_woman:
    print(f"Word: {word}, Similarity with 'woman': {similarity}")


'Man' similarity ranking:
Word: man, Similarity with 'man': 1.0
Word: woman, Similarity with 'man': 0.7664012312889099
Word: husband, Similarity with 'man': 0.34499746561050415
Word: wife, Similarity with 'man': 0.3292091488838196
Word: child, Similarity with 'man': 0.3163333833217621
Word: doctor, Similarity with 'man': 0.31448960304260254
Word: nurse, Similarity with 'man': 0.25472286343574524
Word: teacher, Similarity with 'man': 0.25000131130218506
Word: king, Similarity with 'man': 0.22942671179771423
Word: queen, Similarity with 'man': 0.16658204793930054
Word: scientist, Similarity with 'man': 0.1582496464252472
Word: engineer, Similarity with 'man': 0.15128928422927856
Word: birth, Similarity with 'man': 0.11078789830207825
Word: professor, Similarity with 'man': 0.09415861964225769
Word: president, Similarity with 'man': 0.028424618765711784

'Woman' similarity ranking:
Word: woman, Similarity with 'woman': 1.0
Word: man, Similarity with 'woman': 0.7664012312889099
Word: husba

** 2. Complete the following analogies on your own (without using the model) **

a. king is to throne as judge is to `courts`

b. giant is to dwarf as genius is to `silly`

c. French is to France as Spaniard is to `Spain`

d. bad is to good as sad is to `happy`

e. nurse is to hospital as teacher is to `school`

f. universe is to planet as house is to `neighborhood`

In [76]:
# a. king is to throne as judge is to `courts`
model.most_similar(positive=["throne", "judge"], negative=["king"], topn=1)



[('appellate_court', 0.584525465965271)]

In [77]:
# b. giant is to dwarf as genius is to `silly`
model.most_similar(positive=["dwarf", "genius"], negative=["giant"], topn=1)

[('savant', 0.44152510166168213)]

In [78]:
# c. French is to France as Spaniard is to `Spain`

model.most_similar(positive=["France", "Spaniard"], negative=["French"], topn=1)

[('rider_Dani_Pedrosa', 0.5646752715110779)]

In [79]:
# d. bad is to good as sad is to `happy`
model.most_similar(positive=["Bad", "Sad"], negative=["Good"], topn=1)


[('sad', 0.5258649587631226)]

In [80]:
# e. nurse is to hospital as teacher is to `school`
model.most_similar(positive=["hospital", "teacher"], negative=["nurse"], topn=1)


[('school', 0.60170978307724)]

In [81]:
# f. universe is to planet as house is to `neighborhood`
model.most_similar(positive=["planet", "house"], negative=["universe"], topn=1)

[('bungalow', 0.5428239703178406)]

**3. Ahora completa las analogías usando un modelo word2vec**

Aquí hay un ejemplo de cómo hacerlo. Puedes resolver analogías como "A es a B como C es a _" haciendo A + C - B. 

In [82]:
# man is to woman as king is to ___?
model.most_similar(positive=["king", "woman"], negative=["man"], topn=1)

[('queen', 0.7118193507194519)]

In [83]:
# us is to burger as italy is to ___?
model.most_similar(positive=["Mexico", "burger"], negative=["USA"], topn=1)

[('taco', 0.6266060471534729)]