# Word2vec with gensim

In this Jupyter notebook you will use the [Gensim] library (https://radimrehurek.com/gensim/index.html) to experiment with Word2VEC.This notebook is focused on the intuition of the concepts and not on the implementation details.This notebook is inspired by this [Guide] (https://radicrehurek.com/gensim/auto_examples/ttorials/run_word2vec.html).

## 1. Installation and loading the model

In [6]:
#!pip install --upgrade gensim

In [7]:
import gensim.downloader as api

In [8]:
model = api.load('word2vec-google-news-300')

## 2. Similarity of words

In this section we will see how to achieve the similarity between two words using a Word Embedding already trained.

In [9]:
model.similarity("king", "queen")

0.6510957

In [10]:
model.similarity("king", "man")

0.22942671

In [11]:
model.similarity("king", "potato")

0.09978464

In [12]:
model.similarity("king", "king")

1.0

Now we will see how to find the words with greater similarity to the set of specified words.

In [13]:
model.most_similar(["king", "queen"], topn=5)

[('monarch', 0.7042065858840942),
 ('kings', 0.6780862808227539),
 ('princess', 0.6731551885604858),
 ('queens', 0.6679496765136719),
 ('prince', 0.6435247659683228)]

In [14]:
model.most_similar(["tomato", "carrot"], topn=5)

[('carrots', 0.7536594867706299),
 ('tomatoes', 0.712963879108429),
 ('celery', 0.7025030851364136),
 ('broccoli', 0.6796349883079529),
 ('cherry_tomatoes', 0.662927508354187)]

But you can even do interesting things such as seeing what word does not correspond to a list.

In [15]:
model.doesnt_match(["summer", "fall", "spring", "air"])

'air'

## Exercises

1. Use the Word2VEC model to make a ranking of the following 15 words according to its similarity with the words "man" and "Woman".For each pair, it prints its similarity.

In [None]:
words = [
"wife",
"husband",
"child",
"queen",
"king",
"man",
"woman",
"birth",
"doctor",
"nurse",
"teacher",
"professor",
"engineer",
"scientist",
"president"]

[('wife', 'husband'), ('wife', 'child'), ('wife', 'queen'), ('wife', 'king'), ('wife', 'man'), ('wife', 'woman'), ('wife', 'birth'), ('wife', 'doctor'), ('wife', 'nurse'), ('wife', 'teacher'), ('wife', 'professor'), ('wife', 'engineer'), ('wife', 'scientist'), ('wife', 'president'), ('husband', 'child'), ('husband', 'queen'), ('husband', 'king'), ('husband', 'man'), ('husband', 'woman'), ('husband', 'birth'), ('husband', 'doctor'), ('husband', 'nurse'), ('husband', 'teacher'), ('husband', 'professor'), ('husband', 'engineer'), ('husband', 'scientist'), ('husband', 'president'), ('child', 'queen'), ('child', 'king'), ('child', 'man'), ('child', 'woman'), ('child', 'birth'), ('child', 'doctor'), ('child', 'nurse'), ('child', 'teacher'), ('child', 'professor'), ('child', 'engineer'), ('child', 'scientist'), ('child', 'president'), ('queen', 'king'), ('queen', 'man'), ('queen', 'woman'), ('queen', 'birth'), ('queen', 'doctor'), ('queen', 'nurse'), ('queen', 'teacher'), ('queen', 'professor

#### Axuliar functions that I do not use anymore but that I want to keep

In [None]:
# # Making pairs from the words list
# def make_pairs(words):
#     word_pairs = []
#     for first_word in range(len(words)):
#         for second_word in range(first_word + 1, len(words)):
#             word_pairs.append((words[first_word], words[second_word]))
#     return word_pairs

# print(make_pairs(words))

[('wife', 'husband'), ('wife', 'child'), ('wife', 'queen'), ('wife', 'king'), ('wife', 'man'), ('wife', 'woman'), ('wife', 'birth'), ('wife', 'doctor'), ('wife', 'nurse'), ('wife', 'teacher'), ('wife', 'professor'), ('wife', 'engineer'), ('wife', 'scientist'), ('wife', 'president'), ('husband', 'child'), ('husband', 'queen'), ('husband', 'king'), ('husband', 'man'), ('husband', 'woman'), ('husband', 'birth'), ('husband', 'doctor'), ('husband', 'nurse'), ('husband', 'teacher'), ('husband', 'professor'), ('husband', 'engineer'), ('husband', 'scientist'), ('husband', 'president'), ('child', 'queen'), ('child', 'king'), ('child', 'man'), ('child', 'woman'), ('child', 'birth'), ('child', 'doctor'), ('child', 'nurse'), ('child', 'teacher'), ('child', 'professor'), ('child', 'engineer'), ('child', 'scientist'), ('child', 'president'), ('queen', 'king'), ('queen', 'man'), ('queen', 'woman'), ('queen', 'birth'), ('queen', 'doctor'), ('queen', 'nurse'), ('queen', 'teacher'), ('queen', 'professor

In [None]:
# # Now I can calculate word similiratity 

# word_pairs = make_pairs(words)
# def calculate_similarity(word_pairs):
#     for word_1, word_2 in word_pairs:
#         similarity = model.similarity(word_1, word_2)
#         print(f"Similarity between {word_1} and {word_2}: {similarity}")

#     return similarity

# print(calculate_similarity(word_pairs))


Similarity between wife and husband: 0.8294166922569275
Similarity between wife and child: 0.3550868034362793
Similarity between wife and queen: 0.20636820793151855
Similarity between wife and king: 0.1500406712293625
Similarity between wife and man: 0.3292091488838196
Similarity between wife and woman: 0.4448239803314209
Similarity between wife and birth: 0.2527046501636505
Similarity between wife and doctor: 0.3103739619255066
Similarity between wife and nurse: 0.3347511887550354
Similarity between wife and teacher: 0.30123811960220337
Similarity between wife and professor: 0.17416977882385254
Similarity between wife and engineer: 0.15991957485675812
Similarity between wife and scientist: 0.15480250120162964
Similarity between wife and president: 0.16623905301094055
Similarity between husband and child: 0.3832300305366516
Similarity between husband and queen: 0.2445879429578781
Similarity between husband and king: 0.12284289300441742
Similarity between husband and man: 0.344997465610

** 2. Complete the following analogies on your own (without using the model) **

a. king is to throne as judge is to _

b. giant is to dwarf as genius is to _

c. French is to France as Spaniard is to _

d. bad is to good as sad is to _

e. nurse is to hospital as teacher is to _

f. universe is to planet as house is to _

**3. Ahora completa las analogías usando un modelo word2vec**

Aquí hay un ejemplo de cómo hacerlo. Puedes resolver analogías como "A es a B como C es a _" haciendo A + C - B. 

In [17]:
# man is to woman as king is to ___?
model.most_similar(positive=["king", "woman"], negative=["man"], topn=1)

[('queen', 0.7118193507194519)]

In [18]:
# us is to burger as italy is to ___?
model.most_similar(positive=["Mexico", "burger"], negative=["USA"], topn=1)

[('taco', 0.6266060471534729)]