# Word2vec with gensim

In this Jupyter notebook you will use the [Gensim] library (https://radimrehurek.com/gensim/index.html) to experiment with Word2VEC.This notebook is focused on the intuition of the concepts and not on the implementation details.This notebook is inspired by this [Guide] (https://radicrehurek.com/gensim/auto_examples/ttorials/run_word2vec.html).

## 1. Installation and loading the model

In [None]:
#!pip install --upgrade gensim

Collecting gensim
  Downloading gensim-4.3.3-cp311-cp311-macosx_11_0_arm64.whl.metadata (8.1 kB)
Collecting numpy<2.0,>=1.18.5 (from gensim)
  Downloading numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl.metadata (114 kB)
Collecting scipy<1.14.0,>=1.7.0 (from gensim)
  Downloading scipy-1.13.1-cp311-cp311-macosx_12_0_arm64.whl.metadata (60 kB)
Collecting smart-open>=1.8.1 (from gensim)
  Using cached smart_open-7.1.0-py3-none-any.whl.metadata (24 kB)
Collecting wrapt (from smart-open>=1.8.1->gensim)
  Using cached wrapt-1.17.2-cp311-cp311-macosx_11_0_arm64.whl.metadata (6.4 kB)
Downloading gensim-4.3.3-cp311-cp311-macosx_11_0_arm64.whl (24.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.0/24.0 MB[0m [31m35.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading numpy-1.26.4-cp311-cp311-macosx_11_0_arm64.whl (14.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.0/14.0 MB[0m [31m52.3 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m

In [2]:
import gensim.downloader as api

In [3]:
model = api.load('word2vec-google-news-300')



## 2. Similarity of words

In this section we will see how to achieve the similarity between two words using a Word Embedding already trained.

In [4]:
model.similarity("king", "queen")

0.6510956

In [5]:
model.similarity("king", "man")

0.2294267

In [6]:
model.similarity("king", "potato")

0.09978465

In [7]:
model.similarity("king", "king")

0.99999994

Now we will see how to find the words with greater similarity to the set of specified words.

In [8]:
model.most_similar(["king", "queen"], topn=5)

[('monarch', 0.7042065858840942),
 ('kings', 0.6780861020088196),
 ('princess', 0.6731551885604858),
 ('queens', 0.6679496765136719),
 ('prince', 0.6435247659683228)]

In [9]:
model.most_similar(["tomato", "carrot"], topn=5)

[('carrots', 0.7536594271659851),
 ('tomatoes', 0.7129638195037842),
 ('celery', 0.7025030851364136),
 ('broccoli', 0.6796351075172424),
 ('cherry_tomatoes', 0.6629275679588318)]

But you can even do interesting things such as seeing what word does not correspond to a list.

In [10]:
model.doesnt_match(["summer", "fall", "spring", "air"])

'air'

## Exercises

1. Use the Word2VEC model to make a ranking of the following 15 words according to its similarity with the words "man" and "Woman".For each pair, it prints its similarity.

In [11]:
words = [
"wife",
"husband",
"child",
"queen",
"king",
"man",
"woman",
"birth",
"doctor",
"nurse",
"teacher",
"professor",
"engineer",
"scientist",
"president"]

** 2. Complete the following analogies on your own (without using the model) **

a. king is to throne as judge is to _

b. giant is to dwarf as genius is to _

c. French is to France as Spaniard is to _

d. bad is to good as sad is to _

e. nurse is to hospital as teacher is to _

f. universe is to planet as house is to _

**3. Ahora completa las analogías usando un modelo word2vec**

Aquí hay un ejemplo de cómo hacerlo. Puedes resolver analogías como "A es a B como C es a _" haciendo A + C - B. 

In [12]:
# man is to woman as king is to ___?
model.most_similar(positive=["king", "woman"], negative=["man"], topn=1)

[('queen', 0.7118192911148071)]

In [13]:
# us is to burger as italy is to ___?
model.most_similar(positive=["Mexico", "burger"], negative=["USA"], topn=1)

[('taco', 0.6266060471534729)]