<a href="https://colab.research.google.com/github/ShivinM-17/nlp-practices/blob/main/word_embeddings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import gdown

## Implementing Word2Vec analogies (from Google)

### Loading the dataset

In [None]:
!gdown https://drive.google.com/uc?id=0B7XkCwpI5KDYNlNUTTlSS21pQmM

Downloading...
From: https://drive.google.com/uc?id=0B7XkCwpI5KDYNlNUTTlSS21pQmM
To: /content/GoogleNews-vectors-negative300.bin.gz
100% 1.65G/1.65G [00:23<00:00, 71.6MB/s]


In [None]:
!gunzip GoogleNews-vectors-negative300.bin.gz

### Importing and using the necessary modules for word embeddings

In [None]:
# API with abiltiy to interact with the word embedding we have downloaded
from gensim.models import KeyedVectors

In [None]:
word_vectors = KeyedVectors.load_word2vec_format(
    'GoogleNews-vectors-negative300.bin',
    binary=True
)

In [None]:
word_vectors

<gensim.models.keyedvectors.KeyedVectors at 0x78c2aa8bf820>

In [None]:
def find_analogies(w1, w2, w3):
  # Their position in analogies is as follows
  # w1 - w2 = ? - w3
  # eg. king - man = ? - woman
  # (or) ? = king - man + woman
  r = word_vectors.most_similar(positive=[w1,w3], negative=[w2])
  print(f"{w1} - {w2} = {r[0][0]} - {w3}")


In [None]:
find_analogies('king', 'man', 'woman')

king - man = queen - woman


In [None]:
find_analogies('france', 'paris', 'rome')

france - paris = italy - rome


In [None]:
find_analogies('paris', 'france', 'english')

paris - france = grammer - english


In [None]:
find_analogies('france', 'french', 'english')

france - french = england - english


In [None]:
find_analogies('japan', 'japanese', 'chinese')

japan - japanese = tibet - chinese


In [None]:
find_analogies('japan', 'japanese', 'italian')

japan - japanese = italy - italian


In [None]:
find_analogies('december', 'november', 'june')

december - november = september - june


In [None]:
find_analogies('man', 'woman', 'aunt')

man - woman = uncle - aunt


In [None]:
find_analogies('man', 'woman', 'sister')

man - woman = brother - sister


In [None]:
find_analogies('nephew', 'niece', 'girlfriend')

nephew - niece = boyfriend - girlfriend


### Finding the most similar words for a given word

In [None]:
def nearest_neighbors(word):
  r = word_vectors.most_similar(positive=[word])
  print(f"Neighbors of {word} are:")
  for word, score in r:
    print(f"{word}")

In [None]:
nearest_neighbors('king')

Neighbors of king are:
kings
queen
monarch
crown_prince
prince
sultan
ruler
princes
Prince_Paras
throne


In [None]:
nearest_neighbors('queen')

Neighbors of queen are:
queens
princess
king
monarch
very_pampered_McElhatton
Queen
NYC_anglophiles_aflutter
Queen_Consort
princesses
royal


In [None]:
nearest_neighbors('japan')

Neighbors of japan are:
japanese
tokyo
america
europe
germany
chinese
india
hawaii
usa
korea


In [None]:
nearest_neighbors('france')

Neighbors of france are:
spain
french
germany
europe
italy
england
european
belgium
usa
serbia


In [None]:
nearest_neighbors('newton')

Neighbors of newton are:
jerome
thompson
thomas
walsh
richards
carl
alexander
phillips
brandon
anderson


In [None]:
nearest_neighbors('boy')

Neighbors of boy are:
girl
teenager
toddler
teenage_girl
man
teen_ager
son
kid
youngster
stepfather


In [None]:
nearest_neighbors('robot')

Neighbors of robot are:
robots
robotic
Robot
humanoid
robotics
humanoid_robots
Honda_Asimo
autonomous_robots
GeckoSystems_suite
i_SOBOT


## Implementing GloVe (from Stanford)