# Neural Word Embeddings

Sequences of Vectors
purpose built models for sequences
CNNs, RNNs, Transformers (and HMMs)

- Word2Vec
- GloVe

Word2Vec
- Binary prediction of whether an input word and an output word can be found near each other in a corpus

GloVe
- Doesn't use NN
- Used in NNs and later stages
- Works like a recommender system

## Word Embeddings

What can you do with Word Embeddings?
- Can convert a document into a vector (but not sparse like TFIDF)
- Embeddings are dense and low-dimensional (20, 50, 100, 300 << V)
- DOC -> tokenize -> vectorize each token -> AVERAGE (1 vector for the whole document)

## Word Analogies

We can do arithmetic on these vectors
- King:Man :: ????:Woman.    Answer: Queen
- King - Man + Women ==> Queen (<- all are vectors)
- France:Paris :: Italy:Rome
- Japan:Japanese :: China:Chinese
- Miami:Florida :: Dallas:Texas
- December:November :: July:June
- Man:Woman :: He:She

## DEMO

In [1]:
from gensim.models import KeyedVectors

In [2]:
word_vectors = KeyedVectors.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

In [7]:
def find_analogies(w1, w2, w3):
    r = word_vectors.most_similar(positive=[w1, w3], negative=[w2])
    print(f"{w1}:{w2} :: {r[0][0]}:{w3}")

In [8]:
find_analogies("king", "man", "woman")

king:man :: queen:woman


In [9]:
find_analogies("france", "paris", "london")

france:paris :: england:london


In [10]:
find_analogies("paris", "france", "italy")

paris:france :: lohan:italy


In [11]:
find_analogies("france", "paris", "rome")

france:paris :: italy:rome


In [13]:
find_analogies("japan", "japanese", "italian")

japan:japanese :: italy:italian


In [14]:
find_analogies("france", "french", "english")

france:french :: england:english


In [15]:
find_analogies("japan", "japanese", "chinese")

japan:japanese :: tibet:chinese


In [16]:
def nearest_neighbors(w):
    r = word_vectors.most_similar(positive=[w])
    print(f"neighbors of: {w}")
    for word, score in r:
        print(f"\t{word}")

In [17]:
nearest_neighbors("king")

neighbors of: king
	kings
	queen
	monarch
	crown_prince
	prince
	sultan
	ruler
	princes
	Prince_Paras
	throne


In [18]:
nearest_neighbors("france")

neighbors of: france
	spain
	french
	germany
	europe
	italy
	england
	european
	belgium
	usa
	serbia


In [19]:
nearest_neighbors("japan")

neighbors of: japan
	japanese
	tokyo
	america
	europe
	germany
	chinese
	india
	hawaii
	usa
	korea


In [20]:
nearest_neighbors("einstein")

neighbors of: einstein
	nikki
	lmfao
	albert
	armstrong
	joan
	becky
	mcmahon
	conrad
	lori
	haley


In [21]:
nearest_neighbors("woman")

neighbors of: woman
	man
	girl
	teenage_girl
	teenager
	lady
	teenaged_girl
	mother
	policewoman
	boy
	Woman


In [22]:
nearest_neighbors("february")

neighbors of: february
	january
	april
	september
	december
	july
	october
	november
	june
	feb
	norway


In [23]:
find_analogies("money", "evil", "woman")

money:evil :: funds:woman


In [24]:
find_analogies("evil", "man", "woman")

evil:man :: evil_stepmother:woman
