# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [2]:
train_data = pickle.load(open("./Data/train_data.pkl", "rb"))

In [3]:
model = Word2Vec()

In [4]:
model.build_vocab(train_data)

In [5]:
%%time
model.train(
    train_data, 
    total_examples = model.corpus_count, 
    epochs = model.epochs
)

CPU times: user 2min 7s, sys: 345 ms, total: 2min 7s
Wall time: 43.2 s


(67633851, 81118765)

In [6]:
model.wv.most_similar(["salad"], topn = 20)

[('mesclun', 0.7851408123970032),
 ('dressing', 0.7791105508804321),
 ('vinaigrette', 0.7282365560531616),
 ('mizuna', 0.7002274990081787),
 ('dressed', 0.6980729699134827),
 ('caesar', 0.6848417520523071),
 ('lettuces', 0.6838010549545288),
 ('slaw', 0.6818397045135498),
 ('salads', 0.6788983345031738),
 ('frisée', 0.6782753467559814),
 ('thousand', 0.6738550066947937),
 ('tabbouleh', 0.644822359085083),
 ('frisee', 0.6398942470550537),
 ('panzanella', 0.6342629194259644),
 ('mâche', 0.6336913704872131),
 ('mache', 0.6291115880012512),
 ('zesty', 0.6139935255050659),
 ('cress', 0.6128290295600891),
 ('tossed', 0.6127486228942871),
 ('watercress', 0.6112533211708069)]

In [7]:
model.save("./Data/w2v.model")