# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- ## Task 3: Word2Vec with Gensim
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph


Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle



In [2]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)

In [6]:
train_data[0]

['place',
 'chicken',
 'butter',
 'soup',
 'onion',
 'slow',
 'cooker',
 'water',
 'cover',
 'cover',
 'cook',
 'hours',
 'high',
 'minutes',
 'serving',
 'place',
 'torn',
 'biscuit',
 'dough',
 'slow',
 'cooker',
 'cook',
 'dough',
 'longer',
 'raw',
 'center']

In [7]:
model = Word2Vec()

In [8]:
model?

[0;31mType:[0m           Word2Vec
[0;31mString form:[0m    Word2Vec(vocab=0, size=100, alpha=0.025)
[0;31mFile:[0m           ~/.local/lib/python3.6/site-packages/gensim/models/word2vec.py
[0;31mDocstring:[0m     
Train, use and evaluate neural networks described in https://code.google.com/p/word2vec/.

Once you're finished training a model (=no more updates, only querying)
store and use only the :class:`~gensim.models.keyedvectors.KeyedVectors` instance in `self.wv` to reduce memory.

The model can be stored/loaded via its :meth:`~gensim.models.word2vec.Word2Vec.save` and
:meth:`~gensim.models.word2vec.Word2Vec.load` methods.

The trained word vectors can also be stored/loaded from a format compatible with the
original word2vec implementation via `self.wv.save_word2vec_format`
and :meth:`gensim.models.keyedvectors.KeyedVectors.load_word2vec_format`.

Some important attributes are the following:

Attributes
----------
wv : :class:`~gensim.models.keyedvectors.Word2VecKeyedVectors

In [9]:
model.build_vocab(train_data)

In [11]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 2min 54s, sys: 1.09 s, total: 2min 55s
Wall time: 1min 34s


(68099682, 81403200)

In [12]:
model.wv.most_similar(['salad'], topn=20)

[('dressing', 0.7680736184120178),
 ('mesclun', 0.7516231536865234),
 ('vinaigrette', 0.7410101890563965),
 ('slaw', 0.7020204067230225),
 ('dressed', 0.7004445791244507),
 ('salads', 0.682034969329834),
 ('caesar', 0.6692193746566772),
 ('mizuna', 0.662878155708313),
 ('mache', 0.6412196159362793),
 ('thousand', 0.6408696174621582),
 ('lettuces', 0.6389744281768799),
 ('frisée', 0.6313387155532837),
 ('mâche', 0.6264267563819885),
 ('panzanella', 0.6223949790000916),
 ('dress', 0.6181958913803101),
 ('frisee', 0.6158856153488159),
 ('coleslaw', 0.6031270623207092),
 ('tabbouleh', 0.6006590127944946),
 ('arugula', 0.5951802730560303),
 ('cress', 0.5874608159065247)]

In [14]:
model.wv.most_similar(['salad', 'chicken'], topn=10)

[('dressing', 0.6165879368782043),
 ('caesar', 0.5640236139297485),
 ('pheasant', 0.5541513562202454),
 ('vinaigrette', 0.5517940521240234),
 ('mesclun', 0.549064040184021),
 ('romaine', 0.5292191505432129),
 ('turkey', 0.5203468799591064),
 ('slaw', 0.5160238742828369),
 ('croutons', 0.49328598380088806),
 ('thousand', 0.4924042820930481)]

In [15]:
model.save('Data/w2v.model')