# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [2]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)
    
train_data[0]

['place',
 'the',
 'chicken',
 'butter',
 'soup',
 'and',
 'onion',
 'in',
 'a',
 'slow',
 'cooker',
 'and',
 'fill',
 'with',
 'enough',
 'water',
 'to',
 'cover',
 'cover',
 'and',
 'cook',
 'for',
 'to',
 'hours',
 'on',
 'high',
 'about',
 'minutes',
 'before',
 'serving',
 'place',
 'the',
 'torn',
 'biscuit',
 'dough',
 'in',
 'the',
 'slow',
 'cooker',
 'cook',
 'until',
 'the',
 'dough',
 'is',
 'no',
 'longer',
 'raw',
 'in',
 'the',
 'center']

In [3]:
model = Word2Vec()

In [4]:
model?

[0;31mType:[0m            Word2Vec
[0;31mString form:[0m     Word2Vec<vocab=0, vector_size=100, alpha=0.025>
[0;31mFile:[0m            ~/dev/ai_projects/NLP/.venv/lib/python3.10/site-packages/gensim/models/word2vec.py
[0;31mDocstring:[0m       <no docstring>
[0;31mClass docstring:[0m
Serialize/deserialize objects from disk, by equipping them with the `save()` / `load()` methods.

--------
This uses pickle internally (among other techniques), so objects must not contain unpicklable attributes
such as lambda functions etc.
[0;31mInit docstring:[0m 
Train, use and evaluate neural networks described in https://code.google.com/p/word2vec/.

Once you're finished training a model (=no more updates, only querying)
store and use only the :class:`~gensim.models.keyedvectors.KeyedVectors` instance in ``self.wv``
to reduce memory.

The full model can be stored/loaded via its :meth:`~gensim.models.word2vec.Word2Vec.save` and
:meth:`~gensim.models.word2vec.Word2Vec.load` methods.

The tr

In [5]:
model.build_vocab(train_data)

In [6]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 4min 15s, sys: 3.12 s, total: 4min 18s
Wall time: 1min 27s


(92907521, 125366090)

In [7]:
model.wv.most_similar('salad', topn=20)

[('dressing', 0.7689827680587769),
 ('mesclun', 0.7343204021453857),
 ('vinaigrette', 0.7328872680664062),
 ('slaw', 0.7293946146965027),
 ('mache', 0.683726966381073),
 ('lettuces', 0.6635078191757202),
 ('mizuna', 0.6586458683013916),
 ('frisée', 0.6550888419151306),
 ('frisee', 0.6550101637840271),
 ('salads', 0.6481441259384155),
 ('thousand', 0.645386278629303),
 ('tabbouleh', 0.6379859447479248),
 ('dressed', 0.6374696493148804),
 ('caesar', 0.6362305283546448),
 ('micro', 0.6318367719650269),
 ('tartare', 0.6228469610214233),
 ('spago', 0.6170496344566345),
 ('mâche', 0.6120429039001465),
 ('zesty', 0.609458327293396),
 ('island', 0.6069187521934509)]

In [8]:
model.save('Data/w2v.model')