# Interactive Recommendation System with Word Embeddings using Word2Vec, Plotly, and NetworkX

## Project Breakdown
- Task 1: Introduction
- Task 2: Exploratory Data Analysis and Preprocessing
- Task 3: Word2Vec with Gensim (you are here)
- Task 4: Exploring Results
- Task 5: Building and Visualizing Interactive Network Graph

## Task 3: Word2Vec with Gensim
Word2Vec original papers can be found [here](https://arxiv.org/pdf/1301.3781.pdf) and [here](https://arxiv.org/pdf/1310.4546.pdf), while the documentation for the Gensim model can be found [here](https://radimrehurek.com/gensim/models/word2vec.html).

![Word2Vec architecture](Data/word2vec.jpeg)

In [1]:
from gensim.models.word2vec import Word2Vec
from tqdm import tqdm
import pandas as pd
import pickle

In [2]:
with open('Data/train_data.pkl', 'rb') as f:
    train_data = pickle.load(f)
    
train_data[0]

['place',
 'the',
 'chicken',
 'butter',
 'soup',
 'and',
 'onion',
 'in',
 'a',
 'slow',
 'cooker',
 'and',
 'fill',
 'with',
 'enough',
 'water',
 'to',
 'cover',
 'cover',
 'and',
 'cook',
 'for',
 'to',
 'hours',
 'on',
 'high',
 'about',
 'minutes',
 'before',
 'serving',
 'place',
 'the',
 'torn',
 'biscuit',
 'dough',
 'in',
 'the',
 'slow',
 'cooker',
 'cook',
 'until',
 'the',
 'dough',
 'is',
 'no',
 'longer',
 'raw',
 'in',
 'the',
 'center']

In [9]:
model = Word2Vec()

In [4]:
model?

[0;31mType:[0m            Word2Vec
[0;31mString form:[0m     Word2Vec<vocab=0, vector_size=100, alpha=0.025>
[0;31mFile:[0m            ~/.pyenv/versions/3.10.5/envs/trax/lib/python3.10/site-packages/gensim/models/word2vec.py
[0;31mDocstring:[0m       <no docstring>
[0;31mClass docstring:[0m
Serialize/deserialize objects from disk, by equipping them with the `save()` / `load()` methods.

--------
This uses pickle internally (among other techniques), so objects must not contain unpicklable attributes
such as lambda functions etc.
[0;31mInit docstring:[0m 
Train, use and evaluate neural networks described in https://code.google.com/p/word2vec/.

Once you're finished training a model (=no more updates, only querying)
store and use only the :class:`~gensim.models.keyedvectors.KeyedVectors` instance in ``self.wv``
to reduce memory.

The full model can be stored/loaded via its :meth:`~gensim.models.word2vec.Word2Vec.save` and
:meth:`~gensim.models.word2vec.Word2Vec.load` methods.


In [10]:
model.build_vocab(train_data)

In [11]:
%%time
model.train(train_data, total_examples=model.corpus_count, epochs=model.epochs)

CPU times: user 3min 9s, sys: 2.49 s, total: 3min 11s
Wall time: 1min 4s


(92904398, 125366090)

In [12]:
model.wv.most_similar('salad', topn=20)

[('dressing', 0.7688537240028381),
 ('slaw', 0.7555199861526489),
 ('mesclun', 0.7452468276023865),
 ('vinaigrette', 0.7269927859306335),
 ('mache', 0.6787164807319641),
 ('frisée', 0.6744154691696167),
 ('tartare', 0.6649956703186035),
 ('salads', 0.6594023704528809),
 ('mizuna', 0.6564218997955322),
 ('caesar', 0.6513283848762512),
 ('dressed', 0.6512698531150818),
 ('zesty', 0.6456854939460754),
 ('watercress', 0.6448778510093689),
 ('mâche', 0.6444368958473206),
 ('frisee', 0.6416191458702087),
 ('lettuces', 0.6408902406692505),
 ('micro', 0.638536810874939),
 ('thousand', 0.6112208366394043),
 ('cress', 0.598767876625061),
 ('arugula', 0.595379114151001)]

In [None]:
model.save('Data/w2v.model')