## Node2vec

In this notebook, I’m going to talk about a technique called node2vec which is a great tool for creating embeddings for nodes in a graph (in the G(V, E, W) sense of the word) these embeddings then can be used for recommender systems.
Let's start with installing node2vec first (Make sure internet is toggled on in Kaggle settings). I have used this [implementation.](https://github.com/eliorc/node2vec)

In [None]:
!pip install node2vec

## Loading Libraries

In [None]:

import numpy as np 
import pandas as pd
import networkx as nx #create and store graph
from node2vec import Node2Vec #To run node2vec algorithm

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Look into the dataset
I have used Netflix dataset which stores information about different movies and series on Netflix, information such as Name, director, Genre etc. Let's look into it.

In [None]:
df=pd.read_csv('../input/netflix-shows/netflix_titles.csv')
df=df.dropna()
df=df.drop(['description'],axis=1)
df.head()

In [None]:
df['title'].is_unique

Titles are not unique, let's check the reason for the same titles and which titles are repeated.

In [None]:
df2=df.groupby(['title']).count()
print(df2[df2['show_id']>1][0:2]) 
#Print the lists of titles appearing more than one time

In [None]:
df[df['title']=='Benji']
# Since there are same titles appaearing in different years we will combine year and title

In [None]:
df['title']=df['title']+', '+df['date_added']
df['title'].is_unique

In [None]:
df[df['title']=='The Silence, March 1, 2018']
#Now they are simply repeated with exactly we will drop them

In [None]:
df.drop_duplicates(subset=['title'],keep = False, inplace = True) 
df['title'].is_unique

In [None]:
df.head()

Now, we are ready to work with our dataset.

## Creating and analysing Graph
Now, we will create a graph using movie titles and genres as a node using networkx. I have used two functions:-
* addToGraph(movie_name, graph): It adds an edge to the graph the edge contains the title and its genres as nodes
* createGraph(): This calls addToGraph for each movie title to create a complete graph

In [None]:
# function that will create edges for given movie title and its genres
def addToGraph(movie_name,graph):
    genres=df[df['title']==movie_name]['listed_in'].values[0].rstrip().lower().split(', ')
    for genre in genres:
        graph.add_edge(movie_name.strip(),genre)
    return graph

#function that will create graph for all the movies name
def createGraph():
    graph = nx.Graph()
    for movie_name in df['title']:
        graph=addToGraph(movie_name,graph)
    return graph

In [None]:
graph=createGraph()

In [None]:
print(graph.degree()['Norm of the North: King Sized Adventure, September 9, 2019']) #should be 2 since two genres are assoicated with it
print(graph.degree()['#realityhigh, September 8, 2017']) #shoukd be 1 since 1 genres are assoicated with it

## Running Node2Vec

In [None]:
node2vec = Node2Vec(graph, dimensions=20, walk_length=16, num_walks=10)

In [None]:
model = node2vec.fit(window=5, min_count=1)

## See Embeddings
Let's look at what values are there in embeddings

In [None]:
model.wv.get_vector('Ralph Breaks the Internet: Wreck-It Ralph 2, June 11, 2019')

In [None]:
model.wv.get_vector('Transformer, February 20, 2019')

## Using Node2Vec Embeddings

We will use the embeddings generated to recommend similiar genres and movies.

In [None]:
#generate similiar movies to given genre or title
def print_similiar(name):
    for node, _ in model.wv.most_similar(name):
        print(node)

In [None]:
print_similiar('children & family movies')
#As we can see most movies like "Barbie", "sKrish Trish and Baltiboy" are indeed children movies and shows

In [None]:
print_similiar('Naruto Shippuden : Blood Prison, September 1, 2017')
# We get results like another naruto series, seven deadly sins which are really good recommendation.