# Recommendation System using Node2vec


In [1]:
import pandas as pd

In [2]:
import networkx as nx  # create and store graph
from node2vec import Node2Vec  # To run node2vec algorithm


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
df_node2vec = pd.read_csv("../netflix_titles.csv")
df_node2vec = df_node2vec.dropna()
df_node2vec.drop(
    columns=[
        "director",
        "cast",
        "country",
        "date_added",
        "release_year",
        "rating",
        "duration",
        "type",
    ],
    inplace=True,
)


## Creating and analyzing Graph

Now, we'll use networkx to create a graph with movie titles and genres as nodes. I used two different functions: -

- addToGraph(movie name, graph): Adds an edge to the graph with the title and genres as nodes.
- createGraph(): This function calls addToGraph for each movie title in order to generate a complete graph.


In [4]:
# function that will create edges for given movie title and its genres
def addToGraph(movie_name, graph):
    genres = (
        df_node2vec[df_node2vec["title"] == movie_name]["listed_in"]
        .values[0]
        .rstrip()
        .lower()
        .split(", ")
    )
    for genre in genres:
        graph.add_edge(movie_name.strip(), genre)
    return graph


# function that will create graph for all the movies name
def createGraph():
    graph = nx.Graph()
    for movie_name in df_node2vec["title"]:
        graph = addToGraph(movie_name, graph)
    return graph


In [5]:
graph = createGraph()


In [6]:
# should be 2 since two genres are associated with it
print(graph.degree()["Norm of the North: King Sized Adventure"])
# should be 1 since 1 genres are associated with it
print(graph.degree()["#realityhigh"])


2
1


## Running Node2Vec

Node2vec’s sampling strategy, accepts 4 arguments:

- Number of walks: Number of random walks to be generated from each node in the graph
- dimensions : Embedding dimensions
- Walk length: How many nodes are in each random walk
- P: Return hyperparameter
- Q: Input hyperparameter


In [7]:
node2vec = Node2Vec(graph, dimensions=20, walk_length=16, num_walks=10)


Computing transition probabilities: 100%|██████████| 5373/5373 [02:00<00:00, 44.47it/s] 
Generating walks (CPU: 1): 100%|██████████| 10/10 [01:26<00:00,  8.68s/it]


In [8]:
model = node2vec.fit(window=5, min_count=1)


## See Embeddings

Let's take a look at the values in embeddings.


In [9]:
model.wv.get_vector("The Conjuring")


array([-0.07528649,  0.32395372, -0.04905971,  0.3000146 ,  0.01165257,
        0.6508559 ,  0.552389  ,  1.2716715 , -0.09399039,  0.47394162,
       -0.03846619, -0.7954039 ,  0.7876637 , -0.8032726 ,  0.6878014 ,
        0.02354643,  0.79975134,  0.6447317 , -1.0822569 , -0.20448099],
      dtype=float32)

In [10]:
model.wv.get_vector("Insidious")


array([-0.267572  ,  0.41998634, -0.10039267,  0.40691748,  0.24374427,
        0.76195514,  0.710778  ,  1.2504964 , -0.20586337,  0.47253788,
       -0.06752691, -0.962921  ,  0.80334276, -0.6394063 ,  0.64307296,
        0.20696399,  0.75908184,  0.4785674 , -1.029885  , -0.21576826],
      dtype=float32)

## Using Node2Vec Embeddings

We will use the generated embeddings to recommend similar genres and movies.


In [11]:
# generate similar movies to given genre or title
def node2vec_recommender(name):
    for node, _ in model.wv.most_similar(name):
        print(node)


## Movie Recommender using node2vec tool


In [12]:
node2vec_recommender("Insidious")


The Haunting of Molly Hartley
Paranormal Activity
The Open House
Malevolent
All Light Will End
The Charnel House
The Bye Bye Man
Cabin Fever
The Ring
Before I Wake


In [13]:
node2vec_recommender("The Conjuring")


Knock Knock
Insidious
The Ring
The Witch Files
The Bye Bye Man
Case 39
The Open House
Our House
The Charnel House
The Haunting of Molly Hartley
