In this notebook I will apply some basic content based and collaborative filtering methods. The goal is to mimic some of the Netflix recommender algorithms described in [1]. The Neo4j sandbox project "Recommendation" shows handy examples of how content based as well as collaborativ filtering can be applyed on to the MovieLens dataset [2].

The netflix recommendation system consists of multiple algorithms. Each is designed to serve a dedicated use casse. All algorithms together create the netflix expierence according to [1]. The collection of netflix recommender algorithms consists mainly of:
- Personalized Video Ranger: PVR
- Top-N Video Ranker
- Trending Now
- Contiunue Watching
- Video-Video Similarity
- Page Generation: Row Selection and Ranking
- Search algorithms

In this notebook we are focusing on the PVR, Top-N Ranker as well as Trending Now.

- [1] *Carlos A. Gomez-Uribe and Neil Hunt. 2015. The Netflix recommender system: Algorithms, business value, and innovation. ACM Trans. Manage. Inf. Syst. 6, 4, Article 13 (December 2015), 19 pages. DOI: http://dx.doi.org/10.1145/2843948*
- [2] *https://sandbox.neo4j.com/*

In [1]:
from neo4j import GraphDatabase
import pandas as pd
from matplotlib import pyplot as plt
# from skimage import io

from neo4_connection import USER, PWD, URL

import warnings
warnings.filterwarnings('ignore')

In [2]:
driver = GraphDatabase.driver(uri=URL, auth=(USER, PWD))

def fetch_data(query):
  with driver.session() as session:
    result = session.run(query)
    return pd.DataFrame([r.values() for r in result], columns=result.keys())

## Personalized Video Ranker (PVR)

PVR recommendation categories could be for example:
- Action Movies
- Because you watched XYZ
- Top Picks for User ABC
- Popular on Netflix
- ect. 

In the following section we try to mimic some of these categories by simply using Cypher queries. The entry point for the queries is always a user and optionally a date.

In [3]:
def plot_netflix_style(df:pd.DataFrame, n_cols:int=8) -> None:
    _, ax = plt.subplots(nrows=1, ncols=n_cols, figsize=(18, 3))

    for i, (idx, row) in enumerate(df.iterrows()):
        #ax[i].set_title(row['title'])
        if row['imgUri'] is not None:
            img = io.imread(row['imgUri'])
            ax[i].imshow(img)

        if i == n_cols -1:
            break

    plt.axis('off')
    plt.show()

## Content-based Approaches

### Example: Action Movies

Here we start at a dedicated genre, e.g. Action. We then look for movies within the same genre and filter out movies with bad ratings. The query returns a list of n movies ordered by the amount of "good" ratings and the imdbRating score. This is a simple approach of how we can apply content based filtering by genres. We recommend the users x who likes genre Action other action movies. And we sort them by popularity. These recommendations are not very personalized, for each user who likes the genre Action the recommendation list will look the same.

In [4]:
genre = 'Action'
limit = 10
query = f"""
MATCH (g:Genre) <-[:IN_GENRE]- (m:Movie)
    <-[r:RATED]- (u:User)
WHERE g.name = '{genre}'
AND r.rating >= 4.0
AND m.imdbRating >= 8.0
AND r.ratingTs.year >= 2015
WITH g, m, count(r.rating) as numOfRatings, sum(r.rating) as ratingScore

RETURN m.movieId AS movieId, 
    m.title AS title, 
    numOfRatings, 
    toInteger(ratingScore) AS ratingScore, 
    m.imdbRating AS imdbRating
    //m.poster as imgUri
ORDER BY ratingScore DESC, imdbRating DESC
LIMIT {limit}
"""

dfs = []
df = fetch_data(query)
df.dropna(axis=0, inplace=True)
dfs.append(df)

print(f"{genre} movies")
df.head(10)

Action movies


Unnamed: 0,movieId,title,numOfRatings,ratingScore,imdbRating
0,260,Star Wars: Episode IV - A New Hope,43,194,8.7
1,79132,Inception,39,180,8.8
2,2571,"Matrix, The",39,180,8.7
3,2959,Fight Club,36,163,8.9
4,1196,Star Wars: Episode V - The Empire Strikes Back,34,160,8.8
5,58559,"Dark Knight, The",33,153,9.0
6,7153,"Lord of the Rings: The Return of the King, The",33,149,8.9
7,1198,Raiders of the Lost Ark (Indiana Jones and the...,31,140,8.5
8,1210,Star Wars: Episode VI - Return of the Jedi,28,127,8.4
9,112852,Guardians of the Galaxy,23,103,8.1


### Example "Because you watched ..."

Here we start at a dedicated movie, e.g. Inception. We then look for other movies with the at least one common genre. The query returns a list of n movies ordered by its similarity regarding common genres. This is a simple approach of how we can apply content based filtering by recommending movies based on rated movies. These recommendations are not personalized, the list of recommended movies looks identical for each user.

In [5]:
movie_id = 79132
limit = 10
query = f"""
// Find similar movies by common genres
MATCH (m:Movie)-[:IN_GENRE]->(g:Genre)
              <-[:IN_GENRE]-(rec:Movie)
WHERE m.movieId = {movie_id}
WITH m, rec, collect(g.name) AS commonGenres, count(*) AS num_of_commonGenres
RETURN m.movieId AS watchedId,
    m.title AS watchedTitle,
    rec.movieId AS recId,
    rec.title AS recTitle, 
    commonGenres, 
    num_of_commonGenres,
    toInteger(rec.imdbRating * rec.imdbVotes) AS imdbScore 
    //,rec.poster as imgUri
ORDER BY num_of_commonGenres DESC, imdbScore DESC 
LIMIT {limit}
"""

df = fetch_data(query)
df.dropna(axis=0, inplace=True)
dfs.append(df)

print(f"Because you have watched {df.loc[0, 'watchedTitle']}")
df.head(10)

Because you have watched Inception


Unnamed: 0,watchedId,watchedTitle,recId,recTitle,commonGenres,num_of_commonGenres,imdbScore
0,79132,Inception,60684,Watchmen,"[Drama, Mystery, Sci-Fi, Thriller, IMAX, Action]",6,2812919.0
1,79132,Inception,198,Strange Days,"[Crime, Drama, Mystery, Sci-Fi, Thriller, Action]",6,380001.0
2,79132,Inception,26701,Patlabor: The Movie (Kidô keisatsu patorebâ: T...,"[Crime, Drama, Mystery, Sci-Fi, Thriller, Action]",6,23126.0
4,79132,Inception,5445,Minority Report,"[Crime, Mystery, Sci-Fi, Thriller, Action]",5,2985382.0
5,79132,Inception,85414,Source Code,"[Drama, Mystery, Sci-Fi, Thriller, Action]",5,2730802.0
6,79132,Inception,86644,"Fast Five (Fast and the Furious 5, The)","[Crime, Drama, Thriller, IMAX, Action]",5,2012967.0
7,79132,Inception,7445,Man on Fire,"[Crime, Drama, Mystery, Thriller, Action]",5,1993576.0
8,79132,Inception,5388,Insomnia,"[Crime, Drama, Mystery, Thriller, Action]",5,1531620.0
9,79132,Inception,2985,RoboCop,"[Crime, Drama, Sci-Fi, Thriller, Action]",5,1309012.0


### Example This week popular on MovieLens

Here we look at the most popular movies within a timeinterval, e.g. current week. The query returns a list of n movies ordered by popularity wihtin the timeinterval. These recommendations are not personalized, the list of recommended movies looks identical for each user.

In [6]:
query_date = "2016-10-16"
limit = 10
query = f"""
MATCH (u:User) -[r:RATED]-> (m:Movie)
WHERE date(r.ratingTs).week = date("{query_date}").week
WITH m, sum(r.rating) AS trendScore
RETURN m.movieId AS movieId, 
    m.title AS movie, 
    trendScore,
    m.imdbVotes AS imdbVotes
    //,m.poster as imgUri
ORDER BY trendScore DESC, m.imdbVotes DESC
LIMIT {limit}
"""

df = fetch_data(query)
df.dropna(axis=0, inplace=True)
dfs.append(df)

print(f"This week most popular movies")
df.head(10)

This week most popular movies


Unnamed: 0,movieId,movie,trendScore,imdbVotes
0,608,Fargo,38.0,435149
1,527,Schindler's List,35.5,828323
2,356,Forrest Gump,34.0,1184397
3,318,"Shawshank Redemption, The",33.5,1626900
4,1198,Raiders of the Lost Ark (Indiana Jones and the...,28.5,632840
5,1265,Groundhog Day,28.5,418699
6,296,Pulp Fiction,28.0,1268850
7,25,Leaving Las Vegas,26.0,91469
8,260,Star Wars: Episode IV - A New Hope,25.5,875151
9,593,"Silence of the Lambs, The",25.5,847200


## Collaborative approches

In [7]:
user_id = 600 #389
limit = 20
query = f"""
// Find most similiar users according to pearson
MATCH (p1:User) -[r:RATED]-> (m:Movie)
    <-[r2:RATED]-(p2:User)
WHERE p1.userId = {user_id} 
AND p2 <> p1

WITH p1, p2, 
    collect(r.rating) AS p1Ratings, 
    collect(r2.rating) AS p2Ratings
WHERE size(p1Ratings) > 10

WITH p1, p2,
    gds.similarity.pearson(p1Ratings, p2Ratings) AS pearsonScore,
    gds.similarity.cosine(p1Ratings, p2Ratings) AS cosineScore
ORDER BY pearsonScore DESC
LIMIT 7

// Find movies which has been recommended by the peers but not by the user itself 
MATCH (p2) -[r3:RATED]-> (rec:Movie)
WHERE NOT EXISTS {{ (p1)-[:RATED]->(rec) }}
WITH p1, p2, r3, rec

// Collect the genres in which the recommended movies are
MATCH (rec) -[:IN_GENRE]-> (g:Genre)
WITH p1, p2, r3, rec,
    collect(g.name) AS genres

RETURN rec.movieId AS movieId, 
    rec.title AS title, 
    sum(r3.rating) AS ratingScore, 
    collect(p2.name) as recommendingPeers, 
    genres
    //,rec.poster as imgUri
ORDER BY ratingScore DESC
LIMIT {limit}

"""

df = fetch_data(query)
df.dropna(axis=0, inplace=True)
dfs.append(df)

print(f"Top picks for Cynthia Freeman")
df.head(10)


Top picks for Cynthia Freeman


Unnamed: 0,movieId,title,ratingScore,recommendingPeers,genres
0,92259,Intouchables,27.5,"[Jonathan Cooper, Kenneth Snyder MD, Manuel Mo...","[Drama, Comedy]"
1,4973,"Amelie (Fabuleux destin d'Amélie Poulain, Le)",26.5,"[Jonathan Cooper, Kenneth Snyder MD, Manuel Mo...","[Comedy, Romance]"
2,296,Pulp Fiction,26.0,"[Jonathan Cooper, Kenneth Snyder MD, Andrew Fr...","[Drama, Crime, Comedy, Thriller]"
3,4306,Shrek,25.5,"[Jonathan Cooper, Kenneth Snyder MD, Manuel Mo...","[Romance, Fantasy, Comedy, Children, Animation..."
4,2571,"Matrix, The",25.5,"[Jonathan Cooper, Kenneth Snyder MD, Manuel Mo...","[Thriller, Sci-Fi, Action]"
5,260,Star Wars: Episode IV - A New Hope,24.0,"[Kenneth Snyder MD, Manuel Morris, Andrew Free...","[Sci-Fi, Action, Adventure]"
6,1196,Star Wars: Episode V - The Empire Strikes Back,24.0,"[Jonathan Cooper, Kenneth Snyder MD, Andrew Fr...","[Action, Adventure, Sci-Fi]"
7,5952,"Lord of the Rings: The Two Towers, The",23.5,"[Jonathan Cooper, Kenneth Snyder MD, Manuel Mo...","[Adventure, Fantasy]"
8,7147,Big Fish,23.0,"[Kenneth Snyder MD, Manuel Morris, Andrew Free...","[Romance, Fantasy, Drama]"
9,7361,Eternal Sunshine of the Spotless Mind,22.0,"[Kenneth Snyder MD, Manuel Morris, Andrew Free...","[Drama, Sci-Fi, Romance]"
