# Movie Recommendation - Collaborative Filtering

We have all seen product recommandations like "People who have looked that item x, also bought item y."
In this notebook we develop a similar system for an even better cause: Figuring out which movie to watch next.
For this we consider a simple dataset with user ratings for movies and then use a technique called [Collaborative Filtering](https://en.wikipedia.org/wiki/Collaborative_filtering) to identify which new movies might be worth watching based on other movies we liked.

![ratings](https://github.com/joerg84/Graph_Powered_ML_Workshop/blob/master/img/user_movie_rating.png?raw=1)

First, setting up our environment.

In [36]:
!pip3 install pyarango

Defaulting to user installation because normal site-packages is not writeable


In [37]:
import csv
import json
import requests
import sys
import oasis
import time
from pyArango.connection import *
from pyArango.collection import Collection, Edges, Field
from pyArango.graph import Graph, EdgeDefinition
from pyArango.collection import BulkOperation as BulkOperation

Next, create a temporary database instance backed by ArangoDB's Managed Cloud Service Oasis:

In [38]:
# Retrieve tmp credentials from ArangoDB Tutorial Service
login = oasis.getTempCredentials()

# Connect to the temp database
conn = oasis.connect(login)
db = conn[login["dbName"]] 

Reusing cached credentials.


In [39]:
print("https://"+login["hostname"]+":"+str(login["port"]))
print("Username: " + login["username"])
print("Password: " + login["password"])
print("Database: " + login["dbName"])

https://tutorials.arangodb.cloud:8529
Username: TUTfwbsls06i88g7bukjyqoxn
Password: TUTzyrb6ma6n530f7ejnbjqh
Database: TUTwkxc0uoa02ri4wq5oig9k


Lets define a structure for a simple train network.

Let us take a short look at our dataset which–as often in realworld scenarios– comes in csv format. 

In [40]:
print("User Data")
!head -n 3 Data/users.csv 
print()
print("Movies Data")
!head -n 3 Data/movies.csv 
print()
print("Rating Data")
!head -n 3 Data/ratings.csv 


User Data
user_id,Age,Gender,occupation,zip_code
1,35,M,engineer,94117
2,53,F,other,94043

Movies Data
movie_id, movie title , release date , video release date , IMDb URL , unknown , Action , Adventure , Animation , Children's , Comedy , Crime , Documentary , Drama , Fantasy , Film-Noir , Horror , Musical , Mystery , Romance , Sci-Fi , Thriller , War , Western
1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%20(1995),0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(1995),0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0

Rating Data
user_id,item_id,Rating,Timestamp
186,302,3,891717742
22,377,1,878887116


Create a graph with Users and Movies as Vertices, and Ratings as edges between.

In [43]:
# Create a collection for users

class Users(Collection):
    _fields = {
        "user_id": Field(),
        "age": Field(),
        "gender": Field()
    }
    
class Movies(Collection):
    _fields = {
        "movie_id": Field(),
        "movie_title": Field(),
        "release_data": Field()
    }

class Ratings(Edges): 
    _fields = {
        #user_id and item_id are encoded by _from, _to 
        "rating": Field(),
        "timestamp": Field()
    }

class IMDBGraph(Graph) :
    _edgeDefinitions = [EdgeDefinition("Ratings", fromCollections=["Users"], toCollections=["Movies"])]
    _orphanedCollections = []

db.createCollection("Users")
db.createCollection("Movies")
db.createCollection("Ratings")
iMDBGraph = db.createGraph("IMDBGraph", replicationFactor=3)

print("Collection/Graph Setup done.")

Collection/Graph Setup done.


In [44]:
collection = db["Users"]
with BulkOperation(collection, batchSize=100) as col:
    with open('Data/users.csv', newline='') as csvfile:
        reader = csv.reader(csvfile, delimiter=',', quotechar='|')
        #Skip header
        next(reader)
        for row in reader:
            user_id,age,gender,occupation,zip = tuple(row)
            doc = col.createDocument()
            doc["_key"] = user_id
            doc["age"] = age
            doc["gender"] = gender
            doc.save()

collection = db["Movies"]
with BulkOperation(collection, batchSize=100) as col:
    with open('Data/movies.csv', newline='') as csvfile:
        reader = csv.reader(csvfile, delimiter=',', quotechar='|')
        #Skip header
        next(reader)
        for row in reader:
            movie_id, movie_title , release_date , video_release_date , url , unknown , action , adventure , animation , childrens , comedy , crime , documentary , drama , fantasy , noir , horror , musical , mystery , romance , scifi , thriller , war , western = tuple(row)
            doc = col.createDocument()
            doc["_key"] = movie_id
            doc["movie_title"] = movie_title
            doc["release_date"] = release_date
            doc.save()

collection = db["Ratings"]
with BulkOperation(collection, batchSize=1000) as col:
    with open('Data/ratings.csv', newline='') as csvfile:
        reader = csv.reader(csvfile, delimiter=',', quotechar='|')
        #Skip header
        next(reader)
        for row in reader:
            user_id,movie_id,rating,timestamp = tuple(row)
            doc = col.createDocument()
            doc["_from"] = "Users/"+user_id
            doc["_to"] = "Movies/"+movie_id
            doc["ratings"] = rating
            doc["timestamp"] = timestamp
            doc.save()
        
print("Import Done")

Import Done


Let us build the Collaborative Filtering step by step;

1. Find movies I rated with 5 stars
2. Find users who also rated these movies also with 5 stars
3. Find additional movies also rated 5 stars by those users


In [45]:
my_ratings = """
WITH Movies, Users, Ratings
FOR movie, edge IN 1..1 
  OUTBOUND 'Users/1'
  GRAPH 'IMDBGraph'
  FILTER TO_NUMBER(edge.ratings) == 5
  LIMIT 10
  RETURN {
        "movie" : movie.movie_title,
        "rating" : edge.ratings
    }
"""

queryResult = db.AQLQuery(my_ratings, rawResults=True)
for result in queryResult:
    print("Movie: " + result["movie"])
    print("Rating: " + result["rating"])
    print()

Movie: Groundhog Day (1993)
Rating: 5

Movie: Delicatessen (1991)
Rating: 5

Movie: Pillow Book- The (1995)
Rating: 5

Movie: Horseman on the Roof- The (Hussard sur le toit- Le) (1995)
Rating: 5

Movie: Shawshank Redemption- The (1994)
Rating: 5

Movie: Star Trek: The Wrath of Khan (1982)
Rating: 5

Movie: Wallace & Gromit: The Best of Aardman Animation (1996)
Rating: 5

Movie: Breaking the Waves (1996)
Rating: 5

Movie: Three Colors: Blue (1993)
Rating: 5

Movie: Good- The Bad and The Ugly- The (1966)
Rating: 5



In [46]:
alike_users = """
WITH Movies, Users, Ratings
FOR movie, edge IN 1..1 
  OUTBOUND 'Users/1'
  GRAPH 'IMDBGraph'
  FILTER TO_NUMBER(edge.ratings) == 5
      FOR user, edge2 IN ANY movie Ratings
            FILTER TO_NUMBER(edge2.ratings) == 5
            LIMIT 10
            RETURN DISTINCT {
                "user" : user._key,
                "age" : user.age
            }
"""

queryResult = db.AQLQuery(alike_users, rawResults=True)
for result in queryResult:
    print("User: " + result["user"])
    print("Age: " + result["age"])
    print()

User: 161
Age: 50

User: 233
Age: 38

User: 1
Age: 35

User: 301
Age: 24

User: 303
Age: 19

User: 288
Age: 34

User: 210
Age: 39

User: 379
Age: 44

User: 130
Age: 20

User: 97
Age: 43



In [47]:
new_movies = """
WITH Movies, Users, Ratings
FOR movie, edge IN 1..1 
  OUTBOUND 'Users/1'
  GRAPH 'IMDBGraph'
  FILTER TO_NUMBER(edge.ratings) == 5
  
      FOR user, edge2 IN ANY movie Ratings
            FILTER TO_NUMBER(edge2.ratings) == 5
           // All users who have also rated that movie with 5 stars
          FOR movie2, edge3 IN ANY user Ratings
              FILTER TO_NUMBER(edge3.ratings) == 5
              LIMIT 10
              RETURN DISTINCT {
                 "title" : movie2.movie_title
              }
"""

queryResult = db.AQLQuery(new_movies, rawResults=True)
print("Recommended Movies:\n")
for result in queryResult:
    print("\t" + result["title"])
    print()

Recommended Movies:

	Groundhog Day (1993)

	Leaving Las Vegas (1995)

	Good Will Hunting (1997)

	As Good As It Gets (1997)

	Apt Pupil (1998)

	One Flew Over the Cuckoo's Nest (1975)

	To Live (Huozhe) (1994)

	Hoop Dreams (1994)

	Raiders of the Lost Ark (1981)



In [42]:
# Delete collections
db.dropAllCollections() 
db.reload()