The task would be to recommend movies to the user based on him/her given movies.

Movies will be given by title.

In [1]:
import numpy as np
import pandas as pd

Start by inspecting our dataset

In [2]:
links_df = pd.read_csv('data/links.csv')
links_df.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


In [3]:
movies_df = pd.read_csv('data/movies.csv')
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [4]:
ratings_df = pd.read_csv('data/ratings.csv')
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [5]:
tags_df = pd.read_csv('data/tags.csv')
tags_df.head()

Unnamed: 0,userId,movieId,tag,timestamp
0,2,60756,funny,1445714994
1,2,60756,Highly quotable,1445714996
2,2,60756,will ferrell,1445714992
3,2,89774,Boxing story,1445715207
4,2,89774,MMA,1445715200


Let's go on the assumption that if person $A$ likes movies $M_0, M_1, ..., M_i$,
then there goes person $B$ who likes one or more movies from $M_i$ let's call them $M_j$.

This would mean that $A$ and $B$ has a movie that they both liked, therefore other movies from both $M_j$ and $M_i$ can be liked by both $A$ and $B$ with high probability.

------------------

In [6]:
df = movies_df.merge(ratings_df, on='movieId')

In [7]:
M_j = 'John Wick (2014)' # Title as input, now it's just one movie
recommended_movies = []

# Find the movie in the database, and sort it by rating
movie_db = df[df['title'] == M_j]\
            .sort_values(by='rating', ascending=False)

# Get the first 5 users who liked this movie
for user in movie_db.iloc[:5]['userId'].values:
    
    # Get the rated movies for this user
    rated_movies = df[df['userId'] == user]
    
    # Get the five biggest rated movie by this user
    rated_movies = rated_movies[rated_movies['title'] != M_j]\
                    .sort_values(by='rating', ascending=False)\
                    .iloc[:5]
    
    # Add these to the recommendations
    recommended_movies.extend(list(rated_movies['title'].values))
    
recommended_movies = np.unique(recommended_movies)
    
for movie in recommended_movies:
    print(movie)

21 Jump Street (2012)
Addams Family, The (1991)
Aladdin (1992)
Batman Begins (2005)
Boondock Saints II: All Saints Day, The (2009)
Captain America: Civil War (2016)
Deadpool (2016)
Fight Club (1999)
Green Mile, The (1999)
Indiana Jones and the Temple of Doom (1984)
Jackass 2.5 (2007)
Jungle Book, The (1967)
King's Speech, The (2010)
Kingsman: The Secret Service (2015)
Opera (1987)
Pan's Labyrinth (Laberinto del fauno, El) (2006)
Predestination (2014)
Suspiria (1977)
The Godfather Trilogy: 1972-1990 (1992)
Toy Story (1995)
Visitor Q (Bizita Q) (2001)
Willow (1988)


Now weight each movie by the similiarity on the genre feature

In [8]:
gmovie_genres = df[df['title'] == M_j].iloc[0]['genres'].split('|')
scores = {}  # {title: score ...}

for movie in recommended_movies:
    movied = df[df['title'] == movie].iloc[0]
    movie_genres = movied['genres'].split('|')
    score = 0
    
    # How many gmovie_genre can be found in movie_genres?
    for gmovie_genre in gmovie_genres:
        if gmovie_genre in movie_genres:
            score += 1
    
    scores[movie] = score
    
# Sort them on score and reverse it, because the bigger the score the better 
recommended_movies = sorted(scores, key=lambda x: scores[x])[::-1]  

The recommendations are now weighted

In [9]:
for movie in recommended_movies:
    print(movie)

Predestination (2014)
Fight Club (1999)
Captain America: Civil War (2016)
Boondock Saints II: All Saints Day, The (2009)
Willow (1988)
Pan's Labyrinth (Laberinto del fauno, El) (2006)
Kingsman: The Secret Service (2015)
Indiana Jones and the Temple of Doom (1984)
Deadpool (2016)
Batman Begins (2005)
21 Jump Street (2012)
Visitor Q (Bizita Q) (2001)
Toy Story (1995)
The Godfather Trilogy: 1972-1990 (1992)
Suspiria (1977)
Opera (1987)
King's Speech, The (2010)
Jungle Book, The (1967)
Jackass 2.5 (2007)
Green Mile, The (1999)
Aladdin (1992)
Addams Family, The (1991)


For implementation, see the `rmovie.py` file