# Hybrid Recommenders

We have discussed a few problems that different recommender approaches have, such as the cold-start problem in collaborative filtering. Some of these problems can be resolved by using a different recommender approach in the start-up phase (e.g., using a content-based approach). In this Python notebook, I will present a simple hybdrif recommender that combines the content and the collaborative filters that we've built so far.

In [2]:
#Import the relevant packages
import numpy as np
import pandas as pd

#### Introduction
Netflix is a very good example of a hybrid recommender. It employs content-based technqiues when it shows you similar movies to a movie you're watching (the "more like this" section). These are typically content-based. However, most of the times, you would use a collaborative filter ("Top picks for you").

#### Case Study
Imagine that you've built a Netflix-like website. Each time a user watches a movie, you want to display a list of recommendations in the side pane (a bit like Youtube). A content-based recommender would then seem appropriate. However, let's say if a user would be watching the Dark Knight, this would lead to more Batman movie recommendations (not necessarily other superhero movies), which might be of poor quality. This requires a collaborative filter, which predicts the ratings of the movies recommender by our content-based model and return the top few movies with the highest predictions.

#### Workflow
1. Take in a movie title and user as input.
2. Use a content-based model to compute the 25 most similar movies.
3. Compute the predicted ratings that the use might give these 25 movies using a collaborative filter.
4. Return the top 10 movies with the highest predicted rating.

In [4]:
#Import or compute the cosine_sim matrix
cosine_sim = pd.read_csv('../data/cosine_sim.csv')

Normally I would ask you to compute the cosine similarity matrix, but the file above already has the scores. You can try to do it yourself in your own time!

In [23]:
#Import or compute the cosine sim mapping matrix
cosine_sim_map = pd.read_csv('../data/cosine_sim_map.csv', header=None)

#Convert cosine_sim_map into a Pandas Series
cosine_sim_map = cosine_sim_map.set_index(0)
cosine_sim_map = cosine_sim_map[1]

Now we import another csv-file to build a CF model. We will use the SVD model from the last chapter for this purpose, albeit with slightly different syntax

In [29]:
#Build the SVD based Collaborative filter
from surprise import SVD, Reader, Dataset

reader = Reader()
ratings = pd.read_csv('../data/ratings_small.csv')
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
svd = SVD()
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x21f4f279308>

Yet another file to important to map metadata to the CF data.

In [36]:
#Build title to ID and ID to title mappings
id_map = pd.read_csv('../data/movie_ids.csv')
id_to_title = id_map.set_index('id')
title_to_id = id_map.set_index('title')

Import metadata so that you can inspect the year of release and the IMDB rating

In [32]:
#Import or compute relevant metadata of the movies
smd = pd.read_csv('../data/metadata_small.csv')

Below is the hybrid recommender according to the workflow described earlier

In [33]:
def hybrid(userId, title):
    #Extract the cosine_sim index of the movie
    idx = cosine_sim_map[title]
    
    #Extract the TMDB ID of the movie
    tmdbId = title_to_id.loc[title]['id']
    
    #Extract the movie ID internally assigned by the dataset
    movie_id = title_to_id.loc[title]['movieId']
    
    #Extract the similarity scores and their corresponding index for every movie from the cosine_sim matrix
    sim_scores = list(enumerate(cosine_sim[str(int(idx))]))
    
    #Sort the (index, score) tuples in decreasing order of similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    
    #Select the top 25 tuples, excluding the first 
    #(as it is the similarity score of the movie with itself)
    sim_scores = sim_scores[1:26]
    
    #Store the cosine_sim indices of the top 25 movies in a list
    movie_indices = [i[0] for i in sim_scores]

    #Extract the metadata of the aforementioned movies
    movies = smd.iloc[movie_indices][['title', 'vote_count', 'vote_average', 'year', 'id']]
    
    #Compute the predicted ratings using the SVD filter
    movies['est'] = movies['id'].apply(lambda x: svd.predict(userId, id_to_title.loc[x]['movieId']).est)
    
    #Sort the movies in decreasing order of predicted rating
    movies = movies.sort_values('est', ascending=False)
    
    #Return the top 10 movies as recommendations
    return movies.head(10)

Below, you can test the hybrid recommender model. Let's imagine that users with the IDS 1 and 2 are both watching the movie Avatar. You can see that both the content and the order recommended to them differ. This is due to the collaborative filter. However, alle the movies are similar to Avatar, due to the content-based approach.

In [37]:
hybrid(1, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,year,id,est
522,Terminator 2: Judgment Day,4274.0,7.7,1991,280,3.209727
974,Aliens,3282.0,7.7,1986,679,3.193526
1011,The Terminator,4208.0,7.4,1984,218,3.06789
922,The Abyss,822.0,7.1,1989,2756,3.025943
2834,Predator,2129.0,7.3,1987,106,2.976406
2014,Fantastic Planet,140.0,7.6,1973,16306,2.848557
7705,Alice in Wonderland,8.0,5.4,1933,25694,2.848225
8658,X-Men: Days of Future Past,6155.0,7.5,2014,127585,2.787715
8401,Star Trek Into Darkness,4479.0,7.4,2013,54138,2.750236
1621,Darby O'Gill and the Little People,35.0,6.7,1959,18887,2.675153


In [38]:
hybrid(2, 'Avatar')

Unnamed: 0,title,vote_count,vote_average,year,id,est
522,Terminator 2: Judgment Day,4274.0,7.7,1991,280,4.160931
1011,The Terminator,4208.0,7.4,1984,218,3.942202
8401,Star Trek Into Darkness,4479.0,7.4,2013,54138,3.907942
7705,Alice in Wonderland,8.0,5.4,1933,25694,3.782791
974,Aliens,3282.0,7.7,1986,679,3.772583
2834,Predator,2129.0,7.3,1987,106,3.694343
1668,Return from Witch Mountain,38.0,5.6,1978,14822,3.686929
7088,Star Wars: The Clone Wars,434.0,5.8,2008,12180,3.653427
2014,Fantastic Planet,140.0,7.6,1973,16306,3.635466
922,The Abyss,822.0,7.1,1989,2756,3.616338
