# Recommender System

In this exercise, we provide a Jupyter notebook with an implementation of a Recommender System to suggest movies to users. Data files are collected from the source and used to build the recommender system.

The Recommender System calculates movie suggestions for a given user by calculating  similarities between the users latest rated movie(s) and other movies. Similarity between movies is calculated using Pearson coefficient between user ratings of each movie. Similar movies are then ranked in descending order by Pearson coefficient and the first ___x___ movies are taken as recommendations for the user.
This is a basic implementation of a Collaborative Filtering system. For example, there is no post processing in case of multiple movies with similar Pearson coefficient (no tie breaker). Also, no other information about the movies or the users (ex. genres or release year).
External libraries may be used to answer the questions. Notebook must be executable.


1. Split data into train/test and implement the following evaluation metrics for the Recommender System:
    * __MAP @ k__: Mean Average Precision where precision @ ___k___ is a percentage of correct items among first ___k___ recommendations.
    * __Coverage__: Percentage of movies that can be recommended in the test data.
    * __Personalization__: Score indicating how user dissimilar user recommendations are from each other. It is computed as __1 - Cosine Similarity of user recommendations__.
2. Improve the Recommender System on at least one of the metrics in __(1)__. There is no constraint on how to improve the model. It could be through an implementation of another method or a pre/post processing of data. 
3. Document all the modifications to the model and present a comparative report showing the improvement in evaluation metrics.

In [35]:
!pip install wget



In [None]:
import pandas as pd
import numpy as np
import zipfile
import wget
from IPython.display import display, HTML
pd.set_option('display.max_colwidth', -1)

## Method to read the input data
* Reading csv files containing data 
* "ratings.csv" contains all user movie ratings with a timestamp
* "movies.csv" contains information about teh movies such as the genres

In [None]:
url = 'http://files.grouplens.org/datasets/movielens/ml-20m.zip'
filename = wget.download(url)
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall('.')

In [41]:
all_reviews = pd.read_csv('./ml-20m/ratings.csv')
all_items = pd.read_csv('./ml-20m/movies.csv')

## Method to prepare the data
* This function creates a user/item matrix with the ratings as values, NaN means no user in row did not rate movie in column

In [42]:
def get_user_item_matrix(sample_size):
    reviews = all_reviews.head(int(all_reviews.shape[0]*sample_size))
    r = reviews.groupby(['userId','movieId'], as_index=False)['rating'].max()            
    data = r[['movieId','userId','rating']]
    d_ = data.pivot(index='userId', columns='movieId', values='rating')
    matrix = d_.dropna(thresh=2)
    matrix = matrix.dropna(thresh=10,axis=1)
    return matrix   

## Method to calculate the recommendations
* This function takes as arguments the user/item matrix, a vector of movie Ids rated by the user (possibly one) and the number of recommendations we would like to have.
* It finds correlated movies to the ones rated by the user.
* Returns the K most correlated movies (K given as argument).

In [43]:
def recommend_items(matrix,user_items,k):
    recommended_items = pd.DataFrame(columns=['items','score'])
    for item in user_items:
        item_vector = matrix[item]
        corr_results = matrix.corrwith(item_vector)
        corr_results = corr_results.where(corr_results > 0).dropna().sort_values(ascending=False)    
        similar_items = pd.DataFrame(corr_results.head(k))
        similar_items = similar_items.reset_index()
        similar_items.columns =  ['items','score']
        recommended_items = pd.concat([recommended_items,similar_items])
        if recommended_items[recommended_items.score == 1].shape[0] >= k:
                break

    if ~recommended_items.empty:  
        recommended_items = recommended_items[~recommended_items['items'].isin(user_items)] 
        recommended_items = recommended_items.sort_values(by=['score'], ascending=False)
        recommended_items.drop_duplicates(inplace=True)

    return recommended_items.head(k)


## Method to display the results
* Prints a DataFrame of movies rated by the User
* Prints a DataFrame of movies recommended to the User


In [44]:
def display_items(item_list):   
    display(all_items[all_items['movieId'].isin(item_list)])

def show_recommendations(user_id,recommendations,nb_rated_movies):
    items_names = [int(all_reviews.loc[all_reviews['userId']==user_id].sort_values(by='timestamp', ascending=False).iloc[i]['movieId']) for i in np.arange(nb_rated_movies)]
    print('\n\033[1mUser \"{}\" latest rated movie:\033[0m\n'.format(user_id))
    display_items(items_names)
    print('\n\n\033[1mRecommendations for user \"{}\":\033[0m\n'.format(user_id))
    display_items(recommendations['items'])

## Method to get the recommendation 
* Get the latest movied rated by a given User and a user/item matrix
* Calls 'recommend_items' with the latest movie rated by the User
* Returns the recommendations

In [45]:
def get_recommendations(user_id,matrix,nb_rated_movies):
    items_names = [int(all_reviews.loc[all_reviews['userId']==user_id].sort_values(by='timestamp', ascending=False).iloc[i]['movieId']) for i in np.arange(nb_rated_movies)]
    number_of_recommendations = 5
    recommendations = recommend_items(matrix,items_names,number_of_recommendations)
    return recommendations

In [46]:
sample_size=0.1
matrix = get_user_item_matrix(sample_size)
nonan_matrix = matrix.replace(np.nan,-1)

In [47]:
user_id = 15
nb_rated_movies = 4
try:
    recommendations = get_recommendations(user_id,nonan_matrix,nb_rated_movies)
    show_recommendations(user_id,recommendations,nb_rated_movies)
except:
    print('{} is an invalid User ID. Please enter a user ID between {} and {}.'.format(user_id,1,len(all_reviews.userId.unique())))  


[1mUser "15" latest rated movie:[0m



Unnamed: 0,movieId,title,genres
493,497,Much Ado About Nothing (1993),Comedy|Romance
511,515,"Remains of the Day, The (1993)",Drama|Romance
530,534,Shadowlands (1993),Drama|Romance
590,596,Pinocchio (1940),Animation|Children|Fantasy|Musical




[1mRecommendations for user "15":[0m



Unnamed: 0,movieId,title,genres
505,509,"Piano, The (1993)",Drama|Romance
588,594,Snow White and the Seven Dwarfs (1937),Animation|Children|Drama|Fantasy|Musical
1003,1022,Cinderella (1950),Animation|Children|Fantasy|Musical|Romance
1010,1029,Dumbo (1941),Animation|Children|Drama|Musical
1254,1282,Fantasia (1940),Animation|Children|Fantasy|Musical
