### Example shows how to use the recommendation library
* The first part downloads the dataset, add my ratings and finds similar movies to the ones we want to explore.
* The second part generates recommendations for me (with the ratings I provided)
* The last part asks for your ratings, and generates recommendations for you. 

In [1]:
import os
os.chdir('..')

In [2]:
# Import all the packages we need to generate recommendations
import pandas as pd
import wwc_recsys.utils as utils
import wwc_recsys.recommenders as recommenders
import wwc_recsys.similarity as similarity

In [3]:
# Get the dataset folder. If the dataset is not downloaded, it downloads it and unzips it
# It merges my_ratings_file with the dataset
datasets_folder = os.getcwd()+'/data/'
dataset_url = 'http://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
my_ratings_file = 'ratings_humberto.csv'

In [4]:
# transform the data to get similar movies 
[ratings, my_customer_number] = utils.merge_datasets(datasets_folder, dataset_url, my_ratings_file)
ratings_matrix = ratings.pivot_table(index='customer', columns='movie', values='rating', fill_value=0)
ratings_matrix = ratings_matrix.transpose()
movie_list = pd.DataFrame(ratings_matrix.index)

## Find similar movies

In [5]:
# movie_list[movie_list['movie'].str.contains("Mystic River")]
movie_list = ['Juno (2007)', 'Harry Potter and the Chamber of Secrets (2002)', 'Django Unchained (2012)']
for movie_test in movie_list:
    print('\n MOVIES SIMILAR TO:  ', movie_test)
    print(similarity.compute_nearest_neighbours(movie_test, ratings_matrix, 'intersection')[0:10])
    print(similarity.compute_nearest_neighbours(movie_test, ratings_matrix, 'pearson')[0:10])
    print('\n')


 MOVIES SIMILAR TO:   Juno (2007)
                                                   item  similarity
4932                                        Juno (2007)          63
5877                                 Matrix, The (1999)          48
8136                   Shawshank Redemption, The (1994)          46
3349                                Forrest Gump (1994)          44
7323                                Pulp Fiction (1994)          44
5516  Lord of the Rings: The Fellowship of the Ring,...          44
3185                                  Fight Club (1999)          43
2289                            Dark Knight, The (2008)          43
7087  Pirates of the Caribbean: The Curse of the Bla...          42
5517  Lord of the Rings: The Return of the King, The...          42
                                    item  similarity
4932                         Juno (2007)    1.000000
3344    Forgetting Sarah Marshall (2008)    0.532086
5426         Little Miss Sunshine (2006)    0.516048
8777 

## Generate recommendations 

In [6]:
# get recommendations for a single user  (with default parameters)
recommendations = recommenders.recommend_iknn(ratings, my_customer_number, K=50, similarity_metric='pearson')
recommendations

Unnamed: 0,rating,movie
0,5,Wreck-It Ralph (2012)
1,5,White Noise (2005)
2,5,White Material (2009)
3,5,Whatever Happened to Aunt Alice? (1969)
4,5,"Watch, The (2012)"
5,5,Waltz with Bashir (Vals im Bashir) (2008)
6,5,Vanishing on 7th Street (2010)
7,5,V/H/S/2 (2013)
8,5,V/H/S (2012)
9,5,Undertow (2004)


In [7]:
# get recommendations for a single user  (with default parameters)
recommendations = recommenders.recommend_uknn(ratings, my_customer_number, K=4, similarity_metric='pearson')
recommendations

Unnamed: 0,rating,movie
0,5,Submarine (2010)
1,5,Prisoners (2013)
2,5,Palo Alto (2013)
3,5,Moonrise Kingdom (2012)
4,5,Love and Death (1975)
5,5,Lolita (1997)
6,5,Lolita (1962)
7,5,Juno (2007)
8,5,He Loves Me... He Loves Me Not (À la folie... ...
9,5,"Good, the Bad and the Ugly, The (Buono, il bru..."


---
## Recommendations for a new ratings file (your ratings)
* For this you need to add all ratings of your own 

In [None]:
my_ratings_file = 'ratings_humberto.csv'
[ratings, my_customer_number] = utils.merge_datasets(datasets_folder, dataset_url, my_ratings_file)
recommenders.recommend_iknn(ratings, my_customer_number, K=50, similarity_metric='pearson')