### Recommendations Rank & Knowledge Based: 


In [76]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [18]:
movies.head()

Unnamed: 0,movie_id,movie,genre,date,1800's,1900's,2000's,History,News,Horror,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,91,Le manoir du diable (1896),Short|Horror,1896,1,0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0


In [19]:
movies.movie_id.nunique()

31245

In [20]:
movies.shape

(31245, 35)

In [77]:
reviews.tail()

Unnamed: 0,user_id,movie_id,rating,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_9,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018
712332,53966,6644200,7,1530049728,2018-06-26 21:48:48,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
712333,53966,6684714,5,1516287632,2018-01-18 15:00:32,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
712334,53967,816711,8,1371972851,2013-06-23 07:34:11,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
712335,53968,1559547,2,1373287369,2013-07-08 12:42:49,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
712336,53968,2415464,2,1373772560,2013-07-14 03:29:20,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [22]:
reviews.shape

(712337, 23)

In [23]:
reviews.movie_id.nunique()

31245

### 1. How To Find The Most Popular Movies (Rank Based)

The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.

In [117]:
def popular_recommendations(user_id, n_top):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    
    movie_ratings = reviews.groupby('movie_id')['rating']
    avg_ratings = movie_ratings.mean()
    num_ratings = movie_ratings.count()
    last_rating = pd.DataFrame(reviews.groupby('movie_id')['date'].max())
    last_rating.columns = ['last_rating']

    rating_count_df = pd.DataFrame({'avg_rating': avg_ratings, 'num_ratings': num_ratings})
    rating_count_df = rating_count_df.join(last_rating)


    movie_recs = movies.set_index('movie_id').join(rating_count_df)
    ranked_movies = movie_recs.sort_values(['avg_rating', 'num_ratings', 'last_rating'], ascending=False)
    ranked_movies = ranked_movies[ranked_movies['num_ratings'] > 4]
    top_movies = list(ranked_movies['movie'][:n_top])
        
    
    return top_movies # a list of the n_top movies as recommended

In [118]:
popular_recommendations(1, 20)

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)',
 'Be Somebody (2016)',
 'Birlesen Gonuller (2014)',
 'Agnelli (2017)',
 'Sátántangó (1994)',
 'Shijie (2004)',
 'Foster (2011)',
 'CM101MMXI Fundamentals (2013)',
 'Akahige (1965)',
 'Crystal Lake Memories: The Complete History of Friday the 13th (2013)',
 'Körkarlen (1921)']

### 2. Adding Filters (Knowledge Based)

Allowing the user to make a filter to the movies by providing some information and preferences.

The cells below is to adjust the existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then the ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.




In [120]:
def popular_recommendations(user_id, n_top,years=None,genres=None):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
   
    
    movie_ratings = reviews.groupby('movie_id')['rating']
    avg_ratings = movie_ratings.mean()
    num_ratings = movie_ratings.count()
    last_rating = pd.DataFrame(reviews.groupby('movie_id')['date'].max())
    last_rating.columns = ['last_rating']

    rating_count_df = pd.DataFrame({'avg_rating': avg_ratings, 'num_ratings': num_ratings})
    rating_count_df = rating_count_df.join(last_rating)


    movie_recs = movies.set_index('movie_id').join(rating_count_df)
    ranked_movies = movie_recs.sort_values(['avg_rating', 'num_ratings', 'last_rating'], ascending=False)
    ranked_movies = ranked_movies[ranked_movies['num_ratings'] > 4]
    
    if years is not None:
        ranked_movies = ranked_movies[ranked_movies['date'].isin(years)]

    if genres is not None:
        num_genre_match = ranked_movies[genres].sum(axis=1)
        ranked_movies = ranked_movies.loc[num_genre_match > 0, :]
    
    top_movies = list(ranked_movies['movie'][:n_top])
        
    
    return top_movies # a list of the n_top movies as recommended

Try to test our function.

```
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'])
```

In [121]:
popular_recommendations('1', 20, years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'])

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Make Like a Dog (2015)',
 'Be Somebody (2016)',
 'Poshter Girl (2016)',
 'Ayla: The Daughter of War (2017)',
 'I Believe in Miracles (2015)',
 'Bajrangi Bhaijaan (2015)',
 'The Farthest (2017)',
 'Ricky Gervais: Humanity (2018)',
 'Coco (2017)',
 'The Age of Spin: Dave Chappelle Live at the Hollywood Palladium (2017)',
 'Sado (2015)',
 'Adult Life Skills (2016)',
 'Hatred (2016)',
 'Inside Out (2015)',
 'La vida inmoral de la pareja ideal (2016)',
 'World of Tomorrow (2015)',
 'Toc Toc (2017)']