### Recommendations Rank & Knowledge Based: 


#### Part I: How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [42]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t


%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [43]:
# merge movies and reviews tables
movie_review = pd.merge(movies,reviews,on='movie_id',how='inner')

# convert reviews date to correct type .
reviews['date'] = pd.to_datetime(reviews['date'])


In [50]:
def popular_recommendations_clean(user_id, n_top):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # take this out?
    top_movies_id = ranked_movie_ids()
    
    top_movies_id_nonwatched = remove_watched_movies(user_id, top_movies_id)
    
    top_movies = list(pd.merge(top_movies_id_nonwatched,movies,on='movie_id')['movie'])[:n_top]

    return top_movies
    


In [None]:
def ranked_movie_ids():
    '''
    INPUT: 
    movie_review - the merged users and ratings dataframe
    OUTPUT:
    top_movies_id - all movies id's in sorted order according to the criteria mentioned above.
    '''
    # get mean rating and no. of ratings for each movie and date of last review (because we'll use them to sort movies according to criteria)
    movie_review_aggr = movie_review.groupby('movie_id').agg({'rating': ['mean', 'count'], 'date_y': 'max'})

    # drop rows with less than 5 ratings
    movie_review_aggr = movie_review_aggr[ movie_review_aggr[('rating', 'count')] >= 5 ]
    
    # change movie_id from index to column
    movie_review_aggr.reset_index(level=0, inplace=True)
    
    # sort rows according to criteria and return their sorted id's
    top_movies_id = movie_review_aggr.sort_values(by=[('rating',  'mean'),('rating', 'count'),('date_y','max')], ascending=False)['movie_id']
    
    top_movies_id = pd.DataFrame(top_movies_id)
    
    #print(top_movies_id['movie_id'])
    
    return top_movies_id


In [None]:
# this function simply removes movies the user has watched from the sorted list and then returns the list.
def remove_watched_movies(user_id, top_movies_id):
    
    movie_user_watched_ = movie_review[movie_review['user_id'] == user_id]['movie_id']
    top_movies_id_nonwatched = top_movies_id[~(top_movies_id['movie_id'].isin(movie_user_watched_))]
    return top_movies_id_nonwatched

## testing part 1

In [46]:
# Top 10 movies recommended for id 1

recs_5_for_1 = popular_recommendations_clean(1,10)
recs_5_for_1

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)']

**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

```
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'])
```

In [47]:
genres = ['History', 'News', 'Horror', 'Musical', 'Film-Noir', 'Mystery',
       'Adventure', 'Sport', 'War', 'Music', 'Reality-TV', 'Adult', 'Crime',
       'Family', 'Drama', 'Talk-Show', 'Biography', 'Sci-Fi', 'Fantasy',
       'Romance', 'Game-Show', 'Action', 'Documentary', 'Animation', 'Comedy',
       'Short', 'Western', 'Thriller']

In [48]:
def popular_recommendations_filtered_clean(user_id, n_top, years=None, genres=None):
       '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    years - an array to filter top movies 
    genres - an array to filter top movies
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    top_movies_id = ranked_movie_ids()
    
    top_movies_id_nonwatched = remove_watched_movies(user_id, top_movies_id)
    
    
    # get merged df 
    merged = pd.merge(top_movies_id_nonwatched,movies,on='movie_id')
    if years is not None:
        merged = merged[merged['date'].isin(years)]
        
    if genres is not None:
        # get slice, then sum over 1 axis
        no_of_genre_matched = merged[genres].sum(axis=1)
        merged = merged[no_of_genre_matched > 0]
        
    top_movies = list(merged['movie'])[:n_top]

    #top_movies = list(pd.merge(top_movies_id_nonwatched,movies,on='movie_id')['movie'])[:n_top]

    return top_movies
    


## testing part 2

In [49]:
# find top 10 movies in years from 2015 - 2018 and in genres of History or Comedy

popular_recommendations_filtered_clean(1, 10,years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'] )

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Make Like a Dog (2015)',
 'Be Somebody (2016)',
 'Poshter Girl (2016)',
 'Ayla: The Daughter of War (2017)',
 'I Believe in Miracles (2015)',
 'Bajrangi Bhaijaan (2015)',
 'The Farthest (2017)']