### Recommendations Rank & Knowledge Based: 


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [2]:
movies.head()

Unnamed: 0,movie_id,movie,genre,date,1800's,1900's,2000's,History,News,Horror,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,91,Le manoir du diable (1896),Short|Horror,1896,1,0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0


In [3]:
reviews.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_9,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018
0,1,68646,10,1381620027,2013-10-12 23:20:27,0,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
1,1,113277,10,1379466669,2013-09-18 01:11:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,2,422720,8,1412178746,2014-10-01 15:52:26,0,0,0,0,0,...,0,1,0,0,0,1,0,0,0,0
3,2,454876,8,1394818630,2014-03-14 17:37:10,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,2,790636,7,1389963947,2014-01-17 13:05:47,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0


In [4]:
reviews.shape

(712337, 23)

#### 1. How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [5]:
def popular_recommendations(user_id, n_top):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # Do stuff
    # get the average rating
    avg_rating = reviews.groupby('movie_id')['rating'].mean()

    movie_rev_df = pd.DataFrame(avg_rating)

    # add rating count column
    movie_rev_df['rating_count'] = reviews.groupby('movie_id')['rating'].count()
    # recent rating
    movie_rev_df['recent_rating'] = reviews.groupby('movie_id')['date'].max()
    
    #check that the rating is more than 5
    movie_rev_df = movie_rev_df[movie_rev_df.rating_count >= 5]
    # top_movies = df.groupby('movie')['rating'].agg([np.mean,count]).head(n_top)
    
    # join the two dataframes to use the movie title
    movie_rev_df = movie_rev_df.merge(movies[['movie_id','movie']], on = 'movie_id')
    # a list of the n_top movies as recommended 
    movie_rev_df =  movie_rev_df.sort_values(by = ['rating','rating_count','recent_rating'],ascending = [False,False,False])
    
    return movie_rev_df.head(n_top)

In [6]:
popular_recommendations(1,5)

Unnamed: 0,movie_id,rating,rating_count,recent_rating,movie
9432,4921860,10.0,48,2016-08-14 17:16:50,MSG 2 the Messenger (2015)
9584,5262972,10.0,28,2016-01-08 00:44:43,Avengers: Age of Ultron Parody (2015)
9718,5688932,10.0,14,2018-06-17 01:44:48,Sorry to Bother You (2018)
8035,2737018,10.0,10,2015-05-10 22:56:01,Selam (2013)
7883,2560840,10.0,6,2016-01-23 00:30:44,"Quiet Riot: Well Now You're Here, There's No W..."


In [7]:
popular_recommendations(6364, 100)

Unnamed: 0,movie_id,rating,rating_count,recent_rating,movie
9432,4921860,10.000000,48,2016-08-14 17:16:50,MSG 2 the Messenger (2015)
9584,5262972,10.000000,28,2016-01-08 00:44:43,Avengers: Age of Ultron Parody (2015)
9718,5688932,10.000000,14,2018-06-17 01:44:48,Sorry to Bother You (2018)
8035,2737018,10.000000,10,2015-05-10 22:56:01,Selam (2013)
7883,2560840,10.000000,6,2016-01-23 00:30:44,"Quiet Riot: Well Now You're Here, There's No W..."
...,...,...,...,...,...
1983,108052,9.030556,360,2018-06-24 02:45:45,Schindler's List (1993)
711,68646,9.029520,542,2018-06-23 13:10:50,The Godfather (1972)
8953,3863552,9.026042,192,2018-04-22 13:43:41,Bajrangi Bhaijaan (2015)
228,44741,9.024390,41,2018-06-10 07:38:41,Ikiru (1952)


In [8]:
popular_recommendations(1, 3)

Unnamed: 0,movie_id,rating,rating_count,recent_rating,movie
9432,4921860,10.0,48,2016-08-14 17:16:50,MSG 2 the Messenger (2015)
9584,5262972,10.0,28,2016-01-08 00:44:43,Avengers: Age of Ultron Parody (2015)
9718,5688932,10.0,14,2018-06-17 01:44:48,Sorry to Bother You (2018)


After Implementing the above function go to the practical quiz in maharatech link and solve the MCQ there. 

In [9]:
# Put your solutions for MCQ here

**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

The shape of the function should be as below:
```
popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'])
```

In [10]:
def popular_recommendations(user_id, n_top, years=None, genres=None):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    years - a list of strings with years of movies
    genres - a list of strings with genres of movies
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # Do stuff
    # get the average rating
    avg_rating = reviews.groupby('movie_id')['rating'].mean()

    movie_rev_df = pd.DataFrame(avg_rating)

    # add rating count column
    movie_rev_df['rating_count'] = reviews.groupby('movie_id')['rating'].count()
    # recent rating
    movie_rev_df['recent_rating'] = reviews.groupby('movie_id')['date'].max()
    
    #check that the rating is more than 5
    movie_rev_df = movie_rev_df[movie_rev_df.rating_count >= 5]
    # top_movies = df.groupby('movie')['rating'].agg([np.mean,count]).head(n_top)
    
    # join the two dataframes to use the movie title
    movie_rev_df = movie_rev_df.merge(movies, on = 'movie_id')
    # a list of the n_top movies as recommended
    
    # filter by given year if list exists
    if years != None:
        movie_rev_df = movie_rev_df[movie_rev_df.date.isin(years)]
    
# the on the the split represents the first split and skips the others    
#     movie_rev_df['genre'] = movie_rev_df['genre'].str.split('|', expand=False)
    if genres != None:
        movie_rev_df = movie_rev_df[movie_rev_df[genres].eq(1).any(1)]
#         movie_rev_df = any(item in test_list for item in test_list)
    movie_rev_df =  movie_rev_df.sort_values(by = ['rating','rating_count','recent_rating'],ascending = [False,False,False])
    
    return movie_rev_df[["movie",'date', 'genre']].head(n_top) # a list of the n_top movies as recommended

In [11]:
# Put your solutions for MCQ here

In [12]:
popular_recommendations(1, 20, [1995],None)

Unnamed: 0,movie,date,genre
2139,Dilwale Dulhania Le Jayenge (1995),1995,Comedy|Drama|Musical
2119,Braveheart (1995),1995,Biography|Drama|History
2200,Se7en (1995),1995,Crime|Drama|Mystery
2219,The Usual Suspects (1995),1995,Crime|Drama|Mystery
2159,Heat (1995),1995,Action|Crime|Drama
2176,Maboroshi no hikari (1995),1995,Drama
2122,Casino (1995),1995,Biography|Crime|Drama
2198,Salaam Cinema (1995),1995,Documentary|Comedy|Drama
2215,Toy Story (1995),1995,Animation|Adventure|Comedy
2218,Underground (1995),1995,Comedy|Drama|War


After Implementing the above function go to the practical quiz in maharatech link and solve the MCQ there. 

In [13]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31245 entries, 0 to 31244
Data columns (total 35 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   movie_id     31245 non-null  int64 
 1   movie        31245 non-null  object
 2   genre        31022 non-null  object
 3   date         31245 non-null  int64 
 4   1800's       31245 non-null  int64 
 5   1900's       31245 non-null  int64 
 6   2000's       31245 non-null  int64 
 7   History      31245 non-null  int64 
 8   News         31245 non-null  int64 
 9   Horror       31245 non-null  int64 
 10  Musical      31245 non-null  int64 
 11  Film-Noir    31245 non-null  int64 
 12  Mystery      31245 non-null  int64 
 13  Adventure    31245 non-null  int64 
 14  Sport        31245 non-null  int64 
 15  War          31245 non-null  int64 
 16  Music        31245 non-null  int64 
 17  Reality-TV   31245 non-null  int64 
 18  Adult        31245 non-null  int64 
 19  Crime        31245 non-nu

In [14]:
popular_recommendations(1, 10, years=[2015, 2016, 2017, 2018], genres=['History', 'Comedy'])

Unnamed: 0,movie,date,genre
9432,MSG 2 the Messenger (2015),2015,Comedy|Drama|Fantasy
9584,Avengers: Age of Ultron Parody (2015),2015,Short|Comedy
9718,Sorry to Bother You (2018),2018,Comedy|Fantasy|Sci-Fi
9238,Make Like a Dog (2015),2015,Short|Comedy|Drama
9658,Be Somebody (2016),2016,Comedy|Drama
9613,Poshter Girl (2016),2016,Comedy
9867,Ayla: The Daughter of War (2017),2017,Drama|History|War
9515,I Believe in Miracles (2015),2015,Documentary|History|Sport
8953,Bajrangi Bhaijaan (2015),2015,Comedy|Drama
9851,The Farthest (2017),2017,Documentary|History


In [15]:
popular_recommendations(1,3,[2015, 2016, 2017, 2018],genres=['Romance'])

Unnamed: 0,movie,date,genre
9614,Koe no katachi (2016),2016,Animation|Drama|Romance
7336,Departure (2015),2015,Drama|Family|Romance
9576,Hepta: The Last Lecture (2016),2016,Drama|Romance


In [16]:
popular_recommendations(user_id='1', n_top=3, years=[2014], genres=['History'])

Unnamed: 0,movie,date,genre
9087,Birlesen Gonuller (2014),2014,Drama|History
8621,Night Will Fall (2014),2014,Documentary|History|War
8479,Red Army (2014),2014,Documentary|Biography|History


In [17]:
popular_recommendations(user_id='7000', n_top=100,genres=['History', 'News'])

Unnamed: 0,movie,date,genre
9087,Birlesen Gonuller (2014),2014,Drama|History
438,Seppuku (1962),1962,Action|Drama|History
31,La passion de Jeanne d'Arc (1928),1928,Biography|Drama|History
9867,Ayla: The Daughter of War (2017),2017,Drama|History|War
1046,The Decline of Western Civilization (1981),1981,Documentary|History|Music
...,...,...,...
2446,Les Misérables (1998),1998,Crime|Drama|History
8725,The Search for General Tso (2014),2014,Documentary|Comedy|History
1113,Missing (1982),1982,Drama|History|Mystery
8491,Last Days in Vietnam (2014),2014,Documentary|History|War
