### Recommendations Rank & Knowledge Based: 


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline
# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [2]:
print(reviews.info())
print(reviews.shape)
reviews.head(2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 712337 entries, 0 to 712336
Data columns (total 23 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   user_id    712337 non-null  int64 
 1   movie_id   712337 non-null  int64 
 2   rating     712337 non-null  int64 
 3   timestamp  712337 non-null  int64 
 4   date       712337 non-null  object
 5   month_1    712337 non-null  int64 
 6   month_2    712337 non-null  int64 
 7   month_3    712337 non-null  int64 
 8   month_4    712337 non-null  int64 
 9   month_5    712337 non-null  int64 
 10  month_6    712337 non-null  int64 
 11  month_7    712337 non-null  int64 
 12  month_8    712337 non-null  int64 
 13  month_9    712337 non-null  int64 
 14  month_10   712337 non-null  int64 
 15  month_11   712337 non-null  int64 
 16  month_12   712337 non-null  int64 
 17  year_2013  712337 non-null  int64 
 18  year_2014  712337 non-null  int64 
 19  year_2015  712337 non-null  int64 
 20  year

Unnamed: 0,user_id,movie_id,rating,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_9,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018
0,1,68646,10,1381620027,2013-10-12 23:20:27,0,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
1,1,113277,10,1379466669,2013-09-18 01:11:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [3]:
print(movies.info())
print(movies.shape)
movies.head(2)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31245 entries, 0 to 31244
Data columns (total 35 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   movie_id     31245 non-null  int64 
 1   movie        31245 non-null  object
 2   genre        31022 non-null  object
 3   date         31245 non-null  int64 
 4   1800's       31245 non-null  int64 
 5   1900's       31245 non-null  int64 
 6   2000's       31245 non-null  int64 
 7   History      31245 non-null  int64 
 8   News         31245 non-null  int64 
 9   Horror       31245 non-null  int64 
 10  Musical      31245 non-null  int64 
 11  Film-Noir    31245 non-null  int64 
 12  Mystery      31245 non-null  int64 
 13  Adventure    31245 non-null  int64 
 14  Sport        31245 non-null  int64 
 15  War          31245 non-null  int64 
 16  Music        31245 non-null  int64 
 17  Reality-TV   31245 non-null  int64 
 18  Adult        31245 non-null  int64 
 19  Crime        31245 non-nu

Unnamed: 0,movie_id,movie,genre,date,1800's,1900's,2000's,History,News,Horror,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0


#### 1. How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [4]:
avg_Count_reviews = reviews.groupby('movie_id').agg(avg_rating=('rating', 'mean')
                                                    , No_reviews=('rating', 'count')
                                                    ,last_review = ('date', 'max'))
avg_Count_reviews

Unnamed: 0_level_0,avg_rating,No_reviews,last_review
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
8,5.0,1,2014-04-08 18:20:11
10,10.0,1,2014-10-09 18:15:53
12,10.0,1,2015-08-10 23:16:19
25,8.0,1,2017-02-27 10:04:59
91,6.0,1,2013-11-23 18:59:55
...,...,...,...
8335880,8.0,1,2018-05-11 18:39:25
8342748,9.0,2,2018-05-26 18:50:01
8342946,4.5,2,2018-06-18 03:16:26
8402090,10.0,1,2018-06-12 14:21:41


In [16]:
joind_df = movies.copy()
#A movie with the highest average rating is considered best
joind_df=joind_df.join(avg_Count_reviews["avg_rating"],on="movie_id") 
 #With ties, movies that have more ratings are better
joind_df=joind_df.join(avg_Count_reviews["No_reviews"],on="movie_id")
#A movie must have a minimum of 5 ratings to be considered among the best movies
joind_df=joind_df[joind_df["No_reviews"]>=5]
#If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is 
#the most recent rating
joind_df=joind_df.join(avg_Count_reviews['last_review'],on="movie_id")
# Rate >> No of Reviews >> recent rating
joind_df=joind_df.sort_values(by=["avg_rating","No_reviews","last_review"],ascending=False)
# ican make them with one Join Order i just did that for the sake of task this also work with less and more clear code
# joind_df=joind_df.join(avg_Count_reviews,on="movie_id")
# joind_df=joind_df[joind_df["No_reviews"]>=5].sort_values(by=["avg_rating","No_reviews","last_review"],ascending=False)

In [17]:
def popular_recommendations(user_id, n_top):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''  
    return joind_df.head(n_top)['movie'].values.tolist() # a list of the n_top movies as recommended

Usint the three criteria above, you should be able to put together the above function.  If you feel confident in your solution, check the results of your function against our solution. On the next page, you can see a walkthrough and you can of course get the solution by looking at the solution notebook available in this workspace.  

In [18]:
# Put your solutions for each of the cases here

# Top 20 movies recommended for id 1

recs_20_for_1 = popular_recommendations('1', 20)
# Top 5 movies recommended for id 53968
recs_5_for_53968 = popular_recommendations('53968', 5)

# Top 100 movies recommended for id 70000
recs_100_for_70000 = popular_recommendations('70000', 100)

# Top 35 movies recommended for id 43
recs_35_for_43 = popular_recommendations('43', 35)

In [19]:
### You Should Not Need To Modify Anything In This Cell
ranked_movies = t.create_ranked_df(movies, reviews) # only run this once - it is not fast

# check 1 
assert t.popular_recommendations('1', 20, ranked_movies) == recs_20_for_1,  "The first check failed..."
# check 2
assert t.popular_recommendations('53968', 5, ranked_movies) == recs_5_for_53968,  "The second check failed..."
# check 3
assert t.popular_recommendations('70000', 100, ranked_movies) == recs_100_for_70000,  "The third check failed..."
# check 4
assert t.popular_recommendations('43', 35, ranked_movies) == recs_35_for_43,  "The fourth check failed..."

print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!


**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

Try writing a few tests against the test function in our test function.  Below returns the top 20 movies for user 1 based on the specified year and genre filters.  Does yours return the same? 

```
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'])
```

In [9]:
def popular_recommendations_by_genre_year(user_id,n_top,years=-1,genres=-1):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    genre - string for the genre to filter top results by
    year - string for the year to filter top results by
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst filtered by genre and year
    '''
    #if it's -1 the Customer doesnt want to apply the filter so just pass 
    if years ==-1:
        top_movies = joind_df.copy()
    else:
        top_movies = joind_df[joind_df['date'].isin(years)]
    if genres == -1:
        pass
    else:
        top_movies=top_movies[top_movies[genres].sum(axis=1)==1]
    return top_movies['movie'].head(n_top).values.tolist() # a list of the n_top movies as recommended

In [10]:
ranked_movies_year_genre=popular_recommendations_by_genre_year('1', 20, years=[2015, 2016, 2017, 2018]
                                                               , genres=['History', 'Comedy'])

In [11]:
assert t.popular_recs_filtered('1', 20, ranked_movies, years=[2015, 2016, 2017, 2018]
                               , genres=['History', 'Comedy'])== ranked_movies_year_genre
print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!


In [26]:
#with no filter 
assert t.popular_recommendations('1', 20, ranked_movies) ==\
        popular_recommendations_by_genre_year('1', 20),  "The first check failed..."
print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!
