### Recommendations Rank & Knowledge Based: 


In [191]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

In [192]:
reviews.head(2)

Unnamed: 0,user_id,movie_id,rating,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_9,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018
0,1,68646,10,1381620027,2013-10-12 23:20:27,0,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
1,1,113277,10,1379466669,2013-09-18 01:11:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0


In [193]:
movies.head(2)

Unnamed: 0,movie_id,movie,genre,date,1800's,1900's,2000's,History,News,Horror,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0


In [194]:
reviewsModified = reviews[['user_id', 'movie_id', 'rating', 'timestamp', 'date']]
moviesModified = movies[['movie_id', 'movie', 'genre', 'date']]

reviewsModified.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 712337 entries, 0 to 712336
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   user_id    712337 non-null  int64 
 1   movie_id   712337 non-null  int64 
 2   rating     712337 non-null  int64 
 3   timestamp  712337 non-null  int64 
 4   date       712337 non-null  object
dtypes: int64(4), object(1)
memory usage: 27.2+ MB


In [195]:
values = (reviewsModified.movie_id.value_counts() >= 5)
values = values[values].index.values
values = values.tolist()

In [196]:
reviewsModified = reviewsModified[reviewsModified['movie_id'].isin(values)]
reviewsModified.shape

(677742, 5)

In [197]:
moviesModified = moviesModified[moviesModified['movie_id'].isin(values)].set_index('movie_id')
moviesModified.shape

(9974, 3)

In [198]:
dateDf = reviewsModified.drop_duplicates(subset='movie_id').set_index('movie_id').drop('rating', axis=1)

dateDf["date"] = pd.to_datetime(dateDf["date"])
dateDf.head()

Unnamed: 0_level_0,user_id,timestamp,date
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
68646,1,1381620027,2013-10-12 23:20:27
113277,1,1379466669,2013-09-18 01:11:09
422720,2,1412178746,2014-10-01 15:52:26
454876,2,1394818630,2014-03-14 17:37:10
790636,2,1389963947,2014-01-17 13:05:47


In [199]:
reviewsModified.head()

Unnamed: 0,user_id,movie_id,rating,timestamp,date
0,1,68646,10,1381620027,2013-10-12 23:20:27
1,1,113277,10,1379466669,2013-09-18 01:11:09
2,2,422720,8,1412178746,2014-10-01 15:52:26
3,2,454876,8,1394818630,2014-03-14 17:37:10
4,2,790636,7,1389963947,2014-01-17 13:05:47


In [200]:
highestRatingsDf = reviewsModified.groupby('movie_id').mean('rating')\
                                                        .drop(['user_id', 'timestamp'], axis=1)
highestRatingsDf.head()

Unnamed: 0_level_0,rating
movie_id,Unnamed: 1_level_1
417,8.368421
6864,8.8
10323,8.210526
12349,8.5
12364,9.625


In [201]:
tempDf = reviewsModified.groupby('movie_id').count()[['rating']]
tempDf.rename(columns = {'rating':'numberOfViews'}, inplace = True)
tempDf.head()

Unnamed: 0_level_0,numberOfViews
movie_id,Unnamed: 1_level_1
417,19
6864,5
10323,19
12349,60
12364,8


In [202]:
FinalDf = pd.merge(highestRatingsDf, tempDf, left_index=True, right_index=True)
FinalDf = pd.merge(dateDf, FinalDf, left_index=True, right_index=True)
FinalDf.sort_values(by = ['rating', 'numberOfViews', 'date'], ascending=(False, False, False), inplace=True)

FinalDf.head()

Unnamed: 0_level_0,user_id,timestamp,date,rating,numberOfViews
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
4921860,1824,1470036059,2016-08-01 07:20:59,10.0,48
5262972,4177,1452210578,2016-01-07 23:49:38,10.0,28
5688932,7585,1529045193,2018-06-15 06:46:33,10.0,14
2737018,7397,1365265613,2013-04-06 16:26:53,10.0,10
2560840,30109,1453509044,2016-01-23 00:30:44,10.0,6


#### 1. How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [203]:
def popular_recommendations(user_id, n_top):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    movieIndex = FinalDf.head(n_top).index
    
    top_movies = list(moviesModified.loc[FinalDf.head(n_top).index].movie.values)

    # Do stuff
    
    return top_movies # a list of the n_top movies as recommended

In [204]:
popular_recommendations(4, 10)

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)']

After Implementing the above function go to the practical quiz in maharatech link and solve the MCQ there. 

In [205]:
# Put your solutions for MCQ here

**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

The shape of the function should be as below:
```
popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History', 'Comedy'])
```

In [206]:
from sklearn.impute import SimpleImputer
imp = SimpleImputer(strategy='constant', fill_value = 'noGenre')
moviesModified[['genre']] = imp.fit_transform(moviesModified[['genre']])

In [207]:
moviesModified.head()

Unnamed: 0_level_0,movie,genre,date
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
417,Le voyage dans la lune (1902),Short|Adventure|Fantasy,1902
6864,Intolerance: Love's Struggle Throughout the Ag...,Drama,1916
10323,Das Cabinet des Dr. Caligari (1920),Horror|Mystery,1920
12349,The Kid (1921),Comedy|Drama|Family,1921
12364,Körkarlen (1921),Drama|Fantasy|Horror,1921


In [208]:
moviesModified['genre'] = moviesModified['genre'].str.split('|')

moviesModified.head()

Unnamed: 0_level_0,movie,genre,date
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
417,Le voyage dans la lune (1902),"[Short, Adventure, Fantasy]",1902
6864,Intolerance: Love's Struggle Throughout the Ag...,[Drama],1916
10323,Das Cabinet des Dr. Caligari (1920),"[Horror, Mystery]",1920
12349,The Kid (1921),"[Comedy, Drama, Family]",1921
12364,Körkarlen (1921),"[Drama, Fantasy, Horror]",1921


In [209]:
moviesModified[moviesModified['genre'] == 'new']

Unnamed: 0_level_0,movie,genre,date
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1


In [210]:
moviesModified.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9974 entries, 417 to 8075496
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   movie   9974 non-null   object
 1   genre   9974 non-null   object
 2   date    9974 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 569.7+ KB


In [211]:
def popular_recommendationsKnow(user_id, n_top, years=None, genres=None):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    years - a list of int with years of movies
    genres - a list of strings with genres of movies
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # Do stuff
    dateDf =  moviesModified[moviesModified.date.isin(years)]
    movieIndex = [i for i in dateDf.index if bool(set(dateDf.loc[i].genre) & set(genres))]
    
    FinalDf2 = FinalDf.loc[movieIndex]
    FinalDf2.sort_values(by = ['rating', 'numberOfViews', 'date'], ascending=(False, False, False), inplace=True)
    
    top_movies = list(moviesModified.loc[FinalDf2.head(n_top).index].movie.values)


    return top_movies # a list of the n_top movies as recommended

In [218]:
popular_recommendationsKnow(user_id='1', n_top=3, years=[2014], genres=['History'])

['Birlesen Gonuller (2014)', 'Night Will Fall (2014)', 'Red Army (2014)']

In [213]:
# Put your solutions for MCQ here

After Implementing the above function go to the practical quiz in maharatech link and solve the MCQ there. 

In [221]:
popular_recommendations(4, 101)

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)',
 'Be Somebody (2016)',
 'Birlesen Gonuller (2014)',
 'Agnelli (2017)',
 'Sátántangó (1994)',
 'Shijie (2004)',
 'Foster (2011)',
 'CM101MMXI Fundamentals (2013)',
 'Crystal Lake Memories: The Complete History of Friday the 13th (2013)',
 'Akahige (1965)',
 'Körkarlen (1921)',
 'Tengoku to jigoku (1963)',
 'Kirik Party (2016)',
 'Chasing Asylum (2016)',
 'Beyond the Sea (2004)',
 'Blood Brother (2013)',
 'Poshter Girl (2016)',
 'The Adventures of Robin Hood (1938)',
 'Umberto D. (1952)',
 'Bridegroom (2013)',
 'No Stone Unturned (2017)',
 'Mad As Hell (2014)',
 'Nashville (1975)',
 'The Shawshank Redemption (1994)',
 'A Matter of Life and Death (1946)',
 'Cyborgs: Heroes Never Die