# Recommendations with MovieTweetings: Knowledge-Based 

**Recommending the most popular movies for all users.**


In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

## Part I: How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [3]:
def popular_recommendations(movies, reviews):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    # Getting the average rating of all movies in an array with the movie id as an index
    rats = pd.DataFrame(reviews.groupby('movie_id')['rating'].mean())
    rats.columns = ['movie_rating']

    # Getting the number of ratings for each movie
    count = pd.DataFrame(reviews.groupby('movie_id')['rating'].count())
    count.columns = ['rating_count']

    # Getting the last rated movies
    last_rated = pd.DataFrame(reviews.groupby('movie_id')['date'].max())
    last_rated.columns = ['last_rated']

    # Merging the three dataframes
    rats_count = rats.merge(count, right_index=True, left_index=True)
    rats_count_last = rats_count.merge(last_rated,
                                       right_index=True,
                                       left_index=True)

    # Merge with the movies dataframe
    top_movies = movies.merge(rats_count_last,
                              left_on='movie_id',
                              right_index=True)
    top_movies = top_movies[top_movies['rating_count'] > 4]
    
    # Rank the movies by 
    # 1. Rating
    # 2. Number of Ratings
    # 3. By the latest rated
    
    top_movies = top_movies.sort_values(['movie_rating', 'rating_count', 'last_rated'], ascending=False)


    
    return top_movies

def make_recs(user_id, n_top):
    '''
    returns a list of the names of popular movies recommended
    '''
    top_movies = popular_recommendations(movies, reviews)
    # a list of the n_top movies as recommended
    top_movies = list(top_movies['movie'][:n_top])
    
    return top_movies

## Part II: Adding Filters

Users usually would like to choose the popular movies within certain categories, like genre and year of release. In this part I will add a functionality to the script to enable the user to filter the most popular movies based on genre and year. 

In [4]:
def filtered_popular(movies, reviews, user_id, n_top, years = None, genres = None):
    '''
    a functions that returns a list of recommended criteria by the Genre and Year
    '''
    top_movies = popular_recommendations(movies, reviews)
    
    if genres is not None:
        select = top_movies[genres].sum(axis=1)
        top_movies = top_movies[select > 0]
    
    if years is not None:
        top_movies = top_movies[top_movies['year'].isin(years)]
        
    
        
    top_movies = list(top_movies['movie'][:n_top])
    
    return top_movies