**Denise Dodd:
Movie Recommender (Main Function, Correlations, Joins, Python Functions)**

# **Table of Contents**

[Write Up](#Write_Up)

[End User Version](#End_User)

[References](#References)

-----
# **Write Up** <a id="Write_Up"></a>

I choose to run my recommender in a main function. I initially created several multi-step boxes of code using https://analyticsindiamag.com/how-to-build-your-first-recommender-system-using-python-movielens-dataset/ as a guide. However, I chose to transform those steps into a main function:

a) to test my abilities with main functions which I haven’t used in several semesters 

b) so I can differentiate from the steps in my reference website

c) to make it more user friendly (the user only has to click a box once to get their recommendation and not several boxes) 

In the top box of coding (the "End User Version"), I’ve commented out where I print the data frame each step of the way to check for accuracy. This is the final version I would present to an end user. I displayed the results of my recommender with the user requesting a valid movie, an invalid movie, and exiting the loop. 

The first steps in the main function are the same intro steps in all my coding. I load my modules, set my directory, and load the necessary files as panda data frames. Following these introductory steps are the following functions: 

**Merge** – This function left merges the ratings_df data frame and the movies_df data frame on the ‘movieID’ column. The function returns the resulting merged_df data frame. 

**Features** – This function groups the merged_df data frame by ‘title’ and finds the mean ‘rating’ per ‘title.’ A new column called ‘Total Ratings’ is added which counts the number of ratings per title. The function then returns the resulting average_ratings_df which consist of three columns: ‘title,’ ‘rating,’ and ‘Total Ratings.’ This information is useful as our recommender will only return movies that have at least 100 reviews so outlier reviews don’t skew the results.

**Pivot** – This function pivots the merged_df data frame using the ‘userID’ column as the new index, the titles of each movie as the columns, and the ratings of each userID/movie as the data in the body of the pivoted_df data frame. The function returns the pivoted_df which is a very wide data frame with several NaN values in the body as every movie is its own column and values will only populate for each movie that each user has rated. The recommender will use this to find correlations among the movies. 

**Recommender** – Now that all the data frames are formatted, this function does what the assignment asks, which is to obtain a movie from the user and recommend 10 similar movies. This function is a loop which asks the user to input the name and release year of a movie they would like recommendations for or type ‘quit’ in any upper/lowercase combination to leave the loop. If the user requests a movie that is not in the database, the loop alerts the user to the error and repeats from the beginning requesting a movie. If the user inputs a movie which is in the database, the loop continues to the recommendation programing. 

**Recommender cont** – If given a valid movie, the recommender will first calculate the correlation between the requested movie and all other movies in the data frame. It will then convert the correlation series to a data frame and drop the rows with NaN in the correlation column. The ‘Total Ratings’ column from the average_ratings_df will be added to the correlations_df. Movies with less than 100 ratings will be removed to limit the skew of outliers. The data frame will then be sorted so the movies with the highest correlation to the user requested movie will be on top and the lowest correlation will be on bottom. Following the guide at the above website, I then left joined the movies_df on the ‘title’ column so the ‘genre’ and ‘movieID’ columns could be included in the final output. I debated the necessity of this step, but it could be useful if the user is in the mood for a particular genre. I thought of dropping the ‘movieID’ column, but left it in incase the user is familiar with the dataset and refers to the movies by their movieID or if this recommender will be expanded in the future. Finally, the recommender will print the first ten lines of the sorted data frame. Then the loop will start over, and the user can request recommendations for a new movie or opt to exit the loop. 

----
# **End User Version** <a id="End_User"></a>

In [2]:
# Import needed modules.
import numpy as np
import pandas as pd
import os
import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning)

# Left merge dfs on the movieID col.
# Check for accuracy(commented out on final run).
def merge(ratings_df, movies_df):
    merged_df = ratings_df.merge(movies_df,on='movieId', how='left')
    #print(f"\n\033[1mThe current shape of the merged dataframe is 
        #{merged_df.shape}.\033[0m")
    #print(f"\033[1mPreview the merged dataframe:\033[0m\n{merged_df.head(5)}")
    return merged_df

# Find the average ratings by grouping the merged dataframe by title and finding the average rating per title.
# Add a new col to the above df which groups the merged dataframe by title and counts the number of ratings per title.
# Check for accuracy(commented out on final run).
def features(merged_df):
    average_ratings_df = pd.DataFrame(merged_df.groupby('title')['rating'].mean())
    average_ratings_df['Total Ratings'] = pd.DataFrame(merged_df.groupby('title')['rating'].count())
    #print(f"\n\033[1mThe current shape of the average ratings dataframe is {average_ratings_df.shape}.\033[0m")
    #print(f"\033[1mPreview the average ratings dataframe:\033[0m\n{average_ratings_df.head(5)}")
    return average_ratings_df

# Pivot the merged dataframe using the userID col as the new index, the titles of each movie as the columns, 
# and the ratings of each userID/movie as the data in the body of the pivoted dataframe.
# Check for accuracy(commented out on final run).
def pivot(merged_df):
    pivoted_df = merged_df.pivot_table(index='userId',columns='title',values='rating')
    #print(f"\n\033[1mThe current shape of the pivoted dataframe is {pivoted_df.shape}.\033[0m")
    #print(f"\033[1mPreview the pivoted dataframe:\033[0m\n{pivoted_df.head(5)}")
    return pivoted_df

# Create loop to obtain user input of a movie selection and produce recommendations based on the input.
def recommender(pivoted_df, average_ratings_df, movies_df):
    while True:
        # Ask user to enter a movie for which they would like recommendations.
        user_input = input("\nPlease enter a movie and corresponding year from the dataset (or 'quit' to exit): ")

        # If user opts to leave the recommender exit the loop.
        if user_input.lower() == 'quit':
            print("\nThanks for using the recommender. I hope you enjoy your selected movie.")
            break 
        
        # If the user input is present in the movies dataframe proceed with providing recommendation.
        elif user_input in movies_df['title'].values:
            print(f"\nGreat! I'll recommend ten movies for you similar to {user_input}.")
        
            # Calculate the correlation of the user input movie with all of the columns in the pivoted_df.
            correlations = pivoted_df.corrwith(pivoted_df[user_input])
            # Convert the above correlations series to a dataframe. Drop any rows with an N/A (likely in the correlations col).
            correlations = pd.DataFrame(correlations, columns=['Correlation']).dropna()
            # Join the correlations dataframe with the Total Ratings row of the average_ratings_df.
            correlations = correlations.join(average_ratings_df['Total Ratings'])
            # Create dataframe of recommendations.  Only include listings with over 100 Total Ratings.
            # Sort the dataframe from highest to lowest values in correlation column.
            recommendation = correlations[correlations['Total Ratings']>100].sort_values('Correlation',ascending=False).reset_index()
            # Left merge the recommendation dataframe with the movies dataframe on the title column to include genre and movieID.
            recommendation = recommendation.merge(movies_df,on='title', how='left')
            # Print the first ten listings of the recommendation dataframe.
            # Because the dataframe is sorted descending based on Correlation col, 
            # these will be the movies with the 10 highest correlations to the user input movie.
            print(recommendation.head(10))
    
        # If user enters invalid entry allow them to try again.
        else:
            print("\nError: Selected movie not found in the dataset. Please try again.")   
    
def main():
    # Set directory.
    os.chdir('C:/Users/hadle/Downloads')
    # Load ratings file.
    ratings_df = pd.read_csv('ratings.csv')
    # Check for accuracy(commented out on final run).
    #print(f"\033[1mThe current shape of the ratings dataframe is {ratings_df.shape}.\033[0m")
    #print(f"\033[1mPreview the ratings dataframe:\033[0m\n{ratings_df.head(5)}")
    # Load movies file.
    movies_df = pd.read_csv("movies.csv")
    # Check for accuracy(commented out on final run).
    #print(f"\n\033[1mThe current shape of the movies dataframe is {movies_df.shape}.\033[0m")
    #print(f"\033[1mPreview the movies dataframe:\033[0m\n{movies_df.head(5)}")
 
    # Call above functions.
    merged_df = merge(ratings_df, movies_df)
    average_ratings_df = features(merged_df)
    pivoted_df = pivot(merged_df)
    recommender(pivoted_df, average_ratings_df, movies_df)
    
# Call to main.
if __name__ == '__main__':
    main()


Please enter a movie and corresponding year from the dataset (or 'quit' to exit): Finding Nemo (2003)

Great! I'll recommend ten movies for you similar to Finding Nemo (2003).
                               title  Correlation  Total Ratings  movieId  \
0                Finding Nemo (2003)     1.000000            141     6377   
1                       Shrek (2001)     0.644980            170     4306   
2                   Toy Story (1995)     0.618701            215        1   
3         Princess Bride, The (1987)     0.609508            142     1197   
4              Monsters, Inc. (2001)     0.583469            132     4886   
5            Incredibles, The (2004)     0.561018            125     8961   
6                Pretty Woman (1990)     0.545784            135      597   
7                     Aladdin (1992)     0.543285            183      588   
8  Die Hard: With a Vengeance (1995)     0.539991            144      165   
9                     Twister (1996)     0.528666    

--------
# **References** <a id="References"></a>

Nair, A. (2020, October 10). How to build your first recommender system using Python & Movielens Dataset. Analytics India Magazine. https://analyticsindiamag.com/how-to-build-your-first-recommender-system-using-python-movielens-dataset/ 