# Movie Recommendation System
## Author: Ahria Dominguez
### Last Updated: 6/30/2024

In this project, we use data from the University of Minnesota (Harper & Konstan, 2015; https://doi.org/10.1145/2827872 (see database_authors_README.txt file for more information from the authors themselves) to find movie recommendations based on the user's input of a movie they enjoy. We use the movies.csv and ratings.csv files to make these recommendations.

The data consist of the following columns:

_movies.csv_
- movieId: numbered ID for each movie
- title: movie title
- genres: movie genre

_ratings.csv_
- userId: numbered ID for each user
- movieId: numbered ID for each movie
- rating: user's movie rating
- timestamp: time of user rating

#### Import Libraries

In [1]:
# Imports the pandas and numpy libraries to work with the data and perform calculations.
import pandas as pd
import numpy as np
# Imports the warnings library to silence RuntimeWarnings later.
import warnings
# Imports the time library to make sure the welcome message is printed before the user 
# input field.
import time

#### Initial User Input Function

In [2]:
# Defines the first, main function to load in the data if people wish to use the movie
# recommendation system.
def main():
    # Prints off an initial welcome message.
    print("Welcome to the movie recommendation system!\n")
    time.sleep(0.20) # This was the only way I could get the welcome message to print
    # before the user input field.
    # Loops through the following until the user 'breaks' the cycle.
    while True:
        # Asks the user if they would like to see movie recommendations.
        retrieve_movie = input("\nWould you like to see some movie recommendations? "
                              "Y - Yes; N - No: ")
        # If they do, try the following.
        if retrieve_movie.lower() == "y":
            try:
                # Imports the ratings and movie data for the recommender system.
                rating_data = pd.read_csv("ratings.csv")
                movie_data = pd.read_csv("movies.csv")
                # Merges the two data frames together based on 'movieId'.
                movies = rating_data.merge(movie_data, on='movieId', how='left')
                # Calculates the mean rating for each movie and places it in a new 
                # data frame.
                ratings = pd.DataFrame(movies.groupby('title')['rating'].mean())
                # Finds the total number of ratings the movie received and places it 
                # into a new column.
                ratings['totalRatings'] = pd.DataFrame(movies.groupby('title')
                                                       ['rating'].count())
                # Pivots the 'movies' table to have the user IDs as each row to use for 
                # correlations.
                users = movies.pivot_table(index='userId', columns='title', 
                                           values='rating')
                # Returns some of the variables to the new function 'movie_input'.
                movie_input(movie_data, ratings, users)
            # If the initial 'try' section does not work, print off an error message.    
            except ValueError:
                print("Please enter a 'Y' or an 'N'.\n")
        # If they say no, then print a goodbye message and break.
        elif retrieve_movie.lower() == "n":
            print("\nThank you for using the system! :)")
            break
        # Covers any other unexpected errors by printing an error message.
        else:
            print("Please try again. Enter a 'Y' or an 'N'.\n")

#### Movie Input Function

In [3]:
# Defines the second function using some variables set in the previous one to
# ask the user for a movie and then provide recommendations.
def movie_input(movie_data, ratings, users):
    # Asks the user what movie they want a recommendation for. It also specifies the
    # format in which they should type the movie.
    movie = input("\nWhat movie do you like? Please be sure to use proper punctuation "
                 "and the year the movie came out like this: \n'Movie Title (2000)' \n")
    # Try the following code after they provide a movie title and year.
    try:
        # If the movie is not in the database, it will print out an error and 
        # remind the user to use the proper punctuation and capitalization.
        if movie not in users.columns:
            raise ValueError("That movie is not in the database. Did you use proper "
                            "punctuation and capitalization? Please try again!\n")
        # Filters through the movies to only select ones with more than 100 ratings.
        # (I also do this later, but this is to further help any errors I was 
        # receiving in the output.)
        movie_ratings_100plus = ratings[ratings['totalRatings'] > 100].index
        new_users = users[movie_ratings_100plus]
        # Uses the np.errstate function to ignore division errors I was getting
        # when calculating correlations. It also uses the new, filtered users'
        # data for the correlations.
        with np.errstate(divide='ignore', invalid='ignore'):
            correlations = new_users.corrwith(users[movie])
        # Silences RuntimeWarnings I was receiving. 
        with warnings.catch_warnings():
            warnings.simplefilter("ignore", category=RuntimeWarning)
        # Creates a new data frame called 'recommendation' with the correlations
        # in a new column.
        recommendation = pd.DataFrame(correlations, columns=['correlation'])
        # Drops any NA values.
        recommendation.dropna(inplace=True)
        # Joins the recommendation correlations with the 'totalRatings' column
        # in 'ratings'.
        recommendation = recommendation.join(ratings['totalRatings'])
        # Sorts through the recommendation ratings to show only ones that
        # have more than 100 ratings again.
        recc = recommendation[recommendation['totalRatings'] > 
                              100].sort_values('correlation', 
                                               ascending=False).reset_index()
        # Merges the movie recommendations with the 'movie_data' data frame based
        # on the title.
        recc = recc.merge(movie_data, on='title', how='left')
        # Prints out a separation line, the prompt to show the recommendations,
        # and the 10 recommendations.
        print("-"*50)
        print("Here are your 10 movie recommendations:\n")
        # Prints the top 10 movies (besides the movie that was input) 
        # title by title to avoid it printing the Series information
        # at the bottom of the list.
        for movie in recc['title'].iloc[1:11]:
            print(movie)
    # Handles any unexpected errors by printing what error occurred.
    except ValueError as error:
        print(error)

#### Running the Script

In [4]:
# Runs the script as a script and not a module.
if __name__ == "__main__":
    main()

Welcome to the movie recommendation system!


Would you like to see some movie recommendations? Y - Yes; N - No: Y

What movie do you like? Please be sure to use proper punctuation and the year the movie came out like this: 
'Movie Title (2000)' 
Titanic (1997)
--------------------------------------------------
Here are your 10 movie recommendations:

Dances with Wolves (1990)
Star Trek: Generations (1994)
Ghost (1990)
Four Weddings and a Funeral (1994)
E.T. the Extra-Terrestrial (1982)
Minority Report (2002)
Batman Forever (1995)
Monsters, Inc. (2001)
Outbreak (1995)
Clear and Present Danger (1994)

Would you like to see some movie recommendations? Y - Yes; N - No: y

What movie do you like? Please be sure to use proper punctuation and the year the movie came out like this: 
'Movie Title (2000)' 
This is not a movie (2001)
That movie is not in the database. Did you use proper punctuation and capitalization? Please try again!


Would you like to see some movie recommendations? Y - Yes;

##### References
- https://analyticsindiamag.com/how-to-build-your-first-recommender-system-using-python-movielens-dataset/

- https://stackoverflow.com/questions/44933518/how-to-remove-runtimewarning-errors-from-code

- https://numpy.org/doc/stable/reference/generated/numpy.errstate.html

- https://stackoverflow.com/questions/65796066/python-numpy-runtime-warning-using-np-where-np-errstate-and-warnings-error

- https://stackoverflow.com/questions/50439035/jupyter-notebook-input-line-executed-before-print-statement

##### Write-Up
This recommendation system was largely based on the article provided in Blackboard (Nair, 2019) (first link under 'References'). The recommender first asks the user if they are interested in receiving movie recommendations. The user can type 'y' or 'n' for yes or no in either capital or lowercase letters (either work, as the script converts the input to lowercase to work with it. If the user says they would like to receive recommendations ('y'), the script first loads in the ratings and movie datasets from the small MovieLens dataset into pandas data frames. Then, it merges those two data frames together based on the 'movieId' column values to create one data frame. The average rating for each move is calculated by grouping the data frame by movie title and taking the mean rating for each. These ratings are placed in an empty data frame, and then the number of ratings are counted by grouping the data again and placing the values in a new column of the newly created data frame 'ratings'. The 'movies' data frame is pivoted to use for correlation calculations by placing each user ID as a row. The 'movie_data,' 'ratings,' and 'users' variables are then passed on to the second function for later use. The first function also has several error handling blocks to handle other inputs besides the expected y/Y or n/N. These will print off a "try again" message to let the user know that their response was not accepted. If the user does not want any movie recommendations, they can simply say "n" or "N" to break the cycle at any time (even at the very beginning). 

The second function asks the user for a movie in order to find the top 10 recommendations. It specifies that they need to use proper capitalization for the movie titles, and they need to provide the year the movie came out. I did this because I was not sure how to make the movies all lowercase, have the script lowercase any user input, and then compare it to another database that included uppercase letters. I know that some movies have strange capitalizations or numbers/symbols in some spaces, so I wanted to make it as easy on the program as I could, even if that means it's a little bit harder for users to use. The script will then check to see if the user's input is within the column of movie titles and will print out a message saying it is not if it isn't (as well as a reminder to use proper capitalization/punctuation in their input). I had several issues with receiving warnings regarding the division errors when calculation correlations, so I wanted to filter out the movies to only show the ones with more than 100 ratings. I did that, and then I also specified to ignore the errors I was encountering that would not always be stopped by having more than 100 ratings. A new data frame is created with the correlations data, and any NA values are dropped. The new data frame is then merged with the 'ratings' data frame's 'totalRatings' column, and, again, any movies with 100 or fewer ratings are filtered out. Then, the correlations are merged with the 'movie_data' data frame based on the movie's title. 

The script will print out the top 10 movie recommendations based on the highest correlations to the movie they input into the field, excluding the movie they input (as it would always place that movie first). Any errors are caught in the error handling exception line and will print what error was encountered. The script will continue to loop until the user indicates that they no longer wish to obtain movie recommendations. Then, it breaks out of the loop nicely with a goodbye message. 

I used the "if name equals main" syntax to specify to only run the script if it is not imported as a module. I ran the script a few times above to show that it handles errors well and can also accurately create recommendations based on valid input.

Overall, I think this is a very successful movie recommendation system. The only thing I would hope to improve would be to make it easier for users by not having them input the year of the movie or worry about the capitalization. 