### Recommender System

By carefully analyzing the process of feature selection, it becomes possible to identify the most informative features and eliminate those that are less relevant. Ultimately, this leads to improved model performance. It is worth noting, however, that this assessment is based on the particular models and data sets used in this specific testing cycle.

To achieve even better outcomes, additional experimentation may be necessary. Trying out different models or feature engineering techniques could lead to more accurate predictions. Similarly, experimenting with different data preprocessing methods might yield valuable insights|

I am going to call the sklearn and scipy libraries for this project. I referenced a course text from a previous class by Dietmar Jannach called 'Recommender Systems: An Introduction'

In [2]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse

# Load movies data
movies = pd.read_csv('movies.csv')
# Load ratings data
ratings = pd.read_csv('ratings.csv')


I think we need to pivot the ratings to a matrix of viewers and movies rated to make the data better formated for the function I will write below. I could be wrong about this; if so, I will call the data again and use new variables. Any missing values will be filled with 0.

In [3]:
ratings_matrix = ratings.pivot_table(index='movieId', columns='userId', values='rating').fillna(0)

Now I need the cosine similarity matrix to give me some idea of the similarities between movies based on ratings. 

In [4]:
# Computing the cosine similarity matrix
cosine_sim = cosine_similarity(ratings_matrix)
cosine_sim_df = pd.DataFrame(cosine_sim, index=ratings_matrix.index, columns=ratings_matrix.index)

Using one of the movies from the dataset, the function will reference the similarity matrix to make a top 10 recommendation. Toy Story is my favorite!

In [10]:
def recommend_movies(movieId, cosine_sim_df=cosine_sim_df):
    # Get the pairwise similarity scores of all movies with the given movie
    sim_scores = cosine_sim_df[movieId]
    
    # Sort the movies based on the similarity scores
    sim_scores = sim_scores.sort_values(ascending=False)

    # Get the scores of the 10 most similar movies
    sim_scores = sim_scores[1:11]

    # Get the movie indices
    movie_indices = sim_scores.index

    # Returning the top 10 most similar movies
    return movies[movies['movieId'].isin(movie_indices)]

# This function requires a movieId to work, 1 for 'Toy Story (1995)
recommend_movies(1)


Unnamed: 0,movieId,title,genres
224,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi
314,356,Forrest Gump (1994),Comedy|Drama|Romance|War
322,364,"Lion King, The (1994)",Adventure|Animation|Children|Drama|Musical|IMAX
418,480,Jurassic Park (1993),Action|Adventure|Sci-Fi|Thriller
546,648,Mission: Impossible (1996),Action|Adventure|Mystery|Thriller
615,780,Independence Day (a.k.a. ID4) (1996),Action|Adventure|Sci-Fi|Thriller
911,1210,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Sci-Fi
964,1265,Groundhog Day (1993),Comedy|Fantasy|Romance
969,1270,Back to the Future (1985),Adventure|Comedy|Sci-Fi
2355,3114,Toy Story 2 (1999),Adventure|Animation|Children|Comedy|Fantasy


Justing checking that it's working- running again using movieId 1270 for 'Back to the Future'

In [11]:
recommend_movies(1270)

Unnamed: 0,movieId,title,genres
224,260,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi
898,1196,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Sci-Fi
900,1198,Raiders of the Lost Ark (Indiana Jones and the...,Action|Adventure
911,1210,Star Wars: Episode VI - Return of the Jedi (1983),Action|Adventure|Sci-Fi
964,1265,Groundhog Day (1993),Comedy|Fantasy|Romance
990,1291,Indiana Jones and the Last Crusade (1989),Action|Adventure
1486,2011,Back to the Future Part II (1989),Adventure|Comedy|Sci-Fi
1487,2012,Back to the Future Part III (1990),Adventure|Comedy|Sci-Fi|Western
1576,2115,Indiana Jones and the Temple of Doom (1984),Action|Adventure|Fantasy
2038,2716,Ghostbusters (a.k.a. Ghost Busters) (1984),Action|Comedy|Sci-Fi


I wasn't sure if the assignment called for actually user input code, so I decided to modify the function to accept user input. 

In [12]:
def search_movie_by_title(search_term, movies_df=movies):
    # Filtering movies where the title contains the search term
    matches = movies_df[movies_df['title'].str.contains(search_term, case=False)]
    
    if matches.empty:
        return "No matches found."
    else:
        return matches

def get_movie_recommendations():
    search_term = input("Enter a movie title or part of a title to search: ")
    print("Searching for movies that match your query...\n")
    matches = search_movie_by_title(search_term)

    if isinstance(matches, str):
        print(matches)
    else:
        print("Here are the matches in the movie library:\n")
        print(matches, "\n")

        movie_id = int(input("Enter the movieId of the movie you are interested in: "))
        print("Getting recommendations...\n")
        recommendations = recommend_movies(movie_id)

        print("Here are some movies you might like:\n")
        print(recommendations)

get_movie_recommendations()


Enter a movie title or part of a title to search: toy story
Searching for movies that match your query...

Here are the matches in the movie library:

      movieId               title  \
0           1    Toy Story (1995)   
2355     3114  Toy Story 2 (1999)   
7355    78499  Toy Story 3 (2010)   

                                                genres  
0          Adventure|Animation|Children|Comedy|Fantasy  
2355       Adventure|Animation|Children|Comedy|Fantasy  
7355  Adventure|Animation|Children|Comedy|Fantasy|IMAX   

Enter the movieId of the movie you are interested in: 2355
Getting recommendations...

Here are some movies you might like:

      movieId                                         title  \
1183     1580              Men in Black (a.k.a. MIB) (1997)   
1545     2081                    Little Mermaid, The (1989)   
1576     2115   Indiana Jones and the Temple of Doom (1984)   
2014     2683  Austin Powers: The Spy Who Shagged Me (1999)   
2038     2716    Ghostbusters 

### Reference

The MovieLens dataset: F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI=http://dx.doi.org/10.1145/2827872