Question Statement:

Create a recommender system based on the ReelGood Data using only Jupyter notebooks on colab (no need to use external data or tools). As an input, ask the user to provide a title, or a genre, topic, or example of movies you like. E.g. Recommend a movie like …..; List three movies you like…..; Describe a genre of movies or show you like:…. . Then using the reel good data, seek our relevant titles and provide some recommendations (e.g. 3 movies or shows) based on data. You can use product level data, IMDB ratings, or whatever features you want, but the idea is to build out a tool that provides some recommendations. This can all be done with python in colab, and what I am looking for is a block of code that will take an input as described above, and provide a recommendation as a text output (even better if you come up with an alternative, but do not need to).

Sample Input: **Movies that user likes**

Target Output: **3 movies or shows recommendations**

## Data Cleaning / Preprocessing

In [None]:
from google.colab import drive
import pandas as pd
import numpy as np
import seaborn as sns
import re

In [None]:
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
df_whole = pd.read_csv('/content/drive/MyDrive/DSO 574/Assignment 3/Database/ReelGood Data/Reel Good Data (Title+Service+Genre+Tag List).csv', index_col=0)

  df_whole = pd.read_csv('/content/drive/MyDrive/DSO 574/Assignment 3/Database/ReelGood Data/Reel Good Data (Title+Service+Genre+Tag List).csv', index_col=0)


In [None]:
df = df_whole.drop_duplicates(subset=df_whole.columns.difference(
                        ['URL', 'Tag', 'IMDB', 'ReelGood' , 'AgeRating', 'Rated',
                         'Released Year', 'Duration Year', 'Seasons', 'What it\'s about',
                         'Where to Watch', 'Rent or Buy Available', 'Exclusive Service',
                         'Has Tag']))

In [None]:
# Data cleaning (we are grouping the data by Title, so each movie/ TV show will show up once only)
cleaned_df = df.groupby('Title').agg({
    'URL': 'first',
    'Type': 'first',
    'Service': lambda x: list(set(x)), # Check the TV show is available on which platforms
    'Genre': lambda x: list(set(x)),
    'Tag': 'first',
    'IMDB': 'mean',  # Mean IMDB rating
    'ReelGood': 'mean',  # Mean ReelGood rating
    'AgeRating': 'first',
    'Rated': 'first', # first IMDB rating
    'Released Year': 'first',  # first released year
    'Duration Year': 'first',  # first duration years
    'Seasons': 'max',  # Maximum number of seasons
    'What it\'s about': 'first',
    'Where to Watch': 'first',  # first 'Where to Watch' values
    'Rent or Buy Available': 'first',  # first of 'Rent or Buy Available' values
    'Exclusive Service': 'first',  # first non-null 'Exclusive Service' values
    'Has Tag': 'first'  # first of 'Has Tag' values
}).reset_index()

In [None]:
import ast

# Function to safely convert string representation of lists to actual lists
def safe_literal_eval(x):
    try:
        return ast.literal_eval(x)
    except (SyntaxError, ValueError):
        return x

# Convert string representation of lists to actual lists
cleaned_df['Tag'] = cleaned_df['Tag'].apply(safe_literal_eval)

# Define a function to remove duplicates from a list
def remove_duplicates(lst):
    if isinstance(lst, list):
        return list(set(lst))
    else:
        return lst

# Apply the function to remove duplicates from each list in the 'Tag' column
cleaned_df['Tag'] = cleaned_df['Tag'].apply(remove_duplicates)

In [None]:
cleaned_df

Unnamed: 0,Title,URL,Type,Service,Genre,Tag,IMDB,ReelGood,AgeRating,Rated,Released Year,Duration Year,Seasons,What it's about,Where to Watch,Rent or Buy Available,Exclusive Service,Has Tag
0,!Women Art Revolution,/movie/women-art-revolution-2010,movies,"[tubi_tv, vudu_free, free, fandor, plutotv, ka...",[Documentary],"[Political, Female Director]",6.8,48.0,,,2010,,,"Through intimate interviews, provocative art, ...","Available to watch free online (Tubi, PlutoTV ...",1,0,1
1,#1 Cheerleader Camp,/movie/cheerleader-camp-2010,movies,"[plutotv, free, tubi_tv]",[Comedy],[Sports],3.7,45.0,18+,R,2010,,,A pair of horny college guys get summer jobs a...,Available to watch free online (Tubi & PlutoTV...,1,0,1
2,#Alive,/movie/alive-2020,movies,[netflix],"[Drama, Horror, Action & Adventure]","[Zombie, Suspense, Survival, Technology, Escap...",6.2,62.0,,,2020,,,"As a grisly virus rampages a city, a lone man ...",Available to stream on a popular subscription ...,0,1,1
3,#BlackLove,/show/blacklove-2015,tv,"[plutotv, fyi, fyi_tveverywhere, free]","[Romance, Reality, Drama]","[nan, nan, nan, nan]",6.1,37.0,16+,,2015,,,,,1,0,0
4,#FollowFriday,/movie/followfriday-2016,movies,[Rent or Buy],"[Mystery, Thriller]",[nan],2.8,21.0,,,2016,,,,,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61954,那年花開月正圓,/show/na-nian-hua-kai-yue-zheng-yuan-2017,tv,[amazon_prime],"[Romance, Drama]",[nan],7.7,41.0,7+,,2017,,,,,0,1,0
61955,阳关道,/show/demons-path-2018,tv,[netflix],"[Crime, Comedy, Horror]",[nan],3.2,32.0,,,2018,,,,,0,1,0
61956,頭文字D First Stage,/show/d-first-stage-1998,tv,"[funimation, hulu_plus]","[Action & Adventure, Sport, Animation]",[Sports],8.4,54.0,7+,TV-PG,1998,1998-2020,6.0,Takumi Fujiwara is the son of the owner of a l...,63 episodes (77%) are available to stream on a...,1,0,1
61957,부릉! 부릉! 브루미즈,/show/vroomiz,tv,"[amazon_prime, netflix, free, tubi_tv]","[Action & Adventure, Children, Animation, Family]","[nan, nan, nan, nan]",4.9,44.0,,,2012,,,,,1,0,0


In [None]:
df['Title'].nunique()

61959

## Recommender System

In [None]:
cleaned_df['Title']

0        !Women Art Revolution
1          #1 Cheerleader Camp
2                       #Alive
3                   #BlackLove
4                #FollowFriday
                 ...          
61954                  那年花開月正圓
61955                      阳关道
61956         頭文字D First Stage
61957             부릉! 부릉! 브루미즈
61958                   부자의 탄생
Name: Title, Length: 61959, dtype: object

In [None]:
cleaned_df.columns

Index(['Title', 'URL', 'Type', 'Service', 'Genre', 'Tag', 'IMDB', 'ReelGood',
       'AgeRating', 'Rated', 'Released Year', 'Duration Year', 'Seasons',
       'What it's about', 'Where to Watch', 'Rent or Buy Available',
       'Exclusive Service', 'Has Tag'],
      dtype='object')

In [None]:
cleaned_df[cleaned_df['Title'] == 'Stranger Things']

Unnamed: 0,Title,URL,Type,Service,Genre,Tag,IMDB,ReelGood,AgeRating,Rated,Released Year,Duration Year,Seasons,What it's about,Where to Watch,Rent or Buy Available,Exclusive Service,Has Tag
44574,Stranger Things,/show/stranger-things-2016,tv,[netflix],"[Drama, Horror, Fantasy]","[Monster, Space, Suspense, Friendship, Science...",8.8,96.0,16+,TV-14,2016,2016-2020,4.0,"When a young boy vanishes, a small town uncove...",25 episodes (100%) are available to stream on ...,0,1,1


In [None]:
cleaned_df[cleaned_df['Title'] == 'Stranger Things']['Genre'].iloc[0]

['Drama', 'Horror', 'Fantasy']

**Version 1** (the system will find if the movie/ tv shows exist in the database)

In [None]:
def recommend_movie(input_query):

    filtered_data = cleaned_df[
                               cleaned_df['Title'].str.contains(input_query, case=False) |
                               cleaned_df['Genre'].str.contains(input_query, case=False) |
                               cleaned_df['Tag'].str.contains(input_query, case=False)
                               ]

    if len(filtered_data) == 0:
        return "Sorry, we couldn't find any movies/TV shows matching your query."


    sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)

    recommendations = sorted_data.head(3)

    return recommendations[['Title', 'IMDB']]

user_input = input("Please provide a title, genre, or example of movies you like: ")

recommendations = recommend_movie(user_input)

print("Recommendations based on your input:")
print(recommendations)

Please provide a title, genre, or example of movies you like: stranger things
Recommendations based on your input:
                 Title  IMDB
44574  Stranger Things   8.8


**Version 2** (the system will return the top 3 IMDB rating movies/tv shows under the same genre)

In [None]:
cleaned_df.sort_values(by = 'IMDB', ascending = False).iloc[0:3][['Title', 'IMDB']]

Unnamed: 0,Title,IMDB
15747,Eco-Terrorist: Battle for Our Planet,10.0
61459,"You May Be Pretty, But I Am Beautiful: The Adr...",9.7
8009,Bluey,9.7


In [None]:
def recommend_movies_tvshows(input_data):
    recommended_movies_tvshows = []

    for movie in input_data:
        # Check if the input movie exists in the dataset
        movie_row = cleaned_df[cleaned_df['Title'].str.lower() == movie.lower()]

        if not movie_row.empty:
            # If movie exists, get its genres
            genres = movie_row['Genre'].iloc[0]
            # Filter movies based on genres
            filtered_data = cleaned_df[cleaned_df['Genre'].apply(lambda x: any(genre in x for genre in genres))]

            if len(filtered_data) > 0:
                # Sort filtered data based on IMDB rating
                sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                # Get top recommendation
                recommendation = sorted_data.iloc[0:3][['Title', 'Genre', 'IMDB']]
                recommended_movies_tvshows.append(recommendation)
            else:
                recommended_movies_tvshows.append(f"No recommendations found for {movie}.")
        else:
            recommended_movies_tvshows.append(f"Sorry, we couldn't find any movies matching '{movie}'.")

    return recommended_movies_tvshows

# Ask user for input
user_input = input("Please provide a list of titles, genres, or examples of movies you like (separated by comma): ")
input_data = [movie.strip() for movie in user_input.split(',')]

# Get recommendations
recommendations = recommend_movies_tvshows(input_data)

print("Recommendations based on your input:")
for recommendation in recommendations:
    print(recommendation)

Please provide a list of titles, genres, or examples of movies you like (separated by comma): stranger things
Recommendations based on your input:
                                       Title                    Genre  IMDB
48264           The Curators of Dixon School                  [Drama]   9.6
8616                            Breaking Bad           [Crime, Drama]   9.5
51134  The Last Drive-in With Joe Bob Briggs  [Fantasy, Comedy, Cult]   9.5


**Version 3** (add the feature that if a user input a genre, the system will return the top3 movies/tv shows under the given genre)

In [None]:
list = ['Drama', 'Horror', 'Fantasy']
[i.lower() for i in list]

['drama', 'horror', 'fantasy']

In [None]:
def recommend_movies_tvshows(input_data):
    recommended_movies = []

    for item in input_data:
        # Check if the input is a movie or TV show
        movie_row = cleaned_df[cleaned_df['Title'].str.lower() == item.lower()]

        if not movie_row.empty:
            # If movie exists, get its genres
            genres = movie_row['Genre'].iloc[0]
            # Filter movies based on genres
            filtered_data = cleaned_df[cleaned_df['Genre'].apply(lambda x: any(genre in x for genre in genres))]

            if len(filtered_data) > 0:
                # Sort filtered data based on IMDB rating
                sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                # Get top 3 recommendations
                recommendations = sorted_data.head(3)[['Title', 'Type', 'Genre', 'IMDB']]
                recommended_movies.append(recommendations)
            else:
                recommended_movies.append(f"No recommendations found for {item}.")
        else:
            # Check if the input is a genre
            is_genre = False
            for genre in cleaned_df['Genre']:
                if item.lower() in [g.lower() for g in genre]:
                    is_genre = True
                    # Filter movies based on the genre
                    filtered_data = cleaned_df[cleaned_df['Genre'].apply(lambda x: item.lower() in [g.lower() for g in x])]

                    if len(filtered_data) > 0:
                        # Sort filtered data based on IMDB rating
                        sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                        # Get top 3 recommendations
                        recommendations = sorted_data.head(3)[['Title', 'Type', 'Genre', 'IMDB']]
                        recommended_movies.append(recommendations)
                    else:
                        recommended_movies.append(f"No recommendations found for the genre '{item}'.")
                    break

            if not is_genre:
                # Check if the input is a tag
                is_tag = False
                for tag in cleaned_df['Tag']:
                    if item.lower() in [t.lower() for t in tag]:
                        is_tag = True
                        # Filter movies based on the tag
                        filtered_data = cleaned_df[cleaned_df['Tag'].apply(lambda x: item.lower() in [t.lower() for t in x])]

                        if len(filtered_data) > 0:
                            # Sort filtered data based on IMDB rating
                            sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                            # Get top 3 recommendations
                            recommendations = sorted_data.head(3)[['Title', 'Type', 'Genre', 'IMDB']]
                            recommended_movies.append(recommendations)
                        else:
                            recommended_movies.append(f"No recommendations found for the tag '{item}'.")
                        break

                if not is_tag:
                    recommended_movies.append(f"Sorry, we couldn't find any movies or TV shows matching '{item}'.")

    return recommended_movies

# Ask user for input
user_input = input("Please provide a list of titles, genres, or examples of movies/TV shows you like (separated by comma): ")
input_data = [item.strip() for item in user_input.split(',')]

# Get recommendations
recommendations = recommend_movies_tvshows(input_data)

print("Recommendations based on your input:")
for recommendation in recommendations:
    print(recommendation)


Please provide a list of titles, genres, or examples of movies/TV shows you like (separated by comma): crime
Recommendations based on your input:
                          Title    Type                     Genre  IMDB
8616               Breaking Bad      tv            [Crime, Drama]   9.5
55760                  The Wire      tv  [Crime, Thriller, Drama]   9.3
54169  The Shawshank Redemption  movies            [Crime, Drama]   9.3


In [None]:
def recommend_movies_tvshows(input_data):
    recommended_movies = []

    for item in input_data:
        # Check if the input is a movie or TV show
        movie_row = cleaned_df[cleaned_df['Title'].str.lower() == item.lower()]

        if not movie_row.empty:
            # If movie exists, get its genres
            genres = movie_row['Genre'].iloc[0]
            # Filter movies based on genres
            filtered_data = cleaned_df[cleaned_df['Genre'].apply(lambda x: any(genre in x for genre in genres))]

            if len(filtered_data) > 0:
                # Sort filtered data based on IMDB rating
                sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                # Get top 3 recommendations
                recommendations = sorted_data.head(3)[['Title', 'Genre', 'IMDB']]
                recommended_movies.append((item, genres, recommendations))
            else:
                recommended_movies.append((item, genres, f"No recommendations found for {item}."))

        else:
            # Check if the input is a genre
            is_genre = False
            for genre in cleaned_df['Genre']:
                if item.lower() in [g.lower() for g in genre]:
                    is_genre = True
                    # Filter movies based on the genre
                    filtered_data = cleaned_df[cleaned_df['Genre'].apply(lambda x: item.lower() in [g.lower() for g in x])]

                    if len(filtered_data) > 0:
                        # Sort filtered data based on IMDB rating
                        sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                        # Get top 3 recommendations
                        recommendations = sorted_data.head(3)[['Title', 'Genre', 'IMDB']]
                        recommended_movies.append((item, item, recommendations))
                    else:
                        recommended_movies.append((item, item, f"No recommendations found for the genre '{item}'."))

                    break

            if not is_genre:
                # Check if the input is a tag
                is_tag = False
                for tag in cleaned_df['Tag']:
                    if item.lower() in [t.lower() for t in tag]:
                        is_tag = True
                        # Filter movies based on the tag
                        filtered_data = cleaned_df[cleaned_df['Tag'].apply(lambda x: item.lower() in [t.lower() for t in x])]

                        if len(filtered_data) > 0:
                            # Sort filtered data based on IMDB rating
                            sorted_data = filtered_data.sort_values(by='IMDB', ascending=False)
                            # Get top 3 recommendations
                            recommendations = sorted_data.head(3)[['Title', 'Genre', 'IMDB', 'Tag']]
                            recommended_movies.append((item, item, recommendations))
                        else:
                            recommended_movies.append((item, item, f"No recommendations found for the tag '{item}'."))

                        break

                if not is_tag:
                    recommended_movies.append((item, item, f"Sorry, we couldn't find any movies or TV shows matching '{item}'."))

    return recommended_movies

# Ask user for input
user_input = input("Please provide a list of titles, genres, or examples of movies/TV shows you like (separated by comma): ")
input_data = [item.strip() for item in user_input.split(',')]

# Get recommendations
recommendations = recommend_movies_tvshows(input_data)

print("Recommendations based on your input:")
for item, category, recommendation in recommendations:
    print(f"Input: {item}, Category: {category}")

    if isinstance(recommendation, str):
        print(recommendation)
    else:
        print(recommendation)


Please provide a list of titles, genres, or examples of movies/TV shows you like (separated by comma): Action & Adventure
Recommendations based on your input:
Input: Action & Adventure, Category: Action & Adventure
                  Title                                 Genre  IMDB
5773   Band of Brothers           [Drama, Action & Adventure]   9.4
19222   Game of Thrones  [Drama, Action & Adventure, Fantasy]   9.3
38475           Ramayan  [Drama, Action & Adventure, Fantasy]   9.2
