## Movie Recommendation System - Introduction

This project is a basic movie recommendation system that uses machine learning to suggest movies to users. By analyzing what movies users have watched and liked before, the system predicts and recommends new movies they might enjoy. It uses simple algorithms to understand user preferences and make personalized movie suggestions, making it easier for users to find movies they will love.

In [90]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import ast
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from typing import List
import os

* I am using the dataset from Kaggle [TMDB 5000 Movie Dataset](https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata)

In [91]:
dataset_path = os.path.join(os.getcwd(), "datasets")
credits = os.path.join(dataset_path, "tmdb_5000_credits.csv")
movies = os.path.join(dataset_path, "tmdb_5000_movies.csv")

* Let's explore these datasets: Credits and Movies

In [92]:
df_credits = pd.read_csv(credits)
df_movies = pd.read_csv(movies)

print(f"df_credits shape: {df_credits.shape}")
df_credits.head()

df_credits shape: (4803, 4)


Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [93]:
print(f"df_movies shape: {df_movies.shape}")
df_movies.head()

df_movies shape: (4803, 20)


Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


* We can merge these two datasets using the "title" column.

In [94]:
data = df_credits.merge(df_movies, on="title")
data.shape
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
movie_id,4809.0,57120.57,88653.37,5.0,9012.0,14624.0,58595.0,459488.0
budget,4809.0,29027800.0,40704730.0,0.0,780000.0,15000000.0,40000000.0,380000000.0
id,4809.0,57120.57,88653.37,5.0,9012.0,14624.0,58595.0,459488.0
popularity,4809.0,21.49166,31.80337,0.0,4.66723,12.92159,28.35053,875.5813
revenue,4809.0,82275110.0,162837900.0,0.0,0.0,19170000.0,92913170.0,2787965000.0
runtime,4807.0,106.8823,22.60254,0.0,94.0,103.0,118.0,338.0
vote_average,4809.0,6.092514,1.193989,0.0,5.6,6.2,6.8,10.0
vote_count,4809.0,690.3317,1234.187,0.0,54.0,235.0,737.0,13752.0


In [95]:
class DataPreProcessConfig:

    def __init__(self, data_df):
        self.data_df = data_df
        self.data_df = data_df[
            [
                "movie_id",
                "title",
                "overview",
                "genres",
                "keywords",
                "cast",
                "crew"
            ]
        ]

    def data_col_func(self, data_col):
        return [val['name'] for val in ast.literal_eval(data_col)]


    def data_col_func2(self, data_col):
        try:
            char_list = []
            min_val = 0
            for val in ast.literal_eval(data_col):
                if min_val < 3:
                    char_list.append(val['name'])
                    min_val += 1
                else:
                    break
            return char_list
        except Exception as e:
            print(f"Error in e: {e}")

    def director_func(self, data_col):
        director_list = [val['name'] for val in ast.literal_eval(data_col) if val['job'] == 'Director']
        return director_list

    def data_process_func(self):
        try:
            add_columns = ['genres', 'keywords', 'cast', 'crew']
            for col in add_columns:
                self.data_df[col] = self.data_df[col].apply(self.data_col_func)
        except Exception as e:
            print(f"Error in {e}")
            raise e

        return self.data_df
    

In [96]:
data_preprocessing_obj = DataPreProcessConfig(data_df=data)
data2 = data_preprocessing_obj.data_process_func()

In [97]:
data2.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[Stephen E. Rivkin, Rick Carter, Christopher B..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[Dariusz Wolski, Gore Verbinski, Jerry Bruckhe..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[Thomas Newman, Sam Mendes, Anna Pinnock, John..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman, A...","[Hans Zimmer, Charles Roven, Christopher Nolan..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton,...","[Andrew Stanton, Andrew Stanton, John Lasseter..."


In [98]:
data2['overview'] = data2['overview'].apply(lambda x: x.split() if isinstance(x,str) else [])

In [99]:
data2.head()

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weave...","[Stephen E. Rivkin, Rick Carter, Christopher B..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[ocean, drug abuse, exotic island, east india ...","[Johnny Depp, Orlando Bloom, Keira Knightley, ...","[Dariusz Wolski, Gore Verbinski, Jerry Bruckhe..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[spy, based on novel, secret agent, sequel, mi...","[Daniel Craig, Christoph Waltz, Léa Seydoux, R...","[Thomas Newman, Sam Mendes, Anna Pinnock, John..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[dc comics, crime fighter, terrorist, secret i...","[Christian Bale, Michael Caine, Gary Oldman, A...","[Hans Zimmer, Charles Roven, Christopher Nolan..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, Science Fiction]","[based on novel, mars, medallion, space travel...","[Taylor Kitsch, Lynn Collins, Samantha Morton,...","[Andrew Stanton, Andrew Stanton, John Lasseter..."


In [100]:
data2['tags_data'] = data2.apply(lambda r: r['overview']+r['genres']+r['keywords']+r['cast']+r['crew'], axis=1)

In [101]:
data2.drop(['overview','genres','keywords', 'cast','crew'], axis=1, inplace=True)


In [102]:
data2['tags_data'] = data2['tags_data'].apply(lambda x: ' '.join(x))

In [103]:
cv = CountVectorizer(max_features=5000, stop_words='english')

In [104]:
vector_data = cv.fit_transform(data2['tags_data']).toarray()

In [105]:
vector_data.shape

(4809, 5000)

In [106]:
similarity = cosine_similarity(vector_data)

In [107]:
similarity

array([[1.        , 0.26846111, 0.33307784, ..., 0.08147231, 0.02723512,
        0.049093  ],
       [0.26846111, 1.        , 0.18911737, ..., 0.04845016, 0.01230915,
        0.01849001],
       [0.33307784, 0.18911737, 1.        , ..., 0.07696724, 0.11173795,
        0.02098069],
       ...,
       [0.08147231, 0.04845016, 0.07696724, ..., 1.        , 0.05367422,
        0.04031297],
       [0.02723512, 0.01230915, 0.11173795, ..., 0.05367422, 1.        ,
        0.03413944],
       [0.049093  , 0.01849001, 0.02098069, ..., 0.04031297, 0.03413944,
        1.        ]])

In [123]:
data2[data2['title'] == 'Avatar']
data2['title'] = data2['title'].apply(lambda x: x.lower())

In [165]:
sample_movie = "2012"
try:
    dd = data2[data2['title'] == sample_movie.lower()].index[0]
    distances = sorted(list(enumerate(similarity[dd])),
                                reverse=True,
                                key=lambda x: x[1]
                            )
    top_data =distances[0:6]
    suggest_movie_list = [data2.iloc[i[0]][1] for i in top_data ]
    print(f"Because you watched: {sample_movie}. Here are movie suggestions\n")
    for i in suggest_movie_list[1:]:
        print(i)
except IndexError as e:
    print(f"Unable to locate movie: \"{sample_movie.title()}\". Please try again.")

Because you watched: 2012. Here are movie suggestions

jurassic world
independence day: resurgence
the day after tomorrow
15 minutes
batman begins
