<a id="1"></a>
# <div style="padding:20px;color:white;margin:0;font-size:30px;font-family:Georgia;text-align:center;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>🎬 Movie Recommendation System 🎬</b></div>

![image.png](attachment:7fd3808a-766f-4187-a075-a3e6c30ec82f.png)

<a id="1"></a>
# <div style="padding:20px;color:white;margin:0;font-size:35px;font-family:Verdana;text-align:center;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b> 🎥 Introduction 🎥</b></div>

<div style="border-radius:10px;border:black solid;padding:15px;background-color:white;font-size:110%;text-align:left">
<div style="font-family:Georgia;background-color:'#DEB887';padding:30px;font-size:17px">

<h3 align="left"><font color=purple>🎬 Introduction to TMDB (The Movie Database):</font></h3><br>

The Movie Database (TMDB) is an online database that provides information about movies, TV shows, and other entertainment content. It offers a vast collection of metadata, including movie titles, overviews, genres, keywords, cast details, and crew information. This rich dataset allows developers and data scientists to build various applications, including movie recommendation systems.

<h3 align="left"><font color=purple>🎥 Movie Recommendation System:</font></h3><br>

A movie recommendation system is a software application designed to suggest movies to users based on their preferences, viewing history, and other related data. It helps users discover new movies that align with their interests, leading to a more personalized and engaging entertainment experience.

<h3 align="left"><font color=purple>🔍 Content-Based Filtering:</font></h3><br>

Content-based filtering is a recommendation technique that suggests items (movies, in this case) to users based on the features and attributes of items they have shown interest in before. In the context of movies, content-based filtering utilizes various movie features, such as genres, keywords, cast, and crew details, to recommend similar movies to the users.

<h3 align="left"><font color=purple>🤝 Collaborative Filtering:</font></h3><br>

Collaborative Filtering is another popular recommendation technique that relies on user-item interaction data. Unlike content-based filtering, collaborative filtering doesn't require explicit feature extraction or item descriptions. Instead, it focuses on analyzing user behavior and preferences to make recommendations.

<h3 align="left"><font color=purple>📊 Vectorization:</font></h3><br>

To build a content-based movie recommendation system, the first step involves representing movies and their features in vectorized form. Vectorization is the process of converting textual or categorical data into numerical vectors, making it easier to perform mathematical operations and computations.

<h3 align="left"><font color=purple>📝 Introduction to the Notebook (Movie Recommendation System):</font></h3><br>

The notebook "Movie Recommendation System" aims to implement a content-based filtering approach using the TMDB dataset. The dataset contains valuable information about movies, such as titles, overviews, genres, keywords, cast, and crew details. By leveraging this data, we can build a recommendation system that suggests movies to users based on their preferences and past interactions.

Throughout the notebook, we will explore the process of vectorizing movie features, developing similarity metrics to measure movie similarity, and building a recommendation mechanism. By the end of the notebook, we will have a functional movie recommendation system capable of providing personalized movie suggestions to users, enhancing their movie-watching experience. We will also use diagrams to illustrate the system's architecture and key components, making it easier to understand the system's inner workings.
</div>
</div>


![image.png](attachment:dd218fe9-08ae-4599-b9ec-10e511f7fbf4.png)

![image.png](attachment:a77ea394-e98e-4419-beaa-67b8ffe884e0.png)

<div style="border-radius:10px;border:black solid;padding:15px;background-color:white;font-size:110%;text-align:left;">
<div style="font-family:Georgia;background-color:#254E58;padding:30px;font-size:17px;color:white;">

<h3 align="left"><font color="white">🤔 Question: Why use vectorization instead of Euclidean distances for content-based movie recommendation systems?</font></h3><br>
<h4 align="left"><font color="white">🎯 Answer Points:</font></h4>

<ul>
<li><font color="white">📐 Euclidean distances are less effective in high-dimensional spaces due to the "curse of dimensionality."</font></li>
<li><font color="white">🌌 Vectorization converts movie features into numerical representations, reducing complexity in high-dimensional data.</font></li>
<li><font color="white">💡 Content-based filtering relies on movie attributes like genres, keywords, cast, and crew, which can be efficiently represented as vectors.</font></li>
<li><font color="white">🔍 Vectorization allows for meaningful comparisons between movies based on their feature similarities.</font></li>
<li><font color="white">🎥 Utilizing cosine similarity between movie vectors provides accurate and personalized movie suggestions in content-based filtering systems.</font></li>
</ul>
</div>
</div>


<a id="1"></a>
# <div style="padding:20px;color:white;margin:0;font-size:30px;font-family:Georgia;text-align:center;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>🐍 Import Dependency 🐍</b></div>

In [107]:
import pandas as pd
import numpy as np

In [108]:
movies = pd.read_csv("/kaggle/input/tmdb-movie-metadata/tmdb_5000_movies.csv")
credits = pd.read_csv("/kaggle/input/tmdb-movie-metadata/tmdb_5000_credits.csv")

In [109]:
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[In, the, 22nd, century,, a, paraplegic, Marin..."


In [110]:
df = movies.merge(credits, on="title")

In [111]:
df.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [112]:
df=df[["movie_id","title","overview","genres","keywords","cast","crew"]]
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [113]:
df.isnull().sum()

movie_id    0
title       0
overview    3
genres      0
keywords    0
cast        0
crew        0
dtype: int64

In [114]:
df.dropna(inplace=True)

In [115]:
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [116]:
# Getting Genres["name"] from this function
import ast

def convert(obj):
    L=[]
    for i in ast.literal_eval(obj):
        L.append(i["name"])
    return L

In [117]:
df["genres"] = df["genres"].apply(convert)

In [118]:
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [119]:
df["keywords"] = df["keywords"].apply(convert)

In [120]:
# Getting Top 3 charecters in cast feature from this function
def convert2(obj):
    L=[]
    counter = 0
    for i in ast.literal_eval(obj):
        if counter!=3:
            L.append(i["name"])
            counter+=1
        else:
            break
    return L


In [121]:
df["cast"] = df["cast"].apply(convert2)

In [122]:
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


In [123]:
# Getting Crew["Director"] names from this function
def fetch(obj):
    L =[]
    for i in ast.literal_eval(obj):
        if i["job"] == "Director":
            L.append(i["name"])
            break
    return L    

In [124]:
df["crew"] = df["crew"].apply(fetch)

In [125]:
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[culture clash, future, space war, space colon...","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron]


In [126]:
 df["overview"] = df["overview"].apply(lambda x:x.split())

In [127]:
# Removing " " (spaces) between Words from features
df["cast"] = df["cast"].apply(lambda x:[i.replace(" ","") for i in x])
df["crew"] = df["crew"].apply(lambda x:[i.replace(" ","") for i in x])
df["keywords"] = df["keywords"].apply(lambda x:[i.replace(" ","") for i in x])
df["genres"] = df["genres"].apply(lambda x:[i.replace(" ","") for i in x])

In [128]:
df.head(1)

Unnamed: 0,movie_id,title,overview,genres,keywords,cast,crew
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[cultureclash, future, spacewar, spacecolony, ...","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron]


In [129]:
df["tags"] = df["overview"] + df["genres"] + df["keywords"] + df["cast"] + df["crew"]

In [130]:
# new_df data is ready now!!!
new_df = df[["movie_id", "title", "tags"]]

In [131]:
new_df["tags"] = new_df["tags"].apply(lambda x:" ".join(x))
new_df["tags"] = new_df["tags"].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["tags"] = new_df["tags"].apply(lambda x:" ".join(x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["tags"] = new_df["tags"].apply(lambda x:x.lower())


In [132]:
new_df.head(1)

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a paraplegic marine is di..."


In [133]:
# Poreter Stemmer for handeling repeated words in tags feature
import nltk
from nltk.stem.porter import PorterStemmer
ps=PorterStemmer()

In [134]:
def stem(text):
    y=[]
    for i in text.split():
        y.append(ps.stem(i))
    
    return " ".join(y)

In [135]:
new_df["tags"] = new_df["tags"].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df["tags"] = new_df["tags"].apply(stem)


In [136]:
# Vectorization: Creating each movie as a Vector
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000, stop_words="english")

In [137]:
vector = cv.fit_transform(new_df["tags"]).toarray()

In [138]:
# Calculating Cosine Angle between vectors
from sklearn.metrics.pairwise import cosine_similarity
similar = cosine_similarity(vector)

In [139]:
# Creating our Recommend function it will return Top 5 movies back
def recommend(movie):
    movie_index = new_df[new_df["title"]==movie].index[0]
    distances = similar[movie_index]
    movie_list = sorted(list(enumerate(distances)),reverse=True,key=lambda x:x[1])[1:6]
    
    for i in movie_list: 
        print(new_df.iloc[i[0]].title)

<a id="1"></a>
# <div style="padding:20px;color:white;margin:0;font-size:30px;font-family:Georgia;text-align:center;display:fill;border-radius:5px;background-color:#254E58;overflow:hidden"><b>✨ Results ✨</b></div>

In [140]:
recommend("Batman Begins")

The Dark Knight
Batman
Batman
The Dark Knight Rises
10th & Wolf


In [141]:
recommend("The Avengers")

Iron Man 3
Avengers: Age of Ultron
Captain America: Civil War
Captain America: The First Avenger
Iron Man


In [142]:
recommend("Pirates of the Caribbean: At World's End")

Pirates of the Caribbean: Dead Man's Chest
Pirates of the Caribbean: The Curse of the Black Pearl
Pirates of the Caribbean: On Stranger Tides
Life of Pi
20,000 Leagues Under the Sea


In [143]:
import pickle

In [144]:
pickle.dump(new_df.to_dict(), open("movies.pkl", "wb"))

In [145]:
pickle.dump(similar,open("similar.pkl","wb"))