# **Movie Recommendation System** 

## **Project Overview**
- This project builds a content-based movie recommendation system using NLP techniques and cosine similarity.  
- It processes movie metadata, extracts key features, and generates recommendations based on textual similarity.

### 1. Import Required Libraries

In [1]:
import numpy as np
import pandas as pd
import ast  # To parse stringified lists
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import joblib  # For saving the model

In [2]:
# Display all columns for better visibility
pd.set_option("display.max_columns", None)

### 2. Load Datasets
- 'tmdb_5000_credits.csv' contains movie_id title cast and crew details.
- 'tmdb_5000_movies.csv' contains metadata like genres, keywords, and overviews etc.

In [3]:
credits=pd.read_csv("C:/Users/User/OneDrive/Desktop/movie without similarity on github/tmdb_5000_credits.csv")

In [4]:
movies = pd.read_csv("C:/Users/User/OneDrive/Desktop/movie without similarity on github/tmdb_5000_movies.csv")
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [5]:
# Display basic information

print(f"Movies Dataset Shape: {movies.shape}")
print(f"Credits Dataset Shape: {credits.shape}")

Movies Dataset Shape: (4803, 20)
Credits Dataset Shape: (4803, 4)



### 3. Merge Datasets on Movie ID

In [6]:
df_movie=movies.merge(credits,left_on="id",right_on="movie_id")
df_movie.head(1)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title_x,vote_average,vote_count,movie_id,title_y,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."


### 4. Selecting Relevant Features

In [7]:
# List of columns to drop
columns_to_drop = [
    'budget', 'homepage', 'original_language', 'original_title', 'popularity', 
    'production_companies', 'production_countries', 'release_date', 'revenue', 
    'runtime', 'spoken_languages', 'status', 'tagline', 'vote_average', 'vote_count', 
    'title_y', 'movie_id' 
]

# Droping columns to save memory
selected_movie=df_movie.drop(columns=columns_to_drop)

# Displaying the modified DataFrame
selected_movie.head()


Unnamed: 0,genres,id,keywords,overview,title_x,cast,crew
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...",John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


### 5. checking duplicated Values

In [8]:
print("Duplicated Values Before Cleanup:", selected_movie.duplicated().sum())

Duplicated Values Before Cleanup: 0


### 5. Checking Missing Values

In [9]:
print("Missing Values Before Cleanup:\n", selected_movie.isnull().sum())

Missing Values Before Cleanup:
 genres      0
id          0
keywords    0
overview    3
title_x     0
cast        0
crew        0
dtype: int64


In [10]:
selected_movie

Unnamed: 0,genres,id,keywords,overview,title_x,cast,crew
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...",John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."
...,...,...,...,...,...,...,...
4798,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",9367,"[{""id"": 5616, ""name"": ""united states\u2013mexi...",El Mariachi just wants to play his guitar and ...,El Mariachi,"[{""cast_id"": 1, ""character"": ""El Mariachi"", ""c...","[{""credit_id"": ""52fe44eec3a36847f80b280b"", ""de..."
4799,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",72766,[],A newlywed couple's honeymoon is upended by th...,Newlyweds,"[{""cast_id"": 1, ""character"": ""Buzzy"", ""credit_...","[{""credit_id"": ""52fe487dc3a368484e0fb013"", ""de..."
4800,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",231617,"[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","""Signed, Sealed, Delivered"" introduces a dedic...","Signed, Sealed, Delivered","[{""cast_id"": 8, ""character"": ""Oliver O\u2019To...","[{""credit_id"": ""52fe4df3c3a36847f8275ecf"", ""de..."
4801,[],126186,[],When ambitious New York attorney Sam is sent t...,Shanghai Calling,"[{""cast_id"": 3, ""character"": ""Sam"", ""credit_id...","[{""credit_id"": ""52fe4ad9c3a368484e16a36b"", ""de..."


### 6. Preprocessing Text Data
- Convert JSON-like string columns into lists of meaningful words.


In [11]:
def string_from_dict(text):
    return [i['name'].lower().replace(" ", "_") for i in ast.literal_eval(text)]


In [12]:
# Apply function to genres and keywords
selected_movie['genres']=selected_movie['genres'].apply(string_from_dict)
selected_movie['keywords']=selected_movie['keywords'].apply(string_from_dict)

In [13]:
# Extract top 3 cast members
def top_cast(text):
    """Extracts the director's name from the crew list."""
    return [i['name'].lower().replace(" ", "_") for index, i in enumerate(ast.literal_eval(text)) if index < 3]

selected_movie['cast'] = selected_movie['cast'].apply(top_cast)

In [14]:
# Extract director's name
def Director(text):
    """Extracts the director's name from the crew list."""
    return next((i['name'].lower().replace(" ", "_") for i in ast.literal_eval(text) if i['job'] == "Director"), "unknown")

selected_movie['crew'] = selected_movie['crew'].fillna("[]")  # Replace NaN with empty lists in string format to aviod error
selected_movie['crew'] = selected_movie['crew'].apply(Director)  #  To Apply function safely


In [15]:
selected_movie=selected_movie.dropna(subset=['overview'])

In [16]:
print(selected_movie.isnull().sum())

genres      0
id          0
keywords    0
overview    0
title_x     0
cast        0
crew        0
dtype: int64


### 7. Feature Engineering
- Create a new 'tags' column by combining all relevant textual features.

In [17]:
selected_movie['tags'] = selected_movie.apply(lambda x: ' '.join(
    [word.replace(" ", "_") for word in (x['genres'] + x['keywords'] + x['cast'] + [x['crew']])] + 
    x['overview'].lower().split()  # Keep words separate
), axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  selected_movie['tags'] = selected_movie.apply(lambda x: ' '.join(


In [18]:
selected_movie.head(5)

Unnamed: 0,genres,id,keywords,overview,title_x,cast,crew,tags
0,"[action, adventure, fantasy, science_fiction]",19995,"[culture_clash, future, space_war, space_colon...","In the 22nd century, a paraplegic Marine is di...",Avatar,"[sam_worthington, zoe_saldana, sigourney_weaver]",james_cameron,action adventure fantasy science_fiction cultu...
1,"[adventure, fantasy, action]",285,"[ocean, drug_abuse, exotic_island, east_india_...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,"[johnny_depp, orlando_bloom, keira_knightley]",gore_verbinski,adventure fantasy action ocean drug_abuse exot...
2,"[action, adventure, crime]",206647,"[spy, based_on_novel, secret_agent, sequel, mi...",A cryptic message from Bond’s past sends him o...,Spectre,"[daniel_craig, christoph_waltz, léa_seydoux]",sam_mendes,action adventure crime spy based_on_novel secr...
3,"[action, crime, drama, thriller]",49026,"[dc_comics, crime_fighter, terrorist, secret_i...",Following the death of District Attorney Harve...,The Dark Knight Rises,"[christian_bale, michael_caine, gary_oldman]",christopher_nolan,action crime drama thriller dc_comics crime_fi...
4,"[action, adventure, science_fiction]",49529,"[based_on_novel, mars, medallion, space_travel...","John Carter is a war-weary, former military ca...",John Carter,"[taylor_kitsch, lynn_collins, samantha_morton]",andrew_stanton,action adventure science_fiction based_on_nove...


In [19]:
selected_movie['tags'][0]

'action adventure fantasy science_fiction culture_clash future space_war space_colony society space_travel futuristic romance space alien tribe alien_planet cgi marine soldier battle love_affair anti_war power_relations mind_and_soul 3d sam_worthington zoe_saldana sigourney_weaver james_cameron in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.'

In [20]:
# Keep only necessary columns for the model
filtered_movie=selected_movie[['id','title_x','tags']].copy()

In [21]:
filtered_movie.head(5)

Unnamed: 0,id,title_x,tags
0,19995,Avatar,action adventure fantasy science_fiction cultu...
1,285,Pirates of the Caribbean: At World's End,adventure fantasy action ocean drug_abuse exot...
2,206647,Spectre,action adventure crime spy based_on_novel secr...
3,49026,The Dark Knight Rises,action crime drama thriller dc_comics crime_fi...
4,49529,John Carter,action adventure science_fiction based_on_nove...


### 8. Text Vectorization Using CountVectorizer
- Convert the 'tags' column into a numerical feature vector using Bag-of-Words.

In [22]:
# Initialize CountVectorizer
cv = CountVectorizer(max_features=5000, stop_words='english')

# Convert tags into feature vectors
vectors = cv.fit_transform(filtered_movie['tags']).toarray()

In [23]:
# Display vocabulary size
print(f"Vectorized Features Shape: {vectors.shape}")

Vectorized Features Shape: (4800, 5000)


### 9. Compute Cosine Similarity
- Compute similarity scores between movies based on their text features.

In [24]:
similarity = cosine_similarity(vectors)

In [25]:
similarity = cosine_similarity(vectors)
print(f"Similarity Matrix Shape: {similarity.shape}")

Similarity Matrix Shape: (4800, 4800)


### 10. Build Recommendation Function
- Function to recommend the top 5 most similar movies based on user input.

In [26]:

def recommend(movie):

    """Returns top 5 recommended movies based on similarity scores."""
    index = filtered_movie[filtered_movie['title_x'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])[1:6]
    print(distances)
    for i in distances:
       return filtered_movie.iloc[i[0]].title_x

In [27]:
recommend("Avatar")

[(539, 0.26089696604360174), (1191, 0.2581988897471611), (507, 0.25302403842552984), (260, 0.25110592822973776), (1213, 0.2480694691784169)]


'Titan A.E.'

### 12. Save Model for Deployment

In [28]:
joblib.dump(filtered_movie, "movie.joblib")

['movie.joblib']

In [29]:
!pip install gdown




In [30]:
import gdown

file_url = "https://drive.google.com/file/d/1nijyuSOhetFdEGqXU0nMSs1ZMOBnkYKd/view?usp=sharing"
gdown.download(file_url, "similarity.joblib", quiet=False, fuzzy=True)

Downloading...
From (original): https://drive.google.com/uc?id=1nijyuSOhetFdEGqXU0nMSs1ZMOBnkYKd
From (redirected): https://drive.google.com/uc?id=1nijyuSOhetFdEGqXU0nMSs1ZMOBnkYKd&confirm=t&uuid=90c0ceab-46e0-456e-8f55-09af93038769
To: c:\Users\User\OneDrive\Desktop\movie without similarity on github\similarity.joblib
100%|██████████| 184M/184M [00:54<00:00, 3.37MB/s] 


'similarity.joblib'