## 1. Problem Definition & Objective

With the rapid growth of online movie platforms, users are often overwhelmed by the large number of movie choices available. Finding movies that match a user's taste and preferences becomes difficult without intelligent assistance.

The objective of this project is to build a Movie Recommendation System that suggests relevant movies based on content similarity. The system analyzes movie metadata such as genres and descriptions to recommend movies similar to a selected title.

This project falls under the Recommendation Systems track and demonstrates the application of machine learning techniques to solve a real-world problem.
## 2. Data Understanding & Preparation

The dataset used in this project contains information about movies, including attributes such as title, genres, and overview. This data is used to understand the content of each movie and compute similarity between them.

Before building the recommendation model, the dataset is cleaned and preprocessed to remove missing values and combine relevant text features.



In [41]:

import pandas as pd
# Load the dataset
movies = pd.read_csv("tmdb_5000_movies.csv")
print(movies.head())


# Preview the dataset
movies.head()


      budget                                             genres  \
0  237000000  [{"id": 28, "name": "Action"}, {"id": 12, "nam...   
1  300000000  [{"id": 12, "name": "Adventure"}, {"id": 14, "...   
2  245000000  [{"id": 28, "name": "Action"}, {"id": 12, "nam...   
3  250000000  [{"id": 28, "name": "Action"}, {"id": 80, "nam...   
4  260000000  [{"id": 28, "name": "Action"}, {"id": 12, "nam...   

                                       homepage      id  \
0                   http://www.avatarmovie.com/   19995   
1  http://disney.go.com/disneypictures/pirates/     285   
2   http://www.sonypictures.com/movies/spectre/  206647   
3            http://www.thedarkknightrises.com/   49026   
4          http://movies.disney.com/john-carter   49529   

                                            keywords original_language  \
0  [{"id": 1463, "name": "culture clash"}, {"id":...                en   
1  [{"id": 270, "name": "ocean"}, {"id": 726, "na...                en   
2  [{"id": 470, "nam

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [42]:
# Keep only the required columns
movies = movies[['title', 'overview', 'genres']]

movies.head()


Unnamed: 0,title,overview,genres
0,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam..."
1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""..."
2,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam..."
3,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam..."
4,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam..."


In [43]:
# Remove rows with missing values
movies.dropna(inplace=True)

movies.isnull().sum()


title       0
overview    0
genres      0
dtype: int64

In [44]:
# Combine overview and genres into a single feature
movies['tags'] = movies['overview'] + " " + movies['genres']

movies[['title', 'tags']].head()


Unnamed: 0,title,tags
0,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,Spectre,A cryptic message from Bond’s past sends him o...
3,The Dark Knight Rises,Following the death of District Attorney Harve...
4,John Carter,"John Carter is a war-weary, former military ca..."


In [45]:
movies.shape


(4800, 4)

## 3. Model / System Design

This project uses a Content-Based Recommendation System to suggest movies to users. Instead of relying on user ratings or historical behavior, the system recommends movies based on the similarity of their content.

Textual features such as movie overview and genres are combined to represent each movie. These features are converted into numerical vectors using a text vectorization technique. The similarity between movies is then computed using cosine similarity.

This design is effective for scenarios where user interaction data is limited and provides relevant recommendations based on movie descriptions.


## 4. Core Implementation

This section implements the core logic of the movie recommendation system. It includes text vectorization, similarity computation, and the recommendation function used to suggest similar movies.


In [48]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity


In [49]:
# Convert text data into numerical vectors
cv = CountVectorizer(max_features=5000, stop_words='english')
vectors = cv.fit_transform(movies['tags']).toarray()

vectors.shape


(4800, 5000)

In [50]:
# Compute cosine similarity between movies
similarity = cosine_similarity(vectors)

similarity.shape


(4800, 4800)

In [51]:
def recommend(movie):
    index = movies[movies['title'] == movie].index[0]
    distances = similarity[index]
    
    movie_list = sorted(
        list(enumerate(distances)),
        reverse=True,
        key=lambda x: x[1]
    )[1:6]
    
    recommendations = []
    for i in movie_list:
        recommendations.append(movies.iloc[i[0]].title)
        
    return recommendations


## 5. Evaluation & Analysis

The performance of the recommendation system is evaluated qualitatively by observing the relevance of the recommended movies. Since this is a content-based system, recommendations are analyzed based on similarity in genres and movie descriptions.


In [53]:
recommend("Avatar")


['The Helix... Loaded',
 'Small Soldiers',
 'The Fifth Element',
 'Godzilla 2000',
 'Beowulf']

The results demonstrate that the system is capable of identifying movies with similar themes and genres. The recommendations are relevant and align well with the selected movie, validating the effectiveness of the content-based approach.


## 6. Ethical Considerations & Responsible AI

The recommendation system may exhibit bias toward popular or frequently occurring genres present in the dataset. Since the system relies solely on content similarity, it does not capture individual user preferences or viewing history.

Additionally, the quality of recommendations depends heavily on the accuracy and completeness of the dataset. The data used in this project is publicly available and has been used strictly for academic purposes, ensuring responsible and ethical use of information.


## 7. Conclusion & Future Scope

This project successfully demonstrates the implementation of a content-based movie recommendation system using machine learning techniques. By analyzing movie descriptions and genres, the system is able to recommend relevant movies effectively.

In the future, this system can be enhanced by incorporating collaborative filtering techniques, user ratings, and personalized user profiles. Deploying the model with real-time user feedback and advanced recommendation algorithms can further improve the accuracy and usefulness of the system.
