<h1 style="text-align: center;"> Movie Recommendation System </h1>


## Objective:
This project demonstrates a movie recommendation system using machine learning techniques. 
The system recommends movies based on their content similarity using textual features like genres, keywords, tagline, cast, and director.

### Key Steps:
1. Data Collection and Pre-Processing
2. Feature Engineering and Vectorization
3. Similarity Calculation Using Cosine Similarity
4. Generating Recommendations Based on User Input


In [58]:
# Importing Required Libraries
import numpy as np
import pandas as pd
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

##  1. Data Collection & Pre-Processing


In [59]:
# Loading the dataset into a pandas DataFrame
movies_data = pd.read_csv('movies.csv')

In [60]:
# Display the first 5 rows of the dataset
print("First 5 rows of the dataset:")
movies_data.head()

First 5 rows of the dataset:


Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton


In [61]:
# Checking the size of the dataset
print("\nNumber of rows and columns in the dataset:")
print(movies_data.shape)


Number of rows and columns in the dataset:
(4803, 24)


In [62]:
# Selecting the relevant features for building the recommendation system
selected_features = ['genres', 'keywords', 'tagline', 'cast', 'director']
print("\nSelected Features:")
print(selected_features)


Selected Features:
['genres', 'keywords', 'tagline', 'cast', 'director']


In [63]:
# Counting null values in each selected column
null_count = movies_data[selected_features].isnull().sum()
print("\nNull values in selected features:")
print(null_count)


Null values in selected features:
genres       28
keywords    412
tagline     844
cast         43
director     30
dtype: int64


In [64]:
# Replacing null values with empty strings for seamless processing
for feature in selected_features:
    movies_data[feature] = movies_data[feature].fillna('')


In [65]:
# Combining selected features into a single string for each movie
combine_features = (
    movies_data['genres'] + ' ' +
    movies_data['keywords'] + ' ' +
    movies_data['tagline'] + ' ' +
    movies_data['cast'] + ' ' +
    movies_data['director']
)


In [66]:
# Displaying a sample of combined features
print("\nSample combined features for movies:")
print(combine_features.head())


Sample combined features for movies:
0    Action Adventure Fantasy Science Fiction cultu...
1    Adventure Fantasy Action ocean drug abuse exot...
2    Action Adventure Crime spy based on novel secr...
3    Action Crime Drama Thriller dc comics crime fi...
4    Action Adventure Science Fiction based on nove...
dtype: object


## 2. Feature Engineering

In [67]:
# Converting text data into numerical feature vectors using TF-IDF Vectorization
vectorizer = TfidfVectorizer()
feature_vectors = vectorizer.fit_transform(combine_features)

In [68]:
# Displaying the shape of the resulting feature matrix
print("\nShape of the TF-IDF feature matrix:")
print(feature_vectors.shape)



Shape of the TF-IDF feature matrix:
(4803, 17318)


## 3. Cosine Similarity Calculation

In [69]:
# Calculating cosine similarity between all movie feature vectors
similarity = cosine_similarity(feature_vectors)

In [70]:
# Displaying the shape of the similarity matrix
print("\nShape of the similarity matrix:")
print(similarity.shape)


Shape of the similarity matrix:
(4803, 4803)


## 4. Generating Recommendations

In [71]:
# Prompting the user to enter their favorite movie
movie_name = input('\nEnter your favorite movie name: ')


Enter your favorite movie name: 


In [72]:
# Creating a list of all movie titles in the dataset
list_of_all_titles = movies_data['title'].tolist()

In [73]:
# Finding the closest match for the input movie name
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
if find_close_match:
    close_match = find_close_match[0]
    print(f"\nClosest match found: {close_match}")
else:
    print("\nNo matching movie found. Please try again with a different title.")
    exit()


No matching movie found. Please try again with a different title.


In [1]:
# Finding the index of the movie with the closest match
index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]
print(index_of_the_movie)

NameError: name 'movies_data' is not defined

In [None]:
## Getting a list of similarity scores for the movie
similarity_score = list(enumerate(similarity[index_of_the_movie]))
##print(similarity_score)

In [None]:
len(similarity_score)

In [None]:
# Sorting movies based on similarity scores in descending order
sorted_similar_movies = sorted(similarity_score, key=lambda x: x[1], reverse=True)
##print(sorted_similar_movies)


In [None]:
# Printing the top 30 recommended movies
print('\nMovies recommended for you:\n')
i = 1
for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = movies_data[movies_data.index == index]['title'].values[0]
    if i <= 30:
        print(f"{i}. {title_from_index}")
        i += 1
        
    
    
      
      
        

## Movie Recommendation Sytem

In [None]:
# Prompt the user to input their favorite movie
movie_name = input('\nEnter your favorite movie name: ')

# Creating a list of all movie titles from the dataset
list_of_all_titles = movies_data['title'].tolist()

# Using difflib to find the closest match for the user's input
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)

# Check if any match is found
if find_close_match:
    # Select the closest matching movie title
    close_match = find_close_match[0]
    print(f"\nClosest match found: {close_match}")
else:
    # Handle cases where no match is found
    print("\nNo close match found for the movie name provided. Please try again with a different title.")
    exit()

# Finding the index of the closest matching movie
index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]

# Retrieve the similarity scores for the selected movie
similarity_score = list(enumerate(similarity[index_of_the_movie]))

# Sort the movies based on similarity scores in descending order
sorted_similar_movies = sorted(similarity_score, key=lambda x: x[1], reverse=True)

# Displaying the top 30 recommended movies
print('\nMovies recommended for you:\n')

# Initialize a counter for ranking the recommendations
i = 1

# Loop through the sorted list of similar movies
for movie in sorted_similar_movies:
    index = movie[0]
    
    # Get the title of the movie from its index
    title_from_index = movies_data[movies_data.index == index]['title'].values[0]
    
    # Display up to 30 movie recommendations
    if i <= 30:
        print(f"{i}. {title_from_index}")
        i += 1