# Movie_Recommendation_System

In [104]:
'''This project aims to develop a movie recommendation system that suggests similar movies based on a user's input.By leveraging a dataset of Netflix 
titles, the system utilizes natural language processing techniques to analyze various features of the movies, such as their titles, cast, directors,
genres, and descriptions. The core of the recommendation process involves converting text data into numerical vectors and calculating the cosine
similarity between these vectors to identify movies that are closely related in terms of content'''

"This project aims to develop a movie recommendation system that suggests similar movies based on a user's input.By leveraging a dataset of Netflix \ntitles, the system utilizes natural language processing techniques to analyze various features of the movies, such as their titles, cast, directors,\ngenres, and descriptions. The core of the recommendation process involves converting text data into numerical vectors and calculating the cosine\nsimilarity between these vectors to identify movies that are closely related in terms of content"

In [106]:
'''we are importing the pandas library, which is used for data manipulation. We are also importing CountVectorizer and cosine_similarity 
which are used to convert text data into vectors and 
to calculate similarity, respectively'''
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [108]:
#load the data
'''Load Dataset,we are loading our CSV file netflix_titles.csv and storing it in a DataFrame named movies.'''
movies = pd.read_csv("netflix_titles.csv")

In [110]:
#select relevant cols
'''herer  we are selecting only the columns that we need: title, cast, director, listed_in, and description.'''
movies = movies[['title', 'cast', 'director', 'listed_in', 'description']]

In [112]:
#drop the missing data
'''Here, we are dropping the rows that contain any missing values. The inplace=True parameter means that the changes
will be applied to the original DataFrame.'''
movies.dropna(inplace=True)

In [114]:
# Combine Features: listed_in (categories/genres), description, cast, director
'''In this function, we are combining different features. For each row, we are concatenating listed_in, description, cast,
and director into a single string.'''
def combine_features(row):
    return row['listed_in'] + " " + row['description'] + " " + row['cast'] + " " + row['director']

In [116]:

'''we are applying the combine_features function to each row and creating a new column called combined_features.'''
movies['combined_features'] = movies.apply(combine_features, axis=1)

In [118]:
# Initialize CountVectorizer and create count matrix
''' we are initializing the CountVectorizer and converting combined_features into a vector. 
The stop_words='english' parameter means that common English words will be ignored.'''
cv = CountVectorizer(stop_words='english')
count_matrix = cv.fit_transform(movies['combined_features'])

In [120]:

# Compute Cosine Similarity matrix from count matrix
'''Here, we are calculating the cosine similarity, which will give us a similarity score between each movie.'''
cosine_sim = cosine_similarity(count_matrix)

In [121]:
# Movie Recommendation Function
def recommend(movie_name):
    '''In this function, we are converting the movie name to lowercase for easier comparison. We also convert the titles to lowercase.'''
    movie_name = movie_name.lower()
    titles_lower = movies['title'].str.lower()
    
    '''Here, we are checking if the provided movie name exists in the DataFrame. If it does not, we return "Movie not found."'''
    if movie_name not in titles_lower.values:
        return "Movie not found."

    '''In this part, we find the index of the movie and create a list of similar movies based on that index.
    We then sort them according to their similarity scores and take the top 5 similar movies.'''
    movie_index = titles_lower[titles_lower == movie_name].index[0]
    similar_movies = list(enumerate(cosine_sim[movie_index]))

    # Sort movies based on similarity score; skip the first (same movie)
    sorted_similar_movies = sorted(similar_movies, key=lambda x: x[1], reverse=True)[1:6]

    '''Here, we extract the titles from the sorted similar movies and return them.'''
    recommendations = [movies.iloc[i[0]].title for i in sorted_similar_movies]
    return recommendations

# Test the recommendation function
if __name__ == "__main__":
    '''In this block, we check if the script is being run directly. If so, we print recommendations for the movie "Jeans".'''
    test_movie = "Jeans"
    print(f"Recommended movies for '{test_movie}':")
    print(recommend(test_movie))

Recommended movies for 'Jeans':
['Jaws 2', 'Jaws', 'Jaws 3', 'Never Back Down 2: The Beatdown', 'In The Deep']


In [124]:
'''In conclusion, the movie recommendation system successfully demonstrates how to utilize text data and similarity measures to provide personalized movie 
suggestions. By combining features from multiple columns in the dataset and applying cosine similarity, the system can effectively recommend movies that 
align with the user's interests. This project not only showcases the practical application of data manipulation and machine learning techniques but 
also highlights the potential for enhancing user experience in content discovery platforms like Netflix.'''

"In conclusion, the movie recommendation system successfully demonstrates how to utilize text data and similarity measures to provide personalized movie \nsuggestions. By combining features from multiple columns in the dataset and applying cosine similarity, the system can effectively recommend movies that \nalign with the user's interests. This project not only showcases the practical application of data manipulation and machine learning techniques but \nalso highlights the potential for enhancing user experience in content discovery platforms like Netflix."