# **Title of Project**
MOVIE RECOMMENDATION SYSTEM

-------------

## **Objective**

To build a movie recommendation system that suggests movies based on the user's favorite movie using a combination of genres, keywords, taglines, cast, and director.

## **Data Source**

The dataset is sourced from a public repository on GitHub: Movies Recommendation Dataset.

## **Import Library**

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import difflib
import matplotlib.pyplot as plt
import seaborn as sns


## **Import Data**

In [None]:
# Reading the dataset from the URL
df = pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Movies%20Recommendation.csv')


## **Describe Data**

In [None]:
# Display the first few rows of the dataframe
print(df.head())

# Get information about the dataset
print(df.info())

# Display the shape of the dataset
print(df.shape)

# Display the column names
print(df.columns)


## **Data Visualization**

In [None]:
# Plotting the distribution of movie genres
plt.figure(figsize=(10, 6))
sns.countplot(y='Movie_Genre', data=df, order=df['Movie_Genre'].value_counts().index)
plt.title('Distribution of Movie Genres')
plt.xlabel('Count')
plt.ylabel('Genre')
plt.show()


## **Data Preprocessing**

In [None]:
# Select relevant features and fill missing values with empty strings
df_features = df[['Movie_Genre', 'Movie_Keywords', 'Movie_Tagline', 'Movie_Cast', 'Movie_Director']].fillna('')

# Combine features into a single string for each movie
X = df_features['Movie_Genre'] + ' ' + df_features['Movie_Keywords'] + ' ' + df_features['Movie_Tagline'] + ' ' + df_features['Movie_Cast'] + ' ' + df_features['Movie_Director']


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
we don't have a traditional target variable y, since we're building a recommendation system based on features X.

## **Train Test Split**

In [None]:
Not applicable here since we are not performing a traditional supervised learning task.

## **Modeling**

In [None]:
# Convert the combined features into TF-IDF vectors
tfidf = TfidfVectorizer()
X_tfidf = tfidf.fit_transform(X)

# Calculate the cosine similarity between the TF-IDF vectors
similarity_score = cosine_similarity(X_tfidf)


## **Model Evaluation**

In [None]:
Evaluation will be based on user feedback or further statistical methods to validate the recommendations, but here we focus on generating recommendations.

## **Prediction**

In [None]:
# Function to get movie recommendations
def get_recommendations(favorite_movie_name, df, similarity_score):
    all_movies_title_list = df['Movie_Title'].tolist()
    movie_recommendation = difflib.get_close_matches(favorite_movie_name, all_movies_title_list)
    
    if not movie_recommendation:
        return "No close match found for the provided movie name."
    
    close_match = movie_recommendation[0]
    index_of_close_match_movie = df[df.Movie_Title == close_match]['Movie_ID'].values[0]
    recommendation_score = list(enumerate(similarity_score[index_of_close_match_movie]))
    sorted_similar_movies = sorted(recommendation_score, key=lambda x: x[1], reverse=True)
    
    print(f'Top 10 Movies suggested for you based on {close_match}: \n')
    i = 1
    for movie in sorted_similar_movies[1:11]:  # skipping the first match as it is the same movie
        index = movie[0]
        title_from_index = df[df.index == index]['Movie_Title'].values[0]
        print(f"{i}. {title_from_index}")
        i += 1


## **Explaination**

This code builds a movie recommendation system using the TF-IDF vectorization and cosine similarity. Given a favorite movie name, it finds the closest match from the dataset and recommends similar movies based on combined features like genre, keywords, tagline, cast, and director.


In [None]:
favorite_movie_name = input('Enter your favourite movie name: ')
get_recommendations(favorite_movie_name, df, similarity_score)
