<a href="https://colab.research.google.com/github/amitesh1906/Movierecommendations./blob/main/movierecommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Title of Project:
Movie Recommendation System Using Content-Based Filtering

Objective:
To build a movie recommendation system that suggests similar movies based on the user’s favorite movie using content-based filtering. The system uses movie metadata such as genre, language, keywords, vote count, and popularity to calculate similarity scores between movies.

Data Source:
The dataset is sourced from the GitHub repository:
https://github.com/YBIFoundation/Dataset/raw/main/Movies%20Recommendation.csv

Import Library
This section imports the necessary libraries required for data manipulation, machine learning, and similarity computation.

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import difflib


Import Data
Read the movie dataset from the CSV file using Pandas.



In [2]:
df = pd.read_csv(r'https://github.com/YBIFoundation/Dataset/raw/main/Movies%20Recommendation.csv')


Describe Data
This section provides an overview of the dataset, such as column names, data types, and the shape of the data.

In [3]:
print(df.head())  # Display the first few rows of the dataset
print(df.info())  # Show data types and missing values
print(df.shape)   # Show the dimensions of the dataset
print(df.columns) # Display column names


   Movie_ID      Movie_Title                       Movie_Genre Movie_Language  \
0         1       Four Rooms                      Crime Comedy             en   
1         2        Star Wars  Adventure Action Science Fiction             en   
2         3     Finding Nemo                  Animation Family             en   
3         4     Forrest Gump              Comedy Drama Romance             en   
4         5  American Beauty                             Drama             en   

   Movie_Budget  Movie_Popularity Movie_Release_Date  Movie_Revenue  \
0       4000000         22.876230         09-12-1995        4300000   
1      11000000        126.393695         25-05-1977      775398007   
2      94000000         85.688789         30-05-2003      940335536   
3      55000000        138.133331         06-07-1994      677945399   
4      15000000         80.878605         15-09-1999      356296601   

   Movie_Runtime  Movie_Vote  ...  \
0           98.0         6.5  ...   
1          1

Data Preprocessing
Select relevant features for building the recommendation system and handle missing values.

In [4]:
df_features = df[['Movie_Genre', 'Movie_Language', 'Movie_Keywords', 'Movie_Vote_Count', 'Movie_Popularity']].fillna('')


In [5]:
X = df_features['Movie_Genre'] + ' ' + df_features['Movie_Language'] + ' ' + df_features['Movie_Keywords'] + ' ' + df_features['Movie_Vote_Count'].astype(str) + ' ' + df_features['Movie_Popularity'].astype(str)


Define Target Variable (y) and Feature Variables (X)
Convert the combined features into a numerical matrix using TF-IDF Vectorizer, which helps in weighting the importance of words in the context of the dataset.

In [6]:
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(X)


Modeling
Use cosine similarity to measure the similarity between movies based on their features.

In [7]:
similarity_Score = cosine_similarity(X)


Prediction
Get recommendations for a user-input movie name. Find the closest match and suggest similar movies based on similarity scores

In [10]:
# Input from the user for their favorite movie
Favourite_Movie_Name = input('Enter your favourite movie name: ')

# Convert the 'Movie_Title' column into a list and find the closest match
All_Movies_Title_List = df['Movie_Title'].tolist()
Movie_Recommendation = difflib.get_close_matches(Favourite_Movie_Name, All_Movies_Title_List)

# Get the closest matching movie and its index
Close_Match = Movie_Recommendation[0]
Index_of_Close_Match_Movie = df[df['Movie_Title'] == Close_Match]["Movie_ID"].values[0]

# Compute similarity scores for the closest match
Recommendation_Score = list(enumerate(similarity_Score[Index_of_Close_Match_Movie]))

# Sort movies by similarity score
Sorted_Similar_Movies = sorted(Recommendation_Score, key=lambda x: x[1], reverse=True)

# Display top 30 recommended movies
print("Top 30 Movies Recommended for You:\n")
i = 1
for movie in Sorted_Similar_Movies:
    index = movie[0]
    title_from_index = df.loc[index, "Movie_Title"]
    if i <= 30:
        print(f"{i}. {title_from_index}")
        i += 1


Enter your favourite movie name: punj
Top 30 Movies Recommended for You:

1. Mad Money
2. Kiss of Death
3. Civil Brand
4. The Three Stooges
5. RockNRolla
6. Sugar Town
7. Baby's Day Out
8. Without Men
9. It's a Mad, Mad, Mad, Mad World
10. Proud
11. The Young Unknowns
12. Archaeology of a Woman
13. Phat Girlz
14. Next Friday
15. Chasing Papi
16. Yes
17. Lovely & Amazing
18. Kangaroo Jack
19. Walking and Talking
20. I Love You, Don't Touch Me!
21. Woman on Top
22. There's Always Woodstock
23. Windsor Drive
24. How to Deal
25. Peeples
26. The Trials Of Darryl Hunt
27. Sisters in Law
28. Born to Fly: Elizabeth Streb vs. Gravity
29. Butterfly Girl
30. Antarctic Edge: 70° South


Model Evaluation
Provide a subset of top 10 recommendations based on the similarity score to evaluate the system's output quality.

In [9]:
# Ask the user again for another favorite movie
Movie_Name = input('Enter your favourite movie name: ')
list_of_all_titles = df['Movie_Title'].tolist()

# Find the closest match for the input movie name
Close_Match = difflib.get_close_matches(Movie_Name, list_of_all_titles, n=1, cutoff=0.6)

# If a match is found, display the top 10 recommended movies
if Close_Match:
    Close_Match = Close_Match[0]
    Index_of_Movie = df[df['Movie_Title'] == Close_Match]['Movie_ID'].values[0]
    Recommendation_Score = list(enumerate(similarity_Score[Index_of_Movie]))
    Sorted_Recommendation_Score = sorted(Recommendation_Score, key=lambda x: x[1], reverse=True)

    print("Top 10 Movies Suggested for You:\n")
    i = 1
    for movie in Sorted_Recommendation_Score:
        index = movie[0]
        movie_title = df.loc[index, 'Movie_Title']
        if i <= 10:
            print(f"{i}. {movie_title}")
            i += 1
else:
    print("No close match found for the entered movie name.")


Enter your favourite movie name: avatarr
Top 10 Movies Suggested for You:

1. Niagara
2. Harry Brown
3. Eye for an Eye
4. Welcome to the Sticks
5. Back to the Future
6. The Curse of Downers Grove
7. Se7en
8. Backmask
9. Enough
10. Yeh Jawaani Hai Deewani


***Explanation***:
The system uses TF-IDF Vectorization to convert text-based movie features into numerical form, which allows the calculation of cosine similarity between movies. This similarity score is used to find and rank movies that are closest in characteristics to the user's input, providing personalized recommendations.

