<a href="https://colab.research.google.com/github/Ameesha2214/Movie-Recommendation/blob/main/Movie_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Movie Recommendation System**

**Movie Match: Revolutionizing Movie Recommendations with the Close Match Algorithm**

Movie Match is an innovative recommendation system crafted for movie enthusiasts. Powered by the Close Match algorithm, it analyzes user inputs with precision, accommodating subtle variations to suggest films that perfectly align with individual preferences.

Key Features
1. **Close Match Precision**:
The Close Match algorithm delivers unmatched accuracy by handling typos, misspellings, or minor deviations in movie titles. Enjoy precise and relevant movie suggestions every time, elevating your cinematic experience.

2. **Personalized Recommendations:**
Whether you’re a fan of timeless classics, edge-of-your-seat thrillers, or heartwarming rom-coms, Movie Match adapts to your movie preferences. Discover a world of handpicked recommendations tailored just for you.

3. **Intuitive Movie Discovery:**
Experience effortless movie exploration. Movie Match streamlines the search process, offering a curated selection of films that resonate with your cinematic tastes. Dive into a personalized journey of film discovery like never before.

**OBJECTIVE**

The primary goal of the Movie Recommendation System is to provide users with accurate, personalized movie suggestions based on their preferences and inputs. Leveraging the Close Match algorithm, the system excels in delivering recommendations that accommodate minor deviations in movie titles. By personalizing suggestions and offering a seamless user experience, the system enhances user engagement and satisfaction, making movie discovery both enjoyable and effortless.

**DATA SOURCE**

The dataset for this project was sourced from the YBI Foundation Kaggle repository. It contains comprehensive information about movies, user ratings, and other relevant attributes essential for building the recommendation system.

**IMPORT LIBRARIES**

In [1]:
import pandas as pd
import numpy as np

**IMPORT DATASET**

In [3]:
df = pd.read_csv('/content/Movies Recommendation (1).csv')
print(df.head())
print(df.info())

   Movie_ID      Movie_Title                       Movie_Genre Movie_Language  \
0         1       Four Rooms                      Crime Comedy             en   
1         2        Star Wars  Adventure Action Science Fiction             en   
2         3     Finding Nemo                  Animation Family             en   
3         4     Forrest Gump              Comedy Drama Romance             en   
4         5  American Beauty                             Drama             en   

   Movie_Budget  Movie_Popularity Movie_Release_Date  Movie_Revenue  \
0       4000000         22.876230         09-12-1995        4300000   
1      11000000        126.393695         25-05-1977      775398007   
2      94000000         85.688789         30-05-2003      940335536   
3      55000000        138.133331         06-07-1994      677945399   
4      15000000         80.878605         15-09-1999      356296601   

   Movie_Runtime  Movie_Vote  ...  \
0           98.0         6.5  ...   
1          1

**CLEAN THE DATA**

In [4]:
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)


**PREPARE DATA**

In [5]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

data = {
    "Movie_Genre": ["Crime Comedy", "Adventure Action Science Fiction", "Animation Family", "Comedy Drama Romance", "Drama"],
    "Movie_Keywords": ["hotel new year's eve", "android galaxy hermit", "father son relationship", "vietnam veteran hippie", "male nudity female nudity"],
    "Movie_Tagline": ["Twelve outrageous guests.", "A long time ago...", "There are 3.7 trillion fish...", "The world will never be the same.", "Look closer."],
    "Movie_Cast": ["Tim Roth Antonio Banderas", "Mark Hamill Harrison Ford", "Albert Brooks Ellen DeGeneres", "Tom Hanks Robin Wright", "Kevin Spacey Annette Bening"],
    "Movie_Director": ["Allison Anders", "George Lucas", "Andrew Stanton", "Robert Zemeckis", "Sam Mendes"]
}
df = pd.DataFrame(data)

df_features = df[["Movie_Genre", "Movie_Keywords", "Movie_Tagline", "Movie_Cast", "Movie_Director"]].fillna('')
df_features["Combined"] = (
    df_features["Movie_Genre"] + " " +
    df_features["Movie_Keywords"] + " " +
    df_features["Movie_Tagline"] + " " +
    df_features["Movie_Cast"] + " " +
    df_features["Movie_Director"]
)


**CONVERT TEXT TO TF-IDF MATRIX**

In [6]:
tfidf = TfidfVectorizer()

tfidf_matrix = tfidf.fit_transform(df_features["Combined"])

print(f"TF-IDF Matrix Shape: {tfidf_matrix.shape}")

TF-IDF Matrix Shape: (5, 74)


**COMPUTE COSINE SIMILARITY**

In [7]:
cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

print(f"Cosine Similarity Matrix:\n{cosine_sim}")


Cosine Similarity Matrix:
[[1.         0.         0.         0.03774201 0.        ]
 [0.         1.         0.         0.         0.        ]
 [0.         0.         1.         0.         0.        ]
 [0.03774201 0.         0.         1.         0.03774201]
 [0.         0.         0.         0.03774201 1.        ]]


**RECOMMEND MOVIES**

In [9]:
def get_recommendations(title_index, similarity_matrix, df):
  similarity_scores = list(enumerate(similarity_matrix[title_index]))

  # The following line was incorrectly indented.
  sorted_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

  recommended_indices = [idx for idx, score in sorted_scores[1:6]]
  return df.iloc[recommended_indices][["Movie_Genre", "Movie_Cast"]]


recommendations = get_recommendations(0, cosine_sim, df)
print(f"Recommended Movies:\n{recommendations}")

Recommended Movies:
                        Movie_Genre                     Movie_Cast
3              Comedy Drama Romance         Tom Hanks Robin Wright
1  Adventure Action Science Fiction      Mark Hamill Harrison Ford
2                  Animation Family  Albert Brooks Ellen DeGeneres
4                             Drama    Kevin Spacey Annette Bening


**EXPAND FOR USER INPUT**

In [10]:
def find_similar_movies(user_input, tfidf_matrix, tfidf, df):

    user_input_tfidf = tfidf.transform([user_input])

    similarity_scores = cosine_similarity(user_input_tfidf, tfidf_matrix)

    top_indices = similarity_scores[0].argsort()[-5:][::-1]

    return df.iloc[top_indices]

user_input = "galaxy android adventure"
similar_movies = find_similar_movies(user_input, tfidf_matrix, tfidf, df)
print(f"Movies similar to '{user_input}':\n{similar_movies}")


Movies similar to 'galaxy android adventure':
                        Movie_Genre             Movie_Keywords  \
1  Adventure Action Science Fiction      android galaxy hermit   
4                             Drama  male nudity female nudity   
3              Comedy Drama Romance     vietnam veteran hippie   
2                  Animation Family    father son relationship   
0                      Crime Comedy       hotel new year's eve   

                       Movie_Tagline                     Movie_Cast  \
1                 A long time ago...      Mark Hamill Harrison Ford   
4                       Look closer.    Kevin Spacey Annette Bening   
3  The world will never be the same.         Tom Hanks Robin Wright   
2     There are 3.7 trillion fish...  Albert Brooks Ellen DeGeneres   
0          Twelve outrageous guests.      Tim Roth Antonio Banderas   

    Movie_Director  
1     George Lucas  
4       Sam Mendes  
3  Robert Zemeckis  
2   Andrew Stanton  
0   Allison Anders  


In [13]:


import pandas as pd
import difflib

Favourite_Movie_Name = input('Enter your favourite movie name: ')

# Assuming your original movie dataset is named 'original_df' and contains 'Movie_Title'
original_df = pd.read_csv('/content/Movies Recommendation (1).csv')  # Replace with the actual path
original_df.dropna(inplace=True)
original_df.drop_duplicates(inplace=True)

# Get the closest match for the movie title from the original dataset
All_Movies_Title_List = original_df['Movie_Title'].tolist()  # Use original_df here
Movie_Recommendation = difflib.get_close_matches(Favourite_Movie_Name, All_Movies_Title_List)

if Movie_Recommendation:
    Close_Match = Movie_Recommendation[0]
    print(f"Did you mean: {Close_Match}?")

    # Get index of the closest match from the original dataset
    Index_of_Close_Match_Movie = original_df[original_df.Movie_Title == Close_Match].index[0]  # Use original_df and index

    # Assuming 'similarity_scores' is calculated based on the original dataset
    # similarity_scores = ... (calculate based on original_df)

    Recommendation_Score = list(enumerate(cosine_sim[Index_of_Close_Match_Movie]))

    Sorted_Similar_Movies = sorted(Recommendation_Score, key=lambda x: x[1], reverse=True)

    print("\nTop 10 Movies Recommended for You:\n")
    i = 1
    for movie in Sorted_Similar_Movies:
        index = movie[0]
        # Get the title from the original dataset using the index
        title_from_index = original_df.loc[index, 'Movie_Title']  # Use loc to get the title
        if i <= 10:
            print(f"{i}. {title_from_index}")
            i += 1
else:
    print("Sorry, no close matches found for the entered movie title.")

Enter your favourite movie name: Look Closer
Sorry, no close matches found for the entered movie title.
