**üé¨ INTRODUCTION:**

This is a movie recommendation system based on content similarity,also  
it helps users discover similar movies using metadata like genres, cast, and director

**üéØ OBJECTIVE:**

To suggest the top 30 most similar movies to a user-given input movie,  
optionally filtered by language and genre for better personalization


**üìÇ DATASET:**

The dataset contains movie details such as title, genre, director, cast, rating, etc...
It is read from a CSV file uploaded by the user(movies_data.csv)

**üß† CONCEPT USED:**

Content-based filtering using TF-IDF (Term Frequency-Inverse Document Frequency)  
and cosine similarity to measure how similar one movie is to another

**üì§ OUTPUT:**

User enters a movie name, preferred language, and genre and  the system displays 30 similar movies with title, rating, and popularity

In [2]:
#  STEP 1: Upload and Load the CSV File
from google.colab import files
import pandas as pd

# Let user upload the dataset manually
uploaded = files.upload()

# Get the name of the first uploaded file
filename = list(uploaded.keys())[0]

# Load the CSV into a DataFrame
movies_data = pd.read_csv(filename)

print("‚úÖ Uploaded:", filename)
print("üìÑ Columns in dataset:", movies_data.columns.tolist())


Saving movies(final).csv to movies(final).csv
‚úÖ Uploaded: movies(final).csv
üìÑ Columns in dataset: ['index', 'budget', 'genres', 'homepage', 'id', 'keywords', 'original_language', 'original_title', 'overview', 'popularity', 'production_companies', 'production_countries', 'release_date', 'revenue', 'runtime', 'spoken_languages', 'status', 'tagline', 'title', 'vote_average', 'vote_count', 'cast', 'crew', 'director']


In [3]:


# üß† STEP 2: Text Processing for Content-Based Filtering
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Fill missing values with empty strings to avoid errors
for col in ['genres', 'director', 'cast', 'original_language', 'title']:
    if col in movies_data.columns:
        movies_data[col] = movies_data[col].fillna('')
    else:
        movies_data[col] = ''

# Combine genres, director, and cast into one column for similarity comparison
movies_data['combined'] = (
    movies_data['genres'] + ' ' +
    movies_data['director'] + ' ' +
    movies_data['cast']
)

# Convert text to numerical features using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
feature_vectors = vectorizer.fit_transform(movies_data['combined'])

# Compute cosine similarity between all movies
similarity = cosine_similarity(feature_vectors)

print("‚úÖ Similarity matrix created.")

# Convert similarity matrix to a DataFrame with movie titles
similarity_df = pd.DataFrame(similarity, index=movies_data['title'], columns=movies_data['title'])

# Show the top-left 5x5 part
similarity_df.iloc[:5, :5]

‚úÖ Similarity matrix created.


title,Avatar,Pirates of the Caribbean: At World's End,Spectre,The Dark Knight Rises,John Carter
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Avatar,1.0,0.047347,0.071842,0.013833,0.069439
Pirates of the Caribbean: At World's End,0.047347,1.0,0.023552,0.011615,0.023398
Spectre,0.071842,0.023552,1.0,0.031476,0.025788
The Dark Knight Rises,0.013833,0.011615,0.031476,1.0,0.012717
John Carter,0.069439,0.023398,0.025788,0.012717,1.0


In [12]:
# üé¨ STEP 3: Get User Input and Find Closest Matching Movie
import difflib

# Ask the user for a movie name
movie_name = input("\nüé• Enter your favourite movie: ")

# Get all movie titles from the dataset
all_titles = movies_data['title'].tolist()

# Use difflib to find close matches to the entered name
matches = difflib.get_close_matches(movie_name, all_titles)

# If no match is found
if not matches:
    print("‚ùå No close match found.")
else:
    # Pick the best matching title
    close_match = matches[0]
    movie_index = movies_data[movies_data['title'] == close_match].index[0]
    print(f"\n‚úÖ Exact Match Found: {close_match}\n")



üé• Enter your favourite movie: furious 7

‚úÖ Exact Match Found: Furious 7



In [13]:
# üìÑ STEP 4: Show Info About the Selected Movie
selected = movies_data.iloc[movie_index]

print(f"üé¨ Title: {selected['title']}")
print(f"üé≠ Cast: {selected['cast'][:100]}{'...' if len(selected['cast']) > 100 else ''}")
print(f"üé¨ Director: {selected['director']}")
print(f"üìù Tagline: {selected['tagline'] if 'tagline' in movies_data.columns and pd.notna(selected['tagline']) else 'N/A'}\n")

print("üìñ Overview:")
print(selected['overview'][:300] + "...\n" if 'overview' in movies_data.columns and pd.notna(selected['overview']) else "No overview available.\n")

print(f"‚≠ê Rating: {selected['vote_average'] if 'vote_average' in movies_data.columns else 'N/A'} /10")
print(f"üî• Popularity: {selected['popularity'] if 'popularity' in movies_data.columns else 'N/A'}")
print(f"üí∞ Budget: ${int(selected['budget']) if 'budget' in movies_data.columns and pd.notna(selected['budget']) else 'N/A'}")
print(f"üïê Runtime: {selected['runtime'] if 'runtime' in movies_data.columns else 'N/A'} min")
print(f"üó£Ô∏è Language: {selected['original_language']}")
print(f"üìÖ Release Date: {selected['release_date'] if 'release_date' in movies_data.columns else 'N/A'}")
print(f"üé≠ Genres: {selected['genres']}\n")


üé¨ Title: Furious 7
üé≠ Cast: Vin Diesel Paul Walker Dwayne Johnson Michelle Rodriguez Tyrese Gibson
üé¨ Director: James Wan
üìù Tagline: Vengeance Hits Home

üìñ Overview:
Deckard Shaw seeks revenge against Dominic Toretto and his family for his comatose brother....

‚≠ê Rating: 7.3 /10
üî• Popularity: 102.322217
üí∞ Budget: $190000000
üïê Runtime: 137.0 min
üó£Ô∏è Language: en
üìÖ Release Date: 2015-04-01
üé≠ Genres: Action



In [14]:
# üéØ STEP 5: Ask for Optional Filters (Language / Genre)
preferred_lang = input("üåê Enter preferred language (or press Enter to skip): ").lower()
preferred_genre = input("üéØ Enter preferred genre (or press Enter to skip): ").lower()


üåê Enter preferred language (or press Enter to skip): english
üéØ Enter preferred genre (or press Enter to skip): fantasy


In [15]:
# üßÆ STEP 6: Score Similar Movies Based on Similarity √ó Rating

# Fill missing ratings with a neutral value (5.0)
if 'vote_average' in movies_data.columns:
    movies_data['vote_average'] = movies_data['vote_average'].fillna(5.0)
else:
    movies_data['vote_average'] = 5.0

# Get similarity scores for the selected movie
similarity_scores = list(enumerate(similarity[movie_index]))

# Multiply similarity by rating to prioritize well-rated similar movies
scored_movies = [
    (i, score * float(movies_data.iloc[i]['vote_average']))
    for i, score in similarity_scores
]

# Sort the movies by final score (high to low)
sorted_movies = sorted(scored_movies, key=lambda x: x[1], reverse=True)


In [16]:
# üìã STEP 7: Display Function for Recommended Movies
def display_movies(movies_to_display, msg="üéØ TOP 30 SIMILAR MOVIES:\n"):
    print("\n" + msg)
    print(f"{'No.':<4} {'Movie Title':<35} ‚≠ê {'Rating':<8} üî• Popularity")
    print("-" * 65)
    for count, (idx, final_score) in enumerate(movies_to_display, 1):
        if idx == movie_index:
            continue  # Skip the selected movie itself
        row = movies_data.iloc[idx]
        title = row['title']
        rating = row['vote_average']
        popularity = row['popularity'] if 'popularity' in movies_data.columns else 'N/A'
        print(f"{count:<4} {title[:33]:<35} ‚≠ê {rating:<8} üî• {popularity}")
        if count == 30:
            break


In [17]:
# üßæ STEP 9: Show Final Recommendations
filtered=[]
if len(filtered) == 0:
    print("\n‚ùå No recommendations matched the filters, showing top 30 without filters instead.")
    display_movies(sorted_movies)
else:
    display_movies(filtered)

print("\n‚úÖ Done.")



‚ùå No recommendations matched the filters, showing top 30 without filters instead.

üéØ TOP 30 SIMILAR MOVIES:

No.  Movie Title                         ‚≠ê Rating   üî• Popularity
-----------------------------------------------------------------
2    Fast Five                           ‚≠ê 7.1      üî• 7.255717999999999
3    The Fast and the Furious            ‚≠ê 6.6      üî• 6.909942
4    2 Fast 2 Furious                    ‚≠ê 6.2      üî• 10.520961
5    Guardians of the Galaxy             ‚≠ê 7.9      üî• 481.098624
6    Saving Private Ryan                 ‚≠ê 7.9      üî• 76.04186700000002
7    Death Race                          ‚≠ê 6.0      üî• 42.57877
8    Machete Kills                       ‚≠ê 5.3      üî• 29.072964
9    Machete                             ‚≠ê 6.3      üî• 26.396191
10   Find Me Guilty                      ‚≠ê 6.5      üî• 12.321302
11   Four Brothers                       ‚≠ê 6.7      üî• 24.694551
12   Transformers                        ‚≠ê