#  Content-Based Movie Recommendation System

Welcome to this Movie Recommendation System built using **Content-Based Filtering**!

###  Overview
This project recommends movies to users based on their input of a favorite movie. It analyzes movie metadata such as genres, keywords, cast, and more to suggest similar titles using **cosine similarity**.

###  Key Features
- Takes user input with support for **typos**, **partial names**, and **flexible matching**.
- Uses **fuzzy string matching** 'difflib' to improve movie title detection.
- Falls back to **substring search** if fuzzy matching fails.
- Provides **Top 30 movie recommendations** based on similarity.
- Efficient and user-friendly.

###  Tech Stack
- 'pandas' for data handling
- 'difflib' for fuzzy matching
- Cosine similarity matrix (precomputed) for content-based filtering

###  How it Works
1. User inputs a movie name (e.g., "Harry Potter" or "The Polar Express").
2. Movie titles are normalized (lowercase, no spaces).
3. Fuzzy match is attempted against all movie titles.
4. If matched, recommendations are shown based on cosine similarity.



In [1]:
import pandas as pd
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# **Data Collection and Pre-Processing**

In [4]:
#loading the data from csv file to pandas dataframe
movies_data=pd.read_csv('/content/sample_data/movies.csv')

#printing the first 5 rows of the dataframe
movies_data.head()


Unnamed: 0,index,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,cast,crew,director
0,0,237000000,Action Adventure Fantasy Science Fiction,http://www.avatarmovie.com/,19995,culture clash future space war space colony so...,en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,Sam Worthington Zoe Saldana Sigourney Weaver S...,"[{'name': 'Stephen E. Rivkin', 'gender': 0, 'd...",James Cameron
1,1,300000000,Adventure Fantasy Action,http://disney.go.com/disneypictures/pirates/,285,ocean drug abuse exotic island east india trad...,en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,Johnny Depp Orlando Bloom Keira Knightley Stel...,"[{'name': 'Dariusz Wolski', 'gender': 2, 'depa...",Gore Verbinski
2,2,245000000,Action Adventure Crime,http://www.sonypictures.com/movies/spectre/,206647,spy based on novel secret agent sequel mi6,en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,Daniel Craig Christoph Waltz L\u00e9a Seydoux ...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes
3,3,250000000,Action Crime Drama Thriller,http://www.thedarkknightrises.com/,49026,dc comics crime fighter terrorist secret ident...,en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,Christian Bale Michael Caine Gary Oldman Anne ...,"[{'name': 'Hans Zimmer', 'gender': 2, 'departm...",Christopher Nolan
4,4,260000000,Action Adventure Science Fiction,http://movies.disney.com/john-carter,49529,based on novel mars medallion space travel pri...,en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,Taylor Kitsch Lynn Collins Samantha Morton Wil...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton


In [5]:
# finding no. of frows and columns in the dataframe
movies_data.shape

(4803, 24)

In [6]:
# Feature Selection for recommendation
selected_features=['genres','keywords','tagline','cast','director']
print(selected_features)

['genres', 'keywords', 'tagline', 'cast', 'director']


In [7]:
# replacing the null values with null string

for features in selected_features:
  movies_data[features]=movies_data[features].fillna('')


In [8]:
# Feature extraction

combined_features=movies_data['genres']+' '+movies_data['keywords']+' '+movies_data['tagline']+' '+movies_data['cast']+' '+movies_data['director']

In [9]:
print(combined_features)

0       Action Adventure Fantasy Science Fiction cultu...
1       Adventure Fantasy Action ocean drug abuse exot...
2       Action Adventure Crime spy based on novel secr...
3       Action Crime Drama Thriller dc comics crime fi...
4       Action Adventure Science Fiction based on nove...
                              ...                        
4798    Action Crime Thriller united states\u2013mexic...
4799    Comedy Romance  A newlywed couple's honeymoon ...
4800    Comedy Drama Romance TV Movie date love at fir...
4801      A New Yorker in Shanghai Daniel Henney Eliza...
4802    Documentary obsession camcorder crush dream gi...
Length: 4803, dtype: object


In [10]:
# converting the text data to feature vectors
vectorizer=TfidfVectorizer()
feature_vectors=vectorizer.fit_transform(combined_features)
print(feature_vectors)

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 124266 stored elements and shape (4803, 17318)>
  Coords	Values
  (0, 201)	0.07860022416510505
  (0, 274)	0.09021200873707368
  (0, 5274)	0.11108562744414445
  (0, 13599)	0.1036413987316636
  (0, 5437)	0.1036413987316636
  (0, 3678)	0.21392179219912877
  (0, 3065)	0.22208377802661425
  (0, 5836)	0.1646750903586285
  (0, 14378)	0.33962752210959823
  (0, 16587)	0.12549432354918996
  (0, 3225)	0.24960162956997736
  (0, 14271)	0.21392179219912877
  (0, 4945)	0.24025852494110758
  (0, 15261)	0.07095833561276566
  (0, 16998)	0.1282126322850579
  (0, 11192)	0.09049319826481456
  (0, 11503)	0.27211310056983656
  (0, 13349)	0.15021264094167086
  (0, 17007)	0.23643326319898797
  (0, 17290)	0.20197912553916567
  (0, 13319)	0.2177470539412484
  (0, 14064)	0.20596090415084142
  (0, 16668)	0.19843263965100372
  (0, 14608)	0.15150672398763912
  (0, 8756)	0.22709015857011816
  :	:
  (4801, 403)	0.17727585190343229
  (4801, 4835)	0.247137650

# **Cosine Similarity**

In [11]:
# getting similarity scores using cosine similarity
similarity = cosine_similarity(feature_vectors)

print(similarity)

[[1.         0.07219487 0.037733   ... 0.         0.         0.        ]
 [0.07219487 1.         0.03281499 ... 0.03575545 0.         0.        ]
 [0.037733   0.03281499 1.         ... 0.         0.05389661 0.        ]
 ...
 [0.         0.03575545 0.         ... 1.         0.         0.02651502]
 [0.         0.         0.05389661 ... 0.         1.         0.        ]
 [0.         0.         0.         ... 0.02651502 0.         1.        ]]


In [12]:
print(similarity.shape)

(4803, 4803)


In [18]:
# retrieving the movie name from user

movie_name = input(' Enter your favourite movie name: ')

 Enter your favourite movie name: Harry Potter


In [65]:
# creating a list with all the movies given in the dataset

list_of_all_titles = movies_data['title'].tolist()
print(list_of_all_titles)

['Avatar', "Pirates of the Caribbean: At World's End", 'Spectre', 'The Dark Knight Rises', 'John Carter', 'Spider-Man 3', 'Tangled', 'Avengers: Age of Ultron', 'Harry Potter and the Half-Blood Prince', 'Batman v Superman: Dawn of Justice', 'Superman Returns', 'Quantum of Solace', "Pirates of the Caribbean: Dead Man's Chest", 'The Lone Ranger', 'Man of Steel', 'The Chronicles of Narnia: Prince Caspian', 'The Avengers', 'Pirates of the Caribbean: On Stranger Tides', 'Men in Black 3', 'The Hobbit: The Battle of the Five Armies', 'The Amazing Spider-Man', 'Robin Hood', 'The Hobbit: The Desolation of Smaug', 'The Golden Compass', 'King Kong', 'Titanic', 'Captain America: Civil War', 'Battleship', 'Jurassic World', 'Skyfall', 'Spider-Man 2', 'Iron Man 3', 'Alice in Wonderland', 'X-Men: The Last Stand', 'Monsters University', 'Transformers: Revenge of the Fallen', 'Transformers: Age of Extinction', 'Oz: The Great and Powerful', 'The Amazing Spider-Man 2', 'TRON: Legacy', 'Cars 2', 'Green Lant

first method for finding the good match for movie names

In [66]:
# finding the close match for the movie name given by the user

find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_close_match)

['The Polar Express', 'The Express', 'Fortress']


In [67]:
close_match = find_close_match[0]
print(close_match)

The Polar Express


In [68]:
# Finding the index of the movie with title

index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]
print(index_of_the_movie)

90


In [69]:
similarity_score = list(enumerate(similarity[index_of_the_movie]))
print(similarity_score)

[(0, np.float64(0.019902191660509116)), (1, np.float64(0.024273916011565772)), (2, np.float64(0.008092212818035082)), (3, np.float64(0.030210226070149077)), (4, np.float64(0.007701648625204321)), (5, np.float64(0.020617112390809506)), (6, np.float64(0.03399323232470618)), (7, np.float64(0.021867165310835907)), (8, np.float64(0.058002896784536975)), (9, np.float64(0.019626649101844994)), (10, np.float64(0.02238538375854609)), (11, np.float64(0.00702619386980172)), (12, np.float64(0.020321153926143573)), (13, np.float64(0.0072913128564804355)), (14, np.float64(0.06397518008250024)), (15, np.float64(0.027784384969715335)), (16, np.float64(0.020734721016824262)), (17, np.float64(0.019439542836616995)), (18, np.float64(0.009974013026479519)), (19, np.float64(0.032284605946735725)), (20, np.float64(0.019819588973721984)), (21, np.float64(0.006637831334319789)), (22, np.float64(0.033623495764040986)), (23, np.float64(0.017209570921363077)), (24, np.float64(0.05739956154335481)), (25, np.float

second method and preferred method for finding good match  of movie names

In [39]:

# Step 1: Get user input
movie_name = input("Enter your favourite movie name: ")

# Step 2: Create list of movie titles
list_of_all_titles = movies_data['title'].tolist()

# Step 3: Try finding close matches using difflib
close_matches = difflib.get_close_matches(movie_name, list_of_all_titles, n=5, cutoff=0.6)

# Step 4: Check if any close matches contain the user's input as a substring
relevant_matches = [match for match in close_matches if movie_name.lower() in match.lower()]

# Step 5: If no relevant match is found, do a full substring search
if not relevant_matches:
    relevant_matches = [title for title in list_of_all_titles if movie_name.lower() in title.lower()]

# Step 6: Print results
if relevant_matches:
    print("Did you mean:")
    for title in relevant_matches:
        print("-", title)
else:
    print("No matching movie found.")


Enter your favourite movie name: Harry Potter
Did you mean:
- Harry Potter and the Half-Blood Prince
- Harry Potter and the Order of the Phoenix
- Harry Potter and the Goblet of Fire
- Harry Potter and the Prisoner of Azkaban
- Harry Potter and the Philosopher's Stone
- Harry Potter and the Chamber of Secrets


In [40]:
close_match = relevant_matches[0]
print(close_match)

Harry Potter and the Half-Blood Prince


In [41]:
# Finding the index of the movie with title

index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]
print(index_of_the_movie)

8


In [42]:
similarity_score = list(enumerate(similarity[index_of_the_movie]))
print(similarity_score)

[(0, np.float64(0.02960930964063025)), (1, np.float64(0.04056180271185781)), (2, np.float64(0.034842970901094186)), (3, np.float64(0.03210611152276799)), (4, np.float64(0.008184976474383678)), (5, np.float64(0.0306729266348316)), (6, np.float64(0.050353160101363495)), (7, np.float64(0.008484408956417163)), (8, np.float64(1.0000000000000002)), (9, np.float64(0.020858347217226243)), (10, np.float64(0.02379021016800665)), (11, np.float64(0.03025296962103282)), (12, np.float64(0.05764859285467415)), (13, np.float64(0.007748889504301472)), (14, np.float64(0.03203559400312493)), (15, np.float64(0.029528033334071554)), (16, np.float64(0.00804502322103816)), (17, np.float64(0.02065949883376596)), (18, np.float64(0.040759163868134746)), (19, np.float64(0.025677162630943932)), (20, np.float64(0.060921821968521576)), (21, np.float64(0.00705439782523095)), (22, np.float64(0.026742032112101036)), (23, np.float64(0.040469060646362895)), (24, np.float64(0.015909354241301514)), (25, np.float64(0.0)), 

In [43]:
len(similarity_score)

4803

In [44]:
# sorting the movies based on the similarity score

sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True)
print(sorted_similar_movies)

[(8, np.float64(1.0000000000000002)), (114, np.float64(0.5456239713589985)), (113, np.float64(0.5083315474915406)), (197, np.float64(0.4689770509379537)), (276, np.float64(0.35381853516916395)), (191, np.float64(0.31722581320954696)), (331, np.float64(0.15334943778902627)), (995, np.float64(0.14546475582122728)), (3290, np.float64(0.13273955684096359)), (1849, np.float64(0.1320426272424173)), (3599, np.float64(0.12030840676604637)), (335, np.float64(0.11655563877706486)), (728, np.float64(0.109839223282379)), (3681, np.float64(0.10934565798794096)), (1686, np.float64(0.1085620479429687)), (2568, np.float64(0.10737480711241784)), (37, np.float64(0.1028358014453989)), (390, np.float64(0.1022901535043444)), (80, np.float64(0.10061863744012267)), (3670, np.float64(0.10048668644970972)), (189, np.float64(0.09925364377614182)), (2366, np.float64(0.09600495834154432)), (38, np.float64(0.09578653678763513)), (3125, np.float64(0.09535031034376198)), (305, np.float64(0.09469916520828657)), (743,

In [47]:
# print the name of similar movies based on the index

print('Recommended movies for you!!!: \n')
i=1
for movie in sorted_similar_movies:
  index = movie[0]
  title_from_index = movies_data[movies_data.index == index]['title'].values[0]
  if (i<21):
    print(i,title_from_index)
    i+=1

Recommended movies for you!!!: 

1 Harry Potter and the Half-Blood Prince
2 Harry Potter and the Goblet of Fire
3 Harry Potter and the Order of the Phoenix
4 Harry Potter and the Philosopher's Stone
5 Harry Potter and the Chamber of Secrets
6 Harry Potter and the Prisoner of Azkaban
7 Seventh Son
8 Beautiful Creatures
9 Wild Target
10 Nanny McPhee
11 Lone Star
12 Rise of the Planet of the Apes
13 Hellboy
14 Driving Lessons
15 The Borrowers
16 The Craft
17 Oz: The Great and Powerful
18 Hotel Transylvania
19 Snow White and the Huntsman
20 Running Forever


## **Movie Recommendation System**

In [63]:
import difflib


movie_name = input('Enter your favourite movie name: ').strip().lower().replace(" ", "")  # removing the spaces between words

# Normalize all movie titles
movies_data['normalized_title'] = movies_data['title'].apply(lambda x: x.lower().replace(" ", ""))
list_of_all_titles = movies_data['normalized_title'].tolist()

close_matches = difflib.get_close_matches(movie_name, list_of_all_titles, n=5, cutoff=0.5)

relevant_matches = []
for match in close_matches:
    original_title = movies_data[movies_data['normalized_title'] == match]['title'].values[0]
    if movie_name in match:
        relevant_matches.append(original_title)

if not relevant_matches:
    for index, row in movies_data.iterrows():
        if movie_name in row['normalized_title']:
            relevant_matches.append(row['title'])

if not relevant_matches:
    print("No matching movie found.")
else:
    close_match = relevant_matches[0]
    index_of_the_movie = movies_data[movies_data.title == close_match]['index'].values[0]

    similarity_score = list(enumerate(similarity[index_of_the_movie]))
    sorted_similar_movies = sorted(similarity_score, key=lambda x: x[1], reverse=True)

    print('Recommended movies for you!!!: \n')
    i = 1
    for movie in sorted_similar_movies:
        index = movie[0]
        title_from_index = movies_data[movies_data.index == index]['title'].values[0]
        if i < 31:
            print(i, title_from_index)
            i += 1


Enter your favourite movie name: Polar Express
Recommended movies for you!!!: 

1 The Polar Express
2 The Santa Clause
3 Elf
4 Rise of the Guardians
5 A Christmas Carol
6 The Santa Clause 2
7 How the Grinch Stole Christmas
8 Arthur Christmas
9 Jingle All the Way
10 Krampus
11 Cast Away
12 Black Christmas
13 The Gift
14 Patch Adams
15 Larry Crowne
16 Toy Story 3
17 That Thing You Do!
18 Forrest Gump
19 Toy Story
20 Running Forever
21 Nine
22 Back to the Future Part III
23 Alvin and the Chipmunks: Chipwrecked
24 Pixels
25 Open Range
26 The Golden Child
27 The Book of Mormon Movie, Volume 1: The Journey
28 Butterfly Girl
29 Beowulf
30 An American Haunting
