# Movie Recommendation System using Python

### Introduction

Everyone in this world definitely likes movies, such as fantasy, horror, crime, romance, thriller, comedy, drama, action and other genre films, and everyone must have an interest in one or many of these genres. for that, disney+ is here to present movie shows from various types of genres, therefore I created a movie recommendation system based on the user search title using data from disney+

### What is Recommendation System?? 

In simple terms, a recommendation system is a filtering program with the primary purpose of predicting a user's "rating" or "wish" for a domain-specific item or items. Since the domain-specific object in our project is movies, the main purpose of our recommendation system is to filter and anticipate only the movies that the user likes based on some information from the user.

## Importing Relevant Python Libraries 

In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Data Collection

In [23]:
dataset = pd.read_csv('disney_plus_projects.csv')

## Data Pre-Processing 

In [24]:
#add index column
dataset['index'] = dataset.index

In [4]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7850 entries, 0 to 7849
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   title          7850 non-null   object 
 1   year           7703 non-null   object 
 2   certificate    7850 non-null   object 
 3   runtime_min    7850 non-null   int64  
 4   genre          7850 non-null   object 
 5   rating         7850 non-null   float64
 6   votes          7850 non-null   int64  
 7   director_star  7850 non-null   object 
 8   index          7850 non-null   int64  
dtypes: float64(1), int64(3), object(5)
memory usage: 552.1+ KB


In [5]:
dataset.head(5)

Unnamed: 0,title,year,certificate,runtime_min,genre,rating,votes,director_star,index
0,The King's Man,2021,R,131,"Action, Adventure, Thriller",6.4,90344,Matthew Vaughn,0
1,West Side Story,2021,PG-13,156,"Crime, Drama, Musical",7.5,46778,Steven Spielberg,1
2,The Walking Dead,2010–2022,TV-14,44,"Drama, Horror, Thriller",8.3,934972,Andrew Lincoln,2
3,Free Guy,2021,PG-13,115,"Action, Adventure, Comedy",7.2,303127,Shawn Levy,3
4,Pam & Tommy,2022,TV-MA,340,"Biography, Drama, Romance",7.4,16576,Lily James,4


In [6]:
dataset.duplicated()

0       False
1       False
2       False
3       False
4       False
        ...  
7845    False
7846    False
7847    False
7848    False
7849    False
Length: 7850, dtype: bool

In [7]:
dataset.isnull().sum()

title              0
year             147
certificate        0
runtime_min        0
genre              0
rating             0
votes              0
director_star      0
index              0
dtype: int64

In [8]:
#remove missing value
dataset.fillna('NULL', inplace= True)

In [9]:
#make sure  no one missing value
dataset.isnull().sum()

title            0
year             0
certificate      0
runtime_min      0
genre            0
rating           0
votes            0
director_star    0
index            0
dtype: int64

In [10]:
dataset.shape

(7850, 9)

In [11]:
# selecting relevant features for machine learning model
selected_features = {"title", "year", "genre", "director_star" }

In [12]:
#combine all selected features
combined_features = dataset["year"]+ " "+dataset["title"]+ " "+dataset["genre"]+ " "+dataset["director_star"]

In [13]:
print(combined_features)

0       2021 The King's Man Action, Adventure, Thrille...
1       2021 West Side Story Crime, Drama, Musical    ...
2       2010–2022 The Walking Dead Drama, Horror, Thri...
3       2021 Free Guy Action, Adventure, Comedy       ...
4       2022 Pam & Tommy Biography, Drama, Romance    ...
                              ...                        
7845    2020 Rapunzel's Tangled Adventure Animation, A...
7846    2020–2021 Rapunzel's Tangled Adventure Animati...
7847    2020 Miraculous: Tales of Ladybug & Cat Noir A...
7848    2020–2021 Miraculous: Tales of Ladybug & Cat N...
7849    2020 Star Wars Resistance Animation, Action, A...
Length: 7850, dtype: object


In [14]:
#convert data text to vectors
vectorizer = TfidfVectorizer()
feature_vektor = vectorizer.fit_transform(combined_features)
print(feature_vektor)

  (0, 5059)	0.5768749217258283
  (0, 3154)	0.37294797150216497
  (0, 4837)	0.32420182095969047
  (0, 164)	0.1431577112058066
  (0, 148)	0.15214990192207364
  (0, 3084)	0.3501285996686016
  (0, 2731)	0.43642591857046065
  (0, 4813)	0.1344833941749473
  (0, 98)	0.2196160048250088
  (1, 4559)	0.4506512794939174
  (1, 4625)	0.34093334828925115
  (1, 3419)	0.2789868345259331
  (1, 1487)	0.11809862301675374
  (1, 1193)	0.24014908619567663
  (1, 4646)	0.32730353383948246
  (1, 4442)	0.47125193580832453
  (1, 5183)	0.41543422491518794
  (1, 98)	0.17156272502823444
  (2, 2946)	0.4715880870394788
  (2, 289)	0.3411765412738537
  (2, 2353)	0.30692165339005245
  (2, 1308)	0.4078929677872602
  (2, 5115)	0.4509727359610686
  (2, 99)	0.23168209056867792
  (2, 87)	0.23306254763027875
  :	:
  (7847, 311)	0.12413436623698348
  (7847, 164)	0.13841040636560617
  (7847, 148)	0.14710440378057094
  (7848, 5044)	0.3967004355071983
  (7848, 1196)	0.3920242387403992
  (7848, 3520)	0.32421209415001584
  (7848, 93

## Cosine Similarity

In [15]:
similarity = cosine_similarity(feature_vektor)
print(similarity)

[[1.         0.03767792 0.09630598 ... 0.04219644 0.08252762 0.05204583]
 [0.03767792 1.         0.01395723 ... 0.         0.03384459 0.        ]
 [0.09630598 0.01395723 1.         ... 0.         0.         0.        ]
 ...
 [0.04219644 0.         0.         ... 1.         0.69965603 0.12448738]
 [0.08252762 0.03384459 0.         ... 0.69965603 1.         0.11565748]
 [0.05204583 0.         0.         ... 0.12448738 0.11565748 1.        ]]


In [16]:
print(similarity.shape)

(7850, 7850)


In [17]:
#find movie name from user input
movie_name = input("disney movie apa yang ingin kamu tonton : ")

disney movie apa yang ingin kamu tonton : Avengers: Endgame


In [21]:
#make a list of movies from the dataset
list_of_all_titles = dataset["title"].tolist()
print(list_of_all_titles)

["The King's Man", 'West Side Story', 'The Walking Dead', 'Free Guy', 'Pam & Tommy', 'No Exit', 'Fresh', 'Encanto', 'Death on the Nile', 'The Dropout', 'The French Dispatch', 'Morbius', 'The Book of Boba Fett', "Grey's Anatomy", 'Eternals', 'Moon Knight', 'This Is Us', 'Criminal Minds', "It's Always Sunny in Philadelphia", 'The Mandalorian', 'Dopesick', 'Castle', 'The Last Duel', 'How I Met Your Father', 'Turning Red', 'Modern Family', 'Big Sky', 'Daredevil', 'Avengers: Endgame', 'How I Met Your Mother', 'American Horror Story', 'Lost', 'The Simpsons', 'Bones', 'Family Guy', 'Snowfall', 'The Proud Family: Louder and Prouder', 'New Girl', 'Shang-Chi and the Legend of the Ten Rings', '9-1-1: Lone Star', 'Sons of Anarchy', 'Better Things', 'M*A*S*H', 'Abbott Elementary', 'Venom: Let There Be Carnage', 'Hawkeye', 'Antlers', 'Dollface', 'Star Wars: The Clone Wars', "Bob's Burgers", '9-1-1', 'Buffy the Vampire Slayer', 'The Resident', 'Homeland', 'Avengers: Infinity War', 'Only Murders in th

In [19]:
#create a similar search approach from users' input
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_close_match)

['Avengers: Endgame', 'Avengers: Infinity War', 'Avengers: Age of Ultron']


In [20]:
close_match = find_close_match[0]
print(close_match)

Avengers: Endgame


In [21]:
index_of_the_movie = dataset[dataset.title == close_match]["index"].values[0]
print(index_of_the_movie)

11


In [22]:
#calculating similarity score
similarity_score = list(enumerate(similarity[index_of_the_movie]))
print(similarity_score)

[(0, 0.04178720797399355), (1, 0.017094337864587264), (2, 0.08284776733518712), (3, 0.039058954607403426), (4, 0.08849017684802071), (5, 0.09354590545945128), (6, 0.07061147359997305), (7, 0.0), (8, 0.08316462948863007), (9, 0.09333018626523121), (10, 0.01946445143673536), (11, 1.0000000000000002), (12, 0.031100116284093496), (13, 0.018004313682589663), (14, 0.038394044952034655), (15, 0.12550533103688488), (16, 0.07985360038318014), (17, 0.01663246328586777), (18, 0.0), (19, 0.045140735793100184), (20, 0.023137127843915372), (21, 0.021082452756155216), (22, 0.038810209783104954), (23, 0.09068483611601061), (24, 0.08672321679273601), (25, 0.019382989133279942), (26, 0.0185283996867751), (27, 0.04634574627633152), (28, 0.056228465790569476), (29, 0.0), (30, 0.0162195492253728), (31, 0.037919914716258274), (32, 0.0), (33, 0.019889182582747477), (34, 0.0), (35, 0.019356001239184392), (36, 0.09496629973973714), (37, 0.0), (38, 0.1415866978501139), (39, 0.04731014054330055), (40, 0.01671350

In [23]:
len(similarity_score)

7850

In [24]:
# sorting the movie besad on their similarity score
sorted_similarity_movie = sorted(similarity_score, key = lambda x:x[1], reverse = True)
print(sorted_similarity_movie)

[(11, 1.0000000000000002), (2349, 0.23590679285613028), (2276, 0.22971558321688934), (2128, 0.2168148851425789), (6162, 0.2068184306817024), (817, 0.20368301031653466), (1282, 0.19312670252408967), (789, 0.19130891320235577), (931, 0.1836445222191349), (423, 0.17889170688341005), (5212, 0.16969150598439547), (5100, 0.16889975103861724), (1794, 0.16772928533579384), (540, 0.16666359991353932), (1753, 0.1631359490882057), (1151, 0.1577077800220728), (4578, 0.15280338732802845), (4580, 0.15146163663456721), (6618, 0.15008928932083399), (462, 0.14860900677821254), (474, 0.14753623585130293), (4574, 0.1472840421576083), (4576, 0.1472840421576083), (4590, 0.1472840421576083), (5539, 0.1472840421576083), (5699, 0.1472840421576083), (5944, 0.1472840421576083), (4314, 0.14685190066245202), (5128, 0.14616411424037137), (5129, 0.14616411424037137), (5130, 0.14616411424037137), (4586, 0.14239499246300977), (38, 0.1415866978501139), (366, 0.13683984053148499), (5049, 0.13412315351686308), (392, 0.1

In [25]:
print("Movie suggested for you: \n")
i = 1
for movie in sorted_similarity_movie:
    index = movie[0]
    title_index = dataset[dataset.index == index]['title'].values[0]
    if (i < 30):
        print(i, "", title_index)
        i+=1

Movie suggested for you: 

1  Morbius
2  Buffy the Vampire Slayer
3  Buffy the Vampire Slayer
4  The Wonderful World of Disney
5  New Girl
6  I, Daniel Blake
7  Genius
8  Brickleberry
9  Cocoon: The Return
10  Spider-Man: The Animated Series
11  The Hot Zone: Anthrax
12  The Hot Zone: Anthrax
13  Mark Twain and Me
14  Rookie of the Year
15  Mission to the Sun
16  Alley Cats Strike
17  DuckTales
18  DuckTales
19  30 for 30
20  Andor
21  Willow
22  The Simpsons
23  The Simpsons
24  The Simpsons
25  The Simpsons
26  The Simpsons
27  The Simpsons
28  9-1-1: Lone Star
29  Secret Invasion


## combination of codes into one cell to become a movie recommendation

In [37]:
#Full code for the recommendation system
movie_name = input("disney movie apa yang ingin kamu tonton : ")
list_of_all_titles = dataset["title"].tolist()
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
close_match = find_close_match[0]
index_of_the_movie = dataset[dataset.title == close_match]["index"].values[0]
similarity_score = list(enumerate(similarity[index_of_the_movie]))
sorted_similarity_movie = sorted(similarity_score, key = lambda x:x[1], reverse = True)

print("Movies suggested for you : \n ")

i = 1
for movie in sorted_similarity_movie:
    index = movie[0]
    title_index = dataset[dataset.index == index]['title'].values[0]
    if (i < 30):
        print(i, "", title_index)
        i+=1

disney movie apa yang ingin kamu tonton : Avengers: Endgame
Movies suggested for you : 
 
1  Avengers: Endgame
2  Avengers: Infinity War
3  Captain America: The Winter Soldier
4  Captain America: Civil War
5  Avengers Assemble
6  The Avengers
7  The Falcon and the Winter Soldier
8  Red Tails
9  Becoming
10  Avengers: Age of Ultron
11  Black-ish
12  Star Wars: Droids
13  NFL Monday Night Football
14  Lego Star Wars: Droid Tales
15  The Avengers: Earth's Mightiest Heroes
16  Next Avengers: Heroes of Tomorrow
17  Avengers: United They Stand
18  Rapunzel's Tangled Adventure
19  DuckTales
20  DuckTales
21  DuckTales
22  DuckTales
23  Rapunzel's Tangled Adventure
24  Rapunzel's Tangled Adventure
25  Rapunzel's Tangled Adventure
26  Rapunzel's Tangled Adventure
27  The Mandalorian
28  Kim Possible
29  Kim Possible


## Conclusion

This film recommendation system functions to search for films related to the film that the user wants to search for, such as the relationship between the title, genre, year of release, and the director in the details of the film. For example, the film Avengers: Endgame has a connection with Avengers: Infinity War and has links with other similar films. so we as users will find other recommended movies after we watch the movie we want