# Movie Recommendation System using Python

### Introduction

Everyone in this world definitely likes movies, such as fantasy, horror, crime, romance, thriller, comedy, drama, action and other genre films, and everyone must have an interest in one or many of these genres. for that, disney+ is here to present movie shows from various types of genres, therefore I created a movie recommendation system based on the user search title using data from disney+

### What is Recommendation System?? 

In simple terms, a recommendation system is a filtering program with the primary purpose of predicting a user's "rating" or "wish" for a domain-specific item or items. Since the domain-specific object in our project is movies, the main purpose of our recommendation system is to filter and anticipate only the movies that the user likes based on some information from the user.

## Importing Relevant Python Libraries 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

## Data Collection

read the dataset

In [2]:
dataset = pd.read_csv('disney_plus_projects.csv')

## Data Pre-Processing 

In [3]:
#add index column
dataset['index'] = dataset.index

In [4]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7850 entries, 0 to 7849
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   title          7850 non-null   object 
 1   year           7703 non-null   object 
 2   certificate    7850 non-null   object 
 3   runtime_min    7850 non-null   int64  
 4   genre          7850 non-null   object 
 5   rating         7850 non-null   float64
 6   votes          7850 non-null   int64  
 7   director_star  7850 non-null   object 
 8   index          7850 non-null   int64  
dtypes: float64(1), int64(3), object(5)
memory usage: 552.1+ KB


In [5]:
dataset.head(5)

Unnamed: 0,title,year,certificate,runtime_min,genre,rating,votes,director_star,index
0,The King's Man,2021,R,131,"Action, Adventure, Thriller",6.4,90344,Matthew Vaughn,0
1,West Side Story,2021,PG-13,156,"Crime, Drama, Musical",7.5,46778,Steven Spielberg,1
2,The Walking Dead,2010–2022,TV-14,44,"Drama, Horror, Thriller",8.3,934972,Andrew Lincoln,2
3,Free Guy,2021,PG-13,115,"Action, Adventure, Comedy",7.2,303127,Shawn Levy,3
4,Pam & Tommy,2022,TV-MA,340,"Biography, Drama, Romance",7.4,16576,Lily James,4


In [6]:
dataset.duplicated().sum()

0

In [7]:
dataset.isnull().sum()

title              0
year             147
certificate        0
runtime_min        0
genre              0
rating             0
votes              0
director_star      0
index              0
dtype: int64

In [8]:
#filling missing value
dataset.fillna('NULL', inplace= True)

In [9]:
#make sure  no one missing value
dataset.isnull().sum()

title            0
year             0
certificate      0
runtime_min      0
genre            0
rating           0
votes            0
director_star    0
index            0
dtype: int64

In [10]:
dataset.shape

(7850, 9)

In [11]:
# selecting relevant features for machine learning model
selected_features = {"title", "year", "genre", "director_star", "certificate"}

In [12]:
#combine all selected features
combined_features = dataset["year"]+ " "+dataset["title"]+ " "+dataset["genre"]+ " "+dataset["certificate"]+ " "+dataset["director_star"]

In [13]:
print(combined_features)

0       2021 The King's Man Action, Adventure, Thrille...
1       2021 West Side Story Crime, Drama, Musical    ...
2       2010–2022 The Walking Dead Drama, Horror, Thri...
3       2021 Free Guy Action, Adventure, Comedy       ...
4       2022 Pam & Tommy Biography, Drama, Romance    ...
                              ...                        
7845    2020 Rapunzel's Tangled Adventure Animation, A...
7846    2020–2021 Rapunzel's Tangled Adventure Animati...
7847    2020 Miraculous: Tales of Ladybug & Cat Noir A...
7848    2020–2021 Miraculous: Tales of Ladybug & Cat N...
7849    2020 Star Wars Resistance Animation, Action, A...
Length: 7850, dtype: object


In [14]:
#convert data text to vectors
vectorizer = TfidfVectorizer()
feature_vektor = vectorizer.fit_transform(combined_features)
print(feature_vektor)

  (0, 5068)	0.5768749217258283
  (0, 3159)	0.37294797150216497
  (0, 4845)	0.32420182095969047
  (0, 166)	0.1431577112058066
  (0, 150)	0.15214990192207364
  (0, 3089)	0.3501285996686016
  (0, 2735)	0.43642591857046065
  (0, 4821)	0.1344833941749473
  (0, 100)	0.2196160048250088
  (1, 4567)	0.43546556857210605
  (1, 4633)	0.3294448304345145
  (1, 6)	0.23668408713835454
  (1, 3776)	0.10119114199122675
  (1, 3424)	0.2695857441199328
  (1, 1490)	0.11411902364357454
  (1, 1196)	0.23205672128505192
  (1, 4654)	0.3162743033130422
  (1, 4450)	0.45537203932480463
  (1, 5192)	0.40143523204941056
  (1, 100)	0.16578153219513578
  (2, 2950)	0.46507649275929935
  (2, 291)	0.3364656436160334
  (2, 7)	0.15422347551790247
  (2, 4992)	0.060332955168988234
  (2, 2357)	0.302683740394361
  :	:
  (7848, 1199)	0.3783468261198294
  (7848, 3525)	0.3129005930996809
  (7848, 938)	0.3102467856295407
  (7848, 2819)	0.3129005930996809
  (7848, 4760)	0.30833531443749657
  (7848, 3318)	0.3129005930996809
  (7848, 12

## Cosine Similarity

In [15]:
similarity = cosine_similarity(feature_vektor)
print(similarity)

[[1.         0.03640828 0.09497621 ... 0.04207617 0.0796483  0.04896641]
 [0.03640828 1.         0.01330069 ... 0.         0.0315631  0.        ]
 [0.09497621 0.01330069 1.         ... 0.00455191 0.         0.        ]
 ...
 [0.04207617 0.         0.00455191 ... 1.         0.67332104 0.11678795]
 [0.0796483  0.0315631  0.         ... 0.67332104 1.         0.19374877]
 [0.04896641 0.         0.         ... 0.11678795 0.19374877 1.        ]]


In [16]:
similarity.shape

(7850, 7850)

In [17]:
#find movie name from user input
movie_name = input("disney movie apa yang ingin kamu tonton : ")

disney movie apa yang ingin kamu tonton : Avengers: Endgame


In [18]:
#make a list of movies from the dataset
list_of_all_titles = dataset["title"].tolist()
print(list_of_all_titles)

["The King's Man", 'West Side Story', 'The Walking Dead', 'Free Guy', 'Pam & Tommy', 'No Exit', 'Fresh', 'Encanto', 'Death on the Nile', 'The Dropout', 'The French Dispatch', 'Morbius', 'The Book of Boba Fett', "Grey's Anatomy", 'Eternals', 'Moon Knight', 'This Is Us', 'Criminal Minds', "It's Always Sunny in Philadelphia", 'The Mandalorian', 'Dopesick', 'Castle', 'The Last Duel', 'How I Met Your Father', 'Turning Red', 'Modern Family', 'Big Sky', 'Daredevil', 'Avengers: Endgame', 'How I Met Your Mother', 'American Horror Story', 'Lost', 'The Simpsons', 'Bones', 'Family Guy', 'Snowfall', 'The Proud Family: Louder and Prouder', 'New Girl', 'Shang-Chi and the Legend of the Ten Rings', '9-1-1: Lone Star', 'Sons of Anarchy', 'Better Things', 'M*A*S*H', 'Abbott Elementary', 'Venom: Let There Be Carnage', 'Hawkeye', 'Antlers', 'Dollface', 'Star Wars: The Clone Wars', "Bob's Burgers", '9-1-1', 'Buffy the Vampire Slayer', 'The Resident', 'Homeland', 'Avengers: Infinity War', 'Only Murders in th

In [19]:
#create a similar search approach from users input
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
print(find_close_match)

['Avengers: Endgame', 'Avengers: Infinity War', 'Avengers: Age of Ultron']


In [20]:
close_match = find_close_match[0]
print(close_match)

Avengers: Endgame


In [21]:
index_of_the_movie = dataset[dataset.title == close_match]["index"].values[0]
print(index_of_the_movie)

28


In [22]:
#calculating similarity score
similarity_score = list(enumerate(similarity[index_of_the_movie]))
print(similarity_score)

[(0, 0.03786774488593453), (1, 0.01496895851768545), (2, 0.046152746597013676), (3, 0.03372380116741168), (4, 0.02014782105718865), (5, 0.017835841530001263), (6, 0.0), (7, 0.0), (8, 0.015032395448514216), (9, 0.021156052949071275), (10, 0.017638768347712486), (11, 0.05251514746710337), (12, 0.05725163205650377), (13, 0.04850333693726995), (14, 0.03320143319931056), (15, 0.08166176397689288), (16, 0.04452811703498612), (17, 0.018465883199775036), (18, 0.029977625513235778), (19, 0.09960840817802642), (20, 0.06172504625888927), (21, 0.019105007997584002), (22, 0.069215695612119), (23, 0.016857895282646643), (24, 0.019150743229827353), (25, 0.052092167907377016), (26, 0.02040818644489511), (27, 0.039531262044218696), (28, 1.0000000000000002), (29, 0.004749911757702377), (30, 0.04382044257045009), (31, 0.038829419090676665), (32, 0.04632599448528855), (33, 0.021772986700383915), (34, 0.005140464373239651), (35, 0.02218762153578466), (36, 0.0519430065709065), (37, 0.0), (38, 0.045296156782

In [23]:
len(similarity_score)

7850

In [24]:
# sorting the movie besad on their similarity score
sorted_similarity_movie = sorted(similarity_score, key = lambda x:x[1], reverse = True)
print(sorted_similarity_movie)

[(28, 1.0000000000000002), (54, 0.5429009674900843), (200, 0.357506069544592), (154, 0.35270529868727846), (718, 0.2642177562705225), (118, 0.2487800561594826), (4841, 0.23893586600727199), (676, 0.23442725990771113), (148, 0.20889609622478392), (181, 0.20675867024033887), (1170, 0.20107639419554238), (4419, 0.19939010881705108), (853, 0.19763123046960485), (164, 0.19748277200335448), (1539, 0.19004501740019064), (1279, 0.18431753808870063), (1412, 0.1826992099529982), (7725, 0.1631721883009266), (7655, 0.16177769679936235), (5672, 0.15350678407079174), (4074, 0.15124486764850129), (4085, 0.15061318550010422), (993, 0.14265596959909685), (7723, 0.13924583312878425), (7727, 0.13924583312878425), (4135, 0.13805231904067256), (4083, 0.13743362119279495), (4440, 0.1364429514777492), (4081, 0.13493012020613185), (7728, 0.1324637099469116), (4121, 0.12723721724372866), (4203, 0.12723721724372866), (7731, 0.12723721724372866), (616, 0.1271506633177365), (4331, 0.12619519225495193), (7717, 0.1

In [25]:
print("Movie suggested for you: \n")
i = 1
for movie in sorted_similarity_movie:
    index = movie[0]
    title_index = dataset[dataset.index == index]['title'].values[0]
    if (i < 30):
        print(i, "", title_index)
        i+=1

Movie suggested for you: 

1  Avengers: Endgame
2  Avengers: Infinity War
3  Captain America: The Winter Soldier
4  Captain America: Civil War
5  Avengers Assemble
6  The Avengers
7  Becoming
8  Red Tails
9  The Falcon and the Winter Soldier
10  Black-ish
11  Star Wars: Droids
12  NFL Monday Night Football
13  The Avengers: Earth's Mightiest Heroes
14  Avengers: Age of Ultron
15  Lego Star Wars: Droid Tales
16  Next Avengers: Heroes of Tomorrow
17  Avengers: United They Stand
18  The Mandalorian
19  The Simpsons
20  Limitless
21  WandaVision
22  Hawkeye
23  T.O.T.S.
24  The Mandalorian
25  The Mandalorian
26  9-1-1: Lone Star
27  Hawkeye
28  The Simpsons
29  Loki


## combination of codes into one cell to become a movie recommendation

In [26]:
#Full code for the recommendation system
movie_name = input("disney movie apa yang ingin kamu tonton : ")
list_of_all_titles = dataset["title"].tolist()
find_close_match = difflib.get_close_matches(movie_name, list_of_all_titles)
close_match = find_close_match[0]
index_of_the_movie = dataset[dataset.title == close_match]["index"].values[0]
similarity_score = list(enumerate(similarity[index_of_the_movie]))
sorted_similarity_movie = sorted(similarity_score, key = lambda x:x[1], reverse = True)

print("Movies suggested for you : \n ")

i = 1
for movie in sorted_similarity_movie:
    index = movie[0]
    title_index = dataset[dataset.index == index]['title'].values[0]
    if (i < 30):
        print(i, "", title_index)
        i+=1

disney movie apa yang ingin kamu tonton : Avengers: Endgame
Movies suggested for you : 
 
1  Avengers: Endgame
2  Avengers: Infinity War
3  Captain America: The Winter Soldier
4  Captain America: Civil War
5  Avengers Assemble
6  The Avengers
7  Becoming
8  Red Tails
9  The Falcon and the Winter Soldier
10  Black-ish
11  Star Wars: Droids
12  NFL Monday Night Football
13  The Avengers: Earth's Mightiest Heroes
14  Avengers: Age of Ultron
15  Lego Star Wars: Droid Tales
16  Next Avengers: Heroes of Tomorrow
17  Avengers: United They Stand
18  The Mandalorian
19  The Simpsons
20  Limitless
21  WandaVision
22  Hawkeye
23  T.O.T.S.
24  The Mandalorian
25  The Mandalorian
26  9-1-1: Lone Star
27  Hawkeye
28  The Simpsons
29  Loki


## Conclusion

This film recommendation system functions to search for films related to the film that the user wants to search for, such as the relationship between the title, genre, year of release, and the director in the details of the film. For example, the film Avengers: Endgame has a connection with Avengers: Infinity War and has links with other similar films. so we as users will find other recommended movies after we watch the movie we want