# **Movie Recommendation System**

**Recommender System** is a system that seeks to predict or filter preferences according to the user's choices. Recommender systems are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags, and products in general. Recommender systems produce a list of recommendations in any of the two ways -

**Collaborative filtering:** Collaborative filtering approaches build a model from the user's past behavior (i.e. items purchased or searched by the user) as well as similar decisions made by other users. This model is then used to predict items(or ratings for items) that users may have an interest in.

**Content-based filtering:** Content-base filtering approaches uses a series of discrete characteristics of an item in order to recommend additional items with similar properties. Content-based filtering methods are totally based on a description of the item and a profile of the user's preferences. It recommends items based on the user's past preferences. Let's develop a basic recommendation system using Python and Pandas.

Let's develop a basic recommendation system by suggesting items that are most similar to a particular item, in this case, movies. It just tells what movies/items are most similar to the user's movie choice.

# **Import Library**

In [212]:
import pandas as pd

In [213]:
import numpy as np

# **Import Dataset**

In [214]:
ms=pd.read_csv('/content/IMDB-Movie-Dataset(2024-1951).csv')

In [215]:
ms.head()

Unnamed: 0,id,movie_id,movie_name,year,genre,overview,director,cast
0,0,15354916,Jawan,2023,"Action, Thriller",A high-octane action thriller which outlines t...,Atlee,"Shah Rukh Khan, Nayanthara, Vijay Sethupathi, ..."
1,1,15748830,Jaane Jaan,2023,"Crime, Drama, Mystery",A single mother and her daughter who commit a ...,Sujoy Ghosh,"Kareena Kapoor, Jaideep Ahlawat, Vijay Varma, ..."
2,2,11663228,Jailer,2023,"Action, Comedy, Crime",A retired jailer goes on a manhunt to find his...,Nelson Dilipkumar,"Rajinikanth, Mohanlal, Shivarajkumar, Jackie S..."
3,3,14993250,Rocky Aur Rani Kii Prem Kahaani,2023,"Comedy, Drama, Family",Flamboyant Punjabi Rocky and intellectual Beng...,Karan Johar,"Ranveer Singh, Alia Bha0, Dharmendra, Shabana ..."
4,4,15732324,OMG 2,2023,"Comedy, Drama",An unhappy civilian asks the court to mandate ...,Amit Rai,"Pankaj Tripathi, Akshay Kumar, Yami Gautam, Pa..."


In [216]:
ms.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2199 entries, 0 to 2198
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          2199 non-null   int64 
 1   movie_id    2199 non-null   int64 
 2   movie_name  2199 non-null   object
 3   year        2134 non-null   object
 4   genre       2199 non-null   object
 5   overview    2199 non-null   object
 6   director    2199 non-null   object
 7   cast        2199 non-null   object
dtypes: int64(2), object(6)
memory usage: 137.6+ KB


In [217]:
ms.shape

(2199, 8)

In [218]:
ms.columns

Index(['id', 'movie_id', 'movie_name', 'year', 'genre', 'overview', 'director',
       'cast'],
      dtype='object')

# **Get Feature Selection**

In [None]:
movie_features = ms[[ 'genre', 'overview','director', 'cast']].fillna('')

Selected five existing features to recommend movies. It may vary from one project to another. Like one can add vote counts, budget, language, etc.

In [219]:
movie_features.shape

(2199, 4)

In [220]:
movie_features

Unnamed: 0,genre,overview,director,cast
0,"Action, Thriller",A high-octane action thriller which outlines t...,Atlee,"Shah Rukh Khan, Nayanthara, Vijay Sethupathi, ..."
1,"Crime, Drama, Mystery",A single mother and her daughter who commit a ...,Sujoy Ghosh,"Kareena Kapoor, Jaideep Ahlawat, Vijay Varma, ..."
2,"Action, Comedy, Crime",A retired jailer goes on a manhunt to find his...,Nelson Dilipkumar,"Rajinikanth, Mohanlal, Shivarajkumar, Jackie S..."
3,"Comedy, Drama, Family",Flamboyant Punjabi Rocky and intellectual Beng...,Karan Johar,"Ranveer Singh, Alia Bhatt, Dharmendra, Shabana..."
4,"Comedy, Drama",An unhappy civilian asks the court to mandate ...,Amit Rai,"Pankaj Tripathi, Akshay Kumar, Yami Gautam, Pa..."
...,...,...,...,...
2194,Thriller,Add a Plot,Subhash Ghai,"Shatrughan Sinha, Reena Roy, Ajit Khan, Premna..."
2195,"Drama, Musical, Romance",A renowned music teacher mentors a promising y...,Tanuja Chandra,"Lucky Ali, Simone Singh, Achint Kaur, Ehsan Khan"
2196,"Musical, Romance",When a ballroom dancer's shot at a crucial tou...,Stanley D'Costa,"Sooraj Pancholi, Isabelle Kaif, Waluscha D'Sou..."
2197,"Drama, Family, Fantasy",After the tragic deaths of his son Ajit and da...,Harmesh Malhotra,"Sunny Deol, Sridevi, Anupam Kher, Gulshan Grover"


In [221]:
x=movie_features['genre'] + ' ' + movie_features['overview'] + ' ' + movie_features['director'] + ' ' + movie_features['cast']

In [None]:
x

0       Action, Thriller A high-octane action thriller...
1       Crime, Drama, Mystery A single mother and her ...
2       Action, Comedy, Crime A retired jailer goes on...
3       Comedy, Drama, Family Flamboyant Punjabi Rocky...
4       Comedy, Drama An unhappy civilian asks the cou...
                              ...                        
2194    Thriller Add a Plot Subhash Ghai Shatrughan Si...
2195    Drama, Musical, Romance A renowned music teach...
2196    Musical, Romance When a ballroom dancer's shot...
2197    Drama, Family, Fantasy After the tragic deaths...
2198    Action, Comedy, Drama Raj is a successful lawy...
Length: 2199, dtype: object

In [222]:
x.shape

(2199,)

# **Get Feature Text Conversions to Tokens**

In [223]:
!pip install numpy==1.21.2



In [224]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [225]:
tfidf = TfidfVectorizer()

In [226]:
X = tfidf.fit_transform(x)

In [227]:
X.shape

(2199, 11441)

In [228]:
print(X)

  (0, 7220)	0.18732057124632678
  (0, 2670)	0.18524953584832649
  (0, 9103)	0.244585433104368
  (0, 11002)	0.14681069131570232
  (0, 6842)	0.23547949327798678
  (0, 5451)	0.09405767855896913
  (0, 8640)	0.16167663050581524
  (0, 9140)	0.12673362239821723
  (0, 926)	0.2899771135019883
  (0, 9564)	0.19994085660291214
  (0, 4719)	0.06557063346769715
  (0, 11340)	0.2752857927474026
  (0, 8247)	0.2752857927474026
  (0, 10449)	0.05703667866328602
  (0, 9098)	0.16274239131621018
  (0, 4933)	0.07772512631378324
  (0, 11232)	0.09357348531476826
  (0, 6118)	0.1153226211878715
  (0, 7057)	0.06059844123803038
  (0, 5191)	0.15817552570488105
  (0, 3278)	0.22820854074933225
  (0, 10329)	0.16251068864342313
  (0, 7169)	0.2899771135019883
  (0, 11224)	0.1520493398826944
  (0, 7051)	0.2899771135019883
  :	:
  (2198, 9054)	0.13024414506375684
  (2198, 1227)	0.14243780146433394
  (2198, 5816)	0.20550079070454394
  (2198, 5543)	0.16124853363237318
  (2198, 5741)	0.16253753641693003
  (2198, 5467)	0.127362

# **Get Similarity Score using Cosine Similarity**

cosine_similarity computes the L2-normalized dot product of vectors. Euclidean(L2) normalization projects the vectors onto the unit sphere, and their dot product is then the cosine the angle between the points denoted by the vectors.

In [229]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
Similarity_Score = cosine_similarity(X)

In [230]:
Similarity_Score

array([[1.        , 0.03873457, 0.02088166, ..., 0.00456613, 0.02713392,
        0.02675493],
       [0.03873457, 1.        , 0.01286975, ..., 0.02609631, 0.04579643,
        0.00698856],
       [0.02088166, 0.01286975, 1.        , ..., 0.00344653, 0.02984156,
        0.072506  ],
       ...,
       [0.00456613, 0.02609631, 0.00344653, ..., 1.        , 0.02268653,
        0.02220094],
       [0.02713392, 0.04579643, 0.02984156, ..., 0.02268653, 1.        ,
        0.08833593],
       [0.02675493, 0.00698856, 0.072506  , ..., 0.02220094, 0.08833593,
        1.        ]])

In [231]:
Similarity_Score.shape

(2199, 2199)

# **Get Movie Name as Input from User and Validate for Closest Spelling**

In [232]:
Favorite_Movie_Name = input(' Enter your favorite movie name : ')

 Enter your favorite movie name : jailer


In [233]:
All_Movies_Title_List = ms['movie_name'].tolist()

In [234]:
import difflib

In [235]:
Movie_Recommendation = difflib.get_close_matches(Favorite_Movie_Name, All_Movies_Title_List)
print(Movie_Recommendation)

['Jailer', 'Haider', 'Gambler']


In [236]:
Close_Match = Movie_Recommendation[0]
print(Close_Match)

Jailer


In [239]:
Index_of_Close_Match_Movie = ms[ms.movie_name == Close_Match]['id'].values[0]
print(Index_of_Close_Match_Movie)

2


In [240]:
# getting a list of similar movies
Recommendation_Score = list(enumerate(Similarity_Score[Index_of_Close_Match_Movie]))
print(Recommendation_Score)

[(0, 0.020881663857501676), (1, 0.012869750887731604), (2, 1.0), (3, 0.008354737764541983), (4, 0.011319168822695623), (5, 0.010204021159835298), (6, 0.012032427774956412), (7, 0.010628068629240121), (8, 0.005877113046254301), (9, 0.02130545713053794), (10, 0.013342214639618051), (11, 0.005370479734104473), (12, 0.035363759650300836), (13, 0.0144193461966596), (14, 0.04043772170935936), (15, 0.019331164889961394), (16, 0.03473823776046585), (17, 0.06690590353806466), (18, 0.04966095726507856), (19, 0.0059638130804059945), (20, 0.007524150570227021), (21, 0.00900960255171368), (22, 0.010549839968745476), (23, 0.005245821980606272), (24, 0.015329324762657606), (25, 0.05543010192008672), (26, 0.05338013211798028), (27, 0.01279436272104166), (28, 0.0686995958186293), (29, 0.030192586738812408), (30, 0.04149256617194023), (31, 0.036523797662597834), (32, 0.0036383125463817713), (33, 0.0075353096191112705), (34, 0.006593229596655084), (35, 0.03246858235999097), (36, 0.013138225995719248), (3

In [241]:
len(Recommendation_Score)

2199

# **Get All Movies Sort Based on Recommendation Score wrt Favourite Movie**

In [242]:
# sorting the movies based on their similarity score

Sorted_Similar_Movies = sorted(Recommendation_Score, key = lambda x:x[1], reverse = True)
print(Sorted_Similar_Movies)

[(2, 1.0), (332, 0.13996979030245893), (2136, 0.11021447132566749), (587, 0.10252159563122049), (1593, 0.10229675737353869), (1112, 0.10200980280162138), (1434, 0.10117958941673899), (1711, 0.09551057394028403), (2129, 0.09359053956643167), (1370, 0.08894597872211749), (1620, 0.08664008132443551), (1821, 0.08393593452603627), (770, 0.08338069811617405), (1957, 0.08299066976461764), (1617, 0.08109328145152014), (756, 0.0809961460183998), (833, 0.08038240497601153), (372, 0.07964432455940845), (653, 0.07950565516083882), (1267, 0.0793804626643408), (1221, 0.07797547190385141), (757, 0.07637468580475859), (352, 0.07620643077749138), (1944, 0.07605502986776923), (1346, 0.0760463607585411), (868, 0.07560824247160591), (700, 0.0752817792537951), (1820, 0.0751330379549397), (484, 0.07507822902355737), (1635, 0.07504802286362283), (2155, 0.07495038935070247), (77, 0.07484058820721802), (948, 0.07455972946752715), (1512, 0.07439158432403543), (890, 0.074378336028208), (886, 0.0739269546444996),

In [243]:
# print the name of similar movies based on the index

print('Top 30 Movies Suggested for you: \n')

i = 1

for movie in Sorted_Similar_Movies:
    index = movie[0]
    title_from_index = ms[ms.index==index]['movie_name'].values[0]
    if (i<31):
        print(i, '.',title_from_index)
        i+=1

Top 30 Movies Suggested for you: 

1 . Jailer
2 . Radhe
3 . Parvarish
4 . Agneepath
5 . Veer
6 . Kochadaiiyaan
7 . Ram Gopal Varma Ki Aag
8 . Finding Fanny
9 . Sabse Bada Khiladi
10 . Jung
11 . Tridev
12 . Pyasa Darinda
13 . Raju Ban Gaya Gentleman
14 . Mithya
15 . Gardish
16 . Commando
17 . Shakti
18 . Hum
19 . Kaamyaab
20 . Raja Babu
21 . Baap
22 . Khal Nayak
23 . Darbar
24 . Kya Yehi Pyaar Hai
25 . Bhagwaan Dada
26 . Can0 Road: The Beginning
27 . Dishoom
28 . Tezz
29 . Duplicate
30 . Teri Meherbaniyan


# **Top 10 Movie Recommendation System**

In [253]:
Movie_Name = input('Enter your favorite movie name: ')

list_of_all_titles = ms['movie_name'].tolist()

Find_Close_Match = difflib.get_close_matches(Movie_Name, list_of_all_titles)

Close_Match = Find_Close_Match[0]

Index_of_Movie = ms[ms.movie_name == Close_Match]['id'].values[0]

Recommendation_Score = list(enumerate(Similarity_Score[Index_of_Movie]))

sorted_similar_movies = sorted(Recommendation_Score, key = lambda x:x[1], reverse = True)

print('Top 10 Movies suggested for you : \n')

i = 0

for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = ms[ms.id==index]['movie_name'].values
    if (i<10):
        print(i,'.',title_from_index)
        i+=1

Enter your favorite movie name: dhoom
Top 10 Movies suggested for you : 

0 . ['Dhoom']
1 . ['Dhoom:2']
2 . ['Sarkar 3']
3 . ['Kaante']
4 . ['Sultan']
5 . ['Mohabbatein']
6 . ['Dishoom']
7 . ['Kabul Express']
8 . ['Housefull 5']
9 . ['Dhoom:3']
