# Movie Recommender System

----

## Objective
This project builds a movie recommendation system that suggests top 10 and top 30 movies based on the favourite movie entered by the user.

## Data Source

This project is made by following a [youtube tutorial](https://www.youtube.com/watch?v=nj5SDiaIJno&list=PLl3P-U08Zvwll_bzhyp-QPFO7CewOIGi2&index=7):

The data in this project is taken from ![YBI Foundation - Movie Recommendations](https://github.com/YBIFoundation/Dataset/raw/main/Movies%20Recommendation.csv)



## Import Library

In [None]:
import pandas as pd
import numpy as np

## Import Data

In [None]:
df = pd.read_csv("https://github.com/YBIFoundation/Dataset/raw/main/Movies%20Recommendation.csv")

## Describe Data

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4760 entries, 0 to 4759
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Movie_ID                  4760 non-null   int64  
 1   Movie_Title               4760 non-null   object 
 2   Movie_Genre               4760 non-null   object 
 3   Movie_Language            4760 non-null   object 
 4   Movie_Budget              4760 non-null   int64  
 5   Movie_Popularity          4760 non-null   float64
 6   Movie_Release_Date        4760 non-null   object 
 7   Movie_Revenue             4760 non-null   int64  
 8   Movie_Runtime             4758 non-null   float64
 9   Movie_Vote                4760 non-null   float64
 10  Movie_Vote_Count          4760 non-null   int64  
 11  Movie_Homepage            1699 non-null   object 
 12  Movie_Keywords            4373 non-null   object 
 13  Movie_Overview            4757 non-null   object 
 14  Movie_Pr

## Data Visualisation

In [None]:
df.head()

Unnamed: 0,Movie_ID,Movie_Title,Movie_Genre,Movie_Language,Movie_Budget,Movie_Popularity,Movie_Release_Date,Movie_Revenue,Movie_Runtime,Movie_Vote,...,Movie_Homepage,Movie_Keywords,Movie_Overview,Movie_Production_House,Movie_Production_Country,Movie_Spoken_Language,Movie_Tagline,Movie_Cast,Movie_Crew,Movie_Director
0,1,Four Rooms,Crime Comedy,en,4000000,22.87623,09-12-1995,4300000,98.0,6.5,...,,hotel new year's eve witch bet hotel room,It's Ted the Bellhop's first night on the job....,"[{""name"": ""Miramax Films"", ""id"": 14}, {""name"":...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,"[{'name': 'Allison Anders', 'gender': 1, 'depa...",Allison Anders
1,2,Star Wars,Adventure Action Science Fiction,en,11000000,126.393695,25-05-1977,775398007,121.0,8.1,...,http://www.starwars.com/films/star-wars-episod...,android galaxy hermit death star lightsaber,Princess Leia is captured and held hostage by ...,"[{""name"": ""Lucasfilm"", ""id"": 1}, {""name"": ""Twe...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,"[{'name': 'George Lucas', 'gender': 2, 'depart...",George Lucas
2,3,Finding Nemo,Animation Family,en,94000000,85.688789,30-05-2003,940335536,100.0,7.6,...,http://movies.disney.com/finding-nemo,father son relationship harbor underwater fish...,"Nemo, an adventurous young clownfish, is unexp...","[{""name"": ""Pixar Animation Studios"", ""id"": 3}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton
3,4,Forrest Gump,Comedy Drama Romance,en,55000000,138.133331,06-07-1994,677945399,142.0,8.2,...,,vietnam veteran hippie mentally disabled runni...,A man with a low IQ has accomplished great thi...,"[{""name"": ""Paramount Pictures"", ""id"": 4}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,"[{'name': 'Alan Silvestri', 'gender': 2, 'depa...",Robert Zemeckis
4,5,American Beauty,Drama,en,15000000,80.878605,15-09-1999,356296601,122.0,7.9,...,http://www.dreamworks.com/ab/,male nudity female nudity adultery midlife cri...,"Lester Burnham, a depressed suburban father in...","[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes


In [None]:
df.shape

(4760, 21)

In [None]:
df.columns

Index(['Movie_ID', 'Movie_Title', 'Movie_Genre', 'Movie_Language',
       'Movie_Budget', 'Movie_Popularity', 'Movie_Release_Date',
       'Movie_Revenue', 'Movie_Runtime', 'Movie_Vote', 'Movie_Vote_Count',
       'Movie_Homepage', 'Movie_Keywords', 'Movie_Overview',
       'Movie_Production_House', 'Movie_Production_Country',
       'Movie_Spoken_Language', 'Movie_Tagline', 'Movie_Cast', 'Movie_Crew',
       'Movie_Director'],
      dtype='object')

## Define Target Variable (y) and Feature Variables (x)

In [None]:
df_features = df[["Movie_Homepage","Movie_Spoken_Language","Movie_Tagline","Movie_Cast","Movie_Director"]].fillna("")

In [None]:
df_features.shape

(4760, 5)

In [None]:
df_features

Unnamed: 0,Movie_Homepage,Movie_Spoken_Language,Movie_Tagline,Movie_Cast,Movie_Director
0,,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,Allison Anders
1,http://www.starwars.com/films/star-wars-episod...,"[{""iso_639_1"": ""en"", ""name"": ""English""}]","A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,George Lucas
2,http://movies.disney.com/finding-nemo,"[{""iso_639_1"": ""en"", ""name"": ""English""}]","There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,Andrew Stanton
3,,"[{""iso_639_1"": ""en"", ""name"": ""English""}]","The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,Robert Zemeckis
4,http://www.dreamworks.com/ab/,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,Sam Mendes
...,...,...,...,...,...
4755,,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",The hot spot where Satan's waitin'.,Lisa Hart Carroll Michael Des Barres Paul Drak...,Pece Dingo
4756,http://www.growingupsmithmovie.com,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",It’s better to stand out than to fit in.,Roni Akurati Brighton Sharbino Jason Lee Anjul...,Frank Lotito
4757,,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",She never knew it could happen to her...,Nicole Smolen Kim Baldwin Ariana Stephens Brys...,Jaco Booyens
4758,,[],,,


In [None]:
x = df_features["Movie_Homepage"]+' '+df_features["Movie_Spoken_Language"]+' '+df_features["Movie_Tagline"]+' '+df_features["Movie_Cast"]+' '+df_features["Movie_Director"]
x

0        [{"iso_639_1": "en", "name": "English"}] Twel...
1       http://www.starwars.com/films/star-wars-episod...
2       http://movies.disney.com/finding-nemo [{"iso_6...
3        [{"iso_639_1": "en", "name": "English"}] The ...
4       http://www.dreamworks.com/ab/ [{"iso_639_1": "...
                              ...                        
4755     [{"iso_639_1": "en", "name": "English"}] The ...
4756    http://www.growingupsmithmovie.com [{"iso_639_...
4757     [{"iso_639_1": "en", "name": "English"}] She ...
4758                                                []   
4759                []  Tony Oppedisano Simon Napier-Bell
Length: 4760, dtype: object

## Train Test Split

In [None]:
from sklearn.model_selection import train_test_split
x_train,x_test = train_test_split(x,test_size=0.33,random_state=2569)
x_train

1964     [{"iso_639_1": "en", "name": "English"}] If y...
1997     [{"iso_639_1": "en", "name": "English"}, {"is...
1830    http://tv.disney.go.com/disneychannel/original...
4368     [{"iso_639_1": "en", "name": "English"}] The ...
3621     [{"iso_639_1": "en", "name": "English"}]  Jam...
                              ...                        
4568    http://mazerunnermovies.com [{"iso_639_1": "en...
3738     [{"iso_639_1": "en", "name": "English"}] Are ...
4644     [{"iso_639_1": "en", "name": "English"}]   Je...
3864     [{"iso_639_1": "en", "name": "English"}] Get ...
2137     [{"iso_639_1": "ar", "name": "\u0627\u0644\u0...
Length: 3189, dtype: object

In [None]:
x_train.shape

(3189,)

## Data Preprocessing

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [None]:
tfidf = TfidfVectorizer()

In [None]:
X = tfidf.fit_transform(x_train)
X.shape

(3189, 13197)

In [None]:
print(X)

  (0, 5770)	0.03509928056650798
  (0, 3639)	0.03680197613782599
  (0, 8182)	0.03509928056650798
  (0, 3660)	0.036813535654173245
  (0, 5595)	0.16913797813659448
  (0, 13050)	0.2268790755080981
  (0, 9446)	0.1702049167481373
  (0, 7026)	0.21827158084515602
  (0, 4188)	0.1334007176146176
  (0, 11365)	0.15182856911202933
  (0, 11842)	0.18564508274505015
  (0, 12384)	0.202797571244812
  (0, 2394)	0.19317131018510383
  (0, 11661)	0.10522110434039075
  (0, 12948)	0.22050931007773544
  (0, 9027)	0.2014376805191069
  (0, 12339)	0.22050931007773544
  (0, 6373)	0.22050931007773544
  (0, 12462)	0.18650124399647466
  (0, 8563)	0.22824631487790745
  (0, 205)	0.1633457504548396
  (0, 4678)	0.21827158084515602
  (0, 7125)	0.2229014839708566
  (0, 4905)	0.23822104891065887
  (0, 11943)	0.2014376805191069
  :	:
  (3188, 8182)	0.10997521268267137
  (3188, 3660)	0.03844880719133592
  (3188, 5841)	0.12418945933491803
  (3188, 4238)	0.12329841041705866
  (3188, 4244)	0.12279891357631224
  (3188, 11962)	0.1

## Modelling and Model Evaluation

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
Similarity_Score = cosine_similarity(X)

In [None]:
Similarity_Score

array([[1.        , 0.07105401, 0.00439514, ..., 0.01397073, 0.05357769,
        0.01055009],
       [0.07105401, 1.        , 0.00707595, ..., 0.02249215, 0.00898783,
        0.0199248 ],
       [0.00439514, 0.00707595, 1.        , ..., 0.01186872, 0.00474272,
        0.00896274],
       ...,
       [0.01397073, 0.02249215, 0.01186872, ..., 1.        , 0.01507558,
        0.02848965],
       [0.05357769, 0.00898783, 0.00474272, ..., 0.01507558, 1.        ,
        0.01138442],
       [0.01055009, 0.0199248 , 0.00896274, ..., 0.02848965, 0.01138442,
        1.        ]])

In [None]:
Similarity_Score.shape

(3189, 3189)

## Prediction

In [None]:
Fav_Movie = input("Enter your favourite movie name:")

Enter your favourite movie name: avtar


In [None]:
Movie_title_List = df["Movie_Title"].tolist()

In [None]:
import difflib

In [None]:
Movie_Recommendation = difflib.get_close_matches(Fav_Movie,Movie_title_List)
print(Movie_Recommendation)

['Avatar', 'Salvador', 'Water']


In [None]:
Close_Match = Movie_Recommendation[0]
print(Close_Match)

Avatar


In [None]:
Close_Match_Index = df[df.Movie_Title == Close_Match]["Movie_ID"].values[0]
print(Close_Match_Index)

2692


In [None]:
# Getting a list of similar movies
Recommend_Score = list(enumerate(Similarity_Score[Close_Match_Index]))
print(Recommend_Score)

[(0, 0.024334591467174586), (1, 0.028855070952625433), (2, 0.011428741501304837), (3, 0.016085317181881525), (4, 0.005668162674924043), (5, 0.011586844416690946), (6, 0.03336782829438101), (7, 0.006763800013036206), (8, 0.004959891968246814), (9, 0.04726693196508997), (10, 0.005380446537627554), (11, 0.014829889945996565), (12, 0.07924492858839828), (13, 0.031126867613616153), (14, 0.006637195215540293), (15, 0.01638215413084274), (16, 0.016672926089305252), (17, 0.014167581912908725), (18, 0.02748253505414346), (19, 0.0065341505985396535), (20, 0.006721022550434679), (21, 0.014847745904581768), (22, 0.019303346018193262), (23, 0.06004992570991528), (24, 0.008556326933632402), (25, 0.05122112530487128), (26, 0.0), (27, 0.01871111334742471), (28, 0.019509404672932166), (29, 0.09721570449638803), (30, 0.049157464840104095), (31, 0.007294698458394407), (32, 0.006213313596356981), (33, 0.038185232973226284), (34, 0.012457016601257245), (35, 0.01770063347686232), (36, 0.031465192514269005),

In [None]:
len(Recommend_Score)

3189

In [None]:
## Step - 7: Get all movies sort based on recommendation score wrt fav movie
Sorted_Similar_Movies = sorted(Recommend_Score,key = lambda x:x[1], reverse=True)
print(Sorted_Similar_Movies)

[(2692, 0.9999999999999998), (562, 0.13782990767200964), (2206, 0.13338283994050207), (2921, 0.13195355074431603), (295, 0.11867453576891537), (928, 0.11604577353078498), (1838, 0.11558765833899606), (2700, 0.11259882528784979), (958, 0.11005709462308426), (2005, 0.10997392072884728), (2905, 0.10777998730852947), (2124, 0.10774026108847207), (768, 0.10185969117874488), (3020, 0.10100616126803866), (849, 0.09806816598040083), (2469, 0.09749378665124583), (29, 0.09721570449638803), (1736, 0.09399575194382778), (62, 0.09306070326450598), (103, 0.09263224910063414), (3126, 0.0924921954927604), (2502, 0.09233715427652364), (498, 0.09012132089573316), (2142, 0.08988406124561141), (891, 0.08964365525755236), (726, 0.08945786919789984), (2538, 0.08903216495542973), (1530, 0.08873900041140778), (1859, 0.08797711146208681), (1377, 0.08750376574927976), (1132, 0.08646476738364564), (1314, 0.08630304111081563), (1161, 0.08629756395560517), (2624, 0.08435803041641403), (437, 0.0827051946233314), (2

In [None]:
# print the name of top 30 similar movies based on index

print("Top 30 movies suggestes for you: \n")
i = 1
for movie in Sorted_Similar_Movies:
    index = movie[0]
    title_index = df[df.index==index]["Movie_Title"].values[0]
    if i< 31:
        print(i,'----',title_index)
        i+=1

Top 30 movies suggestes for you: 

1 ---- Niagara
2 ---- Copycat
3 ---- Bangkok Dangerous
4 ---- Triangle
5 ---- On Her Majesty's Secret Service
6 ---- The Sentinel
7 ---- Valentine
8 ---- Chocolate: Deep Dark Secrets
9 ---- Racing Stripes
10 ---- Old School
11 ---- My Date with Drew
12 ---- Undercover Brother
13 ---- As Good as It Gets
14 ---- Real Women Have Curves
15 ---- The Brothers Grimm
16 ---- What's Love Got to Do with It
17 ---- Before Sunrise
18 ---- Cradle 2 the Grave
19 ---- Brokeback Mountain
20 ---- Volver
21 ---- Blade II
22 ---- The Rose
23 ---- Teenage Mutant Ninja Turtles III
24 ---- Cry Freedom
25 ---- Impostor
26 ---- The Princess Bride
27 ---- All That Jazz
28 ---- The Saint
29 ---- Dressed to Kill
30 ---- The Fast and the Furious: Tokyo Drift


In [None]:
# print the name of top 10 similar movies based on index

print("Top 30 movies suggestes for you: \n")
i = 1
for movie in Sorted_Similar_Movies:
    index = movie[0]
    title_index = df[df.index==index]["Movie_Title"].values[0]
    if i< 11:
        print(i,'----',title_index)
        i+=1

Top 30 movies suggestes for you: 

1 ---- Niagara
2 ---- Caravans
3 ---- Brokeback Mountain
4 ---- Night of the Living Dead
5 ---- Mad Hot Ballroom
6 ---- Some Like It Hot
7 ---- The Kentucky Fried Movie
8 ---- The Misfits
9 ---- Tora! Tora! Tora!
10 ---- Superman III


## Explaination

This model implements the movie recommendation system using linear regression