# Movies Recommender System Project

***

## Introduction

This is a data science portfolio project of movies recommender system. The data used for the model is downloaded from kaggle website(kaggle.com). In this project, I will try to do data cleaning to have identifiable features of separate words, and use cosine similarty to get the best recommendation for some movie. the model will be provided at the end of the notebook to try it as a codeblock.

Data consists of 2 tables with a primary key column to use for joining the data.
both tables are (.csv files)

The first table is (titles), and consists of 5850 rows and 15 columns.
Columns are: 
- id 	
- title 	
- type 	
- description 	
- release_year 	
- age_certification 	
- runtime 	
- genres 	
- production_countries 	
- seasons 	
- imdb_id 	
- imdb_score 	
- imdb_votes 	
- tmdb_popularity 	
- tmdb_score

The second table is (credits), and consists of 77801 rows and 5 columns.
- Columns are:
- person_id 	
- id 	
- name 	
- character 	
- role

In [635]:
# importing libraries
import pandas as pd
import numpy as np

import re
import difflib

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import spacy
import nltk
from nltk.corpus import stopwords


In [636]:
titles  = pd.read_csv('Data/titles.csv')
credits = pd.read_csv('Data/credits.csv')


In [637]:
titles.head(1)


Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.6,


In [638]:
credits.head(1)


Unnamed: 0,person_id,id,name,character,role
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR


***

## EDA

### Titles

In [639]:
titles


Unnamed: 0,id,title,type,description,release_year,age_certification,runtime,genres,production_countries,seasons,imdb_id,imdb_score,imdb_votes,tmdb_popularity,tmdb_score
0,ts300399,Five Came Back: The Reference Films,SHOW,This collection includes 12 World War II-era p...,1945,TV-MA,51,['documentation'],['US'],1.0,,,,0.600,
1,tm84618,Taxi Driver,MOVIE,A mentally unstable Vietnam War veteran works ...,1976,R,114,"['drama', 'crime']",['US'],,tt0075314,8.2,808582.0,40.965,8.179
2,tm154986,Deliverance,MOVIE,Intent on seeing the Cahulawassee River before...,1972,R,109,"['drama', 'action', 'thriller', 'european']",['US'],,tt0068473,7.7,107673.0,10.010,7.300
3,tm127384,Monty Python and the Holy Grail,MOVIE,"King Arthur, accompanied by his squire, recrui...",1975,PG,91,"['fantasy', 'action', 'comedy']",['GB'],,tt0071853,8.2,534486.0,15.461,7.811
4,tm120801,The Dirty Dozen,MOVIE,12 American military prisoners in World War II...,1967,,150,"['war', 'action']","['GB', 'US']",,tt0061578,7.7,72662.0,20.398,7.600
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5845,tm1014599,Fine Wine,MOVIE,A beautiful love story that can happen between...,2021,,100,"['romance', 'drama']",['NG'],,tt13857480,6.8,45.0,1.466,
5846,tm898842,C/O Kaadhal,MOVIE,A heart warming film that explores the concept...,2021,,134,['drama'],[],,tt11803618,7.7,348.0,,
5847,tm1059008,Lokillo,MOVIE,A controversial TV host and comedian who has b...,2021,,90,['comedy'],['CO'],,tt14585902,3.8,68.0,26.005,6.300
5848,tm1035612,Dad Stop Embarrassing Me - The Afterparty,MOVIE,"Jamie Foxx, David Alan Grier and more from the...",2021,PG-13,37,[],['US'],,,,,1.296,10.000


In [640]:
titles = titles.drop_duplicates().reset_index(drop = True)
titles.shape


(5850, 15)

In [641]:
titles = titles.drop(
    columns = ['release_year', 'age_certification', 'runtime', 'production_countries', 'seasons', 
               'imdb_score', 'imdb_votes', 'tmdb_popularity', 'imdb_id', 'tmdb_score']
)


In [642]:
titles['type'].unique()


array(['SHOW', 'MOVIE'], dtype=object)

In [643]:
titles = titles[titles['type'] == 'MOVIE'].drop(columns = ['type']).reset_index(drop = True)
titles.shape


(3744, 4)

In [644]:
titles.isnull().sum()


id             0
title          1
description    9
genres         0
dtype: int64

In [645]:
titles = titles.dropna().reset_index(drop = True)
titles


Unnamed: 0,id,title,description,genres
0,tm84618,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']"
1,tm154986,Deliverance,Intent on seeing the Cahulawassee River before...,"['drama', 'action', 'thriller', 'european']"
2,tm127384,Monty Python and the Holy Grail,"King Arthur, accompanied by his squire, recrui...","['fantasy', 'action', 'comedy']"
3,tm120801,The Dirty Dozen,12 American military prisoners in World War II...,"['war', 'action']"
4,tm70993,Life of Brian,"Brian Cohen is an average young Jewish man, bu...",['comedy']
...,...,...,...,...
3730,tm1074617,Bling Empire - The Afterparty,"The stars of ""Bling Empire"" discuss the show's...",[]
3731,tm1014599,Fine Wine,A beautiful love story that can happen between...,"['romance', 'drama']"
3732,tm898842,C/O Kaadhal,A heart warming film that explores the concept...,['drama']
3733,tm1059008,Lokillo,A controversial TV host and comedian who has b...,['comedy']


### Credits

In [646]:
credits


Unnamed: 0,person_id,id,name,character,role
0,3748,tm84618,Robert De Niro,Travis Bickle,ACTOR
1,14658,tm84618,Jodie Foster,Iris Steensma,ACTOR
2,7064,tm84618,Albert Brooks,Tom,ACTOR
3,3739,tm84618,Harvey Keitel,Matthew 'Sport' Higgins,ACTOR
4,48933,tm84618,Cybill Shepherd,Betsy,ACTOR
...,...,...,...,...,...
77796,736339,tm1059008,Adelaida Buscato,María Paz,ACTOR
77797,399499,tm1059008,Luz Stella Luengas,Karen Bayona,ACTOR
77798,373198,tm1059008,Inés Prieto,Fanny,ACTOR
77799,378132,tm1059008,Isabel Gaona,Cacica,ACTOR


In [647]:
credits = credits.drop_duplicates().reset_index(drop = True)
credits.shape


(77801, 5)

In [648]:
credits = credits.drop(columns = ['person_id', 'character', 'role']).reset_index(drop = True)
credits.head()


Unnamed: 0,id,name
0,tm84618,Robert De Niro
1,tm84618,Jodie Foster
2,tm84618,Albert Brooks
3,tm84618,Harvey Keitel
4,tm84618,Cybill Shepherd


In [649]:
credits.isnull().sum()


id      0
name    0
dtype: int64

In [650]:
credits = credits.transpose()
credits


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,77791,77792,77793,77794,77795,77796,77797,77798,77799,77800
id,tm84618,tm84618,tm84618,tm84618,tm84618,tm84618,tm84618,tm84618,tm84618,tm84618,...,tm1059008,tm1059008,tm1059008,tm1059008,tm1059008,tm1059008,tm1059008,tm1059008,tm1059008,tm1059008
name,Robert De Niro,Jodie Foster,Albert Brooks,Harvey Keitel,Cybill Shepherd,Peter Boyle,Leonard Harris,Diahnne Abbott,Gino Ardito,Martin Scorsese,...,Jessica Cediel,Javier Gardeazábal,Carla Giraldo,Ana María Sánchez,Aída Morales,Adelaida Buscato,Luz Stella Luengas,Inés Prieto,Isabel Gaona,Julian Gaviria


In [651]:
len(credits.columns)


77801

In [652]:
t = []
d = {}

for i in credits.columns:
    if i == len(credits.columns)-1:
        break
    elif credits[i][0] == credits[i+1][0]:
        t.append(credits[i][1])
    else:
        t.append(credits[i][1])
        d[credits[i][0]] = t
        t = []
        continue
d


{'tm84618': ['Robert De Niro',
  'Jodie Foster',
  'Albert Brooks',
  'Harvey Keitel',
  'Cybill Shepherd',
  'Peter Boyle',
  'Leonard Harris',
  'Diahnne Abbott',
  'Gino Ardito',
  'Martin Scorsese',
  'Murray Moston',
  'Richard Higgs',
  'Bill Minkin',
  'Bob Maroff',
  'Victor Argo',
  'Joe Spinell',
  'Robinson Frank Adu',
  'Brenda Dickson',
  'Norman Matlock',
  'Harry Northup',
  'Harlan Cary Poe',
  'Steven Prince',
  'Peter Savage',
  'Nicholas Shields',
  'Ralph S. Singleton',
  'Annie Gagen',
  'Carson Grant',
  'Mary-Pat Green',
  'Debbi Morgan',
  'Don Stroud',
  'Copper Cunningham',
  'Garth Avery',
  'Nat Grant',
  'Billie Perkins',
  'Catherine Scorsese',
  'Charles Scorsese',
  'Martin Scorsese'],
 'tm154986': ['Jon Voight',
  'Burt Reynolds',
  'Ned Beatty',
  'Ronny Cox',
  'Ed Ramey',
  'Billy Redden',
  'Seamon Glass',
  'Randall Deal',
  'Bill McKinney',
  "Herbert 'Cowboy' Coward",
  'Lewis Crone',
  'Ken Keener',
  'Johnny Popwell',
  'John Fowler',
  'Kathy 

### Titles & Credits combined into df

In [653]:
titles.head(1)


Unnamed: 0,id,title,description,genres
0,tm84618,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']"


In [654]:
d


{'tm84618': ['Robert De Niro',
  'Jodie Foster',
  'Albert Brooks',
  'Harvey Keitel',
  'Cybill Shepherd',
  'Peter Boyle',
  'Leonard Harris',
  'Diahnne Abbott',
  'Gino Ardito',
  'Martin Scorsese',
  'Murray Moston',
  'Richard Higgs',
  'Bill Minkin',
  'Bob Maroff',
  'Victor Argo',
  'Joe Spinell',
  'Robinson Frank Adu',
  'Brenda Dickson',
  'Norman Matlock',
  'Harry Northup',
  'Harlan Cary Poe',
  'Steven Prince',
  'Peter Savage',
  'Nicholas Shields',
  'Ralph S. Singleton',
  'Annie Gagen',
  'Carson Grant',
  'Mary-Pat Green',
  'Debbi Morgan',
  'Don Stroud',
  'Copper Cunningham',
  'Garth Avery',
  'Nat Grant',
  'Billie Perkins',
  'Catherine Scorsese',
  'Charles Scorsese',
  'Martin Scorsese'],
 'tm154986': ['Jon Voight',
  'Burt Reynolds',
  'Ned Beatty',
  'Ronny Cox',
  'Ed Ramey',
  'Billy Redden',
  'Seamon Glass',
  'Randall Deal',
  'Bill McKinney',
  "Herbert 'Cowboy' Coward",
  'Lewis Crone',
  'Ken Keener',
  'Johnny Popwell',
  'John Fowler',
  'Kathy 

In [655]:
df = titles.copy()
df['actors_directors'] = df['id'].map(d)
df = df.drop(columns = ['id'])
df


Unnamed: 0,title,description,genres,actors_directors
0,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']","[Robert De Niro, Jodie Foster, Albert Brooks, ..."
1,Deliverance,Intent on seeing the Cahulawassee River before...,"['drama', 'action', 'thriller', 'european']","[Jon Voight, Burt Reynolds, Ned Beatty, Ronny ..."
2,Monty Python and the Holy Grail,"King Arthur, accompanied by his squire, recrui...","['fantasy', 'action', 'comedy']","[Graham Chapman, John Cleese, Eric Idle, Terry..."
3,The Dirty Dozen,12 American military prisoners in World War II...,"['war', 'action']","[Lee Marvin, Ernest Borgnine, Charles Bronson,..."
4,Life of Brian,"Brian Cohen is an average young Jewish man, bu...",['comedy'],"[Graham Chapman, John Cleese, Terry Gilliam, E..."
...,...,...,...,...
3730,Bling Empire - The Afterparty,"The stars of ""Bling Empire"" discuss the show's...",[],
3731,Fine Wine,A beautiful love story that can happen between...,"['romance', 'drama']","[Richard Mofe-Damijo, Ego Nwosu, Keppy Ekpenyo..."
3732,C/O Kaadhal,A heart warming film that explores the concept...,['drama'],
3733,Lokillo,A controversial TV host and comedian who has b...,['comedy'],


In [656]:
df.isnull().sum()


title                0
description          0
genres               0
actors_directors    92
dtype: int64

In [657]:
df = df.dropna().reset_index(drop = True)
df


Unnamed: 0,title,description,genres,actors_directors
0,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']","[Robert De Niro, Jodie Foster, Albert Brooks, ..."
1,Deliverance,Intent on seeing the Cahulawassee River before...,"['drama', 'action', 'thriller', 'european']","[Jon Voight, Burt Reynolds, Ned Beatty, Ronny ..."
2,Monty Python and the Holy Grail,"King Arthur, accompanied by his squire, recrui...","['fantasy', 'action', 'comedy']","[Graham Chapman, John Cleese, Eric Idle, Terry..."
3,The Dirty Dozen,12 American military prisoners in World War II...,"['war', 'action']","[Lee Marvin, Ernest Borgnine, Charles Bronson,..."
4,Life of Brian,"Brian Cohen is an average young Jewish man, bu...",['comedy'],"[Graham Chapman, John Cleese, Terry Gilliam, E..."
...,...,...,...,...
3638,Kongsi Raya,Jack - a Chinese chef-manager who is in-line t...,['comedy'],"[Ai Leng Ong, Chew Kin-Wah, Harith Iskander, E..."
3639,Sun of the Soil,"In 14th-century Mali, an ambitious young royal...",[],[Joe Penney]
3640,Princess 'Daya'Reese,Reese is a con artist from Manila who dreams o...,"['comedy', 'romance']","[Maymay Entrata, Edward Barber, Snooky Serna, ..."
3641,My Bride,The story follows a young man and woman who go...,"['romance', 'comedy', 'drama']","[Ahmed Hatem, Jamila Awad, Mahmoud Al-Bezzawy,..."


### Columns Data Types and String Formatting

In [658]:
df.head(1)


Unnamed: 0,title,description,genres,actors_directors
0,Taxi Driver,A mentally unstable Vietnam War veteran works ...,"['drama', 'crime']","[Robert De Niro, Jodie Foster, Albert Brooks, ..."


In [659]:
df.dtypes


title               object
description         object
genres              object
actors_directors    object
dtype: object

### NLP

In [660]:
nlp = spacy.load('en_core_web_sm')
nlp


<spacy.lang.en.English at 0x17d43b47350>

In [661]:
def nlp_stop_words(text):
    doc = nlp(text)
    filtered_words = [token.text for token in doc if not token.is_stop]
    return filtered_words


In [662]:
df['description'] = df['description'].apply(nlp_stop_words)
df.head()


Unnamed: 0,title,description,genres,actors_directors
0,Taxi Driver,"[mentally, unstable, Vietnam, War, veteran, wo...","['drama', 'crime']","[Robert De Niro, Jodie Foster, Albert Brooks, ..."
1,Deliverance,"[Intent, seeing, Cahulawassee, River, turned, ...","['drama', 'action', 'thriller', 'european']","[Jon Voight, Burt Reynolds, Ned Beatty, Ronny ..."
2,Monty Python and the Holy Grail,"[King, Arthur, ,, accompanied, squire, ,, recr...","['fantasy', 'action', 'comedy']","[Graham Chapman, John Cleese, Eric Idle, Terry..."
3,The Dirty Dozen,"[12, American, military, prisoners, World, War...","['war', 'action']","[Lee Marvin, Ernest Borgnine, Charles Bronson,..."
4,Life of Brian,"[Brian, Cohen, average, young, Jewish, man, ,,...",['comedy'],"[Graham Chapman, John Cleese, Terry Gilliam, E..."


In [663]:
type(df['description'].iloc[0])


list

In [664]:
type(df['genres'].iloc[0])


str

In [665]:
type(df['actors_directors'].iloc[0])


list

In [666]:
df['genres'] = df['genres'].apply(eval)
df.head()


Unnamed: 0,title,description,genres,actors_directors
0,Taxi Driver,"[mentally, unstable, Vietnam, War, veteran, wo...","[drama, crime]","[Robert De Niro, Jodie Foster, Albert Brooks, ..."
1,Deliverance,"[Intent, seeing, Cahulawassee, River, turned, ...","[drama, action, thriller, european]","[Jon Voight, Burt Reynolds, Ned Beatty, Ronny ..."
2,Monty Python and the Holy Grail,"[King, Arthur, ,, accompanied, squire, ,, recr...","[fantasy, action, comedy]","[Graham Chapman, John Cleese, Eric Idle, Terry..."
3,The Dirty Dozen,"[12, American, military, prisoners, World, War...","[war, action]","[Lee Marvin, Ernest Borgnine, Charles Bronson,..."
4,Life of Brian,"[Brian, Cohen, average, young, Jewish, man, ,,...",[comedy],"[Graham Chapman, John Cleese, Terry Gilliam, E..."


In [667]:
def list_str(x):
    return ' '.join(x)


In [668]:
df['description'] = df['description'].apply(list_str)
df['genres'] = df['genres'].apply(list_str)
df['actors_directors'] = df['actors_directors'].apply(list_str)
df


Unnamed: 0,title,description,genres,actors_directors
0,Taxi Driver,mentally unstable Vietnam War veteran works ni...,drama crime,Robert De Niro Jodie Foster Albert Brooks Harv...
1,Deliverance,Intent seeing Cahulawassee River turned huge l...,drama action thriller european,Jon Voight Burt Reynolds Ned Beatty Ronny Cox ...
2,Monty Python and the Holy Grail,"King Arthur , accompanied squire , recruits Kn...",fantasy action comedy,Graham Chapman John Cleese Eric Idle Terry Gil...
3,The Dirty Dozen,12 American military prisoners World War II or...,war action,Lee Marvin Ernest Borgnine Charles Bronson Jim...
4,Life of Brian,"Brian Cohen average young Jewish man , series ...",comedy,Graham Chapman John Cleese Terry Gilliam Eric ...
...,...,...,...,...
3638,Kongsi Raya,Jack - Chinese chef - manager - line family`s ...,comedy,Ai Leng Ong Chew Kin-Wah Harith Iskander Erra ...
3639,Sun of the Soil,"14th - century Mali , ambitious young royal na...",,Joe Penney
3640,Princess 'Daya'Reese,Reese con artist Manila dreams living like roy...,comedy romance,Maymay Entrata Edward Barber Snooky Serna Jeff...
3641,My Bride,story follows young man woman situations journ...,romance comedy drama,Ahmed Hatem Jamila Awad Mahmoud Al-Bezzawy Sab...


## Build the Recommender System

In [669]:
combined_features = df['title'] + ' ' + df['genres'] + ' ' + df['actors_directors'] + ' ' + df['description']
combined_features


0       Taxi Driver drama crime Robert De Niro Jodie F...
1       Deliverance drama action thriller european Jon...
2       Monty Python and the Holy Grail fantasy action...
3       The Dirty Dozen war action Lee Marvin Ernest B...
4       Life of Brian comedy Graham Chapman John Clees...
                              ...                        
3638    Kongsi Raya comedy Ai Leng Ong Chew Kin-Wah Ha...
3639    Sun of the Soil  Joe Penney 14th - century Mal...
3640    Princess 'Daya'Reese comedy romance Maymay Ent...
3641    My Bride romance comedy drama Ahmed Hatem Jami...
3642    Fine Wine romance drama Richard Mofe-Damijo Eg...
Length: 3643, dtype: object

In [670]:
all_movie_titles = df['title'].tolist()
len(all_movie_titles)


3643

In [671]:
feature_vectors = TfidfVectorizer().fit_transform(combined_features)
print(feature_vectors)


  (0, 649)	0.03713699347864403
  (0, 50155)	0.08031165960113815
  (0, 49304)	0.1054001367478412
  (0, 15902)	0.10799380866829246
  (0, 44037)	0.12102878700010611
  (0, 11720)	0.12102878700010611
  (0, 36217)	0.12102878700010611
  (0, 9483)	0.059420595093556995
  (0, 52112)	0.06873043023126105
  (0, 33471)	0.047115305448587864
  (0, 47635)	0.05377786364201811
  (0, 33644)	0.06389248317578458
  (0, 51540)	0.07806079488341865
  (0, 49905)	0.08757856813300288
  (0, 50652)	0.057828472023975355
  (0, 49995)	0.09963206694789228
  (0, 49195)	0.10130762609547411
  (0, 30840)	0.10799380866829246
  (0, 8787)	0.06495659105967864
  (0, 8330)	0.07194991788073797
  (0, 36272)	0.08977148649557629
  (0, 5692)	0.09963206694789228
  (0, 33054)	0.10130762609547411
  (0, 3700)	0.09553955629552521
  (0, 17519)	0.10130762609547411
  :	:
  (3642, 4684)	0.18352670555788708
  (3642, 51288)	0.1643342262667767
  (3642, 52379)	0.1643342262667767
  (3642, 11305)	0.16782595434566414
  (3642, 31684)	0.167825954345664

In [672]:
similarity = cosine_similarity(feature_vectors)
print(similarity)


[[1.00000000e+00 6.72491995e-03 4.46575566e-03 ... 0.00000000e+00
  7.73141535e-04 5.58795418e-03]
 [6.72491995e-03 1.00000000e+00 1.13549571e-02 ... 0.00000000e+00
  9.23214903e-04 1.08189853e-03]
 [4.46575566e-03 1.13549571e-02 1.00000000e+00 ... 6.00050375e-03
  2.72548939e-03 0.00000000e+00]
 ...
 [0.00000000e+00 0.00000000e+00 6.00050375e-03 ... 1.00000000e+00
  1.15296326e-02 5.53111581e-03]
 [7.73141535e-04 9.23214903e-04 2.72548939e-03 ... 1.15296326e-02
  1.00000000e+00 1.40593664e-02]
 [5.58795418e-03 1.08189853e-03 0.00000000e+00 ... 5.53111581e-03
  1.40593664e-02 1.00000000e+00]]


In [673]:
movie_name = input('Enter a movie name to show some movie recommendations : ')


Enter a movie name to show some movie recommendations : Insidious


In [674]:
find_close_match = difflib.get_close_matches(movie_name, all_movie_titles)
find_close_match


['Insidious', 'Insidious: Chapter 2']

In [675]:
close_match = find_close_match[0]
index_of_the_movie = df[df['title'] == close_match].index.values[0]
index_of_the_movie


183

In [676]:
similarity_score = list(enumerate(similarity[index_of_the_movie]))
similarity_score


[(0, 0.0),
 (1, 0.013439714005959457),
 (2, 0.021885816939886962),
 (3, 0.01317229204307082),
 (4, 0.015599166034852842),
 (5, 0.02580341327251951),
 (6, 0.016031508116952253),
 (7, 0.006690010824673448),
 (8, 0.008463087962196091),
 (9, 0.02044259549258468),
 (10, 0.0038951294830229534),
 (11, 0.0036734429439335653),
 (12, 0.0),
 (13, 0.0),
 (14, 0.0),
 (15, 0.006426056900323684),
 (16, 0.0),
 (17, 0.0),
 (18, 0.0),
 (19, 0.0030328711015718906),
 (20, 0.0),
 (21, 0.027555108926114263),
 (22, 0.0),
 (23, 0.004171944990898004),
 (24, 0.010746354934637502),
 (25, 0.024838621410026768),
 (26, 0.003281967580862739),
 (27, 0.0),
 (28, 0.029548348662601337),
 (29, 0.0),
 (30, 0.0),
 (31, 0.021774598844841654),
 (32, 0.01868893560706975),
 (33, 0.017643611006827646),
 (34, 0.0077352734139485325),
 (35, 0.023041248067966993),
 (36, 0.06496309153869771),
 (37, 0.006874915958458721),
 (38, 0.0413816792635687),
 (39, 0.05311804510368163),
 (40, 0.0045543109883229285),
 (41, 0.016222480724814255),

In [677]:
sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True) 
sorted_similar_movies


[(183, 1.0),
 (624, 0.3140117686976799),
 (832, 0.12367018353742726),
 (615, 0.0995498383321309),
 (822, 0.06951613556636253),
 (2103, 0.0684604792494215),
 (36, 0.06496309153869771),
 (622, 0.06207843653441278),
 (3526, 0.057018920895729165),
 (3204, 0.05565630246672401),
 (1666, 0.054299191226915566),
 (1248, 0.05363077351738162),
 (39, 0.05311804510368163),
 (2024, 0.05245222931826412),
 (3001, 0.05044046296124119),
 (3141, 0.050404144104889255),
 (1203, 0.050319903892151865),
 (2601, 0.04964044038004761),
 (3013, 0.048965390440529784),
 (481, 0.048826485685915584),
 (1465, 0.04820284880395956),
 (1849, 0.04736154954769012),
 (2497, 0.04730140594980232),
 (2444, 0.04650654120279199),
 (3357, 0.0461716719425179),
 (1435, 0.04488889004115184),
 (618, 0.0447759023974847),
 (1083, 0.044536405705787596),
 (665, 0.04448609249128216),
 (412, 0.044370610688552886),
 (746, 0.04434585540153069),
 (3056, 0.04405258033610717),
 (85, 0.043989205801554104),
 (306, 0.04379038430495981),
 (2004, 0.

In [678]:
print(f'Top 30 Movies Suggestions for you based on your choice "{movie_name}" : \n')

i = 1
for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = df[df.index==index]['title'].values[0]
    if i <= 30:
        print(i, '.', title_from_index)
        i+=1
    

Top 30 Movies Suggestions for you based on your choice "Insidious" : 

1 . Insidious
2 . Insidious: Chapter 2
3 . The Conjuring 2
4 . The Conjuring
5 . Ouija: Origin of Evil
6 . The Silence
7 . A Nightmare on Elm Street
8 . Dark Skies
9 . Puff: Wonders of the Reef
10 . Dirty Daddy: The Bob Saget Tribute
11 . F.R.E.D.I.
12 . Buster's Mal Heart
13 . Christine
14 . Official Secrets
15 . Incantation
16 . Love Hard
17 . Cam
18 . The Trap
19 . tick, tick... BOOM!
20 . The Dark Knight Rises
21 . Benji
22 . Vir Das: Losing It
23 . The Scary House
24 . Hospital
25 . The Soul
26 . Dear Ex
27 . Big Eyes
28 . Trailer Park Boys: Drunk, High and Unemployed: Live In Austin
29 . The Haunting in Connecticut 2: Ghosts of Georgia
30 . We Are Family


***

# Test the Recommender System as a user

In [634]:
import pandas as pd
import numpy as np
import re
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import spacy
import nltk
from nltk.corpus import stopwords

titles  = pd.read_csv('Data/titles.csv')
credits = pd.read_csv('Data/credits.csv')

titles = titles.drop_duplicates().reset_index(drop = True)
titles = titles.drop(
    columns = ['release_year', 'age_certification', 'runtime', 'production_countries', 'seasons', 
               'imdb_score', 'imdb_votes', 'tmdb_popularity', 'imdb_id', 'tmdb_score']
)
titles = titles[titles['type'] == 'MOVIE'].drop(columns = ['type']).reset_index(drop = True)
titles = titles.dropna().reset_index(drop = True)

credits = credits.drop_duplicates().reset_index(drop = True)
credits = credits.drop(columns = ['person_id', 'character', 'role']).reset_index(drop = True)
credits = credits.transpose()

t = []
d = {}

for i in credits.columns:
    if i == len(credits.columns)-1:
        break
    elif credits[i][0] == credits[i+1][0]:
        t.append(credits[i][1])
    else:
        t.append(credits[i][1])
        d[credits[i][0]] = t
        t = []
        continue
        
df = titles.copy()
df['actors_directors'] = df['id'].map(d)
df = df.drop(columns = ['id'])
df = df.dropna().reset_index(drop = True)
nlp = spacy.load('en_core_web_sm')

def nlp_stop_words(text):
    doc = nlp(text)
    filtered_words = [token.text for token in doc if not token.is_stop]
    return filtered_words

df['description'] = df['description'].apply(nlp_stop_words)
df['genres'] = df['genres'].apply(eval)

def list_str(x):
    return ' '.join(x)

df['description'] = df['description'].apply(list_str)
df['genres'] = df['genres'].apply(list_str)
df['actors_directors'] = df['actors_directors'].apply(list_str)

combined_features = df['title'] + ' ' + df['genres'] + ' ' + df['actors_directors'] + ' ' + df['description']
all_movie_titles = df['title'].tolist()
feature_vectors = TfidfVectorizer().fit_transform(combined_features)
similarity = cosine_similarity(feature_vectors)

movie_name = input('Enter a movie name to show some movie recommendations : ')
find_close_match = difflib.get_close_matches(movie_name, all_movie_titles)
close_match = find_close_match[0]
index_of_the_movie = df[df['title'] == close_match].index.values[0]
similarity_score = list(enumerate(similarity[index_of_the_movie]))
sorted_similar_movies = sorted(similarity_score, key = lambda x:x[1], reverse = True) 

print(f'Top 30 Movies Suggestions for you based on your choice "{movie_name}" : \n')

i = 1
for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = df[df.index==index]['title'].values[0]
    if i <= 30:
        print(i, '.', title_from_index)
        i+=1
        

Enter a movie name to show some movie recommendations : Insidious
Top 30 Movies Suggestions for you based on your choice "Insidious" : 

1 . Insidious
2 . Insidious: Chapter 2
3 . The Conjuring 2
4 . The Conjuring
5 . Ouija: Origin of Evil
6 . The Silence
7 . A Nightmare on Elm Street
8 . Dark Skies
9 . Puff: Wonders of the Reef
10 . Dirty Daddy: The Bob Saget Tribute
11 . F.R.E.D.I.
12 . Buster's Mal Heart
13 . Christine
14 . Official Secrets
15 . Incantation
16 . Love Hard
17 . Cam
18 . The Trap
19 . tick, tick... BOOM!
20 . The Dark Knight Rises
21 . Benji
22 . Vir Das: Losing It
23 . The Scary House
24 . Hospital
25 . The Soul
26 . Dear Ex
27 . Big Eyes
28 . Trailer Park Boys: Drunk, High and Unemployed: Live In Austin
29 . The Haunting in Connecticut 2: Ghosts of Georgia
30 . We Are Family


**End of Project**