# ML: Supervised algorithm - NLP - Film Recommender

# Introduction

We are going to create a film recommendation system. This system is "content-based", i.e. it is based on the content of the films (the summaries of their plots) and simply recommends films similar to those that a user has seen or liked before.

Our system will recommend films similar to those that have been seen so far.

## 1. Import libraries

In [1]:
import pandas as pd
import numpy as np
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk import word_tokenize
from nltk.stem import PorterStemmer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', None)

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\joaqu\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\joaqu\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\joaqu\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\joaqu\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


## 2. Load datasets

In [2]:
movies = pd.read_csv('tmdb_5000_movies.csv')

In [3]:
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""name"": ""Fantasy""}, {""id"": 878, ""name"": ""Science Fiction""}]",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"": 2964, ""name"": ""future""}, {""id"": 3386, ""name"": ""space war""}, {""id"": 3388, ""name"": ""space colony""}, {""id"": 3679, ""name"": ""society""}, {""id"": 3801, ""name"": ""space travel""}, {""id"": 9685, ""name"": ""futuristic""}, {""id"": 9840, ""name"": ""romance""}, {""id"": 9882, ""name"": ""space""}, {""id"": 9951, ""name"": ""alien""}, {""id"": 10148, ""name"": ""tribe""}, {""id"": 10158, ""name"": ""alien planet""}, {""id"": 10987, ""name"": ""cgi""}, {""id"": 11399, ""name"": ""marine""}, {""id"": 13065, ""name"": ""soldier""}, {""id"": 14643, ""name"": ""battle""}, {""id"": 14720, ""name"": ""love affair""}, {""id"": 165431, ""name"": ""anti war""}, {""id"": 193554, ""name"": ""power relations""}, {""id"": 206690, ""name"": ""mind and soul""}, {""id"": 209714, ""name"": ""3d""}]",en,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289}, {""name"": ""Twentieth Century Fox Film Corporation"", ""id"": 306}, {""name"": ""Dune Entertainment"", ""id"": 444}, {""name"": ""Lightstorm Entertainment"", ""id"": 574}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}, {""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}]",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso_639_1"": ""es"", ""name"": ""Espa\u00f1ol""}]",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""name"": ""Fantasy""}, {""id"": 28, ""name"": ""Action""}]",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""name"": ""drug abuse""}, {""id"": 911, ""name"": ""exotic island""}, {""id"": 1319, ""name"": ""east india trading company""}, {""id"": 2038, ""name"": ""love of one's life""}, {""id"": 2052, ""name"": ""traitor""}, {""id"": 2580, ""name"": ""shipwreck""}, {""id"": 2660, ""name"": ""strong woman""}, {""id"": 3799, ""name"": ""ship""}, {""id"": 5740, ""name"": ""alliance""}, {""id"": 5941, ""name"": ""calypso""}, {""id"": 6155, ""name"": ""afterlife""}, {""id"": 6211, ""name"": ""fighter""}, {""id"": 12988, ""name"": ""pirate""}, {""id"": 157186, ""name"": ""swashbuckler""}, {""id"": 179430, ""name"": ""aftercreditsstinger""}]",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""name"": ""Jerry Bruckheimer Films"", ""id"": 130}, {""name"": ""Second Mate Productions"", ""id"": 19936}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 80, ""name"": ""Crime""}]",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name"": ""based on novel""}, {""id"": 4289, ""name"": ""secret agent""}, {""id"": 9663, ""name"": ""sequel""}, {""id"": 14555, ""name"": ""mi6""}, {""id"": 156095, ""name"": ""british secret service""}, {""id"": 158431, ""name"": ""united kingdom""}]",en,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""name"": ""Danjaq"", ""id"": 10761}, {""name"": ""B24"", ""id"": 69434}]","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}, {""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""}, {""iso_639_1"": ""en"", ""name"": ""English""}, {""iso_639_1"": ""es"", ""name"": ""Espa\u00f1ol""}, {""iso_639_1"": ""it"", ""name"": ""Italiano""}, {""iso_639_1"": ""de"", ""name"": ""Deutsch""}]",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""name"": ""Crime""}, {""id"": 18, ""name"": ""Drama""}, {""id"": 53, ""name"": ""Thriller""}]",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853, ""name"": ""crime fighter""}, {""id"": 949, ""name"": ""terrorist""}, {""id"": 1308, ""name"": ""secret identity""}, {""id"": 1437, ""name"": ""burglar""}, {""id"": 3051, ""name"": ""hostage drama""}, {""id"": 3562, ""name"": ""time bomb""}, {""id"": 6969, ""name"": ""gotham city""}, {""id"": 7002, ""name"": ""vigilante""}, {""id"": 9665, ""name"": ""cover-up""}, {""id"": 9715, ""name"": ""superhero""}, {""id"": 9990, ""name"": ""villainess""}, {""id"": 10044, ""name"": ""tragic hero""}, {""id"": 13015, ""name"": ""terrorism""}, {""id"": 14796, ""name"": ""destruction""}, {""id"": 18933, ""name"": ""catwoman""}, {""id"": 156082, ""name"": ""cat burglar""}, {""id"": 156395, ""name"": ""imax""}, {""id"": 173272, ""name"": ""flood""}, {""id"": 179093, ""name"": ""criminal underworld""}, {""id"": 230775, ""name"": ""batman""}]",en,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""name"": ""Warner Bros."", ""id"": 6194}, {""name"": ""DC Entertainment"", ""id"": 9993}, {""name"": ""Syncopy"", ""id"": 9996}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""name"": ""Adventure""}, {""id"": 878, ""name"": ""Science Fiction""}]",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"": 839, ""name"": ""mars""}, {""id"": 1456, ""name"": ""medallion""}, {""id"": 3801, ""name"": ""space travel""}, {""id"": 7376, ""name"": ""princess""}, {""id"": 9951, ""name"": ""alien""}, {""id"": 10028, ""name"": ""steampunk""}, {""id"": 10539, ""name"": ""martian""}, {""id"": 10685, ""name"": ""escape""}, {""id"": 161511, ""name"": ""edgar rice burroughs""}, {""id"": 163252, ""name"": ""alien race""}, {""id"": 179102, ""name"": ""superhuman strength""}, {""id"": 190320, ""name"": ""mars civilization""}, {""id"": 195446, ""name"": ""sword and planet""}, {""id"": 207928, ""name"": ""19th century""}, {""id"": 209714, ""name"": ""3d""}]",en,John Carter,"John Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands.",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States of America""}]",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


## 3. Dataset analysis

In [4]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

We are only interested in the title and the summary of the film.

In [5]:
plot = movies.loc[:,['title','overview']]
plot.head()

Unnamed: 0,title,overview
0,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization."
1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems."
2,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE."
3,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy."
4,John Carter,"John Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands."


## 4. Pre-processing and cleaning of texts

In [6]:
print(plot.shape)
plot.isna().sum()

(4803, 2)


title       0
overview    3
dtype: int64

In [7]:
plot.dropna(inplace=True)
plot.reset_index(inplace=True)
plot.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4800 entries, 0 to 4799
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   index     4800 non-null   int64 
 1   title     4800 non-null   object
 2   overview  4800 non-null   object
dtypes: int64(1), object(2)
memory usage: 112.6+ KB


In [8]:
dark_knight = "Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy."
dark_knight

"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy."

### 4.1. Tokenization

In [9]:
tokens = word_tokenize(dark_knight)
tokens

['Following',
 'the',
 'death',
 'of',
 'District',
 'Attorney',
 'Harvey',
 'Dent',
 ',',
 'Batman',
 'assumes',
 'responsibility',
 'for',
 'Dent',
 "'s",
 'crimes',
 'to',
 'protect',
 'the',
 'late',
 'attorney',
 "'s",
 'reputation',
 'and',
 'is',
 'subsequently',
 'hunted',
 'by',
 'the',
 'Gotham',
 'City',
 'Police',
 'Department',
 '.',
 'Eight',
 'years',
 'later',
 ',',
 'Batman',
 'encounters',
 'the',
 'mysterious',
 'Selina',
 'Kyle',
 'and',
 'the',
 'villainous',
 'Bane',
 ',',
 'a',
 'new',
 'terrorist',
 'leader',
 'who',
 'overwhelms',
 'Gotham',
 "'s",
 'finest',
 '.',
 'The',
 'Dark',
 'Knight',
 'resurfaces',
 'to',
 'protect',
 'a',
 'city',
 'that',
 'has',
 'branded',
 'him',
 'an',
 'enemy',
 '.']

### 4.2. Clean and wrangling

In [10]:
nopunct = [word.lower() for word in tokens if word.isalpha()]
nopunct

['following',
 'the',
 'death',
 'of',
 'district',
 'attorney',
 'harvey',
 'dent',
 'batman',
 'assumes',
 'responsibility',
 'for',
 'dent',
 'crimes',
 'to',
 'protect',
 'the',
 'late',
 'attorney',
 'reputation',
 'and',
 'is',
 'subsequently',
 'hunted',
 'by',
 'the',
 'gotham',
 'city',
 'police',
 'department',
 'eight',
 'years',
 'later',
 'batman',
 'encounters',
 'the',
 'mysterious',
 'selina',
 'kyle',
 'and',
 'the',
 'villainous',
 'bane',
 'a',
 'new',
 'terrorist',
 'leader',
 'who',
 'overwhelms',
 'gotham',
 'finest',
 'the',
 'dark',
 'knight',
 'resurfaces',
 'to',
 'protect',
 'a',
 'city',
 'that',
 'has',
 'branded',
 'him',
 'an',
 'enemy']

In [11]:
"'s" in nopunct

False

In [12]:
stopwords = nltk.corpus.stopwords.words('english')
stopwords

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [13]:
"could" in stopwords

False

In [14]:
"might" in stopwords

False

In [15]:
stopwords.append("could")
stopwords.append("might")

In [16]:
"could" in stopwords

True

In [17]:
"might" in stopwords

True

In [18]:
nosw = [word for word in nopunct if word not in stopwords]
nosw

['following',
 'death',
 'district',
 'attorney',
 'harvey',
 'dent',
 'batman',
 'assumes',
 'responsibility',
 'dent',
 'crimes',
 'protect',
 'late',
 'attorney',
 'reputation',
 'subsequently',
 'hunted',
 'gotham',
 'city',
 'police',
 'department',
 'eight',
 'years',
 'later',
 'batman',
 'encounters',
 'mysterious',
 'selina',
 'kyle',
 'villainous',
 'bane',
 'new',
 'terrorist',
 'leader',
 'overwhelms',
 'gotham',
 'finest',
 'dark',
 'knight',
 'resurfaces',
 'protect',
 'city',
 'branded',
 'enemy']

### 4.3. Lemmatization

In [19]:
wn = nltk.WordNetLemmatizer()

lemmas = [wn.lemmatize(word) for word in nosw]
lemmas

['following',
 'death',
 'district',
 'attorney',
 'harvey',
 'dent',
 'batman',
 'assumes',
 'responsibility',
 'dent',
 'crime',
 'protect',
 'late',
 'attorney',
 'reputation',
 'subsequently',
 'hunted',
 'gotham',
 'city',
 'police',
 'department',
 'eight',
 'year',
 'later',
 'batman',
 'encounter',
 'mysterious',
 'selina',
 'kyle',
 'villainous',
 'bane',
 'new',
 'terrorist',
 'leader',
 'overwhelms',
 'gotham',
 'finest',
 'dark',
 'knight',
 'resurfaces',
 'protect',
 'city',
 'branded',
 'enemy']

### 4.4. Function

In [20]:
def clean_text(text):
    tokens = word_tokenize(text)
    nopunct = [word.lower() for word in tokens if word.isalpha()]
    nosw = [word for word in nopunct if word not in stopwords]
    lemmas = [wn.lemmatize(word) for word in nosw]
    return " ".join(lemmas)

Apply it to the whole dataset, creating a new column for clean text.

In [21]:
plot['clean_overview']=plot['overview'].apply(clean_text)

In [22]:
plot.head(11)

Unnamed: 0,index,title,overview,clean_overview
0,0,Avatar,"In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization.",century paraplegic marine dispatched moon pandora unique mission becomes torn following order protecting alien civilization
1,1,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, has come back to life and is headed to the edge of the Earth with Will Turner and Elizabeth Swann. But nothing is quite as it seems.",captain barbossa long believed dead come back life headed edge earth turner elizabeth swann nothing quite seems
2,2,Spectre,"A cryptic message from Bond’s past sends him on a trail to uncover a sinister organization. While M battles political forces to keep the secret service alive, Bond peels back the layers of deceit to reveal the terrible truth behind SPECTRE.",cryptic message bond past sends trail uncover sinister organization battle political force keep secret service alive bond peel back layer deceit reveal terrible truth behind spectre
3,3,The Dark Knight Rises,"Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes to protect the late attorney's reputation and is subsequently hunted by the Gotham City Police Department. Eight years later, Batman encounters the mysterious Selina Kyle and the villainous Bane, a new terrorist leader who overwhelms Gotham's finest. The Dark Knight resurfaces to protect a city that has branded him an enemy.",following death district attorney harvey dent batman assumes responsibility dent crime protect late attorney reputation subsequently hunted gotham city police department eight year later batman encounter mysterious selina kyle villainous bane new terrorist leader overwhelms gotham finest dark knight resurfaces protect city branded enemy
4,4,John Carter,"John Carter is a war-weary, former military captain who's inexplicably transported to the mysterious and exotic planet of Barsoom (Mars) and reluctantly becomes embroiled in an epic conflict. It's a world on the brink of collapse, and Carter rediscovers his humanity when he realizes the survival of Barsoom and its people rests in his hands.",john carter former military captain inexplicably transported mysterious exotic planet barsoom mar reluctantly becomes embroiled epic conflict world brink collapse carter rediscovers humanity realizes survival barsoom people rest hand
5,5,Spider-Man 3,"The seemingly invincible Spider-Man goes up against an all-new crop of villain – including the shape-shifting Sandman. While Spider-Man’s superpowers are altered by an alien organism, his alter ego, Peter Parker, deals with nemesis Eddie Brock and also gets caught up in a love triangle.",seemingly invincible go crop villain including sandman superpower altered alien organism alter ego peter parker deal nemesis eddie brock also get caught love triangle
6,6,Tangled,"When the kingdom's most wanted-and most charming-bandit Flynn Rider hides out in a mysterious tower, he's taken hostage by Rapunzel, a beautiful and feisty tower-bound teen with 70 feet of magical, golden hair. Flynn's curious captor, who's looking for her ticket out of the tower where she's been locked away for years, strikes a deal with the handsome thief and the unlikely duo sets off on an action-packed escapade, complete with a super-cop horse, an over-protective chameleon and a gruff gang of pub thugs.",kingdom flynn rider hide mysterious tower taken hostage rapunzel beautiful feisty teen foot magical golden hair flynn curious captor looking ticket tower locked away year strike deal handsome thief unlikely duo set escapade complete horse chameleon gruff gang pub thug
7,7,Avengers: Age of Ultron,"When Tony Stark tries to jumpstart a dormant peacekeeping program, things go awry and Earth’s Mightiest Heroes are put to the ultimate test as the fate of the planet hangs in the balance. As the villainous Ultron emerges, it is up to The Avengers to stop him from enacting his terrible plans, and soon uneasy alliances and unexpected action pave the way for an epic and unique global adventure.",tony stark try jumpstart dormant peacekeeping program thing go awry earth mightiest hero put ultimate test fate planet hang balance villainous ultron emerges avenger stop enacting terrible plan soon uneasy alliance unexpected action pave way epic unique global adventure
8,8,Harry Potter and the Half-Blood Prince,"As Harry begins his sixth year at Hogwarts, he discovers an old book marked as 'Property of the Half-Blood Prince', and begins to learn more about Lord Voldemort's dark past.",harry begin sixth year hogwarts discovers old book marked prince begin learn lord voldemort dark past
9,9,Batman v Superman: Dawn of Justice,"Fearing the actions of a god-like Super Hero left unchecked, Gotham City’s own formidable, forceful vigilante takes on Metropolis’s most revered, modern-day savior, while the world wrestles with what sort of hero it really needs. And with Batman and Superman at war with one another, a new threat quickly arises, putting mankind in greater danger than it’s ever known before.",fearing action super hero left unchecked gotham city formidable forceful vigilante take metropolis revered savior world wrestle sort hero really need batman superman war one another new threat quickly arises putting mankind greater danger ever known


# 5. Vectorization (Bag of Words: simple word count)

In [23]:
# Define a count vectorizer
bow = CountVectorizer()

# Build a BoW matrix
bow_matrix = bow.fit_transform(plot['clean_overview'])

# Print the dimensions of the vectors
print("Each vector has: "+str(bow_matrix.shape[1])+ " elements.")

# Convert it into a dataframe
bow_data = pd.DataFrame(bow_matrix.toarray())
bow_data.columns = bow.get_feature_names_out()
bow_data.head()

Each vector has: 18212 elements.


Unnamed: 0,aa,aaa,aames,aang,aaron,aba,abaddon,abagnale,abandon,abandoned,...,zorin,zorro,zuckerberg,zula,zuzu,zyklon,æon,éloigne,émigré,única
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
bow_data.head(100)

Unnamed: 0,aa,aaa,aames,aang,aaron,aba,abaddon,abagnale,abandon,abandoned,...,zorin,zorro,zuckerberg,zula,zuzu,zyklon,æon,éloigne,émigré,única
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# 6.  Cosine similarity (vector distances)

In [25]:
similarity_scores_bow = cosine_similarity(bow_data,bow_data)

In [26]:
similarity_scores_bow

array([[1.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 1.        , 0.04583492, ..., 0.07075491, 0.        ,
        0.        ],
       [0.        , 0.04583492, 1.        , ..., 0.02756589, 0.        ,
        0.        ],
       ...,
       [0.        , 0.07075491, 0.02756589, ..., 1.        , 0.02306328,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.02306328, 1.        ,
        0.0243975 ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.0243975 ,
        1.        ]])

# 7. Build recommender (model)

The steps we are going to follow are:
* Take the index of a film given its title.
* Draw a list of similarity scores for the movie in question compared to all movies.
* Convert this list into a list of tuples such that its first element is the index of movies and the second element is the similarity score. 
* Sort this list according to the similarity score (the second element of the tuples).
* Take the top 10 elements of this list, ignoring the first one as it will always be the original movie (the overview that most resembles the movie we are interested in is obviously its own overview). 
* Return the titles that correspond to the indexes of the top10.

In [27]:
# Save the titles and their indexes
indices = pd.Series(plot.index, index=plot['title'])
indices.shape
indices.head()

title
Avatar                                      0
Pirates of the Caribbean: At World's End    1
Spectre                                     2
The Dark Knight Rises                       3
John Carter                                 4
dtype: int64

In [33]:
def get_recommendations(title, scores):
    """Takes the title of the film and the type of similarity scores to be used as input +
    and returns the top10 most similar movies to the movie."""
    # See if the film is in our dataset
    if title in indices.keys():
    # Draw the index of the film we are interested in.
        idx = indices[title]
    # Draw the cosine similarity of all the films with respect to the film we are interested in.
        sim_scores = list(enumerate(scores[idx]))
    # Sort the list according to similarity score
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    # It scores the 10 most similar films.
        sim_scores = sim_scores[1:11]
        ids, scores=zip(*sim_scores)
        sim_series = pd.Series(scores, ids, name="similarity score")
    # Draw the ratings of the 10 most similar films
        movie_indices = [i[0] for i in sim_scores]
    # Return a df with the 10 most similar films containing their title, summary and similarity score.
        top_10_titles = plot['title'].iloc[movie_indices]
        top_10_overviews = plot['overview'].iloc[movie_indices]
        return pd.DataFrame([top_10_titles, top_10_overviews, sim_series]).transpose()

    else:
    # Retorna un mensaje indicando el error.
        print("This movie is not included in our dataset, sorry. Try again.")
    return 

In [34]:
get_recommendations('The Dark Knight Rises', similarity_scores_bow)

Unnamed: 0,title,overview,similarity score
299,Batman Forever,"The Dark Knight of Gotham City confronts a dastardly duo: Two-Face and the Riddler. Formerly District Attorney Harvey Dent, Two-Face believes Batman caused the courtroom accident which left him disfigured on one side. And Edward Nygma, computer-genius and former employee of millionaire Bruce Wayne, is out to get the philanthropist; as The Riddler. Former circus acrobat Dick Grayson, his family killed by Two-Face, becomes Wayne's ward and Batman's new partner Robin.",0.309142
1359,Batman,"The Dark Knight of Gotham City begins his war on crime with his first major enemy being the clownishly homicidal Joker, who has seized control of Gotham's underworld.",0.30657
65,The Dark Knight,"Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker.",0.271305
428,Batman Returns,"Having defeated the Joker, Batman now faces the Penguin - a warped and deformed individual who is intent on being accepted into Gotham society. Crooked businessman Max Schreck is coerced into helping him become Mayor of Gotham and they both attempt to expose Batman in a different light. Selina Kyle, Max's secretary, is thrown from the top of a building and is transformed into Catwoman - a mysterious figure who has the same personality disorder as Batman. Batman must attempt to clear his name, all the time deciding just what must be done with the Catwoman.",0.236228
2507,Slow Burn,"A district attorney (Ray Liotta) is involved in a 24-hour showdown with a gang leader (LL Cool J) and is, at the same time, being manipulated by an attractive assistant district attorney (Jolene Blalock) and a cryptic stranger.",0.195047
119,Batman Begins,"Driven by tragedy, billionaire Bruce Wayne dedicates his life to uncovering and defeating the corruption that plagues his home, Gotham City. Unable to work within the system, he instead creates a new identity, a symbol of fear for the criminal underworld - The Batman.",0.180021
3941,Hobo with a Shotgun,"A vigilante homeless man pulls into a new city and finds himself trapped in urban chaos, a city where crime rules and where the city's crime boss reigns. Seeing an urban landscape filled with armed robbers, corrupt cops, abused prostitutes and even a pedophile Santa, the Hobo goes about bringing justice to the city the best way he knows how - with a 20-gauge shotgun. Mayhem ensues when he tries to make things better for the future generation. Street justice will indeed prevail.",0.17696
1664,Dead Man Down,"In New York City, a crime lord's right-hand man is seduced by a woman seeking retribution.",0.169031
3853,"Batman: The Dark Knight Returns, Part 2",Batman has stopped the reign of terror that The Mutants had cast upon his city. Now an old foe wants a reunion and the government wants The Man of Steel to put a stop to Batman.,0.167183
1181,JFK,New Orleans District Attorney Jim Garrison discovers there's more to the Kennedy assassination than the official story.,0.161165


In [35]:
get_recommendations('Avatar', similarity_scores_bow)

Unnamed: 0,title,overview,similarity score
3603,Apollo 18,"Officially, Apollo 17 was the last manned mission to the moon. But a year later in 1973, three American astronauts were sent on a secret mission to the moon funded by the US Department of Defense. What you are about to see is the actual footage which the astronauts captured on that mission. While NASA denies it's authenticity, others say it's the real reason we've never gone back to the moon.",0.221313
778,Meet Dave,"A crew of miniature aliens operate a spaceship that has a human form. While trying to save their planet, the aliens encounter a new problem, as their ship becomes smitten with an Earth woman.",0.169031
1449,The Order,"For centuries, a secret Order of priests has existed within the Church. A renegade priest, Father Alex Bernier, is sent to Rome to investigate the mysterious death of one of the Order's most revered members. Following a series of strangely similar killings, Bernier launches an investigation that forces him to confront unimaginable evil.",0.16538
529,Tears of the Sun,"Navy SEAL Lieutenant A.K. Waters and his elite squadron of tactical specialists are forced to choose between their duty and their humanity, between following orders by ignoring the conflict that surrounds them, or finding the courage to follow their conscience and protect a group of innocent refugees. When the democratic government of Nigeria collapses and the country is taken over by a ruthless military dictator, Waters, a fiercely loyal and hardened veteran is dispatched on a routine mission to retrieve a Doctors Without Borders physician.",0.147542
2966,E.T. the Extra-Terrestrial,"After a gentle alien becomes stranded on Earth, the being is discovered and befriended by a young boy named Elliott. Bringing the extraterrestrial into his suburban California house, Elliott introduces E.T., as the alien is dubbed, to his brother and his little sister, Gertie, and the children decide to keep its existence a secret. Soon, however, E.T. falls ill, resulting in government intervention and a dire situation for both Elliott and the alien.",0.143223
2130,The American,"Dispatched to a small Italian town to await further orders, assassin Jack embarks on a double life that may be more relaxing than is good for him.",0.138013
2766,Birthday Girl,"A shy bank clerk orders a Russian mail order bride, and finds his life turned upside down.",0.138013
2994,Mad Max Beyond Thunderdome,"Mad Max becomes a pawn in a decadent oasis of a technological society, and when exiled, becomes the deliverer of a colony of children.",0.133333
3457,Duel in the Sun,"Beautiful half-breed Pearl Chavez becomes the ward of her dead father's first love and finds herself torn between her sons, one good and the other bad.",0.133333
1213,Aliens vs Predator: Requiem,"A sequel to 2004's Alien vs. Predator, the iconic creatures from two of the scariest film franchises in movie history wage their most brutal battle ever - in our own backyard. The small town of Gunnison, Colorado becomes a war zone between two of the deadliest extra-terrestrial life forms - the Alien and the Predator. When a Predator scout ship crash-lands in the hills outside the town, Alien Facehuggers and a hybrid Alien/Predator are released and begin to terrorize the town.",0.131165


In [36]:
get_recommendations('Jurassic Park', similarity_scores_bow)

Unnamed: 0,title,overview,similarity score
28,Jurassic World,"Twenty-two years after the events of Jurassic Park, Isla Nublar now features a fully functioning dinosaur theme park, Jurassic World, as originally envisioned by John Hammond.",0.292615
2527,National Lampoon's Vacation,"Clark Griswold is on a quest to take his family on a quest to Walley World theme park for a vacation, but things don't go exactly as planned.",0.184932
1983,Meet the Deedles,Two surfers end up as Yellowstone park rangers and have to stop a former ranger who is out for revenge.,0.17609
1580,The Nut Job,"Surly, a curmudgeon, independent squirrel is banished from his park and forced to survive in the city. Lucky for him, he stumbles on the one thing that may be able to save his life, and the rest of park community, as they gear up for winter - Maury's Nut Store.",0.17609
479,Walking With Dinosaurs,"Walking with Dinosaurs 3D is a film depicting life-like 3D dinosaur characters set in photo-real landscapes that transports audiences to the prehistoric world as it existed 70 million years ago. The film is based on the 1999 documentary television miniseries Walking with Dinosaurs, produced by the BBC. Walking with Dinosaurs 3D is being produced by Evergreen Studios, the company that produced Happy Feet, and it is was released on October 11, 2013.",0.174306
4740,The FP,"Two rival gangs fight for control of Frazier Park -- a deadly arena in competitive dance-fight video game ""Beat-Beat Revolution.""",0.169182
1536,Vacation,"Hoping to bring his family closer together and to recreate his childhood vacation for his own kids, a grown up Rusty Griswold takes his wife and their two sons on a cross-country road trip to the coolest theme park in America, Walley World. Needless to say, things don't go quite as planned.",0.167054
2522,The Imitation Game,"Based on the real life story of legendary cryptanalyst Alan Turing, the film portrays the nail-biting race against time by Turing and his brilliant team of code-breakers at Britain's top-secret Government Code and Cypher School at Bletchley Park, during the darkest days of World War II.",0.141591
952,Beverly Hills Cop III,"Back in sunny southern California and on the trail of two murderers, Axel Foley again teams up with LA cop Billy Rosewood. Soon, they discover that an amusement park is being used as a front for a massive counterfeiting ring – and it's run by the same gang that shot Billy's boss.",0.139212
3033,Mud,Two teenage boys encounter a fugitive and make a pact to help him escape from an island in the Mississippi.,0.13794


In [37]:
get_recommendations('The Muppets', similarity_scores_bow)

Unnamed: 0,title,overview,similarity score
3232,The Muppet Movie,"Kermit the Frog is persuaded by agent Dom DeLuise to pursue a career in Hollywood. Along the way, Kermit picks up Fozzie Bear, Miss Piggy, Gonzo, and a motley crew of other Muppets with similar aspirations. Meanwhile, Kermit must elude the grasp of a frog-leg restaurant magnate.",0.170103
858,Muppets Most Wanted,"While on a grand world tour, The Muppets find themselves wrapped into an European jewel-heist caper headed by a Kermit the Frog look-alike and his dastardly sidekick.",0.169842
510,Children of Men,"In 2027, in a chaotic world in which humans can no longer procreate, a former activist agrees to help transport a miraculously pregnant woman to a sanctuary at sea, where her child's birth may help scientists save the future of humankind.",0.163299
4277,The 41–Year–Old Virgin Who Knocked Up Sarah Marshall and Felt Superbad About It,"Follows Andy, who needs to hook up with a hottie, pronto, because he hasn't had sex in... well, forever - and his luck isn't the only thing that's hard. His equally horny teenage roommates also need it superbad, and with the help of their nerdy pal, McAnalovin' and his fake I.D., they may tap more than just a keg.",0.15162
760,Analyze That,"The mafia's Paul Vitti is back in prison and will need some serious counseling when he gets out. Naturally, he returns to his analyst Dr. Ben Sobel for help and finds that Sobel needs some serious help himself as he has inherited the family practice, as well as an excess stock of stress.",0.138013
3037,Hey Arnold! The Movie,"When a powerful developer named Mr. Scheck wants to knock down all the stores and houses in Arnold's neighborhood to build a huge ""mall-plex"", it looks likes the neighborhood is doomed to disappear. But with the help of a superhero and a mysterious deep-voiced stranger, Arnold and Gerald will need to recover a crucial document in order to save their beloved neighborhood.",0.130744
3161,Thunderball,A criminal organization has obtained two nuclear bombs and are asking for a 100 million pound ransom in the form of diamonds in seven days or they will use the weapons. The secret service sends James Bond to the Bahamas to once again save the world.,0.125
3110,Police Academy: Mission to Moscow,"The Russians need help in dealing with the Mafia and so they seek help with the veterans of the Police Academy. They head off to Moscow, in order to find evidence against Konstantin Konali, who marketed a computer game that everyone in the world is playing.",0.122474
661,Zathura: A Space Adventure,"After their father is called into work, two young boys, Walter and Danny, are left in the care of their teenage sister, Lisa, and told they must stay inside. Walter and Danny, who anticipate a boring day, are shocked when they begin playing Zathura, a space-themed board game, which they realize has mystical powers when their house is shot into space. With the help of an astronaut, the boys attempt to return home.",0.120386
3449,West Side Story,"In the slums of the upper West Side of Manhattan, New York, a gang of Polish-American teenagers called the Jets compete with a rival gang of recently immigrated Puerto Ricans, the Sharks, to ""own"" the neighborhood streets. Tensions are high between the gangs but two kids, one from each rival gang, fall in love leading to tragedy.",0.119098


In [38]:
get_recommendations('Matrix', similarity_scores_bow)

This movie is not included in our dataset, sorry. Try again.


In [39]:
get_recommendations('The Matrix', similarity_scores_bow)

Unnamed: 0,title,overview,similarity score
1281,Hackers,"Along with his new friends, a teenager who was arrested by the US Secret Service and banned from using a computer for writing a computer virus discovers a plot by a nefarious hacker, but they must use their computer skills to find the evidence while being pursued by the Secret Service and the evil computer genius behind the virus.",0.294963
2088,Pulse,"When their computer hacker friend accidentally channels a mysterious wireless signal, a group of co-eds rally to stop a terrifying evil from taking over the world.",0.23694
3503,11:14,Tells the seemingly random yet vitally connected story of a set of incidents that all converge one evening at 11:14pm. The story follows the chain of events of five different characters and five different storylines that all converge to tell the story of murder and deceit.,0.214972
2484,The Thirteenth Floor,"Computer scientist Hannon Fuller has discovered something extremely important. He's about to tell the discovery to his colleague, Douglas Hall, but knowing someone is after him, the old man leaves a letter in his computer generated parallel world that's just like the 30's with seemingly real people with real emotions.",0.199681
333,Transcendence,"Two leading computer scientists work toward their goal of Technological Singularity, as a radical anti-technology organization fights to prevent them from creating a world where computers can transcend the abilities of the human brain.",0.195646
2614,The Love Letter,20th century computer games designer Scott exchanges love letters with 19th century poet Elizabeth Whitcomb through an antique desk that can make letters travel through time.,0.195646
2816,WarGames,"High School student David Lightman (Matthew Broderick) has a talent for hacking. But while trying to hack into a computer system to play unreleased video games, he unwittingly taps into the Defense Department's war computer and initiates a confrontation of global proportions! Together with his girlfriend (Ally Sheedy) and a wizardly computer genius (John Wood), David must race against time to outwit his opponent...and prevent a nuclear Armageddon.",0.189076
36,Transformers: Age of Extinction,"As humanity picks up the pieces, following the conclusion of ""Transformers: Dark of the Moon,"" Autobots and Decepticons have all but vanished from the face of the planet. However, a group of powerful, ingenious businessman and scientists attempt to learn from past Transformer incursions and push the boundaries of technology beyond what they can control - all while an ancient, powerful Transformer menace sets Earth in his cross-hairs.",0.174928
3001,The Lawnmower Man,A simple man is turned into a genius through the application of computer science.,0.173422
4229,The Believer,The movie tells the story of a young Jewish man who becomes fiercely anti-Semitic.,0.162221


In [40]:
get_recommendations('Pulp Fiction', similarity_scores_bow)

Unnamed: 0,title,overview,similarity score
3648,"Lovely, Still",A holiday fable that tells the story of an elderly man discovering love for the first time.,0.20702
4308,Grand Theft Parsons,"There are times when it's right and proper to simply bury the dead. This is not one of those times... Gram Parsons was one of the most influential musicians of his time; a bitter, brilliant, genius who knew Elvis, tripped with the Stones and fatally overdosed on morphine and tequila in 1973. And from his dying came a story. A story from deep within folklore; a story of friendship, honour and adventure; a story so extraordinary that if it didn't really happen, no one would believe it. Two men, a hearse, a dead rock star, five gallons of petrol, and a promise. And the most extraordinary chase of modern times.",0.205879
4002,Timecrimes,A man accidentally gets into a time machine and travels back in time nearly an hour. Finding himself will be the first of a series of disasters of unforeseeable consequences.,0.205738
4124,The Lost Medallion: The Adventures of Billy Stone,A man who stops into a foster home to drop off some donations soon tells the kids a story about two teenage friends who uncover a long-lost medallion that transports them back in time.,0.205738
3360,Alien Zone,"A man who is having an affair with a married woman is dropped off on the wrong street when going back to his hotel. He takes refuge out of the rain when an old man invites him in. He turns out to be a mortician, who tells him the stories of the people who have wound up in his establishment over the course of four stories.",0.199205
4694,She's Gotta Have It,"The story of Nola Darling's simultaneous sexual relationships with three different men is told by her and by her partners and other friends. All three men wanted her to commit solely to them; Nola resists being ""owned"" by a single partner.",0.199205
4621,Locker 13,"The story of Skip, a young ex-convict who takes a position as a night janitor at an old-west theme park. His supervisor Archie, teaches him the ropes, but more importantly attempts to convey critical philosophical messages through a series of four stories: a down and out boxer is given the opportunity to become a real golden gloves killer; an assassin kidnaps three people in order to find out who hired him for his latest hit; a new recruit is initiated into a lodge of fez-wearing businessmen where hazing can take a malevolent turn; and a member of a suicide club introduces real fear into a man about to jump to his death.",0.19245
3465,Sliding Doors,"Gwyneth Paltrow plays London publicist Helen, effortlessly sliding between parallel storylines that show what happens if she does or does not catch a train back to her apartment. Love. Romantic entanglements. Deception. Trust. Friendship. Comedy. All come into focus as the two stories shift back and forth, overlap and surprisingly converge.",0.184428
3726,Easy Money,"A three-tiered story centered around drugs and organized crime, and focused on a young man who becomes a runner for a coke dealer.",0.181568
4128,London,London is a drug laden adventure that centers on a party in a New York loft where a young man is trying to win back his ex-girlfriend.,0.174964


Overall, we see good results. But there is still room for improvement. For example, the results for "Pulp fiction" are not very intuitive.

We are going to try to improve it by using another type of vectorisation: TfIdf. Remember that Tf-Idf does not give as much weight to words that occur in many texts, i.e. it gives more weight to words that are less frequent and therefore more informative of the concrete meaning of the text we are interested in.

In [41]:
# Create a vector object
tfidf = TfidfVectorizer()

# Build the tfidf matrix
tfidf_matrix = tfidf.fit_transform(plot['clean_overview'])

# Print the dimensions of the vectors
print("Each vector has: "+str(tfidf_matrix.shape[1])+ " elements.")

# Convert it into a dataframe
tfidf_data=pd.DataFrame(tfidf_matrix.toarray())
tfidf_data.columns=tfidf.get_feature_names_out()
tfidf_data.head()

Nuestros vectores tienen: 18212 elementos.


Unnamed: 0,aa,aaa,aames,aang,aaron,aba,abaddon,abagnale,abandon,abandoned,...,zorin,zorro,zuckerberg,zula,zuzu,zyklon,æon,éloigne,émigré,única
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create the new similarity scores:

In [43]:
similarity_scores_tfidf = cosine_similarity(tfidf_data, tfidf_data)

Draw recommendations for Pulp Fiction with the new scores:

In [44]:
get_recommendations("Pulp Fiction", similarity_scores_tfidf)

Unnamed: 0,title,overview,similarity score
3525,The Sting,Set in the 1930's this intricate caper deals with an ambitious small-time crook and a veteran con man who seek revenge on a vicious crime lord who murdered one of their gang.,0.140591
3193,All or Nothing,"Penny's love for her partner, taxi-driver Phil, has run dry. He is a gentle, philosophical guy, and she works on the checkout at a supermarket...",0.138597
3465,Sliding Doors,"Gwyneth Paltrow plays London publicist Helen, effortlessly sliding between parallel storylines that show what happens if she does or does not catch a train back to her apartment. Love. Romantic entanglements. Deception. Trust. Friendship. Comedy. All come into focus as the two stories shift back and forth, overlap and surprisingly converge.",0.138153
3503,11:14,Tells the seemingly random yet vitally connected story of a set of incidents that all converge one evening at 11:14pm. The story follows the chain of events of five different characters and five different storylines that all converge to tell the story of murder and deceit.,0.136672
3622,Made,Two aspiring boxers lifelong friends get involved in a money-laundering scheme through a low-level organized crime group.,0.13384
4621,Locker 13,"The story of Skip, a young ex-convict who takes a position as a night janitor at an old-west theme park. His supervisor Archie, teaches him the ropes, but more importantly attempts to convey critical philosophical messages through a series of four stories: a down and out boxer is given the opportunity to become a real golden gloves killer; an assassin kidnaps three people in order to find out who hired him for his latest hit; a new recruit is initiated into a lodge of fez-wearing businessmen where hazing can take a malevolent turn; and a member of a suicide club introduces real fear into a man about to jump to his death.",0.133813
1949,Love Ranch,Story of a couple that starts the first legal brothel in Nevada and a boxer they own a piece of.,0.108179
3345,Jumping the Broom,Two very different families converge on Martha's Vineyard one weekend for a wedding.,0.105198
4694,She's Gotta Have It,"The story of Nola Darling's simultaneous sexual relationships with three different men is told by her and by her partners and other friends. All three men wanted her to commit solely to them; Nola resists being ""owned"" by a single partner.",0.10017
4102,Shortbus,"A group of New Yorkers caught up in their milieu converge at an underground salon infamous for its blend of art, music, politics, and carnality.. The characters converge in a weekly Brooklyn salon loosely inspired by various underground NYC gatherings that took place in the early 2000's.",0.092061


Let's try to understand why we get different results with each type of vectorisation:

In [45]:
def get_clean_overview(name):
    return list(plot[plot.title == name].clean_overview.values)[0]

In [46]:
get_clean_overview("Pulp Fiction")

'hit man philosophical partner gangster moll boxer converge sprawling comedic crime caper adventure unfurl three story ingeniously trip back forth time'

In [47]:
get_clean_overview("The Sting")

'set intricate caper deal ambitious crook veteran con man seek revenge vicious crime lord murdered one gang'

Common words: "caper", "crime", "man".

In [48]:
get_clean_overview("Lovely, Still")

'holiday fable tell story elderly man discovering love first time'

Common words: "story", "man", "time"

# 8. Final recommender (model)

Definimos una función de recomendación usando nuestro mejor recomendador (con tfidf).

In [110]:
def get_recs(movie):
    recs = get_recommendations(movie, similarity_scores_tfidf)
    if recs is not None: #film is in the dataset
        titles = list(recs["title"].values)
        msg = "If you enjoyed {}, we recommend... {}".format(movie, ", ".join(titles))
        return msg

In [111]:
get_recs("Jurassic Park")

"If you enjoyed Jurassic Park, we recommend... Jurassic World, Walking With Dinosaurs, National Lampoon's Vacation, The Nut Job, Meet the Deedles, The Way Way Back, Vacation, The FP, Adventureland, Sea Rex 3D: Journey to a Prehistoric World"

In [112]:
get_recs("Matrix")

This movie is not included in our dataset, sorry. Try again.
