## Content Based Movie Recommendation System

This notebook includes a Movie Recommendation system based on the Movie Summary Corpus Dataset.

#### Imports 

In [1]:
import os

#### Load the dataset

In [2]:
movie_summary_path = './../../Datasets/Movie_summary'

In [3]:
!ls ./../../Datasets/Movie_summary

README.MD              movie.metadata.tsv     tvtropes.clusters.txt
README.txt             name.clusters.txt
character.metadata.tsv plot_summaries.txt


In [4]:
movies = sc.textFile(os.path.join(movie_summary_path, 'movie.metadata.tsv'))

In [5]:
print(f"There are {movies.count()} in the Movie Summary dataset")

There are 81741 in the Movie Summary dataset


In [6]:
movies.take(5)

['975900\t/m/03vyhn\tGhosts of Mars\t2001-08-24\t14010832\t98.0\t{"/m/02h40lc": "English Language"}\t{"/m/09c7w0": "United States of America"}\t{"/m/01jfsb": "Thriller", "/m/06n90": "Science Fiction", "/m/03npn": "Horror", "/m/03k9fj": "Adventure", "/m/0fdjb": "Supernatural", "/m/02kdv5l": "Action", "/m/09zvmj": "Space western"}',
 '3196793\t/m/08yl5d\tGetting Away with Murder: The JonBenét Ramsey Mystery\t2000-02-16\t\t95.0\t{"/m/02h40lc": "English Language"}\t{"/m/09c7w0": "United States of America"}\t{"/m/02n4kr": "Mystery", "/m/03bxz7": "Biographical film", "/m/07s9rl0": "Drama", "/m/0hj3n01": "Crime Drama"}',
 '28463795\t/m/0crgdbh\tBrun bitter\t1988\t\t83.0\t{"/m/05f_3": "Norwegian Language"}\t{"/m/05b4w": "Norway"}\t{"/m/0lsxr": "Crime Fiction", "/m/07s9rl0": "Drama"}',
 '9363483\t/m/0285_cd\tWhite Of The Eye\t1987\t\t110.0\t{"/m/02h40lc": "English Language"}\t{"/m/07ssc": "United Kingdom"}\t{"/m/01jfsb": "Thriller", "/m/0glj9q": "Erotic thriller", "/m/09blyk": "Psychological th

In [7]:
summaries = sc.textFile(os.path.join(movie_summary_path, 'plot_summaries.txt'))

In [8]:
print(f"There are {summaries.count()} summaries avaiable in the Movie Summary dataset")

There are 42306 summaries avaiable in the Movie Summary dataset


In [9]:
summaries.take(5)

["23890098\tShlykov, a hard-working taxi driver and Lyosha, a saxophonist, develop a bizarre love-hate relationship, and despite their prejudices, realize they aren't so different after all.",
 '31186339\tThe nation of Panem consists of a wealthy Capitol and twelve poorer districts. As punishment for a past rebellion, each district must provide a boy and girl  between the ages of 12 and 18 selected by lottery  for the annual Hunger Games. The tributes must fight to the death in an arena; the sole survivor is rewarded with fame and wealth. In her first Reaping, 12-year-old Primrose Everdeen is chosen from District 12. Her older sister Katniss volunteers to take her place. Peeta Mellark, a baker\'s son who once gave Katniss bread when she was starving, is the other District 12 tribute. Katniss and Peeta are taken to the Capitol, accompanied by their frequently drunk mentor, past victor Haymitch Abernathy. He warns them about the "Career" tributes who train intensively at special academie

#### Spark transformations

Extract movie ids and titles

In [10]:
movie_ids_and_titles = movies.map(lambda elem: elem.split("\t")).map(lambda movie: (movie[0], movie[2]))

In [11]:
movie_ids_and_titles.take(5)

[('975900', 'Ghosts of Mars'),
 ('3196793', 'Getting Away with Murder: The JonBenét Ramsey Mystery'),
 ('28463795', 'Brun bitter'),
 ('9363483', 'White Of The Eye'),
 ('261236', 'A Woman in Flames')]

Extract the movie ids and summaries

In [12]:
ids_and_summaries = summaries.map(lambda elem: elem.split('\t')).map(lambda summary: (summary[0], summary[1]))

In [13]:
ids_and_summaries.take(5)

[('23890098',
  "Shlykov, a hard-working taxi driver and Lyosha, a saxophonist, develop a bizarre love-hate relationship, and despite their prejudices, realize they aren't so different after all."),
 ('31186339',
  'The nation of Panem consists of a wealthy Capitol and twelve poorer districts. As punishment for a past rebellion, each district must provide a boy and girl  between the ages of 12 and 18 selected by lottery  for the annual Hunger Games. The tributes must fight to the death in an arena; the sole survivor is rewarded with fame and wealth. In her first Reaping, 12-year-old Primrose Everdeen is chosen from District 12. Her older sister Katniss volunteers to take her place. Peeta Mellark, a baker\'s son who once gave Katniss bread when she was starving, is the other District 12 tribute. Katniss and Peeta are taken to the Capitol, accompanied by their frequently drunk mentor, past victor Haymitch Abernathy. He warns them about the "Career" tributes who train intensively at speci

Get only the ids of summaries

In [14]:
ids_of_summaries = ids_and_summaries.map(lambda elem: elem[0]).collect()

In [15]:
ids_of_summaries[:5]

['23890098', '31186339', '20663735', '2231378', '595909']

Keep only the movies for which you have summaries

In [16]:
kept_movies = movie_ids_and_titles.filter(lambda elem: elem[0] in ids_of_summaries)

In [17]:
kept_movies.count()

42207

We found that some summaries where not available, but we can proceed with the 42207 movies.

In [18]:
kept_movies.take(5)

[('975900', 'Ghosts of Mars'),
 ('9363483', 'White Of The Eye'),
 ('261236', 'A Woman in Flames'),
 ('18998739', "The Sorcerer's Apprentice"),
 ('6631279', 'Little city')]

Now we need to join the movie titles with their summaries

In [19]:
joint_rdd = kept_movies.join(ids_and_summaries).cache()

In [20]:
joint_rdd.count()

42207

In [21]:
joint_rdd.take(5)

[('156558',
  ('Baby Boy',
   'A young 20-year-old named Jody  lives with his mother Juanita ,{{amg movie}} in South Central Los Angeles. He spends most of his time with his unemployed best friend P , and does not seem interested in becoming a responsible adult. However, he is forced to mature as a result of an ex-con named Melvin , who moves into their home. Another factor is his children - a son with his girlfriend Yvette  and a daughter with a girl named Peanut, who also lives with her mother. At the beginning of the movie Yvette has an abortion that Jody forced her to have. Yvette constantly asks Jody if he will ever come live with her and their son, but Jody avoids the subject and comes and goes as he pleases. Jody also continues seeing and having sex with other women, including Peanut. This becomes an issue between him and Yvette as well, especially since Yvette and Peanut do not get along. When she discovers his cheating they get in a heated argument which results to Jody slappi

## Using a doc2vec model on the available data

Based on the idea found on this [this link](http://sujitpal.blogspot.com/2016/04/predicting-movie-tags-from-plots-using.html) we wish to use a document_2_vector model so that we can vectorize the content of the summaries to vectors, and that we can use them in order to find movies similar to the interest of a user based on the cosine similarity of the extracted documents.

In [53]:
from gensim.models.doc2vec import TaggedDocument
from gensim.models import Doc2Vec
from random import shuffle
from sklearn.model_selection import train_test_split
import nltk
import numpy as np

In [48]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/alexandros.ferles/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [23]:
model = Doc2Vec(dm=0, vector_size=100, negative=5, hs=0, min_count=2)

In [45]:
only_summaries = joint_rdd.map(lambda elem: elem[1][1]).collect()

In [61]:
sentences = [TaggedDocument(doc, [i]) for i, doc in enumerate(only_summaries)]

In [74]:
model = Doc2Vec(documents, vector_size=100, negative=5, hs=0, min_count=2)

Train the doc2Vec model

In [82]:
from tqdm import tqdm

In [83]:
alpha = 0.025
min_alpha = 0.001
num_epochs = 20
alpha_delta = (alpha - min_alpha) / num_epochs

for epoch in tqdm(range(num_epochs)):
    shuffle(sentences)
    model.alpha = alpha
    model.min_alpha = alpha
    model.train(sentences, total_examples=model.corpus_count, epochs=1)
    alpha -= alpha_delta

100%|██████████| 20/20 [04:23<00:00, 12.98s/it]


We can see that the trained model infers vectors form text, as expected

In [86]:
model.infer_vector(only_summaries[0])

array([ 3.35745700e-02,  6.84581045e-03, -8.33332837e-02,  5.95750324e-02,
        2.94893458e-02, -4.44047116e-02,  9.32905525e-02, -6.17083684e-02,
        9.08117965e-02, -2.30221674e-02, -4.95272763e-02,  2.69503519e-03,
        2.32944768e-02, -4.41004559e-02, -1.17280625e-01, -3.69116594e-03,
        6.42675674e-03, -2.95317061e-02,  6.27793670e-02, -9.50667355e-03,
        1.71459317e-01, -9.37651172e-02,  4.02137451e-02, -2.54107174e-03,
        1.86351165e-01, -8.39546844e-02,  1.42284231e-02, -2.95529608e-02,
       -4.00071740e-02,  5.61882891e-02, -5.88813685e-02,  3.18345875e-02,
       -4.92487708e-03,  9.68242437e-02, -6.64590523e-02, -1.77101403e-01,
       -3.58027667e-02,  6.84229238e-03, -3.82050276e-02, -1.06377356e-01,
        5.75787723e-02, -4.21793684e-02,  1.21596217e-01,  4.07193415e-02,
       -5.78942103e-03, -2.61177234e-02, -1.15547791e-01,  9.75271910e-02,
        2.47938503e-02, -4.07442264e-02, -3.79833542e-02, -2.93961391e-02,
        2.84741689e-02, -

Now we create a new rdd that contains the vector representations of each summary

In [94]:
joint_rdd.take(5)

[('156558',
  ('Baby Boy',
   'A young 20-year-old named Jody  lives with his mother Juanita ,{{amg movie}} in South Central Los Angeles. He spends most of his time with his unemployed best friend P , and does not seem interested in becoming a responsible adult. However, he is forced to mature as a result of an ex-con named Melvin , who moves into their home. Another factor is his children - a son with his girlfriend Yvette  and a daughter with a girl named Peanut, who also lives with her mother. At the beginning of the movie Yvette has an abortion that Jody forced her to have. Yvette constantly asks Jody if he will ever come live with her and their son, but Jody avoids the subject and comes and goes as he pleases. Jody also continues seeing and having sex with other women, including Peanut. This becomes an issue between him and Yvette as well, especially since Yvette and Peanut do not get along. When she discovers his cheating they get in a heated argument which results to Jody slappi

In [107]:
vectorized_summaries = [model.infer_vector(summary) for summary in only_summaries]

In [117]:
indexed_movies = joint_rdd.zipWithIndex()

In [149]:
vectorized_rdd = \
indexed_movies.map(lambda movie: (movie[0][0], movie[0][1][0], movie[0][1][1], vectorized_summaries[movie[1]] ) ).cache()

In [152]:
vectorized_rdd.count()

42207

In [155]:
vectorized_rdd.take(5)

[('156558',
  'Baby Boy',
  'A young 20-year-old named Jody  lives with his mother Juanita ,{{amg movie}} in South Central Los Angeles. He spends most of his time with his unemployed best friend P , and does not seem interested in becoming a responsible adult. However, he is forced to mature as a result of an ex-con named Melvin , who moves into their home. Another factor is his children - a son with his girlfriend Yvette  and a daughter with a girl named Peanut, who also lives with her mother. At the beginning of the movie Yvette has an abortion that Jody forced her to have. Yvette constantly asks Jody if he will ever come live with her and their son, but Jody avoids the subject and comes and goes as he pleases. Jody also continues seeing and having sex with other women, including Peanut. This becomes an issue between him and Yvette as well, especially since Yvette and Peanut do not get along. When she discovers his cheating they get in a heated argument which results to Jody slapping

So we have created an RDD that contains the representations of each movie in a document vector. 

In order to complete the recommender, we need to get the reviews of each user and and form them in a document vector that represents them appropriately. 

The advantage of these methods is that we do not need to take reviews for the given database! We just need to form a document vector that represents a movie and find which of the available(vectorized) movies are mmore similar to this movie. 

For a single user, we need to create a weighted average of her ratings (based on her reviews, movie rated with more stars are expected to have greater importance in her future selections) and use this weighted average to suggest new movies to watch.

Our metric of similarity is the cosine similarity.

#### Implementing a content-based recommender

Firs of all, we no longer need the movie ids

In [161]:
titles_summaries_and_vectors = vectorized_rdd.map(lambda movie: (movie[1], movie[2], movie[3]))

In [162]:
titles_summaries_and_vectors.count()

42207

In [164]:
titles_summaries_and_vectors.take(1)

[('Baby Boy',
  'A young 20-year-old named Jody  lives with his mother Juanita ,{{amg movie}} in South Central Los Angeles. He spends most of his time with his unemployed best friend P , and does not seem interested in becoming a responsible adult. However, he is forced to mature as a result of an ex-con named Melvin , who moves into their home. Another factor is his children - a son with his girlfriend Yvette  and a daughter with a girl named Peanut, who also lives with her mother. At the beginning of the movie Yvette has an abortion that Jody forced her to have. Yvette constantly asks Jody if he will ever come live with her and their son, but Jody avoids the subject and comes and goes as he pleases. Jody also continues seeing and having sex with other women, including Peanut. This becomes an issue between him and Yvette as well, especially since Yvette and Peanut do not get along. When she discovers his cheating they get in a heated argument which results to Jody slapping Yvette in t

Now let's create a new user based on some rates we have received:

#### User 1:

In [237]:
import pandas as pd

In [325]:
d = {
    'Movie title' :  [
        'Deadpool 2', 
        'Fantastic beasts and where to find them', 
        'Harry Potter and the goblet of fire',
        'The perks of being a wallflower',
        'American Animals',
        'Imitation game',
        'Perfume the story of a murderer',
        'The greatest showman',
        'Pirates of the Caribbean:Dead men Tell no Tales',
        'Logan',
        'Hacksaw Ridge',
        'Doctor Strange',
        'Deepwater Horizon',
        'Frozen',
        'Me before you',
        'Now you see me',
        'The hunger games:Mockingjay – Part 2',
        'The pianist',
        'The Hateful 8',
        'The intern'],

    'Rating' : [3, 4, 3, 4, 2.5, 5, 1, 4, 1.5, 2.5, 4.5, 4, 3, 4.5, 3.5, 3.5, 4.5, 2, 4, 4],

    'Summary' : ["After successfully working as the mercenary Deadpool for two years, Wade Wilson fails to kill one of his targets on his anniversary with his girlfriend Vanessa. That night, after the pair decides to start a family together, the target tracks them down and kills Vanessa. Wilson kills the man in revenge. He blames himself for her death and attempts to commit suicide six weeks later by blowing himself up. Wilson has a vision of Vanessa in the afterlife, but the pieces of his body remain alive and are put back together by Colossus. Wilson is left with only a Skee-Ball token, an anniversary gift, as a final memento of Vanessa.Recovering at the X-Mansion, Wilson agrees to join the X-Men as a form of healing. He, Colossus, and Negasonic Teenage Warhead respond to a standoff between authorities and the unstable young mutant Russell Collins Firefist at an orphanage, labeled a 'Mutant Reeducation Center'. Wilson realizes that Collins has been abused by the orphanage staff, and kills one of the staff members. Colossus stops him from killing anyone else, and both Wilson and Collins are arrested. Restrained with collars that suppress their powers, they are taken to the 'Ice Box', an isolated prison for mutant criminals. Meanwhile, a cybernetic soldier from the future, Cable, whose family is murdered by an older Collins, travels back in time to kill the boy before Collins ever becomes a killer.Cable breaks into the Ice Box and attacks Collins. Wilson, whose collar has broken in the melee, attempts to defend Collins. After Cable takes Vanessa's token, Wilson forces himself and Cable out of the prison, but not before Collins overhears Wilson deny that he cares for the young mutant. Near death again, Wilson has another vision of Vanessa in which she convinces him to help Collins. Wilson organizes a team called X-Force to break Collins out of a prison-transfer convoy and defend him from Cable. The team launches its assault on the convoy by parachuting from a plane, but all of the members die during the landing except for Wilson and the lucky Domino. While they fight Cable, Collins frees fellow inmate Juggernaut, who agrees to help Collins kill the abusive orphanage headmaster. Juggernaut destroys the convoy, allowing himself and Collins to escape.Cable offers to work with Wilson and Domino to stop Collins' first murder, which will lead to more, and agrees to give Wilson a chance to talk Collins down. At the orphanage they are overpowered by Juggernaut while Collins attacks the headmaster, until Colossus-who had at first refused to help Wilson due to Wilson's murderous ways-arrives to distract Juggernaut. When Wilson fails to talk down Collins, Cable shoots at the young mutant. Wilson leaps in front of the bullet while wearing the Ice Box collar and dies, reuniting with Vanessa in the afterlife. Seeing this sacrifice, Collins does not kill the headmaster; this changes the future so that Cable's family now survives. Cable uses the last charge on his time-traveling device, which he needed for returning to his family, to go back several minutes and strap Vanessa's token in front of Wilson's heart. Now when Wilson takes the bullet for Collins, it is stopped by the token and he survives. Collins still has his change of heart, and afterwards the headmaster is run over by Wilson's taxi-driver friend Dopinder.In a mid-credits sequence, Negasonic Teenage Warhead and her girlfriend Yukio repair Cable's time-traveling device for Wilson. He uses it to save the lives of Vanessa and X-Force member Peter; kill X-Men Origins: Wolverine's version of Deadpool; and kill actor Ryan Reynolds while he is considering starring in the film Green Lantern.",
                "In 1926, British wizard and 'magizoologist' Newt Scamander arrives by ship to New York en route to Arizona. He encounters Mary Lou Barebone, a non-magical woman ('No-Maj' or 'Muggle') who heads the New Salem Philanthropic Society. As Newt listens to her speech about how witches and wizards are real and dangerous, a Niffler escapes from Newt's magically expanded suitcase, which houses various magical creatures. As Newt attempts to capture the Niffler, he meets No-Maj cannery worker and aspiring baker Jacob Kowalski, and they unwittingly swap suitcases. Demoted Auror (hunter of dark wizards) Tina Goldstein arrests Newt for the chaos caused by the Niffler and takes him to the Magical Congress of the United States of America (MACUSA) headquarters, hoping to regain her former position. However, as Jacob's suitcase contains only baked goods, Newt is released. At Jacob's tenement apartment, several creatures escape from Newt's suitcase.After Tina and Newt find Jacob and the suitcase, Tina takes them to her apartment and introduces them to Queenie, her Legilimens sister. Jacob and Queenie are mutually attracted, though American wizards are forbidden to marry or even meet No-Majs. Newt takes Jacob inside his magically expanded suitcase, where Jacob encounters a contained Obscurus, a parasite that develops inside magically gifted children if they suppress their magical abilities. Newt extracted it from a young girl who died, those afflicted rarely living past the age of ten. Newt persuades Jacob to help search for the missing creatures. After re-capturing two of the three escaped beasts, Tina returns the suitcase to MACUSA. Officials arrest them, believing one of Newt's beasts to be responsible for killing Senator Henry Shaw, Jr. Director of Magical Security Percival Graves accuses Newt of conspiring with the infamous dark wizard Gellert Grindelwald, and decides to destroy Newt's suitcase and erase Jacob's recent memories of magic. Newt and Tina are sentenced to immediate death in secret, but Queenie and Jacob rescue them, and they escape after retrieving Newt's suitcase. Following a tip from Tina's old goblin informant Gnarlack, the foursome find and re-capture the last of the escaped creatures.Meanwhile, Graves approaches Mary Lou's adopted son Credence and offers to free him from his abusive mother. In exchange, Graves wants Credence to find an Obscurus, which he believes has caused the mysterious destructive incidents around the city. Credence finds a wand under his adopted sister Modesty's bed. Mary Lou assumes it is Credence's wand, but Modesty says it is hers. When Modesty is about to be punished, the Obscurus kills Mary Lou and her eldest daughter Chastity. Graves arrives, and after Credence leads him to Modesty, whom he assumes is the Obscurus's host, he dismisses Credence as being a Squib and refuses to teach him magic. Credence reveals he is the real host, having lived longer than any other host due to the intensity of his magic. In a fit of rage, Credence transforms and attacks the city.Newt finds Credence hiding in a subway tunnel, but he is attacked by Graves. Tina, who knows Credence, arrives and attempts to calm him, while Graves tries to convince Credence to listen to him. As Credence begins to settle back into human form, Aurors arrive and apparently disintegrate him to protect the magical society; however, a tiny Obscurus fragment escapes. Graves admits to unleashing the Obscurus to expose the magical community to the No-Majs and framing Newt for it, and angrily claims that MACUSA protects the No-Majs more than themselves. As the president orders the aurors to apprehend Graves, he attacks and begins to defeat all of them. After being subdued by one of Newt's beasts, he is revealed as Grindelwald in disguise and is taken into custody.MACUSA fears their secret world has been exposed, but Newt releases his Thunderbird, Frank, to disperse a potion as rainfall over the city that erases all New Yorkers' recent memories as MACUSA wizards repair the destruction. Queenie kisses Jacob goodbye as the rain erases his memories. Newt departs for Europe, but promises to return and visit Tina when his book is finished; he also anonymously leaves Jacob a case of silver Occamy eggshells to fund his bakery. His breads and pastries are subconsciously inspired by Newt's creatures, and Queenie visits him in his shop.",
                "Harry Potter awakens from a nightmare wherein a man named Frank Bryce is killed after overhearing Lord Voldemort conspiring with Peter Pettigrew and another man. While Harry attends the Quidditch World Cup between Ireland and Bulgaria with the Weasleys and Hermione Granger. Death Eaters terrorise the camp, and the man who appeared in Harry's dream summons the Dark Mark.At Hogwarts, Albus Dumbledore introduces ex-Auror Alastor 'Mad-Eye' Moody as the new Defense Against the Dark Arts teacher. He also announces that the school will host the legendary event known as the Triwizard Tournament, in which three magical schools compete across three dangerous challenges. The Goblet of Fire selects 'champions' to take part in the competition: Cedric Diggory of Hufflepuff representing Hogwarts, Viktor Krum representing the Durmstrang Institute from Central Europe, and Fleur Delacour representing Beauxbatons Academy of Magic from France. The Goblet then unexpectedly selects Harry as a fourth champion. Dumbledore is unable to pull the underage Harry out of the tournament, as Ministry official Barty Crouch Sr. insists that the champions are bound by a contract after being selected.For the first task, each champion must retrieve a golden egg guarded by the dragon they pick. Harry succeeds in retrieving the egg, which contains information about the second challenge. Shortly after, a formal dance event known as the Yule Ball takes place; Harry's crush Cho Chang attends with Cedric, and Hermione attends with Viktor, making Ron jealous. During the second task, the champions must dive underwater to rescue their mates. Harry finishes third, but is promoted to second behind Cedric due to his moral fibre, after saving Fleur's sister Gabrielle as well as Ron. Afterwards, Harry discovers the corpse of Crouch Sr. in the forest. Later, while waiting for Dumbledore in his office, Harry discovers a Pensieve, which holds Dumbledore's memories. Harry witnesses a trial in which Igor confesses to the Ministry of Magic names of other Death Eaters, after Voldemort's defeat. When he names Severus Snape as one, Dumbledore vouches for Snape's innocence; Snape turned spy against Voldemort before the latter's downfall. After Karkaroff names Barty Crouch Jr., a devastated Crouch Sr. imprisons his son in Azkaban. Exiting the Pensieve, Harry realizes that Crouch Jr. is the man he saw in his dream.For the final task, the champions must reach the Triwizard Cup, located in a hedge maze. Viktor, under the influence of the Imperius Curse, accidentally incapacitates Fleur. After Harry saves Cedric when the maze attacks him, the two claim a draw and together grab the cup, which turns out to be a Portkey and transports them to a graveyard where Pettigrew and Voldemort are waiting. Pettigrew kills Cedric with the Killing Curse, and performs a ritual that rejuvenates Voldemort, who then summons the Death Eaters. Voldemort releases Harry in order to beat him in a duel and prove he is the better wizard. Unable to defend himself, Harry tries the Expelliarmus charm at the same moment that Voldemort attempts the Killing Curse. The beams from their wands entwine, and Voldemort's wand disgorges the last spells it performed. The spirits of the people he murdered materialize in the graveyard, including Harry's parents and Cedric. This distracts Voldemort and his Death Eaters, allowing Harry to escape with Cedric's body by grabbing the Portkey.Harry tells Dumbledore that Voldemort returned and killed Cedric. Moody takes Harry back to his office to interrogate him about Voldemort, but inadvertently blows his cover by asking Harry whether there were 'others in the graveyard', despite Harry not mentioning a graveyard. Moody reveals that he submitted Harry's name to the Goblet of Fire and manipulated Harry throughout the tournament to ensure he would win. Moody attempts to attack Harry, but Dumbledore, Snape, and Minerva McGonagall intervene and subdue him. The teachers force Moody to drink Veritaserum, a truth-telling potion, and he reveals that the real Moody is imprisoned in a magical trunk. The impostor's Polyjuice Potion wears off, revealing him as Crouch Jr., who is then returned to Azkaban.Dumbledore reveals to the students that Voldemort killed Cedric, although the Ministry of Magic opposes the revelation. Later, Dumbledore visits Harry in his dormitory, apologizing to him for the dangers he endured. Harry reveals that he saw his parents in the graveyard; Dumbledore names this effect as 'Priori Incantatem'. Soon after Hogwarts, Durmstrang, and Beauxbatons bid farewell to each other.",
                "Set in 1992, the film is set against the background of a young student, Charlie (Logan Lerman), who has been suffering from clinical depression from childhood setbacks and has recently been discharged from a mental health care institution to begin his adaptation to a normal lifestyle as a young high school student. Charlie is uneasy about beginning his freshman year of high school; he is shy and finds difficulty in making friends, but he connects with his English teacher, Mr. Anderson (Paul Rudd).When he sits with two seniors, Sam (Emma Watson) and her stepbrother Patrick (Ezra Miller), at a football game, they invite him to join them to several social activities. At a party, Charlie unwittingly eats a cannabis brownie, gets high and discloses to Sam that the year before, his best friend committed suicide. He also walks in on Patrick and Brad (Johnny Simmons), a popular athlete, kissing. Sam realizes that Charlie has no other friends so she and Patrick make a special effort to bring Charlie into their group. Sam needs to improve her SAT scores to be accepted to Pennsylvania State University, so Charlie offers to tutor her. On the way home from the party, when the three hear a song with which they are unfamiliar, Sam instructs Patrick to drive through a tunnel so she can stand up in the back of the pickup while the music blasts.At Christmas, Sam gives Charlie a vintage typewriter to help his aspirations of being a writer. The two discuss relationships, and Charlie reveals he has never been kissed. Sam, though already involved with someone else, tells Charlie she wants his first kiss to be from someone who loves him, and kisses him. Charlie, in love with Sam, begins to try to find ways to show her how he feels.At a regular Rocky Horror Picture Show performance, Charlie is asked to fill in for Sam's boyfriend Craig, who is unavailable. Their friend Mary Elizabeth (Mae Whitman) is impressed and asks Charlie to the Sadie Hawkins dance. The two enter into a desultory relationship. Finally, at a party, when Charlie is dared to kiss the most beautiful girl in the room, he chooses Sam, upsetting both her and Mary Elizabeth. Patrick recommends Charlie stay away from the group for a while, and the isolation causes him to sink back into depression. He experiences flashbacks of his Aunt Helen (Melanie Lynskey), who died in a car accident when he was seven years old.When Brad shows up at school with a black eye having been caught by his father having sex with Patrick, he lies, saying that he was jumped and beaten up. Brad distances himself from Patrick, calling him a faggot. Brad's friends begin beating Patrick, but Charlie forcefully intervenes, then blacks out. He recovers to find he has bruised knuckles and Brad's friends are on the floor, incapacitated. Charlie threatens, Touch my friends again, and I'll blind you, then leaves. Sam and Patrick express their gratitude to Charlie, and the three become friends again.Sam is accepted into Penn State, and breaks up with Craig on prom night after learning he has been cheating on her. The night before she departs, she brings Charlie to her room and asks him 'Why do I and everyone I love pick people who treat us like we are nothing to which he repeats advice he received from Mr. Anderson",
                "The storytelling frequently jumps between interviews with the real people portrayed in the movie and the events themselves performed by actors.In 2003 in Lexington, Kentucky, Spencer Reinhard (Barry Keoghan) is an art student who feels his life has no meaning, that he needs something exciting, even if tragic, to happen in his life to inspire greater artistry. Warren Lipka (Evan Peters) is a rebellious student on an athletic scholarship, though he does not care much for sports and is only pursuing the education to please his family.After Spencer is given a tour of Transylvania University's library's rare book collection, the two friends begin to plan to steal an extremely valuable edition of John James Audubon's The Birds of America and other rare books. Warren travels to Amsterdam to meet some black market buyers who express interest in buying the books. Upon returning to the US, he informs Spencer that they could make millions of dollars, much to their excitement.Realizing that pulling off the heist will require more people, they enlist the help of childhood friends Erik Borsuk (Jared Abrahamson), who helps provide the logistics of the operation, and Chas Allen (Blake Jenner), who will be the getaway driver. They all take time to prepare, learning that the only person guarding the books is the special collections librarian, Betty Jean Gooch (Ann Dowd).On the day of the robbery, they disguise themselves as elderly businessmen, and enter the library. After noticing that there are too many people in the special collections library, they quickly abort the heist and retreat. Three of the conspirators want to end the attempt altogether, but Warren calls the library asking for a private appointment the next day.They decide to drop the elaborate old-age disguises. While Spencer acts as a lookout outside the building, Warren and Eric enter the library dressed as young businessmen. Warren clumsily tases the special collections librarian and makes Eric help tie her up and gag her. They take the rare books and blunder to an exit. In a panic, they drop and have to leave behind the biggest prizes, two enormous Audubon books comprising 'The Birds of America.' All four manage to escape with two of the rarer books.They take the books to Christie's auction house in New York to get the authentication of value that Warren had said the Dutch buyers required. Spencer is told he has to come back sometime the next day and leaves his cell phone number with an auction assistant. In the van outside, Chas berates everyone for their stupidity, and they return to Lexington with the books. Shortly after, Spencer realizes that the police will be able to trace them from emails they used in setting up the heist as well as his cell phone number.The thieves show signs of great stress as they try to lie low: Warren attempts to shoplift from a convenience store; Spencer gets into a car accident; and Eric starts a bar fight. Inevitably, the FBI raid all four of their homes and arrest them. Movie titles show they each serve over 7 years in federal prison.After prison, the real-life robbers express their regret for attempting the heist, noting how much pain they have put their families through. It is also revealed that Warren may have lied about going to Amsterdam, fabricating the story to get the others to agree to the heist. However, this is not confirmed.An epilogue describes their lives after prison. Eric lives in Los Angeles as a writer, and Chas has become a fitness coach in Southern California. Warren has re-enrolled in college and studies filmmaking in Philadelphia. Spencer still lives in Lexington making a living as an artist, specializing in birds. Betty Jean Gooch, the librarian, still works at Transylvania University as a special collections librarian.",
                "In 1951, two policemen, Nock and Staehl, investigate the mathematician Alan Turing after an apparent break-in at his home. During his interrogation by Nock, Turing tells of his time working at Bletchley Park during the Second World War. In 1927, the young Turing is unhappy and bullied at boarding school. He develops a friendship with Christopher Morcom, who sparks his interest in cryptography. Turing develops romantic feelings for him, but Christopher soon dies from tuberculosis. When Britain declares war on Germany in 1939, Turing travels to Bletchley Park. Under the direction of Commander Alastair Denniston, he joins the cryptography team of Hugh Alexander, John Cairncross, Peter Hilton, Keith Furman and Charles Richards. The team are trying to decrypt the Enigma machine, which the Nazis use to send coded messages. Turing is difficult to work with, and considers his colleagues inferior; he works alone to design a machine to decipher Enigma. After Denniston refuses to fund construction of the machine, Turing writes to Prime Minister Winston Churchill, who puts Turing in charge of the team and funds the machine. Turing fires Furman and Richards and places a difficult crossword in newspapers to find replacements. Joan Clarke, a Cambridge graduate, passes Turing’s test but her parents will not allow her to work with the male cryptographers. Turing arranges for her to live and work with the female clerks who intercept the messages, and shares his plans with her. With Clarke's help, Turing warms to the other colleagues, who begin to respect him. Turing’s machine, which he names Christopher, is constructed, but cannot determine the Enigma settings before the Germans reset the Enigma encryption each day. Denniston orders it destroyed and Turing fired, but the other cryptographers threaten to leave if Turing goes. After Clarke plans to leave on the wishes of her parents, Turing proposes marriage, which she accepts. During their reception, Turing confirms his homosexuality to Cairncross, who warns him to keep it secret. After overhearing a conversation with a female clerk about messages she receives, Turing has an epiphany, realising he can program the machine to decode words he already knows exist in certain messages. After he recalibrates the machine, it quickly decodes a message and the cryptographers celebrate. Turing realises they cannot act on every decoded message or the Germans will realise Enigma has been broken. Turing discovers that Cairncross is a Soviet spy. When Turing confronts him, Cairncross argues that the Soviets are allies working for the same goals, and threatens to retaliate by disclosing Turing’s sexuality. When the MI6 agent Stewart Menzies appears to threaten Clarke, Turing reveals that Cairncross is a spy. Menzies reveals he knew this already and planted Cairncross to leak messages to the Soviets for British benefit. Fearing for her safety, Turing tells Clarke to leave Bletchley Park, revealing that he is gay. Heartbroken, Clarke states she always suspected but insists they would have been happy together anyway. After the war, Menzies tells the cryptographers to destroy their work and that they can never see one another again or share what they have done. In the 1950s, Turing is convicted of gross indecency and, in lieu of a jail sentence, undergoes chemical castration so he can continue his work. Clarke visits him in his home and witnesses his physical and mental deterioration. She comforts him by saying that his work saved millions of lives.",
                "The film begins with the sentencing of Jean-Baptiste Grenouille (Ben Whishaw), a notorious murderer. Between the reading of the sentence and the execution, the story of his life is told in flashback, beginning with his abandonment at birth in a French fish market. Raised in an orphanage, Grenouille grows into a strangely detached boy with a superhuman sense of smell. After growing to maturity as a tanner's apprentice, he makes his first delivery to Paris, where he revels in all the new scents. He focuses on a redheaded girl (Karoline Herfurth) selling yellow plums, following her and repeatedly attempting to sniff her, but startles her with his behavior. To prevent her from crying out, he covers the girl's mouth and unintentionally suffocates her. After realizing that she is dead, he strips her body naked and smells her all over, becoming distraught when her scent fades. Afterwards, Grenouille is haunted by the desire to recreate the girl's aroma.After making a delivery to a perfume shop, Grenouille amazes the Italian owner, Giuseppe Baldini (Dustin Hoffman), with his ability to identify and create fragrances. He revitalizes the perfumer's career with new formulas, demanding only that Baldini teach him how to preserve scents. Baldini explains that all perfumes are harmonies of twelve individual scents, and may contain a theoretical thirteenth scent. Grenouille continues working for Baldini but is saddened when he learns that Baldini's method of distillation will not capture the scents of all objects. Baldini informs Grenouille of another method that can be learned in Grasse and agrees to help him by providing the journeyman papers he requires in exchange for 100 new perfume formulas. Right after Grenouille departs, Baldini dies when the shaky building, along with his studio, collapses. En route to Grasse, Grenouille decides to exile himself from society, taking refuge in a cave. During this time, he discovers that he lacks any personal scent himself, and believes this is why he is perceived as strange or disturbing by others. Deciding to continue his quest, he leaves his cave and continues to Grasse.Upon arrival in Grasse, Grenouille catches the scent of Laura Richis (Rachel Hurd-Wood), the beautiful, redheaded daughter of the wealthy Antoine Richis (Alan Rickman) and decides that she will be his 'thirteenth scent', the linchpin of his perfume. Grenouille finds a job in Grasse under Madame Arnulfi (Corinna Harfouch) and learns the method of enfleurage. He kills a young lavender picker and attempts to extract her scent using the method of hot enfleurage, which fails. After this, he attempts the method of cold enfleurage on a prostitute he hired, but she becomes alarmed and tries to throw him out. He murders her and successfully preserves the scent of the woman. Grenouille embarks on a killing spree, targeting beautiful young women and capturing their scents using his perfected method. He dumps the women's naked corpses around the city, creating panic. After preserving the first twelve scents, Grenouille plans his attack on Laura. During a church sermon denouncing and excommunicating the murderer, it is announced that a man has confessed to the murders. Richis remains unconvinced and secretly flees the city with his daughter, telling no one their destination. Grenouille tracks her scent to a roadside inn and sneaks into her room that night, murdering her.Soldiers capture Grenouille moments after he finishes preparing his perfume. On the day of his execution, he applies the perfume on himself, forcing the jailers to release him. The executioner and the crowd in attendance are speechless at the beauty of the perfume; they declare Grenouille innocent before falling into a massive orgy. Richis, still convinced of Grenouille's guilt, threatens him with his sword, but he is then overwhelmed by the scent and embraces Grenouille as his 'son'. Walking out of Grasse unscathed, Grenouille has enough perfume to rule the world, but has discovered that it will not allow him to love or be loved like a normal person. Disenchanted by his aimless quest, he returns to the Parisian fish market where he was born and pours the remaining perfume over his head. Overcome by the scent and in the belief that Grenouille is an angel, the nearby crowd devours him. The next morning, all that is left are his clothes and the empty bottle, from which one final drop of perfume falls.", 
                 "Orphaned, penniless but ambitious and with a mind crammed with imagination and fresh ideas, the American Phineas Taylor Barnum will always be remembered as the man with the gift to effortlessly blur the line between reality and fiction. Thirsty for innovation and hungry for success, the son of a tailor will manage to open a wax museum but will soon shift focus to the unique and peculiar, introducing extraordinary, never-seen-before live acts on the circus stage. Some will call Barnum's wide collection of oddities, a freak show; however, when the obsessed showman gambles everything on the opera singer Jenny Lind to appeal to a high-brow audience, he will somehow lose sight of the most important aspect of his life: his family. Will Barnum risk it all to be accepted?", 
                 "Jack's luck has run out. Captain Salazar has released the most deadly ghost pirates of the sea from the devil's Triangle. Captain Salazar is the oldest villain of Jack Sparrow. The ghost pirates hunt on every single pirate on sea, including Jack Sparrow. The only hope to survive this adventure is to collect the legendary Trident of Poseidon. This weapon is the most powerful weapon and the owner gets control of all seas. Is Jack going to collect this powerful weapon and can he ensure he is not going to get killed by Captain Salazar and his pirate ghosts?",
                 "In 2029 the mutant population has shrunken significantly due to genetically modified plants designed to reduce mutant powers and the X-Men have disbanded. Logan, whose power to self-heal is dwindling, has surrendered himself to alcohol and now earns a living as a chauffeur. He takes care of the ailing old Professor X whom he keeps hidden away. One day, a female stranger asks Logan to drive a girl named Laura to the Canadian border. At first he refuses, but the Professor has been waiting for a long time for her to appear. Laura possesses an extraordinary fighting prowess and is in many ways like Wolverine. She is pursued by sinister figures working for a powerful corporation; this is because they made her, with Logan's DNA. A decrepit Logan is forced to ask himself if he can or even wants to put his remaining powers to good use. It would appear that in the near-future, the times in which they were able put the world to rights with razor sharp claws and telepathic powers are now over.",
                 "An Americam army veteran grieves by the tombstones of his army company that died during World War I. Back home, he raises his sons in a pious setting and asks them to shun weapons. After a naughty fight turns awry, Desmond reads the Bible and vows not to harm another human in his life thereafter. Desmond then saves the life of a worker, experiencing a wholesome satisfaction in the process. In the hospital, he is smitten by a nurse, who he then dates. After the United States enters the Second World War, both sons enlist, adding to the ire of the father who despises his sons joining the Army. The rigorous regimen of training in the Army requires Desmond to clear his firearms training, but after a huge tiff with his seniors, his father, an old corporal, intervenes to save Desmond from being court-martialed and serve with the Army as a medic. They get posted to Hacksaw Ridge, Okinawa. A win there would ensure that the Empire of Japan surrenders to the Allied Forces. What happens thereon?",
                 "In New York, the arrogant Dr. Stephen Strange is a talented neurosurgeon with a huge ego. After a car accident, Dr. Strange damages his fingers and loses control of his hands. The surgeon, Christine Palmer, who was his lover, tries to help him. But Dr. Strange unsuccessfully spends his savings searching for an experimental treatment for his fingers. When Dr. Strange learns that the paraplegic Jonathan Pangborn walked again, he seeks him out and is told he was healed in Kamar-Taj. Dr. Strange travels to Katmandu where he meets the sorcerer Mordo and is introduced to The Ancient One. She discloses the astral plan and other dimensions to him and explains that Earth is protected in the mystical plan by three Sanctums (in New York, London, and Hong Kong). However, her former protégé, Kaecilius, has contacted the powerful demon Dormammu in the Dark Dimension and wants to destroy the three Sanctums with his minions to let the Dark Dimension, where time does not exist and anyone can live forever, to rule the world. Will The Ancient One, Dr. Strange and Mordo save the world?",
                 "In the Gulf of Mexico, 41 miles south-east of the Louisiana Coast, lies the Deepwater Horizon, a semi-submersible offshore oil drilling rig, which is free-floating over the Gulf floor and manned by 126 crew members on board. Among the personnel is Chief Electronics Technician Mike Williams and the seasoned rig supervisor Jimmy Harrell who are surprised to discover that the standard procedure regarding the cement foundation, the only thing between the rig and a blowout, has been bypassed by orders of BP's executives Donald Vidrine and Robert Kaluza. Without a clue about the stability of the well and whether the concrete's integrity has been compromised or not, but above all, with the intention to cut expenses, the greedy managers push to start pumping and before long, disaster strikes. Eventually, as the foundation fails utterly, an endless chain of malfunctions transforms the Deepwater Horizon into a blazing inferno leaving the men defenceless, while Williams and Harrell heroically struggle to rescue their shipmates in the worst oil disaster in the U.S. history that lasted 87 days.",
                 "In the Kingdom of Arendelle, Princess Elsa has the power to create and freeze ice and snow, and her younger sister Anna loves to play with her. When Elsa accidentally hits Anna on the head with her powers and almost kills her, their parents take them to trolls that save Anna's life and make her forget her sister's ability. Elsa returns to the castle and stays reclusively in her room with fear of hurting Anna with her increasing power. Their parents die when their ship sinks into the ocean, and three years later Elsa's coronation forces her to open her castle gates to celebrate with the people. Anna meets Prince Hans at the party and immediately falls in love and decides to marry him. But Elsa doesn't approve, loses control of her powers, and freezes Arendelle. Elsa flees to the mountain and Anna teams up with the peasant Kristoff, his reindeer Sven, and the snowman Olaf to seek out Elsa. They find her in her icy castle and she accidentally hits Anna in the heart; now only true love can save her sister from death.",
                 "Since a motorbike crushed into him, silver spoon William 'Wild Will' Traynor's glamour life as corporate raider turned into a nightmare of near-paralyzed wheelchair agony and bitter solitude. His rich but desperate parents, owners of British town's grand manor and medieval castle, hired perfect male nurse Nathan for all able medical assistance but are up to their fifth audition for a well-paid job as permanent companion. The only eligible candidate this time is airhead local Lou Clark, daughter of an unemployed laborer, who never holds a job more then a few days. This time however she neglects her jealous-made fiance Patrick, the jock local award gym entrepreneur, trying anything to bond with Will, who slowly starts caring for her company enough to live again, even publicly and abroad, and introduce her to society. However his incurable condition, extremely vulnerable to any infection, still dictates his determination to get legal euthanasia in Switzerland after the six months ample consideration he promised his parents.",
                 "Four down-on-their-luck magicians are brought together by an anonymous person who gives them the blueprints to a great illusion. A year later they call themselves the Four Horsemen and the finale of their show is that they will rob a bank. The bank they choose to rob is in France, and they do it. The incident is brought to the attention of the FBI so they assign agent Dylan Rhodes to investigate. Alma Vargas, an Interpol agent comes to help Rhodes which he doesn't like. They turn to Thaddeus Bradley, a former magician, who exposes magic as simple trickery. Bradley shows them how they did it but still not enough to give Rhodes grounds to arrest them. So they follow them and they pull another stunt which makes Rhodes think that they're part of a plot to get back at certain people.",
                 "For all past deaths made as entertainment, and for the insidious modification of Peeta, Katriss deems to strike out on her own to take down President Snow once and for all, but being the Mockingjay, the living symbol of the rebellion now headed by Alma Coin, has its drawbacks. Recognition, for one, and she finds herself saddled with a team of expert warriors (which surprisingly includes the ailing Peeta) aimed to penetrate the Capitol that has barricaded itself behind Hunger-Game-style death traps. As she closes in on carrying out her private agenda through more deaths and mayhem, President Snow himself makes her aware of another threat to peace for Panem equal to himself, leaving her to consider how to truly end the bloodshed.",
                 "Filmmaker Roman Polanski, who as a boy growing up in Poland watched while the Nazis devastated his country during World War II, directed this downbeat drama based on the story of a privileged musician who spent five years struggling against the Nazi occupation of Warsaw. Wladyslaw Szpilman (Adrien Brody) is a gifted classical pianist born to a wealthy Jewish family in Poland. The Szpilmans have a large and comfortable flat in Warsaw which Wladyslaw shares with his mother and father (Maureen Lipman and Frank Finlay), his sisters Halina and Regina (Jessica Kate Meyer and Julia Rayner), and his brother, Henryk (Ed Stoppard). While Wladyslaw and his family are aware of the looming presence of German forces and Hitler's designs on Poland, they're convinced that the Nazis are a menace which will pass, and that England and France will step forward to aid Poland in the event of a real crisis. Wladyslaw's naivete is shattered when a German bomb rips through a radio studio while he performs a recital for broadcast. During the early stages of the Nazi occupation, as a respected artist, he still imagines himself above the danger, using his pull to obtain employment papers for his father and landing a supposedly safe job playing piano in a restaurant. But as the German grip tightens upon Poland, Wladyslaw and his family are selected for deportation to a Nazi concentration camp. Refusing to face a certain death, Wladyslaw goes into hiding in a comfortable apartment provided by a friend. However, when his benefactor goes missing, Wladyslaw is left to fend for himself and he spends the next several years dashing from one abandoned home to another, desperate to avoid capture by German occupation troops",
                 "Some time after the Civil War, a stagecoach hurtles through the wintry Wyoming landscape. Bounty hunter John Ruth and his fugitive captive Daisy Domergue race towards the town of Red Rock, where Ruth will bring Daisy to justice. Along the road, they encounter Major Marquis Warren (an infamous bounty hunter) and Chris Mannix (a man who claims to be Red Rock's new sheriff). Lost in a blizzard, the bunch seeks refuge at Minnie's Haberdashery. When they arrive they are greeted by unfamiliar faces: Bob, who claims to be taking care of the place while Minnie is gone; Oswaldo Mobray, the hangman of Red Rock; Joe Gage, a cow puncher; and confederate general Sanford Smithers. As the storm overtakes the mountainside, the eight travelers come to learn that they might not make it to Red Rock after all...",
                 "A retired 70-year-old widower, Ben (played by Robert De Niro), is bored with retired life. He applies to a be a senior intern at an online fashion retailer and gets the position. The founder of the company is Jules Ostin (Anne Hathaway), a tireless, driven, demanding, dynamic workaholic. Ben is made her intern, but this is a nominal role - she doesn't intend to give him work and it is just window dressing. However, Ben proves to be quite useful and, more than that, a source of support and wisdom."
                ]
    }

In [326]:
df = pd.DataFrame(data=d)

In [327]:
df

Unnamed: 0,Movie title,Rating,Summary
0,Deadpool 2,3.0,After successfully working as the mercenary De...
1,Fantastic beasts and where to find them,4.0,"In 1926, British wizard and 'magizoologist' Ne..."
2,Harry Potter and the goblet of fire,3.0,Harry Potter awakens from a nightmare wherein ...
3,The perks of being a wallflower,4.0,"Set in 1992, the film is set against the backg..."
4,American Animals,2.5,The storytelling frequently jumps between inte...
5,Imitation game,5.0,"In 1951, two policemen, Nock and Staehl, inves..."
6,Perfume the story of a murderer,1.0,The film begins with the sentencing of Jean-Ba...
7,The greatest showman,4.0,"Orphaned, penniless but ambitious and with a m..."
8,Pirates of the Caribbean:Dead men Tell no Tales,1.5,Jack's luck has run out. Captain Salazar has r...
9,Logan,2.5,In 2029 the mutant population has shrunken sig...


Let's create an rdd for this user (Note: Dataframe.rdd does not work for some reason)

In [244]:
movie_titles_list = sc.parallelize([
        'Deadpool 2', 
        'Fantastic beasts and where to find them', 
        'Harry Potter and the goblet of fire',
        'The perks of being a wallflower',
        'American Animals',
        'Imitation game',
        'Perfume the story of a murderer',
        'The greatest showman',
        'Pirates of the Caribbean:Dead men Tell no Tales',
        'Logan',
        'Hacksaw Ridge',
        'Doctor Strange',
        'Deepwater Horizon',
        'Frozen',
        'Me before you',
        'Now you see me',
        'The hunger games:Mockingjay – Part 2',
        'The pianist',
        'The Hateful 8',
        'The intern'], 4)

In [328]:
ratings_list = sc.parallelize([3, 4, 3, 4, 2.5, 5, 1, 4, 1.5, 2.5, 4.5, 4, 3, 4.5, 3.5, 3.5, 4.5, 2, 4, 4], 4)

In [329]:
summaries_list = sc.parallelize(["After successfully working as the mercenary Deadpool for two years, Wade Wilson fails to kill one of his targets on his anniversary with his girlfriend Vanessa. That night, after the pair decides to start a family together, the target tracks them down and kills Vanessa. Wilson kills the man in revenge. He blames himself for her death and attempts to commit suicide six weeks later by blowing himself up. Wilson has a vision of Vanessa in the afterlife, but the pieces of his body remain alive and are put back together by Colossus. Wilson is left with only a Skee-Ball token, an anniversary gift, as a final memento of Vanessa.Recovering at the X-Mansion, Wilson agrees to join the X-Men as a form of healing. He, Colossus, and Negasonic Teenage Warhead respond to a standoff between authorities and the unstable young mutant Russell Collins Firefist at an orphanage, labeled a 'Mutant Reeducation Center'. Wilson realizes that Collins has been abused by the orphanage staff, and kills one of the staff members. Colossus stops him from killing anyone else, and both Wilson and Collins are arrested. Restrained with collars that suppress their powers, they are taken to the 'Ice Box', an isolated prison for mutant criminals. Meanwhile, a cybernetic soldier from the future, Cable, whose family is murdered by an older Collins, travels back in time to kill the boy before Collins ever becomes a killer.Cable breaks into the Ice Box and attacks Collins. Wilson, whose collar has broken in the melee, attempts to defend Collins. After Cable takes Vanessa's token, Wilson forces himself and Cable out of the prison, but not before Collins overhears Wilson deny that he cares for the young mutant. Near death again, Wilson has another vision of Vanessa in which she convinces him to help Collins. Wilson organizes a team called X-Force to break Collins out of a prison-transfer convoy and defend him from Cable. The team launches its assault on the convoy by parachuting from a plane, but all of the members die during the landing except for Wilson and the lucky Domino. While they fight Cable, Collins frees fellow inmate Juggernaut, who agrees to help Collins kill the abusive orphanage headmaster. Juggernaut destroys the convoy, allowing himself and Collins to escape.Cable offers to work with Wilson and Domino to stop Collins' first murder, which will lead to more, and agrees to give Wilson a chance to talk Collins down. At the orphanage they are overpowered by Juggernaut while Collins attacks the headmaster, until Colossus-who had at first refused to help Wilson due to Wilson's murderous ways-arrives to distract Juggernaut. When Wilson fails to talk down Collins, Cable shoots at the young mutant. Wilson leaps in front of the bullet while wearing the Ice Box collar and dies, reuniting with Vanessa in the afterlife. Seeing this sacrifice, Collins does not kill the headmaster; this changes the future so that Cable's family now survives. Cable uses the last charge on his time-traveling device, which he needed for returning to his family, to go back several minutes and strap Vanessa's token in front of Wilson's heart. Now when Wilson takes the bullet for Collins, it is stopped by the token and he survives. Collins still has his change of heart, and afterwards the headmaster is run over by Wilson's taxi-driver friend Dopinder.In a mid-credits sequence, Negasonic Teenage Warhead and her girlfriend Yukio repair Cable's time-traveling device for Wilson. He uses it to save the lives of Vanessa and X-Force member Peter; kill X-Men Origins: Wolverine's version of Deadpool; and kill actor Ryan Reynolds while he is considering starring in the film Green Lantern.",
                "In 1926, British wizard and 'magizoologist' Newt Scamander arrives by ship to New York en route to Arizona. He encounters Mary Lou Barebone, a non-magical woman ('No-Maj' or 'Muggle') who heads the New Salem Philanthropic Society. As Newt listens to her speech about how witches and wizards are real and dangerous, a Niffler escapes from Newt's magically expanded suitcase, which houses various magical creatures. As Newt attempts to capture the Niffler, he meets No-Maj cannery worker and aspiring baker Jacob Kowalski, and they unwittingly swap suitcases. Demoted Auror (hunter of dark wizards) Tina Goldstein arrests Newt for the chaos caused by the Niffler and takes him to the Magical Congress of the United States of America (MACUSA) headquarters, hoping to regain her former position. However, as Jacob's suitcase contains only baked goods, Newt is released. At Jacob's tenement apartment, several creatures escape from Newt's suitcase.After Tina and Newt find Jacob and the suitcase, Tina takes them to her apartment and introduces them to Queenie, her Legilimens sister. Jacob and Queenie are mutually attracted, though American wizards are forbidden to marry or even meet No-Majs. Newt takes Jacob inside his magically expanded suitcase, where Jacob encounters a contained Obscurus, a parasite that develops inside magically gifted children if they suppress their magical abilities. Newt extracted it from a young girl who died, those afflicted rarely living past the age of ten. Newt persuades Jacob to help search for the missing creatures. After re-capturing two of the three escaped beasts, Tina returns the suitcase to MACUSA. Officials arrest them, believing one of Newt's beasts to be responsible for killing Senator Henry Shaw, Jr. Director of Magical Security Percival Graves accuses Newt of conspiring with the infamous dark wizard Gellert Grindelwald, and decides to destroy Newt's suitcase and erase Jacob's recent memories of magic. Newt and Tina are sentenced to immediate death in secret, but Queenie and Jacob rescue them, and they escape after retrieving Newt's suitcase. Following a tip from Tina's old goblin informant Gnarlack, the foursome find and re-capture the last of the escaped creatures.Meanwhile, Graves approaches Mary Lou's adopted son Credence and offers to free him from his abusive mother. In exchange, Graves wants Credence to find an Obscurus, which he believes has caused the mysterious destructive incidents around the city. Credence finds a wand under his adopted sister Modesty's bed. Mary Lou assumes it is Credence's wand, but Modesty says it is hers. When Modesty is about to be punished, the Obscurus kills Mary Lou and her eldest daughter Chastity. Graves arrives, and after Credence leads him to Modesty, whom he assumes is the Obscurus's host, he dismisses Credence as being a Squib and refuses to teach him magic. Credence reveals he is the real host, having lived longer than any other host due to the intensity of his magic. In a fit of rage, Credence transforms and attacks the city.Newt finds Credence hiding in a subway tunnel, but he is attacked by Graves. Tina, who knows Credence, arrives and attempts to calm him, while Graves tries to convince Credence to listen to him. As Credence begins to settle back into human form, Aurors arrive and apparently disintegrate him to protect the magical society; however, a tiny Obscurus fragment escapes. Graves admits to unleashing the Obscurus to expose the magical community to the No-Majs and framing Newt for it, and angrily claims that MACUSA protects the No-Majs more than themselves. As the president orders the aurors to apprehend Graves, he attacks and begins to defeat all of them. After being subdued by one of Newt's beasts, he is revealed as Grindelwald in disguise and is taken into custody.MACUSA fears their secret world has been exposed, but Newt releases his Thunderbird, Frank, to disperse a potion as rainfall over the city that erases all New Yorkers' recent memories as MACUSA wizards repair the destruction. Queenie kisses Jacob goodbye as the rain erases his memories. Newt departs for Europe, but promises to return and visit Tina when his book is finished; he also anonymously leaves Jacob a case of silver Occamy eggshells to fund his bakery. His breads and pastries are subconsciously inspired by Newt's creatures, and Queenie visits him in his shop.",
                "Harry Potter awakens from a nightmare wherein a man named Frank Bryce is killed after overhearing Lord Voldemort conspiring with Peter Pettigrew and another man. While Harry attends the Quidditch World Cup between Ireland and Bulgaria with the Weasleys and Hermione Granger. Death Eaters terrorise the camp, and the man who appeared in Harry's dream summons the Dark Mark.At Hogwarts, Albus Dumbledore introduces ex-Auror Alastor 'Mad-Eye' Moody as the new Defense Against the Dark Arts teacher. He also announces that the school will host the legendary event known as the Triwizard Tournament, in which three magical schools compete across three dangerous challenges. The Goblet of Fire selects 'champions' to take part in the competition: Cedric Diggory of Hufflepuff representing Hogwarts, Viktor Krum representing the Durmstrang Institute from Central Europe, and Fleur Delacour representing Beauxbatons Academy of Magic from France. The Goblet then unexpectedly selects Harry as a fourth champion. Dumbledore is unable to pull the underage Harry out of the tournament, as Ministry official Barty Crouch Sr. insists that the champions are bound by a contract after being selected.For the first task, each champion must retrieve a golden egg guarded by the dragon they pick. Harry succeeds in retrieving the egg, which contains information about the second challenge. Shortly after, a formal dance event known as the Yule Ball takes place; Harry's crush Cho Chang attends with Cedric, and Hermione attends with Viktor, making Ron jealous. During the second task, the champions must dive underwater to rescue their mates. Harry finishes third, but is promoted to second behind Cedric due to his moral fibre, after saving Fleur's sister Gabrielle as well as Ron. Afterwards, Harry discovers the corpse of Crouch Sr. in the forest. Later, while waiting for Dumbledore in his office, Harry discovers a Pensieve, which holds Dumbledore's memories. Harry witnesses a trial in which Igor confesses to the Ministry of Magic names of other Death Eaters, after Voldemort's defeat. When he names Severus Snape as one, Dumbledore vouches for Snape's innocence; Snape turned spy against Voldemort before the latter's downfall. After Karkaroff names Barty Crouch Jr., a devastated Crouch Sr. imprisons his son in Azkaban. Exiting the Pensieve, Harry realizes that Crouch Jr. is the man he saw in his dream.For the final task, the champions must reach the Triwizard Cup, located in a hedge maze. Viktor, under the influence of the Imperius Curse, accidentally incapacitates Fleur. After Harry saves Cedric when the maze attacks him, the two claim a draw and together grab the cup, which turns out to be a Portkey and transports them to a graveyard where Pettigrew and Voldemort are waiting. Pettigrew kills Cedric with the Killing Curse, and performs a ritual that rejuvenates Voldemort, who then summons the Death Eaters. Voldemort releases Harry in order to beat him in a duel and prove he is the better wizard. Unable to defend himself, Harry tries the Expelliarmus charm at the same moment that Voldemort attempts the Killing Curse. The beams from their wands entwine, and Voldemort's wand disgorges the last spells it performed. The spirits of the people he murdered materialize in the graveyard, including Harry's parents and Cedric. This distracts Voldemort and his Death Eaters, allowing Harry to escape with Cedric's body by grabbing the Portkey.Harry tells Dumbledore that Voldemort returned and killed Cedric. Moody takes Harry back to his office to interrogate him about Voldemort, but inadvertently blows his cover by asking Harry whether there were 'others in the graveyard', despite Harry not mentioning a graveyard. Moody reveals that he submitted Harry's name to the Goblet of Fire and manipulated Harry throughout the tournament to ensure he would win. Moody attempts to attack Harry, but Dumbledore, Snape, and Minerva McGonagall intervene and subdue him. The teachers force Moody to drink Veritaserum, a truth-telling potion, and he reveals that the real Moody is imprisoned in a magical trunk. The impostor's Polyjuice Potion wears off, revealing him as Crouch Jr., who is then returned to Azkaban.Dumbledore reveals to the students that Voldemort killed Cedric, although the Ministry of Magic opposes the revelation. Later, Dumbledore visits Harry in his dormitory, apologizing to him for the dangers he endured. Harry reveals that he saw his parents in the graveyard; Dumbledore names this effect as 'Priori Incantatem'. Soon after Hogwarts, Durmstrang, and Beauxbatons bid farewell to each other.",
                "Set in 1992, the film is set against the background of a young student, Charlie (Logan Lerman), who has been suffering from clinical depression from childhood setbacks and has recently been discharged from a mental health care institution to begin his adaptation to a normal lifestyle as a young high school student. Charlie is uneasy about beginning his freshman year of high school; he is shy and finds difficulty in making friends, but he connects with his English teacher, Mr. Anderson (Paul Rudd).When he sits with two seniors, Sam (Emma Watson) and her stepbrother Patrick (Ezra Miller), at a football game, they invite him to join them to several social activities. At a party, Charlie unwittingly eats a cannabis brownie, gets high and discloses to Sam that the year before, his best friend committed suicide. He also walks in on Patrick and Brad (Johnny Simmons), a popular athlete, kissing. Sam realizes that Charlie has no other friends so she and Patrick make a special effort to bring Charlie into their group. Sam needs to improve her SAT scores to be accepted to Pennsylvania State University, so Charlie offers to tutor her. On the way home from the party, when the three hear a song with which they are unfamiliar, Sam instructs Patrick to drive through a tunnel so she can stand up in the back of the pickup while the music blasts.At Christmas, Sam gives Charlie a vintage typewriter to help his aspirations of being a writer. The two discuss relationships, and Charlie reveals he has never been kissed. Sam, though already involved with someone else, tells Charlie she wants his first kiss to be from someone who loves him, and kisses him. Charlie, in love with Sam, begins to try to find ways to show her how he feels.At a regular Rocky Horror Picture Show performance, Charlie is asked to fill in for Sam's boyfriend Craig, who is unavailable. Their friend Mary Elizabeth (Mae Whitman) is impressed and asks Charlie to the Sadie Hawkins dance. The two enter into a desultory relationship. Finally, at a party, when Charlie is dared to kiss the most beautiful girl in the room, he chooses Sam, upsetting both her and Mary Elizabeth. Patrick recommends Charlie stay away from the group for a while, and the isolation causes him to sink back into depression. He experiences flashbacks of his Aunt Helen (Melanie Lynskey), who died in a car accident when he was seven years old.When Brad shows up at school with a black eye having been caught by his father having sex with Patrick, he lies, saying that he was jumped and beaten up. Brad distances himself from Patrick, calling him a faggot. Brad's friends begin beating Patrick, but Charlie forcefully intervenes, then blacks out. He recovers to find he has bruised knuckles and Brad's friends are on the floor, incapacitated. Charlie threatens, Touch my friends again, and I'll blind you, then leaves. Sam and Patrick express their gratitude to Charlie, and the three become friends again.Sam is accepted into Penn State, and breaks up with Craig on prom night after learning he has been cheating on her. The night before she departs, she brings Charlie to her room and asks him 'Why do I and everyone I love pick people who treat us like we are nothing to which he repeats advice he received from Mr. Anderson",
                "The storytelling frequently jumps between interviews with the real people portrayed in the movie and the events themselves performed by actors.In 2003 in Lexington, Kentucky, Spencer Reinhard (Barry Keoghan) is an art student who feels his life has no meaning, that he needs something exciting, even if tragic, to happen in his life to inspire greater artistry. Warren Lipka (Evan Peters) is a rebellious student on an athletic scholarship, though he does not care much for sports and is only pursuing the education to please his family.After Spencer is given a tour of Transylvania University's library's rare book collection, the two friends begin to plan to steal an extremely valuable edition of John James Audubon's The Birds of America and other rare books. Warren travels to Amsterdam to meet some black market buyers who express interest in buying the books. Upon returning to the US, he informs Spencer that they could make millions of dollars, much to their excitement.Realizing that pulling off the heist will require more people, they enlist the help of childhood friends Erik Borsuk (Jared Abrahamson), who helps provide the logistics of the operation, and Chas Allen (Blake Jenner), who will be the getaway driver. They all take time to prepare, learning that the only person guarding the books is the special collections librarian, Betty Jean Gooch (Ann Dowd).On the day of the robbery, they disguise themselves as elderly businessmen, and enter the library. After noticing that there are too many people in the special collections library, they quickly abort the heist and retreat. Three of the conspirators want to end the attempt altogether, but Warren calls the library asking for a private appointment the next day.They decide to drop the elaborate old-age disguises. While Spencer acts as a lookout outside the building, Warren and Eric enter the library dressed as young businessmen. Warren clumsily tases the special collections librarian and makes Eric help tie her up and gag her. They take the rare books and blunder to an exit. In a panic, they drop and have to leave behind the biggest prizes, two enormous Audubon books comprising 'The Birds of America.' All four manage to escape with two of the rarer books.They take the books to Christie's auction house in New York to get the authentication of value that Warren had said the Dutch buyers required. Spencer is told he has to come back sometime the next day and leaves his cell phone number with an auction assistant. In the van outside, Chas berates everyone for their stupidity, and they return to Lexington with the books. Shortly after, Spencer realizes that the police will be able to trace them from emails they used in setting up the heist as well as his cell phone number.The thieves show signs of great stress as they try to lie low: Warren attempts to shoplift from a convenience store; Spencer gets into a car accident; and Eric starts a bar fight. Inevitably, the FBI raid all four of their homes and arrest them. Movie titles show they each serve over 7 years in federal prison.After prison, the real-life robbers express their regret for attempting the heist, noting how much pain they have put their families through. It is also revealed that Warren may have lied about going to Amsterdam, fabricating the story to get the others to agree to the heist. However, this is not confirmed.An epilogue describes their lives after prison. Eric lives in Los Angeles as a writer, and Chas has become a fitness coach in Southern California. Warren has re-enrolled in college and studies filmmaking in Philadelphia. Spencer still lives in Lexington making a living as an artist, specializing in birds. Betty Jean Gooch, the librarian, still works at Transylvania University as a special collections librarian.",
                "In 1951, two policemen, Nock and Staehl, investigate the mathematician Alan Turing after an apparent break-in at his home. During his interrogation by Nock, Turing tells of his time working at Bletchley Park during the Second World War. In 1927, the young Turing is unhappy and bullied at boarding school. He develops a friendship with Christopher Morcom, who sparks his interest in cryptography. Turing develops romantic feelings for him, but Christopher soon dies from tuberculosis. When Britain declares war on Germany in 1939, Turing travels to Bletchley Park. Under the direction of Commander Alastair Denniston, he joins the cryptography team of Hugh Alexander, John Cairncross, Peter Hilton, Keith Furman and Charles Richards. The team are trying to decrypt the Enigma machine, which the Nazis use to send coded messages. Turing is difficult to work with, and considers his colleagues inferior; he works alone to design a machine to decipher Enigma. After Denniston refuses to fund construction of the machine, Turing writes to Prime Minister Winston Churchill, who puts Turing in charge of the team and funds the machine. Turing fires Furman and Richards and places a difficult crossword in newspapers to find replacements. Joan Clarke, a Cambridge graduate, passes Turing’s test but her parents will not allow her to work with the male cryptographers. Turing arranges for her to live and work with the female clerks who intercept the messages, and shares his plans with her. With Clarke's help, Turing warms to the other colleagues, who begin to respect him. Turing’s machine, which he names Christopher, is constructed, but cannot determine the Enigma settings before the Germans reset the Enigma encryption each day. Denniston orders it destroyed and Turing fired, but the other cryptographers threaten to leave if Turing goes. After Clarke plans to leave on the wishes of her parents, Turing proposes marriage, which she accepts. During their reception, Turing confirms his homosexuality to Cairncross, who warns him to keep it secret. After overhearing a conversation with a female clerk about messages she receives, Turing has an epiphany, realising he can program the machine to decode words he already knows exist in certain messages. After he recalibrates the machine, it quickly decodes a message and the cryptographers celebrate. Turing realises they cannot act on every decoded message or the Germans will realise Enigma has been broken. Turing discovers that Cairncross is a Soviet spy. When Turing confronts him, Cairncross argues that the Soviets are allies working for the same goals, and threatens to retaliate by disclosing Turing’s sexuality. When the MI6 agent Stewart Menzies appears to threaten Clarke, Turing reveals that Cairncross is a spy. Menzies reveals he knew this already and planted Cairncross to leak messages to the Soviets for British benefit. Fearing for her safety, Turing tells Clarke to leave Bletchley Park, revealing that he is gay. Heartbroken, Clarke states she always suspected but insists they would have been happy together anyway. After the war, Menzies tells the cryptographers to destroy their work and that they can never see one another again or share what they have done. In the 1950s, Turing is convicted of gross indecency and, in lieu of a jail sentence, undergoes chemical castration so he can continue his work. Clarke visits him in his home and witnesses his physical and mental deterioration. She comforts him by saying that his work saved millions of lives.",
                "The film begins with the sentencing of Jean-Baptiste Grenouille (Ben Whishaw), a notorious murderer. Between the reading of the sentence and the execution, the story of his life is told in flashback, beginning with his abandonment at birth in a French fish market. Raised in an orphanage, Grenouille grows into a strangely detached boy with a superhuman sense of smell. After growing to maturity as a tanner's apprentice, he makes his first delivery to Paris, where he revels in all the new scents. He focuses on a redheaded girl (Karoline Herfurth) selling yellow plums, following her and repeatedly attempting to sniff her, but startles her with his behavior. To prevent her from crying out, he covers the girl's mouth and unintentionally suffocates her. After realizing that she is dead, he strips her body naked and smells her all over, becoming distraught when her scent fades. Afterwards, Grenouille is haunted by the desire to recreate the girl's aroma.After making a delivery to a perfume shop, Grenouille amazes the Italian owner, Giuseppe Baldini (Dustin Hoffman), with his ability to identify and create fragrances. He revitalizes the perfumer's career with new formulas, demanding only that Baldini teach him how to preserve scents. Baldini explains that all perfumes are harmonies of twelve individual scents, and may contain a theoretical thirteenth scent. Grenouille continues working for Baldini but is saddened when he learns that Baldini's method of distillation will not capture the scents of all objects. Baldini informs Grenouille of another method that can be learned in Grasse and agrees to help him by providing the journeyman papers he requires in exchange for 100 new perfume formulas. Right after Grenouille departs, Baldini dies when the shaky building, along with his studio, collapses. En route to Grasse, Grenouille decides to exile himself from society, taking refuge in a cave. During this time, he discovers that he lacks any personal scent himself, and believes this is why he is perceived as strange or disturbing by others. Deciding to continue his quest, he leaves his cave and continues to Grasse.Upon arrival in Grasse, Grenouille catches the scent of Laura Richis (Rachel Hurd-Wood), the beautiful, redheaded daughter of the wealthy Antoine Richis (Alan Rickman) and decides that she will be his 'thirteenth scent', the linchpin of his perfume. Grenouille finds a job in Grasse under Madame Arnulfi (Corinna Harfouch) and learns the method of enfleurage. He kills a young lavender picker and attempts to extract her scent using the method of hot enfleurage, which fails. After this, he attempts the method of cold enfleurage on a prostitute he hired, but she becomes alarmed and tries to throw him out. He murders her and successfully preserves the scent of the woman. Grenouille embarks on a killing spree, targeting beautiful young women and capturing their scents using his perfected method. He dumps the women's naked corpses around the city, creating panic. After preserving the first twelve scents, Grenouille plans his attack on Laura. During a church sermon denouncing and excommunicating the murderer, it is announced that a man has confessed to the murders. Richis remains unconvinced and secretly flees the city with his daughter, telling no one their destination. Grenouille tracks her scent to a roadside inn and sneaks into her room that night, murdering her.Soldiers capture Grenouille moments after he finishes preparing his perfume. On the day of his execution, he applies the perfume on himself, forcing the jailers to release him. The executioner and the crowd in attendance are speechless at the beauty of the perfume; they declare Grenouille innocent before falling into a massive orgy. Richis, still convinced of Grenouille's guilt, threatens him with his sword, but he is then overwhelmed by the scent and embraces Grenouille as his 'son'. Walking out of Grasse unscathed, Grenouille has enough perfume to rule the world, but has discovered that it will not allow him to love or be loved like a normal person. Disenchanted by his aimless quest, he returns to the Parisian fish market where he was born and pours the remaining perfume over his head. Overcome by the scent and in the belief that Grenouille is an angel, the nearby crowd devours him. The next morning, all that is left are his clothes and the empty bottle, from which one final drop of perfume falls.", 
                 "Orphaned, penniless but ambitious and with a mind crammed with imagination and fresh ideas, the American Phineas Taylor Barnum will always be remembered as the man with the gift to effortlessly blur the line between reality and fiction. Thirsty for innovation and hungry for success, the son of a tailor will manage to open a wax museum but will soon shift focus to the unique and peculiar, introducing extraordinary, never-seen-before live acts on the circus stage. Some will call Barnum's wide collection of oddities, a freak show; however, when the obsessed showman gambles everything on the opera singer Jenny Lind to appeal to a high-brow audience, he will somehow lose sight of the most important aspect of his life: his family. Will Barnum risk it all to be accepted?", 
                 "Jack's luck has run out. Captain Salazar has released the most deadly ghost pirates of the sea from the devil's Triangle. Captain Salazar is the oldest villain of Jack Sparrow. The ghost pirates hunt on every single pirate on sea, including Jack Sparrow. The only hope to survive this adventure is to collect the legendary Trident of Poseidon. This weapon is the most powerful weapon and the owner gets control of all seas. Is Jack going to collect this powerful weapon and can he ensure he is not going to get killed by Captain Salazar and his pirate ghosts?",
                 "In 2029 the mutant population has shrunken significantly due to genetically modified plants designed to reduce mutant powers and the X-Men have disbanded. Logan, whose power to self-heal is dwindling, has surrendered himself to alcohol and now earns a living as a chauffeur. He takes care of the ailing old Professor X whom he keeps hidden away. One day, a female stranger asks Logan to drive a girl named Laura to the Canadian border. At first he refuses, but the Professor has been waiting for a long time for her to appear. Laura possesses an extraordinary fighting prowess and is in many ways like Wolverine. She is pursued by sinister figures working for a powerful corporation; this is because they made her, with Logan's DNA. A decrepit Logan is forced to ask himself if he can or even wants to put his remaining powers to good use. It would appear that in the near-future, the times in which they were able put the world to rights with razor sharp claws and telepathic powers are now over.",
                 "An Americam army veteran grieves by the tombstones of his army company that died during World War I. Back home, he raises his sons in a pious setting and asks them to shun weapons. After a naughty fight turns awry, Desmond reads the Bible and vows not to harm another human in his life thereafter. Desmond then saves the life of a worker, experiencing a wholesome satisfaction in the process. In the hospital, he is smitten by a nurse, who he then dates. After the United States enters the Second World War, both sons enlist, adding to the ire of the father who despises his sons joining the Army. The rigorous regimen of training in the Army requires Desmond to clear his firearms training, but after a huge tiff with his seniors, his father, an old corporal, intervenes to save Desmond from being court-martialed and serve with the Army as a medic. They get posted to Hacksaw Ridge, Okinawa. A win there would ensure that the Empire of Japan surrenders to the Allied Forces. What happens thereon?",
                 "In New York, the arrogant Dr. Stephen Strange is a talented neurosurgeon with a huge ego. After a car accident, Dr. Strange damages his fingers and loses control of his hands. The surgeon, Christine Palmer, who was his lover, tries to help him. But Dr. Strange unsuccessfully spends his savings searching for an experimental treatment for his fingers. When Dr. Strange learns that the paraplegic Jonathan Pangborn walked again, he seeks him out and is told he was healed in Kamar-Taj. Dr. Strange travels to Katmandu where he meets the sorcerer Mordo and is introduced to The Ancient One. She discloses the astral plan and other dimensions to him and explains that Earth is protected in the mystical plan by three Sanctums (in New York, London, and Hong Kong). However, her former protégé, Kaecilius, has contacted the powerful demon Dormammu in the Dark Dimension and wants to destroy the three Sanctums with his minions to let the Dark Dimension, where time does not exist and anyone can live forever, to rule the world. Will The Ancient One, Dr. Strange and Mordo save the world?",
                 "In the Gulf of Mexico, 41 miles south-east of the Louisiana Coast, lies the Deepwater Horizon, a semi-submersible offshore oil drilling rig, which is free-floating over the Gulf floor and manned by 126 crew members on board. Among the personnel is Chief Electronics Technician Mike Williams and the seasoned rig supervisor Jimmy Harrell who are surprised to discover that the standard procedure regarding the cement foundation, the only thing between the rig and a blowout, has been bypassed by orders of BP's executives Donald Vidrine and Robert Kaluza. Without a clue about the stability of the well and whether the concrete's integrity has been compromised or not, but above all, with the intention to cut expenses, the greedy managers push to start pumping and before long, disaster strikes. Eventually, as the foundation fails utterly, an endless chain of malfunctions transforms the Deepwater Horizon into a blazing inferno leaving the men defenceless, while Williams and Harrell heroically struggle to rescue their shipmates in the worst oil disaster in the U.S. history that lasted 87 days.",
                 "In the Kingdom of Arendelle, Princess Elsa has the power to create and freeze ice and snow, and her younger sister Anna loves to play with her. When Elsa accidentally hits Anna on the head with her powers and almost kills her, their parents take them to trolls that save Anna's life and make her forget her sister's ability. Elsa returns to the castle and stays reclusively in her room with fear of hurting Anna with her increasing power. Their parents die when their ship sinks into the ocean, and three years later Elsa's coronation forces her to open her castle gates to celebrate with the people. Anna meets Prince Hans at the party and immediately falls in love and decides to marry him. But Elsa doesn't approve, loses control of her powers, and freezes Arendelle. Elsa flees to the mountain and Anna teams up with the peasant Kristoff, his reindeer Sven, and the snowman Olaf to seek out Elsa. They find her in her icy castle and she accidentally hits Anna in the heart; now only true love can save her sister from death.",
                 "Since a motorbike crushed into him, silver spoon William 'Wild Will' Traynor's glamour life as corporate raider turned into a nightmare of near-paralyzed wheelchair agony and bitter solitude. His rich but desperate parents, owners of British town's grand manor and medieval castle, hired perfect male nurse Nathan for all able medical assistance but are up to their fifth audition for a well-paid job as permanent companion. The only eligible candidate this time is airhead local Lou Clark, daughter of an unemployed laborer, who never holds a job more then a few days. This time however she neglects her jealous-made fiance Patrick, the jock local award gym entrepreneur, trying anything to bond with Will, who slowly starts caring for her company enough to live again, even publicly and abroad, and introduce her to society. However his incurable condition, extremely vulnerable to any infection, still dictates his determination to get legal euthanasia in Switzerland after the six months ample consideration he promised his parents.",
                 "Four down-on-their-luck magicians are brought together by an anonymous person who gives them the blueprints to a great illusion. A year later they call themselves the Four Horsemen and the finale of their show is that they will rob a bank. The bank they choose to rob is in France, and they do it. The incident is brought to the attention of the FBI so they assign agent Dylan Rhodes to investigate. Alma Vargas, an Interpol agent comes to help Rhodes which he doesn't like. They turn to Thaddeus Bradley, a former magician, who exposes magic as simple trickery. Bradley shows them how they did it but still not enough to give Rhodes grounds to arrest them. So they follow them and they pull another stunt which makes Rhodes think that they're part of a plot to get back at certain people.",
                 "For all past deaths made as entertainment, and for the insidious modification of Peeta, Katriss deems to strike out on her own to take down President Snow once and for all, but being the Mockingjay, the living symbol of the rebellion now headed by Alma Coin, has its drawbacks. Recognition, for one, and she finds herself saddled with a team of expert warriors (which surprisingly includes the ailing Peeta) aimed to penetrate the Capitol that has barricaded itself behind Hunger-Game-style death traps. As she closes in on carrying out her private agenda through more deaths and mayhem, President Snow himself makes her aware of another threat to peace for Panem equal to himself, leaving her to consider how to truly end the bloodshed.",
                 "Filmmaker Roman Polanski, who as a boy growing up in Poland watched while the Nazis devastated his country during World War II, directed this downbeat drama based on the story of a privileged musician who spent five years struggling against the Nazi occupation of Warsaw. Wladyslaw Szpilman (Adrien Brody) is a gifted classical pianist born to a wealthy Jewish family in Poland. The Szpilmans have a large and comfortable flat in Warsaw which Wladyslaw shares with his mother and father (Maureen Lipman and Frank Finlay), his sisters Halina and Regina (Jessica Kate Meyer and Julia Rayner), and his brother, Henryk (Ed Stoppard). While Wladyslaw and his family are aware of the looming presence of German forces and Hitler's designs on Poland, they're convinced that the Nazis are a menace which will pass, and that England and France will step forward to aid Poland in the event of a real crisis. Wladyslaw's naivete is shattered when a German bomb rips through a radio studio while he performs a recital for broadcast. During the early stages of the Nazi occupation, as a respected artist, he still imagines himself above the danger, using his pull to obtain employment papers for his father and landing a supposedly safe job playing piano in a restaurant. But as the German grip tightens upon Poland, Wladyslaw and his family are selected for deportation to a Nazi concentration camp. Refusing to face a certain death, Wladyslaw goes into hiding in a comfortable apartment provided by a friend. However, when his benefactor goes missing, Wladyslaw is left to fend for himself and he spends the next several years dashing from one abandoned home to another, desperate to avoid capture by German occupation troops",
                 "Some time after the Civil War, a stagecoach hurtles through the wintry Wyoming landscape. Bounty hunter John Ruth and his fugitive captive Daisy Domergue race towards the town of Red Rock, where Ruth will bring Daisy to justice. Along the road, they encounter Major Marquis Warren (an infamous bounty hunter) and Chris Mannix (a man who claims to be Red Rock's new sheriff). Lost in a blizzard, the bunch seeks refuge at Minnie's Haberdashery. When they arrive they are greeted by unfamiliar faces: Bob, who claims to be taking care of the place while Minnie is gone; Oswaldo Mobray, the hangman of Red Rock; Joe Gage, a cow puncher; and confederate general Sanford Smithers. As the storm overtakes the mountainside, the eight travelers come to learn that they might not make it to Red Rock after all...",
                 "A retired 70-year-old widower, Ben (played by Robert De Niro), is bored with retired life. He applies to a be a senior intern at an online fashion retailer and gets the position. The founder of the company is Jules Ostin (Anne Hathaway), a tireless, driven, demanding, dynamic workaholic. Ben is made her intern, but this is a nominal role - she doesn't intend to give him work and it is just window dressing. However, Ben proves to be quite useful and, more than that, a source of support and wisdom."
                ], 4)

In [330]:
user_rdd = movie_titles_list.zip(ratings_list).zip(summaries_list)\
    .map(lambda elem: (elem[0][0], elem[0][1], elem[1]))

In [331]:
user_rdd.take(1)

[('Deadpool 2',
  3,
  "After successfully working as the mercenary Deadpool for two years, Wade Wilson fails to kill one of his targets on his anniversary with his girlfriend Vanessa. That night, after the pair decides to start a family together, the target tracks them down and kills Vanessa. Wilson kills the man in revenge. He blames himself for her death and attempts to commit suicide six weeks later by blowing himself up. Wilson has a vision of Vanessa in the afterlife, but the pieces of his body remain alive and are put back together by Colossus. Wilson is left with only a Skee-Ball token, an anniversary gift, as a final memento of Vanessa.Recovering at the X-Mansion, Wilson agrees to join the X-Men as a form of healing. He, Colossus, and Negasonic Teenage Warhead respond to a standoff between authorities and the unstable young mutant Russell Collins Firefist at an orphanage, labeled a 'Mutant Reeducation Center'. Wilson realizes that Collins has been abused by the orphanage staff

And let's generate the vectorised plot summaries of the new user: (Note: We initially tried to map from the user rdd to a new rdd that call the model.infer_vector function, but pyspark exits with unexpected error)

In [332]:
temp = ["After successfully working as the mercenary Deadpool for two years, Wade Wilson fails to kill one of his targets on his anniversary with his girlfriend Vanessa. That night, after the pair decides to start a family together, the target tracks them down and kills Vanessa. Wilson kills the man in revenge. He blames himself for her death and attempts to commit suicide six weeks later by blowing himself up. Wilson has a vision of Vanessa in the afterlife, but the pieces of his body remain alive and are put back together by Colossus. Wilson is left with only a Skee-Ball token, an anniversary gift, as a final memento of Vanessa.Recovering at the X-Mansion, Wilson agrees to join the X-Men as a form of healing. He, Colossus, and Negasonic Teenage Warhead respond to a standoff between authorities and the unstable young mutant Russell Collins Firefist at an orphanage, labeled a 'Mutant Reeducation Center'. Wilson realizes that Collins has been abused by the orphanage staff, and kills one of the staff members. Colossus stops him from killing anyone else, and both Wilson and Collins are arrested. Restrained with collars that suppress their powers, they are taken to the 'Ice Box', an isolated prison for mutant criminals. Meanwhile, a cybernetic soldier from the future, Cable, whose family is murdered by an older Collins, travels back in time to kill the boy before Collins ever becomes a killer.Cable breaks into the Ice Box and attacks Collins. Wilson, whose collar has broken in the melee, attempts to defend Collins. After Cable takes Vanessa's token, Wilson forces himself and Cable out of the prison, but not before Collins overhears Wilson deny that he cares for the young mutant. Near death again, Wilson has another vision of Vanessa in which she convinces him to help Collins. Wilson organizes a team called X-Force to break Collins out of a prison-transfer convoy and defend him from Cable. The team launches its assault on the convoy by parachuting from a plane, but all of the members die during the landing except for Wilson and the lucky Domino. While they fight Cable, Collins frees fellow inmate Juggernaut, who agrees to help Collins kill the abusive orphanage headmaster. Juggernaut destroys the convoy, allowing himself and Collins to escape.Cable offers to work with Wilson and Domino to stop Collins' first murder, which will lead to more, and agrees to give Wilson a chance to talk Collins down. At the orphanage they are overpowered by Juggernaut while Collins attacks the headmaster, until Colossus-who had at first refused to help Wilson due to Wilson's murderous ways-arrives to distract Juggernaut. When Wilson fails to talk down Collins, Cable shoots at the young mutant. Wilson leaps in front of the bullet while wearing the Ice Box collar and dies, reuniting with Vanessa in the afterlife. Seeing this sacrifice, Collins does not kill the headmaster; this changes the future so that Cable's family now survives. Cable uses the last charge on his time-traveling device, which he needed for returning to his family, to go back several minutes and strap Vanessa's token in front of Wilson's heart. Now when Wilson takes the bullet for Collins, it is stopped by the token and he survives. Collins still has his change of heart, and afterwards the headmaster is run over by Wilson's taxi-driver friend Dopinder.In a mid-credits sequence, Negasonic Teenage Warhead and her girlfriend Yukio repair Cable's time-traveling device for Wilson. He uses it to save the lives of Vanessa and X-Force member Peter; kill X-Men Origins: Wolverine's version of Deadpool; and kill actor Ryan Reynolds while he is considering starring in the film Green Lantern.",
                "In 1926, British wizard and 'magizoologist' Newt Scamander arrives by ship to New York en route to Arizona. He encounters Mary Lou Barebone, a non-magical woman ('No-Maj' or 'Muggle') who heads the New Salem Philanthropic Society. As Newt listens to her speech about how witches and wizards are real and dangerous, a Niffler escapes from Newt's magically expanded suitcase, which houses various magical creatures. As Newt attempts to capture the Niffler, he meets No-Maj cannery worker and aspiring baker Jacob Kowalski, and they unwittingly swap suitcases. Demoted Auror (hunter of dark wizards) Tina Goldstein arrests Newt for the chaos caused by the Niffler and takes him to the Magical Congress of the United States of America (MACUSA) headquarters, hoping to regain her former position. However, as Jacob's suitcase contains only baked goods, Newt is released. At Jacob's tenement apartment, several creatures escape from Newt's suitcase.After Tina and Newt find Jacob and the suitcase, Tina takes them to her apartment and introduces them to Queenie, her Legilimens sister. Jacob and Queenie are mutually attracted, though American wizards are forbidden to marry or even meet No-Majs. Newt takes Jacob inside his magically expanded suitcase, where Jacob encounters a contained Obscurus, a parasite that develops inside magically gifted children if they suppress their magical abilities. Newt extracted it from a young girl who died, those afflicted rarely living past the age of ten. Newt persuades Jacob to help search for the missing creatures. After re-capturing two of the three escaped beasts, Tina returns the suitcase to MACUSA. Officials arrest them, believing one of Newt's beasts to be responsible for killing Senator Henry Shaw, Jr. Director of Magical Security Percival Graves accuses Newt of conspiring with the infamous dark wizard Gellert Grindelwald, and decides to destroy Newt's suitcase and erase Jacob's recent memories of magic. Newt and Tina are sentenced to immediate death in secret, but Queenie and Jacob rescue them, and they escape after retrieving Newt's suitcase. Following a tip from Tina's old goblin informant Gnarlack, the foursome find and re-capture the last of the escaped creatures.Meanwhile, Graves approaches Mary Lou's adopted son Credence and offers to free him from his abusive mother. In exchange, Graves wants Credence to find an Obscurus, which he believes has caused the mysterious destructive incidents around the city. Credence finds a wand under his adopted sister Modesty's bed. Mary Lou assumes it is Credence's wand, but Modesty says it is hers. When Modesty is about to be punished, the Obscurus kills Mary Lou and her eldest daughter Chastity. Graves arrives, and after Credence leads him to Modesty, whom he assumes is the Obscurus's host, he dismisses Credence as being a Squib and refuses to teach him magic. Credence reveals he is the real host, having lived longer than any other host due to the intensity of his magic. In a fit of rage, Credence transforms and attacks the city.Newt finds Credence hiding in a subway tunnel, but he is attacked by Graves. Tina, who knows Credence, arrives and attempts to calm him, while Graves tries to convince Credence to listen to him. As Credence begins to settle back into human form, Aurors arrive and apparently disintegrate him to protect the magical society; however, a tiny Obscurus fragment escapes. Graves admits to unleashing the Obscurus to expose the magical community to the No-Majs and framing Newt for it, and angrily claims that MACUSA protects the No-Majs more than themselves. As the president orders the aurors to apprehend Graves, he attacks and begins to defeat all of them. After being subdued by one of Newt's beasts, he is revealed as Grindelwald in disguise and is taken into custody.MACUSA fears their secret world has been exposed, but Newt releases his Thunderbird, Frank, to disperse a potion as rainfall over the city that erases all New Yorkers' recent memories as MACUSA wizards repair the destruction. Queenie kisses Jacob goodbye as the rain erases his memories. Newt departs for Europe, but promises to return and visit Tina when his book is finished; he also anonymously leaves Jacob a case of silver Occamy eggshells to fund his bakery. His breads and pastries are subconsciously inspired by Newt's creatures, and Queenie visits him in his shop.",
                "Harry Potter awakens from a nightmare wherein a man named Frank Bryce is killed after overhearing Lord Voldemort conspiring with Peter Pettigrew and another man. While Harry attends the Quidditch World Cup between Ireland and Bulgaria with the Weasleys and Hermione Granger. Death Eaters terrorise the camp, and the man who appeared in Harry's dream summons the Dark Mark.At Hogwarts, Albus Dumbledore introduces ex-Auror Alastor 'Mad-Eye' Moody as the new Defense Against the Dark Arts teacher. He also announces that the school will host the legendary event known as the Triwizard Tournament, in which three magical schools compete across three dangerous challenges. The Goblet of Fire selects 'champions' to take part in the competition: Cedric Diggory of Hufflepuff representing Hogwarts, Viktor Krum representing the Durmstrang Institute from Central Europe, and Fleur Delacour representing Beauxbatons Academy of Magic from France. The Goblet then unexpectedly selects Harry as a fourth champion. Dumbledore is unable to pull the underage Harry out of the tournament, as Ministry official Barty Crouch Sr. insists that the champions are bound by a contract after being selected.For the first task, each champion must retrieve a golden egg guarded by the dragon they pick. Harry succeeds in retrieving the egg, which contains information about the second challenge. Shortly after, a formal dance event known as the Yule Ball takes place; Harry's crush Cho Chang attends with Cedric, and Hermione attends with Viktor, making Ron jealous. During the second task, the champions must dive underwater to rescue their mates. Harry finishes third, but is promoted to second behind Cedric due to his moral fibre, after saving Fleur's sister Gabrielle as well as Ron. Afterwards, Harry discovers the corpse of Crouch Sr. in the forest. Later, while waiting for Dumbledore in his office, Harry discovers a Pensieve, which holds Dumbledore's memories. Harry witnesses a trial in which Igor confesses to the Ministry of Magic names of other Death Eaters, after Voldemort's defeat. When he names Severus Snape as one, Dumbledore vouches for Snape's innocence; Snape turned spy against Voldemort before the latter's downfall. After Karkaroff names Barty Crouch Jr., a devastated Crouch Sr. imprisons his son in Azkaban. Exiting the Pensieve, Harry realizes that Crouch Jr. is the man he saw in his dream.For the final task, the champions must reach the Triwizard Cup, located in a hedge maze. Viktor, under the influence of the Imperius Curse, accidentally incapacitates Fleur. After Harry saves Cedric when the maze attacks him, the two claim a draw and together grab the cup, which turns out to be a Portkey and transports them to a graveyard where Pettigrew and Voldemort are waiting. Pettigrew kills Cedric with the Killing Curse, and performs a ritual that rejuvenates Voldemort, who then summons the Death Eaters. Voldemort releases Harry in order to beat him in a duel and prove he is the better wizard. Unable to defend himself, Harry tries the Expelliarmus charm at the same moment that Voldemort attempts the Killing Curse. The beams from their wands entwine, and Voldemort's wand disgorges the last spells it performed. The spirits of the people he murdered materialize in the graveyard, including Harry's parents and Cedric. This distracts Voldemort and his Death Eaters, allowing Harry to escape with Cedric's body by grabbing the Portkey.Harry tells Dumbledore that Voldemort returned and killed Cedric. Moody takes Harry back to his office to interrogate him about Voldemort, but inadvertently blows his cover by asking Harry whether there were 'others in the graveyard', despite Harry not mentioning a graveyard. Moody reveals that he submitted Harry's name to the Goblet of Fire and manipulated Harry throughout the tournament to ensure he would win. Moody attempts to attack Harry, but Dumbledore, Snape, and Minerva McGonagall intervene and subdue him. The teachers force Moody to drink Veritaserum, a truth-telling potion, and he reveals that the real Moody is imprisoned in a magical trunk. The impostor's Polyjuice Potion wears off, revealing him as Crouch Jr., who is then returned to Azkaban.Dumbledore reveals to the students that Voldemort killed Cedric, although the Ministry of Magic opposes the revelation. Later, Dumbledore visits Harry in his dormitory, apologizing to him for the dangers he endured. Harry reveals that he saw his parents in the graveyard; Dumbledore names this effect as 'Priori Incantatem'. Soon after Hogwarts, Durmstrang, and Beauxbatons bid farewell to each other.",
                "Set in 1992, the film is set against the background of a young student, Charlie (Logan Lerman), who has been suffering from clinical depression from childhood setbacks and has recently been discharged from a mental health care institution to begin his adaptation to a normal lifestyle as a young high school student. Charlie is uneasy about beginning his freshman year of high school; he is shy and finds difficulty in making friends, but he connects with his English teacher, Mr. Anderson (Paul Rudd).When he sits with two seniors, Sam (Emma Watson) and her stepbrother Patrick (Ezra Miller), at a football game, they invite him to join them to several social activities. At a party, Charlie unwittingly eats a cannabis brownie, gets high and discloses to Sam that the year before, his best friend committed suicide. He also walks in on Patrick and Brad (Johnny Simmons), a popular athlete, kissing. Sam realizes that Charlie has no other friends so she and Patrick make a special effort to bring Charlie into their group. Sam needs to improve her SAT scores to be accepted to Pennsylvania State University, so Charlie offers to tutor her. On the way home from the party, when the three hear a song with which they are unfamiliar, Sam instructs Patrick to drive through a tunnel so she can stand up in the back of the pickup while the music blasts.At Christmas, Sam gives Charlie a vintage typewriter to help his aspirations of being a writer. The two discuss relationships, and Charlie reveals he has never been kissed. Sam, though already involved with someone else, tells Charlie she wants his first kiss to be from someone who loves him, and kisses him. Charlie, in love with Sam, begins to try to find ways to show her how he feels.At a regular Rocky Horror Picture Show performance, Charlie is asked to fill in for Sam's boyfriend Craig, who is unavailable. Their friend Mary Elizabeth (Mae Whitman) is impressed and asks Charlie to the Sadie Hawkins dance. The two enter into a desultory relationship. Finally, at a party, when Charlie is dared to kiss the most beautiful girl in the room, he chooses Sam, upsetting both her and Mary Elizabeth. Patrick recommends Charlie stay away from the group for a while, and the isolation causes him to sink back into depression. He experiences flashbacks of his Aunt Helen (Melanie Lynskey), who died in a car accident when he was seven years old.When Brad shows up at school with a black eye having been caught by his father having sex with Patrick, he lies, saying that he was jumped and beaten up. Brad distances himself from Patrick, calling him a faggot. Brad's friends begin beating Patrick, but Charlie forcefully intervenes, then blacks out. He recovers to find he has bruised knuckles and Brad's friends are on the floor, incapacitated. Charlie threatens, Touch my friends again, and I'll blind you, then leaves. Sam and Patrick express their gratitude to Charlie, and the three become friends again.Sam is accepted into Penn State, and breaks up with Craig on prom night after learning he has been cheating on her. The night before she departs, she brings Charlie to her room and asks him 'Why do I and everyone I love pick people who treat us like we are nothing to which he repeats advice he received from Mr. Anderson",
                "The storytelling frequently jumps between interviews with the real people portrayed in the movie and the events themselves performed by actors.In 2003 in Lexington, Kentucky, Spencer Reinhard (Barry Keoghan) is an art student who feels his life has no meaning, that he needs something exciting, even if tragic, to happen in his life to inspire greater artistry. Warren Lipka (Evan Peters) is a rebellious student on an athletic scholarship, though he does not care much for sports and is only pursuing the education to please his family.After Spencer is given a tour of Transylvania University's library's rare book collection, the two friends begin to plan to steal an extremely valuable edition of John James Audubon's The Birds of America and other rare books. Warren travels to Amsterdam to meet some black market buyers who express interest in buying the books. Upon returning to the US, he informs Spencer that they could make millions of dollars, much to their excitement.Realizing that pulling off the heist will require more people, they enlist the help of childhood friends Erik Borsuk (Jared Abrahamson), who helps provide the logistics of the operation, and Chas Allen (Blake Jenner), who will be the getaway driver. They all take time to prepare, learning that the only person guarding the books is the special collections librarian, Betty Jean Gooch (Ann Dowd).On the day of the robbery, they disguise themselves as elderly businessmen, and enter the library. After noticing that there are too many people in the special collections library, they quickly abort the heist and retreat. Three of the conspirators want to end the attempt altogether, but Warren calls the library asking for a private appointment the next day.They decide to drop the elaborate old-age disguises. While Spencer acts as a lookout outside the building, Warren and Eric enter the library dressed as young businessmen. Warren clumsily tases the special collections librarian and makes Eric help tie her up and gag her. They take the rare books and blunder to an exit. In a panic, they drop and have to leave behind the biggest prizes, two enormous Audubon books comprising 'The Birds of America.' All four manage to escape with two of the rarer books.They take the books to Christie's auction house in New York to get the authentication of value that Warren had said the Dutch buyers required. Spencer is told he has to come back sometime the next day and leaves his cell phone number with an auction assistant. In the van outside, Chas berates everyone for their stupidity, and they return to Lexington with the books. Shortly after, Spencer realizes that the police will be able to trace them from emails they used in setting up the heist as well as his cell phone number.The thieves show signs of great stress as they try to lie low: Warren attempts to shoplift from a convenience store; Spencer gets into a car accident; and Eric starts a bar fight. Inevitably, the FBI raid all four of their homes and arrest them. Movie titles show they each serve over 7 years in federal prison.After prison, the real-life robbers express their regret for attempting the heist, noting how much pain they have put their families through. It is also revealed that Warren may have lied about going to Amsterdam, fabricating the story to get the others to agree to the heist. However, this is not confirmed.An epilogue describes their lives after prison. Eric lives in Los Angeles as a writer, and Chas has become a fitness coach in Southern California. Warren has re-enrolled in college and studies filmmaking in Philadelphia. Spencer still lives in Lexington making a living as an artist, specializing in birds. Betty Jean Gooch, the librarian, still works at Transylvania University as a special collections librarian.",
                "In 1951, two policemen, Nock and Staehl, investigate the mathematician Alan Turing after an apparent break-in at his home. During his interrogation by Nock, Turing tells of his time working at Bletchley Park during the Second World War. In 1927, the young Turing is unhappy and bullied at boarding school. He develops a friendship with Christopher Morcom, who sparks his interest in cryptography. Turing develops romantic feelings for him, but Christopher soon dies from tuberculosis. When Britain declares war on Germany in 1939, Turing travels to Bletchley Park. Under the direction of Commander Alastair Denniston, he joins the cryptography team of Hugh Alexander, John Cairncross, Peter Hilton, Keith Furman and Charles Richards. The team are trying to decrypt the Enigma machine, which the Nazis use to send coded messages. Turing is difficult to work with, and considers his colleagues inferior; he works alone to design a machine to decipher Enigma. After Denniston refuses to fund construction of the machine, Turing writes to Prime Minister Winston Churchill, who puts Turing in charge of the team and funds the machine. Turing fires Furman and Richards and places a difficult crossword in newspapers to find replacements. Joan Clarke, a Cambridge graduate, passes Turing’s test but her parents will not allow her to work with the male cryptographers. Turing arranges for her to live and work with the female clerks who intercept the messages, and shares his plans with her. With Clarke's help, Turing warms to the other colleagues, who begin to respect him. Turing’s machine, which he names Christopher, is constructed, but cannot determine the Enigma settings before the Germans reset the Enigma encryption each day. Denniston orders it destroyed and Turing fired, but the other cryptographers threaten to leave if Turing goes. After Clarke plans to leave on the wishes of her parents, Turing proposes marriage, which she accepts. During their reception, Turing confirms his homosexuality to Cairncross, who warns him to keep it secret. After overhearing a conversation with a female clerk about messages she receives, Turing has an epiphany, realising he can program the machine to decode words he already knows exist in certain messages. After he recalibrates the machine, it quickly decodes a message and the cryptographers celebrate. Turing realises they cannot act on every decoded message or the Germans will realise Enigma has been broken. Turing discovers that Cairncross is a Soviet spy. When Turing confronts him, Cairncross argues that the Soviets are allies working for the same goals, and threatens to retaliate by disclosing Turing’s sexuality. When the MI6 agent Stewart Menzies appears to threaten Clarke, Turing reveals that Cairncross is a spy. Menzies reveals he knew this already and planted Cairncross to leak messages to the Soviets for British benefit. Fearing for her safety, Turing tells Clarke to leave Bletchley Park, revealing that he is gay. Heartbroken, Clarke states she always suspected but insists they would have been happy together anyway. After the war, Menzies tells the cryptographers to destroy their work and that they can never see one another again or share what they have done. In the 1950s, Turing is convicted of gross indecency and, in lieu of a jail sentence, undergoes chemical castration so he can continue his work. Clarke visits him in his home and witnesses his physical and mental deterioration. She comforts him by saying that his work saved millions of lives.",
                "The film begins with the sentencing of Jean-Baptiste Grenouille (Ben Whishaw), a notorious murderer. Between the reading of the sentence and the execution, the story of his life is told in flashback, beginning with his abandonment at birth in a French fish market. Raised in an orphanage, Grenouille grows into a strangely detached boy with a superhuman sense of smell. After growing to maturity as a tanner's apprentice, he makes his first delivery to Paris, where he revels in all the new scents. He focuses on a redheaded girl (Karoline Herfurth) selling yellow plums, following her and repeatedly attempting to sniff her, but startles her with his behavior. To prevent her from crying out, he covers the girl's mouth and unintentionally suffocates her. After realizing that she is dead, he strips her body naked and smells her all over, becoming distraught when her scent fades. Afterwards, Grenouille is haunted by the desire to recreate the girl's aroma.After making a delivery to a perfume shop, Grenouille amazes the Italian owner, Giuseppe Baldini (Dustin Hoffman), with his ability to identify and create fragrances. He revitalizes the perfumer's career with new formulas, demanding only that Baldini teach him how to preserve scents. Baldini explains that all perfumes are harmonies of twelve individual scents, and may contain a theoretical thirteenth scent. Grenouille continues working for Baldini but is saddened when he learns that Baldini's method of distillation will not capture the scents of all objects. Baldini informs Grenouille of another method that can be learned in Grasse and agrees to help him by providing the journeyman papers he requires in exchange for 100 new perfume formulas. Right after Grenouille departs, Baldini dies when the shaky building, along with his studio, collapses. En route to Grasse, Grenouille decides to exile himself from society, taking refuge in a cave. During this time, he discovers that he lacks any personal scent himself, and believes this is why he is perceived as strange or disturbing by others. Deciding to continue his quest, he leaves his cave and continues to Grasse.Upon arrival in Grasse, Grenouille catches the scent of Laura Richis (Rachel Hurd-Wood), the beautiful, redheaded daughter of the wealthy Antoine Richis (Alan Rickman) and decides that she will be his 'thirteenth scent', the linchpin of his perfume. Grenouille finds a job in Grasse under Madame Arnulfi (Corinna Harfouch) and learns the method of enfleurage. He kills a young lavender picker and attempts to extract her scent using the method of hot enfleurage, which fails. After this, he attempts the method of cold enfleurage on a prostitute he hired, but she becomes alarmed and tries to throw him out. He murders her and successfully preserves the scent of the woman. Grenouille embarks on a killing spree, targeting beautiful young women and capturing their scents using his perfected method. He dumps the women's naked corpses around the city, creating panic. After preserving the first twelve scents, Grenouille plans his attack on Laura. During a church sermon denouncing and excommunicating the murderer, it is announced that a man has confessed to the murders. Richis remains unconvinced and secretly flees the city with his daughter, telling no one their destination. Grenouille tracks her scent to a roadside inn and sneaks into her room that night, murdering her.Soldiers capture Grenouille moments after he finishes preparing his perfume. On the day of his execution, he applies the perfume on himself, forcing the jailers to release him. The executioner and the crowd in attendance are speechless at the beauty of the perfume; they declare Grenouille innocent before falling into a massive orgy. Richis, still convinced of Grenouille's guilt, threatens him with his sword, but he is then overwhelmed by the scent and embraces Grenouille as his 'son'. Walking out of Grasse unscathed, Grenouille has enough perfume to rule the world, but has discovered that it will not allow him to love or be loved like a normal person. Disenchanted by his aimless quest, he returns to the Parisian fish market where he was born and pours the remaining perfume over his head. Overcome by the scent and in the belief that Grenouille is an angel, the nearby crowd devours him. The next morning, all that is left are his clothes and the empty bottle, from which one final drop of perfume falls.", 
                 "Orphaned, penniless but ambitious and with a mind crammed with imagination and fresh ideas, the American Phineas Taylor Barnum will always be remembered as the man with the gift to effortlessly blur the line between reality and fiction. Thirsty for innovation and hungry for success, the son of a tailor will manage to open a wax museum but will soon shift focus to the unique and peculiar, introducing extraordinary, never-seen-before live acts on the circus stage. Some will call Barnum's wide collection of oddities, a freak show; however, when the obsessed showman gambles everything on the opera singer Jenny Lind to appeal to a high-brow audience, he will somehow lose sight of the most important aspect of his life: his family. Will Barnum risk it all to be accepted?", 
                 "Jack's luck has run out. Captain Salazar has released the most deadly ghost pirates of the sea from the devil's Triangle. Captain Salazar is the oldest villain of Jack Sparrow. The ghost pirates hunt on every single pirate on sea, including Jack Sparrow. The only hope to survive this adventure is to collect the legendary Trident of Poseidon. This weapon is the most powerful weapon and the owner gets control of all seas. Is Jack going to collect this powerful weapon and can he ensure he is not going to get killed by Captain Salazar and his pirate ghosts?",
                 "In 2029 the mutant population has shrunken significantly due to genetically modified plants designed to reduce mutant powers and the X-Men have disbanded. Logan, whose power to self-heal is dwindling, has surrendered himself to alcohol and now earns a living as a chauffeur. He takes care of the ailing old Professor X whom he keeps hidden away. One day, a female stranger asks Logan to drive a girl named Laura to the Canadian border. At first he refuses, but the Professor has been waiting for a long time for her to appear. Laura possesses an extraordinary fighting prowess and is in many ways like Wolverine. She is pursued by sinister figures working for a powerful corporation; this is because they made her, with Logan's DNA. A decrepit Logan is forced to ask himself if he can or even wants to put his remaining powers to good use. It would appear that in the near-future, the times in which they were able put the world to rights with razor sharp claws and telepathic powers are now over.",
                 "An Americam army veteran grieves by the tombstones of his army company that died during World War I. Back home, he raises his sons in a pious setting and asks them to shun weapons. After a naughty fight turns awry, Desmond reads the Bible and vows not to harm another human in his life thereafter. Desmond then saves the life of a worker, experiencing a wholesome satisfaction in the process. In the hospital, he is smitten by a nurse, who he then dates. After the United States enters the Second World War, both sons enlist, adding to the ire of the father who despises his sons joining the Army. The rigorous regimen of training in the Army requires Desmond to clear his firearms training, but after a huge tiff with his seniors, his father, an old corporal, intervenes to save Desmond from being court-martialed and serve with the Army as a medic. They get posted to Hacksaw Ridge, Okinawa. A win there would ensure that the Empire of Japan surrenders to the Allied Forces. What happens thereon?",
                 "In New York, the arrogant Dr. Stephen Strange is a talented neurosurgeon with a huge ego. After a car accident, Dr. Strange damages his fingers and loses control of his hands. The surgeon, Christine Palmer, who was his lover, tries to help him. But Dr. Strange unsuccessfully spends his savings searching for an experimental treatment for his fingers. When Dr. Strange learns that the paraplegic Jonathan Pangborn walked again, he seeks him out and is told he was healed in Kamar-Taj. Dr. Strange travels to Katmandu where he meets the sorcerer Mordo and is introduced to The Ancient One. She discloses the astral plan and other dimensions to him and explains that Earth is protected in the mystical plan by three Sanctums (in New York, London, and Hong Kong). However, her former protégé, Kaecilius, has contacted the powerful demon Dormammu in the Dark Dimension and wants to destroy the three Sanctums with his minions to let the Dark Dimension, where time does not exist and anyone can live forever, to rule the world. Will The Ancient One, Dr. Strange and Mordo save the world?",
                 "In the Gulf of Mexico, 41 miles south-east of the Louisiana Coast, lies the Deepwater Horizon, a semi-submersible offshore oil drilling rig, which is free-floating over the Gulf floor and manned by 126 crew members on board. Among the personnel is Chief Electronics Technician Mike Williams and the seasoned rig supervisor Jimmy Harrell who are surprised to discover that the standard procedure regarding the cement foundation, the only thing between the rig and a blowout, has been bypassed by orders of BP's executives Donald Vidrine and Robert Kaluza. Without a clue about the stability of the well and whether the concrete's integrity has been compromised or not, but above all, with the intention to cut expenses, the greedy managers push to start pumping and before long, disaster strikes. Eventually, as the foundation fails utterly, an endless chain of malfunctions transforms the Deepwater Horizon into a blazing inferno leaving the men defenceless, while Williams and Harrell heroically struggle to rescue their shipmates in the worst oil disaster in the U.S. history that lasted 87 days.",
                 "In the Kingdom of Arendelle, Princess Elsa has the power to create and freeze ice and snow, and her younger sister Anna loves to play with her. When Elsa accidentally hits Anna on the head with her powers and almost kills her, their parents take them to trolls that save Anna's life and make her forget her sister's ability. Elsa returns to the castle and stays reclusively in her room with fear of hurting Anna with her increasing power. Their parents die when their ship sinks into the ocean, and three years later Elsa's coronation forces her to open her castle gates to celebrate with the people. Anna meets Prince Hans at the party and immediately falls in love and decides to marry him. But Elsa doesn't approve, loses control of her powers, and freezes Arendelle. Elsa flees to the mountain and Anna teams up with the peasant Kristoff, his reindeer Sven, and the snowman Olaf to seek out Elsa. They find her in her icy castle and she accidentally hits Anna in the heart; now only true love can save her sister from death.",
                 "Since a motorbike crushed into him, silver spoon William 'Wild Will' Traynor's glamour life as corporate raider turned into a nightmare of near-paralyzed wheelchair agony and bitter solitude. His rich but desperate parents, owners of British town's grand manor and medieval castle, hired perfect male nurse Nathan for all able medical assistance but are up to their fifth audition for a well-paid job as permanent companion. The only eligible candidate this time is airhead local Lou Clark, daughter of an unemployed laborer, who never holds a job more then a few days. This time however she neglects her jealous-made fiance Patrick, the jock local award gym entrepreneur, trying anything to bond with Will, who slowly starts caring for her company enough to live again, even publicly and abroad, and introduce her to society. However his incurable condition, extremely vulnerable to any infection, still dictates his determination to get legal euthanasia in Switzerland after the six months ample consideration he promised his parents.",
                 "Four down-on-their-luck magicians are brought together by an anonymous person who gives them the blueprints to a great illusion. A year later they call themselves the Four Horsemen and the finale of their show is that they will rob a bank. The bank they choose to rob is in France, and they do it. The incident is brought to the attention of the FBI so they assign agent Dylan Rhodes to investigate. Alma Vargas, an Interpol agent comes to help Rhodes which he doesn't like. They turn to Thaddeus Bradley, a former magician, who exposes magic as simple trickery. Bradley shows them how they did it but still not enough to give Rhodes grounds to arrest them. So they follow them and they pull another stunt which makes Rhodes think that they're part of a plot to get back at certain people.",
                 "For all past deaths made as entertainment, and for the insidious modification of Peeta, Katriss deems to strike out on her own to take down President Snow once and for all, but being the Mockingjay, the living symbol of the rebellion now headed by Alma Coin, has its drawbacks. Recognition, for one, and she finds herself saddled with a team of expert warriors (which surprisingly includes the ailing Peeta) aimed to penetrate the Capitol that has barricaded itself behind Hunger-Game-style death traps. As she closes in on carrying out her private agenda through more deaths and mayhem, President Snow himself makes her aware of another threat to peace for Panem equal to himself, leaving her to consider how to truly end the bloodshed.",
                 "Filmmaker Roman Polanski, who as a boy growing up in Poland watched while the Nazis devastated his country during World War II, directed this downbeat drama based on the story of a privileged musician who spent five years struggling against the Nazi occupation of Warsaw. Wladyslaw Szpilman (Adrien Brody) is a gifted classical pianist born to a wealthy Jewish family in Poland. The Szpilmans have a large and comfortable flat in Warsaw which Wladyslaw shares with his mother and father (Maureen Lipman and Frank Finlay), his sisters Halina and Regina (Jessica Kate Meyer and Julia Rayner), and his brother, Henryk (Ed Stoppard). While Wladyslaw and his family are aware of the looming presence of German forces and Hitler's designs on Poland, they're convinced that the Nazis are a menace which will pass, and that England and France will step forward to aid Poland in the event of a real crisis. Wladyslaw's naivete is shattered when a German bomb rips through a radio studio while he performs a recital for broadcast. During the early stages of the Nazi occupation, as a respected artist, he still imagines himself above the danger, using his pull to obtain employment papers for his father and landing a supposedly safe job playing piano in a restaurant. But as the German grip tightens upon Poland, Wladyslaw and his family are selected for deportation to a Nazi concentration camp. Refusing to face a certain death, Wladyslaw goes into hiding in a comfortable apartment provided by a friend. However, when his benefactor goes missing, Wladyslaw is left to fend for himself and he spends the next several years dashing from one abandoned home to another, desperate to avoid capture by German occupation troops",
                 "Some time after the Civil War, a stagecoach hurtles through the wintry Wyoming landscape. Bounty hunter John Ruth and his fugitive captive Daisy Domergue race towards the town of Red Rock, where Ruth will bring Daisy to justice. Along the road, they encounter Major Marquis Warren (an infamous bounty hunter) and Chris Mannix (a man who claims to be Red Rock's new sheriff). Lost in a blizzard, the bunch seeks refuge at Minnie's Haberdashery. When they arrive they are greeted by unfamiliar faces: Bob, who claims to be taking care of the place while Minnie is gone; Oswaldo Mobray, the hangman of Red Rock; Joe Gage, a cow puncher; and confederate general Sanford Smithers. As the storm overtakes the mountainside, the eight travelers come to learn that they might not make it to Red Rock after all...",
                 "A retired 70-year-old widower, Ben (played by Robert De Niro), is bored with retired life. He applies to a be a senior intern at an online fashion retailer and gets the position. The founder of the company is Jules Ostin (Anne Hathaway), a tireless, driven, demanding, dynamic workaholic. Ben is made her intern, but this is a nominal role - she doesn't intend to give him work and it is just window dressing. However, Ben proves to be quite useful and, more than that, a source of support and wisdom."
                ]

In [333]:
vectorised_user_summaries = [model.infer_vector(summary) for summary in temp]

In [334]:
vectorised_user_rdd = \
    user_rdd.zipWithIndex().map(lambda elem: (elem[0][0], elem[0][1], elem[0][2], vectorized_summaries[elem[1]])).cache()

In [335]:
vectorised_user_rdd.count()

20

In [336]:
vectorised_user_rdd.take(1)

[('Deadpool 2',
  3,
  "After successfully working as the mercenary Deadpool for two years, Wade Wilson fails to kill one of his targets on his anniversary with his girlfriend Vanessa. That night, after the pair decides to start a family together, the target tracks them down and kills Vanessa. Wilson kills the man in revenge. He blames himself for her death and attempts to commit suicide six weeks later by blowing himself up. Wilson has a vision of Vanessa in the afterlife, but the pieces of his body remain alive and are put back together by Colossus. Wilson is left with only a Skee-Ball token, an anniversary gift, as a final memento of Vanessa.Recovering at the X-Mansion, Wilson agrees to join the X-Men as a form of healing. He, Colossus, and Negasonic Teenage Warhead respond to a standoff between authorities and the unstable young mutant Russell Collins Firefist at an orphanage, labeled a 'Mutant Reeducation Center'. Wilson realizes that Collins has been abused by the orphanage staff

Now we need to group with the ratings assigned as keys and the vectors and counts assigned as values. We no longer need the plot summaries and the movie titles now that we have implemented the vector representation.

In [337]:
key_value_schema = vectorised_user_rdd.map(lambda elem: (elem[1], (elem[3], 1))).sortByKey()

In [338]:
key_value_schema.count()

20

In [339]:
key_value_schema.take(1)

[(1, (array([-0.00147226,  0.0044263 ,  0.02852533, -0.02649999, -0.00198322,
           0.00405797,  0.02093723,  0.00856092,  0.06417085, -0.01114989,
          -0.03373287,  0.01543447,  0.01961495, -0.03229797, -0.02667066,
           0.01115356, -0.02959675,  0.03130432,  0.01620949, -0.00659307,
           0.03190266, -0.04039969,  0.0154823 ,  0.00028935,  0.06419194,
          -0.04052171,  0.02219417, -0.01093993,  0.02026901,  0.02843093,
          -0.00774683,  0.02248913,  0.02035104,  0.00327248,  0.00630219,
          -0.06199979, -0.01279147, -0.00671188,  0.00661001, -0.05016908,
           0.01183395, -0.00273108,  0.01882027, -0.00063093,  0.00774934,
          -0.01318485, -0.04498927,  0.00935748,  0.02960959,  0.02792278,
          -0.03569188,  0.03322608,  0.01784803,  0.00651194,  0.01032926,
          -0.03848039, -0.01366616, -0.064901  , -0.02363634,  0.01932326,
           0.00458974,  0.01756637, -0.0229273 , -0.0441742 , -0.0498121 ,
           0.01990092,

We aggregate based on the value of the rating

In [340]:
summed_rdd = key_value_schema.reduceByKey(lambda a, b: (a[0]+b[0], a[1] + b[1])).sortByKey()

In [341]:
summed_rdd.count()

9

In [342]:
summed_rdd.take(2)

[(1, (array([-0.00147226,  0.0044263 ,  0.02852533, -0.02649999, -0.00198322,
           0.00405797,  0.02093723,  0.00856092,  0.06417085, -0.01114989,
          -0.03373287,  0.01543447,  0.01961495, -0.03229797, -0.02667066,
           0.01115356, -0.02959675,  0.03130432,  0.01620949, -0.00659307,
           0.03190266, -0.04039969,  0.0154823 ,  0.00028935,  0.06419194,
          -0.04052171,  0.02219417, -0.01093993,  0.02026901,  0.02843093,
          -0.00774683,  0.02248913,  0.02035104,  0.00327248,  0.00630219,
          -0.06199979, -0.01279147, -0.00671188,  0.00661001, -0.05016908,
           0.01183395, -0.00273108,  0.01882027, -0.00063093,  0.00774934,
          -0.01318485, -0.04498927,  0.00935748,  0.02960959,  0.02792278,
          -0.03569188,  0.03322608,  0.01784803,  0.00651194,  0.01032926,
          -0.03848039, -0.01366616, -0.064901  , -0.02363634,  0.01932326,
           0.00458974,  0.01756637, -0.0229273 , -0.0441742 , -0.0498121 ,
           0.01990092,

Different ratings should carry a different weight, since for ratings closer to 5 we should take into high account the contnt of the movie and for lower ratings we should almost not take the movie under consideration. We consider a rating value of 1 as an outlier, and we construct the following weight matrix.

In [343]:
summed_rdd = summed_rdd.filter(lambda elem: elem[0] != 1)

In [348]:
weight_of_ratings = {
    '1.5': 0.01, 
    '2': 0.04, 
    '2.5': 0.05, 
    '3': 0.8,
    '3.5': 0.12, 
    '4': 0.18, 
    '4.5': 0.22,
    '5': 0.3}

By using these weights, we wish to average the user's preferences to a mean vector

In [349]:
averaged_rdd = summed_rdd.\
    map(lambda elem: weight_of_ratings[str(elem[0])] * elem[1][0] / float(elem[1][1]))

In [350]:
averaged_rdd.count()

8

It is easy to produce the mean vector from the averaged rdd:

In [351]:
temp = averaged_rdd.collect()

In [352]:
mean_user_vector = sum(temp) / len(temp)

In [353]:
mean_user_vector

array([ 1.87694596e-03, -3.59628175e-04,  2.74065160e-03, -7.84267951e-03,
       -1.12424884e-03, -1.86142127e-03,  6.82537910e-03, -1.26392453e-03,
        1.52220037e-02, -2.59213452e-03, -5.97395888e-03,  2.25893734e-03,
        6.75213570e-03, -6.21612370e-03, -7.85925332e-03,  3.20965727e-03,
       -3.40601453e-03,  2.82227225e-03,  5.57918521e-03, -4.17340314e-03,
        1.28712729e-02, -1.49417389e-02,  6.64670020e-03, -3.35290330e-03,
        2.13420279e-02, -1.53362080e-02,  3.11278901e-03, -1.38028467e-03,
        2.99587171e-03,  2.77310028e-03, -3.36549478e-03,  3.73566081e-03,
       -1.40112708e-03,  7.17792148e-03, -5.43137896e-04, -1.67864710e-02,
       -6.42805686e-03, -4.87069832e-03,  8.06682685e-04, -1.46794282e-02,
        3.70010408e-03, -4.22765687e-03,  1.19579341e-02,  2.10785796e-03,
        8.32073041e-04, -5.68656111e-03, -1.95702370e-02,  8.54341034e-03,
        8.51506647e-03, -2.21567228e-03, -4.07427782e-03,  6.04892662e-03,
        3.12786014e-03,  

### Important note

This weighted schema can only work because the weights assigned to the ratings sum up to 1.

For this reason, we explicitly asked our reviewers to provide us with at least one rating varying in [1.5, 5] with a step of 0.5. As a future notice, and to adapt our model in a more realistic setting, we need to design an algorithm that is able to deal with spparsity in the reviews given by our users. 

As we stated earlier, our metric of similarity is the cosine similarity between two given vectors

In [358]:
from gensim import matutils

def cosine_similarity(doc_vec1, doc_vec2):
    return np.dot(matutils.unitvec(doc_vec1), matutils.unitvec(doc_vec2))

In [373]:
temp1 = np.array([1,2,3])
temp2 = np.array([1,2,3])
temp3 = np.array([0,0,0])

In [374]:
cosine_similarity(temp1, temp2)

1.0

In [375]:
cosine_similarity(temp1, temp3)

0.0

#### Producing a new RDD with the cosine similarities of the user movies attached

We face the same pyspark error when trying to map directly the original rdd to the mapped rdd. 

Instead, we use the following method:

In [404]:
cosine_similarities = [cosine_similarity(doc_vec, mean_user_vector) for doc_vec in vectorized_summaries]

In [399]:
cosine_similarities[20]

0.705326

In [381]:
len(cosine_similarities)

42207

In [400]:
cosine_similarities_rdd = titles_summaries_and_vectors.zipWithIndex().\
    map(lambda elem: (elem[0][0], elem[0][1], cosine_similarities[elem[1]])).cache()

In [401]:
cosine_similarities_rdd.count()

42207

In [402]:
cosine_similarities_rdd.take(5)

[('Baby Boy',
  'A young 20-year-old named Jody  lives with his mother Juanita ,{{amg movie}} in South Central Los Angeles. He spends most of his time with his unemployed best friend P , and does not seem interested in becoming a responsible adult. However, he is forced to mature as a result of an ex-con named Melvin , who moves into their home. Another factor is his children - a son with his girlfriend Yvette  and a daughter with a girl named Peanut, who also lives with her mother. At the beginning of the movie Yvette has an abortion that Jody forced her to have. Yvette constantly asks Jody if he will ever come live with her and their son, but Jody avoids the subject and comes and goes as he pleases. Jody also continues seeing and having sex with other women, including Peanut. This becomes an issue between him and Yvette as well, especially since Yvette and Peanut do not get along. When she discovers his cheating they get in a heated argument which results to Jody slapping Yvette in t

Now we lastly need to re-arrange this RDD by using the cosine similarity as a key, amd sort in a descending order

In [406]:
key_value_cosine_similarity = cosine_similarities_rdd.\
    map(lambda elem: (elem[2], (elem[0], elem[1]))).cache()

In [407]:
key_value_cosine_similarity.count()

42207

In [408]:
key_value_cosine_similarity.take(5)

[(0.82091373,
  ('Baby Boy',
   'A young 20-year-old named Jody  lives with his mother Juanita ,{{amg movie}} in South Central Los Angeles. He spends most of his time with his unemployed best friend P , and does not seem interested in becoming a responsible adult. However, he is forced to mature as a result of an ex-con named Melvin , who moves into their home. Another factor is his children - a son with his girlfriend Yvette  and a daughter with a girl named Peanut, who also lives with her mother. At the beginning of the movie Yvette has an abortion that Jody forced her to have. Yvette constantly asks Jody if he will ever come live with her and their son, but Jody avoids the subject and comes and goes as he pleases. Jody also continues seeing and having sex with other women, including Peanut. This becomes an issue between him and Yvette as well, especially since Yvette and Peanut do not get along. When she discovers his cheating they get in a heated argument which results to Jody slap

Let's sort in a descending order

In [409]:
key_value_cosine_similarity_sorted= key_value_cosine_similarity.sortByKey(ascending=False).cache()

In [411]:
key_value_cosine_similarity_sorted.count()

42207

In [413]:
key_value_cosine_similarity_sorted.take(5)

[(0.96951896,
  ('The Big Clock',
   'The story is told in flashback. When it begins, George Stroud , editor-in-chief of Crimeways magazine, is shown hiding from building security behind the "big clock" ― the largest and most sophisticated one ever built, which dominates the lobby of the giant publishing company where he works, Janoth Publications in New York City. Stroud is eager to spend more time with his wife  and plans a long-postponed vacation from his job. He sticks to those plans despite being fired for it by his tyrannical publishing boss, Earl Janoth . Instead of meeting his wife at the train station as planned, however, Stroud finds himself preoccupied with the attention being shown him by Janoth\'s glamorous mistress, Pauline York , who proposes a blackmail plan against Janoth. When Stroud misses their scheduled train, his wife angrily leaves without him, so he begins drinking and spends the evening out on the town with York. Later that night, Janoth spots a man leaving Yor

Now let's generate the top 5 recommendations for our user

In [419]:
def generate_top_n_recommendations(cos_sim_rdd, topn = 5):
    """
    Produces the top N (default 5) suggestions for a user based on her interests.
    """
    
    top_n_recommendations = cos_sim_rdd.take(topn)
    
    for index, elem in enumerate(top_n_recommendations):
    
        print('---------------------------')
        print(f'Recommendation no.{index+1}:')
        print()
        print(f'Title: {elem[1][0]}')
        print()
        print(f'Summary:\n{elem[1][1]}')

In [420]:
generate_top_n_recommendations(key_value_cosine_similarity_sorted)

---------------------------
Recommendation no.1:

Title: The Big Clock

Summary:
The story is told in flashback. When it begins, George Stroud , editor-in-chief of Crimeways magazine, is shown hiding from building security behind the "big clock" ― the largest and most sophisticated one ever built, which dominates the lobby of the giant publishing company where he works, Janoth Publications in New York City. Stroud is eager to spend more time with his wife  and plans a long-postponed vacation from his job. He sticks to those plans despite being fired for it by his tyrannical publishing boss, Earl Janoth . Instead of meeting his wife at the train station as planned, however, Stroud finds himself preoccupied with the attention being shown him by Janoth's glamorous mistress, Pauline York , who proposes a blackmail plan against Janoth. When Stroud misses their scheduled train, his wife angrily leaves without him, so he begins drinking and spends the evening out on the town with York. Later 

In [421]:
generate_top_n_recommendations(key_value_cosine_similarity_sorted, topn=10)

---------------------------
Recommendation no.1:

Title: The Big Clock

Summary:
The story is told in flashback. When it begins, George Stroud , editor-in-chief of Crimeways magazine, is shown hiding from building security behind the "big clock" ― the largest and most sophisticated one ever built, which dominates the lobby of the giant publishing company where he works, Janoth Publications in New York City. Stroud is eager to spend more time with his wife  and plans a long-postponed vacation from his job. He sticks to those plans despite being fired for it by his tyrannical publishing boss, Earl Janoth . Instead of meeting his wife at the train station as planned, however, Stroud finds himself preoccupied with the attention being shown him by Janoth's glamorous mistress, Pauline York , who proposes a blackmail plan against Janoth. When Stroud misses their scheduled train, his wife angrily leaves without him, so he begins drinking and spends the evening out on the town with York. Later 