### Purpose:

The purpose of this notebook is to integrate Netflix and IMDB datasets together. To do this, we will match title names as key pairs. We will look at 3 different methods to try and combine datasets. We will implement each and get a final score at the end.

### Getting our data:

In [1]:
# filenames
netflix_movie_titles = "../data/netflix/movie_titles.csv"
imdb_movie_titles = "../data/imdb/title.basics.tsv"
imdb_movie_names = "../data/imdb/name.basics.tsv"

In [2]:
import re
from io import StringIO 
import pandas as pd
import numpy as np

In [3]:
# Get netflix movie csv
def manual_sep(old_split):
    new_split = old_split[0:2] + [",".join(old_split[2:])]
    return new_split
    
ntfx = pd.read_csv(netflix_movie_titles,
                   encoding = "ISO-8859-1",
                   header = None,
                   names = ['Movie_Id', 'Year', 'Name'],
                   on_bad_lines=manual_sep,
                   engine='python')
ntfx.dropna(subset='Year', inplace=True)
ntfx['Year'] = ntfx['Year'].astype("Int64")
print(f'{ntfx.shape = }')
ntfx.head()

ntfx.shape = (17763, 3)


Unnamed: 0,Movie_Id,Year,Name
0,1,2003,Dinosaur Planet
1,2,2004,Isle of Man TT 2004 Review
2,3,1997,Character
3,4,1994,Paula Abdul's Get Up & Dance
4,5,2004,The Rise and Fall of ECW


In [4]:
# Get IMDB movie csv
# pattern metching will use regex: "^tt[0-9]*\t(movie|short|tvSeries|tvShort|tvMovie|tvSpecial|tvMiniSeries)\t"
# *do not change this pattern unless you plan on redoing the NLP vector encodings!
stream = StringIO()
header = True # We want the first line always
with open(imdb_movie_titles, "r") as file:
    patrn = "^tt[0-9]*\t(movie|short|tvSeries|tvShort|tvMovie|tvSpecial|tvMiniSeries)\t"
    for line in file:
        if re.search(patrn, line) or header:
            stream.write(line)
            header = False
stream.seek(0)
imdb = pd.read_csv(stream,
                   sep='\t',
                   header=0)
imdb = imdb[imdb.startYear.apply(lambda x: x.isnumeric())].dropna(subset='startYear', inplace=False)
imdb["endYear"] = imdb.endYear.apply(lambda x: x if x.isnumeric() else np.nan)
imdb['endYear'] = imdb['endYear'].astype("Int64")
imdb['startYear'] = imdb['startYear'].astype("Int64")
stream.close()
file.close()

In [5]:
print(f'{imdb.shape = }')
imdb.head()

imdb.shape = (1836898, 9)


Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,tt0000001,short,Carmencita,Carmencita,0,1894,,1,"Documentary,Short"
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892,,5,"Animation,Short"
2,tt0000003,short,Pauvre Pierrot,Pauvre Pierrot,0,1892,,4,"Animation,Comedy,Romance"
3,tt0000004,short,Un bon bock,Un bon bock,0,1892,,12,"Animation,Short"
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893,,1,"Comedy,Short"


In [6]:
imdb['titleType'].unique()

array(['short', 'movie', 'tvSeries', 'tvShort', 'tvMovie', 'tvMiniSeries',
       'tvSpecial'], dtype=object)

In [7]:
len(imdb)

1836898

### Title comparison using NLP

There are multiple ways to perform NLP on the dataset. The main goal is to vectorize each title and then get a similarity score to find which title is the closest. Due to scope of this project, we will use a pretrained model to compute cosine similarities between titles. Bert-as-a-serveice is a sentence encoding API which allows us to encode a variable-length "sentance" (movie title) to a fixed-length vector. Doing so will alow us to compare the titles of one dataset to the titles of another based on a normalized score.

To run this, python 3.6 is needed with tensorflow 1.10

Follow this guide to set up (https://bert-as-service.readthedocs.io/en/latest/section/get-start.html#installation) and this guide to run as an example (https://bert-as-service.readthedocs.io/en/latest/tutorial/simple-search.html)

We are using Google's multi-lingual cased bert model to vectorize titles (https://github.com/google-research/bert#sentence-and-sentence-pair-classification-tasks), which is a 12-layer neural network with 768 hidden layers, 12-heads, and 110M parameters. Scoring is based on cosine similarity and is weighted beased on the difference between the years created of each key-pair, with absolute years > 4 being highly penalized.

Model is initialized with the following options: 

`bert-serving-start -num_worker=1 -max_seq_len=NONE -gpu_memory_fraction=0.75 -model_dir=notebooks/model/multi_cased_L-12_H-768_A-12`

and the following model parameters:
<code>&nbsp;
                             ARG   VALUE
            -------------------------------------------------
                       ckpt_name = bert_model.ckpt
                     config_name = bert_config.json
                            cors = *
                             cpu = False
                      device_map = []
                   do_lower_case = True
              fixed_embed_length = False
                            fp16 = False
             gpu_memory_fraction = 0.75
                   graph_tmp_dir = None
                http_max_connect = 10
                       http_port = None
                    mask_cls_sep = False
                  max_batch_size = 256
                     max_seq_len = None
                       model_dir = notebooks/model/multi_cased_L-12_H-768_A-12
            no_position_embeddings = False
                no_special_token = False
                      num_worker = 1
                   pooling_layer = [-2]
                pooling_strategy = REDUCE_MEAN
                            port = 5555
                        port_out = 5556
                   prefetch_size = 10
             priority_batch_size = 16
            show_tokens_to_client = False
                 tuned_model_dir = None
                         verbose = False
                             xla = False
                 
</code>                 

In [8]:
from bert_serving.client import BertClient
from sklearn.metrics.pairwise import cosine_similarity
import warnings

#### Preprocessing

In [32]:
# Create tuples, lower case everything
imdb_titles = list(zip(imdb['tconst'].values, # unique tmdb id
                       imdb['primaryTitle'].str.lower().values, # lower case primary title
                       imdb['originalTitle'].str.lower().values, # lower case secondary title
                       imdb['startYear'].values, # year created/started
                       imdb['endYear'].values)) # year ended
ntfx_titles = list(zip(ntfx['Movie_Id'], # unique netflix id
                       [x.replace(':', '') for x in ntfx['Name'].str.lower().values], # lower case name of movie
                       ntfx['Year'].values)) # year created

#### Model encoding

In [47]:
# NETFLIX TITLE ENCODINGS
# Uncomment and run only if dont have doc_vecs saved
# Encoding will automatically tokenize title names and embed them into the API model. This will run faster than
# ptitle and otitle vecs - about 10 mins
warnings.filterwarnings('ignore')
bc = BertClient(port=5555, port_out=5556)
doc_vecs = bc.encode([x.replace(':', '') for x in ntfx['Name'].str.lower().values])
warnings.filterwarnings('default')
# --------------------------------------------------------------------------------------------
# Load document encodings from the netflix data if available.
# This file is 54 MB
# ntfx_vecs = np.load('./doc_vecs.npy')

In [12]:
# IMDB TITLE ENCODINGS
# Uncomment and run only if dont have title encodings. Warning, this takes a LONG time - about 24 hours each
# ptitle_vecs = bc.encode(list(imdb['primaryTitle'].str.lower().values))
# otitle_vecs = bc.encode(list(imdb['originalTitle'].str.lower().values))
# warnings.filterwarnings('default')
# --------------------------------------------------------------------------------------------
# Load title encodings if you have the file. These are .npy files
# Current vectors are created using the imdb regex search:
# "^tt[0-9]*\t(movie|short|tvSeries|tvShort|tvMovie|tvSpecial|tvMiniSeries)\t"
# Each of these files are 5.6 GB:
ptitle_vecs = np.load('./ptitle_vecs.npy')
otitle_vecs = np.load('./otitle_vecs.npy')

In [36]:
print(len(doc_vecs))
print(len(otitle_vecs))
print(len(ptitle_vecs))

17763
1836898
1836898


In [53]:
# Find cosine similarity with movie titles. Takes ~72 hours to compute.
n = 0
missed = 0
min_ntfx = min(ntfx['Year'].values)
ntfx_dict = {}
imdb_dict = {}
# Set up netflix title dictionary
for i in ntfx['Movie_Id']:
    if i not in ntfx_dict:
        ntfx_dict[i] = None
for i, (ntfx_id, title, year) in enumerate(ntfx_titles):
    if i % 1000 == 0:
        print(".", end='')
    ntfx_vec = doc_vecs[i]
    # Year diff
    year_diff = abs(year-imdb['startYear'].values)
    weight = (1/256)*year_diff**4 + 1 # Highly punishes year difference > 4
    # Compute cosine similarity as score
    pscore = cosine_similarity(
        [ntfx_vec],
        ptitle_vecs
    )[0]
    oscore = cosine_similarity(
        [ntfx_vec],
        otitle_vecs
    )[0]
    # Get the maximum of two titles and weight them by year difference
    scores = np.maximum(pscore, oscore)/weight
    topk_idx = np.argsort(scores)[::-1][:5]
    # Only report scores greater than a threshold
    thresh = 0.968
    if scores[topk_idx[0]] > thresh:
        idx = topk_idx[0]
        score = scores[idx]
        if imdb_titles[idx][0] not in imdb_dict:
            imdb_dict[imdb_titles[idx][0]] = True
        elif scores[topk_idx[1]] > thresh and imdb_titles[topk_idx[1]][0] not in imdb_dict:
            idx = topk_idx[1]
            score = scores[idx]
            imdb_dict[imdb_titles[idx][0]] = True
        else:
            continue
        if ntfx_dict[ntfx_id] is None:
            ntfx_dict[ntfx_id] = [score, title, year, imdb_titles[idx]]
        elif ntfx_dict[ntfx_id][0] < score:
            ntfx_dict[ntfx_id] = [score, title, year, imdb_titles[idx]]
        else:
            imdb_dict.pop(imdb_titles[idx][0])
            missed += 1
    else:
        missed += 1
    n+=1
print(f'Total missed matches: {missed}')

..................Total missed matches: 6516


In [54]:
ntfx_dict

{1: [1.000000238418579,
  'dinosaur planet',
  2003,
  ('tt0389605', 'dinosaur planet', 'dinosaur planet', 2003, <NA>)],
 2: None,
 3: [1.0000001192092896,
  'character',
  1997,
  ('tt0119448', 'character', 'karakter', 1997, <NA>)],
 4: None,
 5: None,
 6: [0.999999463558197,
  'sick',
  1997,
  ('tt0120126',
   'sick',
   'sick: the life & death of bob flanagan, supermasochist',
   1997,
   <NA>)],
 7: [1.0,
  '8 man',
  1992,
  ('tt0182668',
   '8 man',
   'eitoman - subete no sabishii yoru no tame ni',
   1992,
   <NA>)],
 8: [0.96934574842453,
  'what the #$*! do we know!?',
  2004,
  ('tt0399877',
   'what the #$*! do we (k)now!?',
   'what the #$*! do we (k)now!?',
   2004,
   <NA>)],
 9: [0.9691489934921265,
  "class of nuke 'em high 2",
  1991,
  ('tt0101591',
   "class of nuke 'em high part ii: subhumanoid meltdown",
   "class of nuke 'em high part ii: subhumanoid meltdown",
   1991,
   <NA>)],
 10: [0.9961088306709023,
  'fighter',
  2001,
  ('tt0263338', 'fighter', 'fighter

In [80]:
# JOIN IMDB DATA TO NETFLIX DATA
# Create a column linking data pairs
ntfx_idx = []
for movie_id in ntfx["Movie_Id"]:
    imdb_info = ntfx_dict.get(movie_id)
    if imdb_info is None:
        ntfx_idx += [imdb_info]
    else:
        ntfx_idx += [imdb_info[3][0]]
# Add column to dataframe
ntfx["idx"] = ntfx_idx
# Remove rows that are not None
ntfx_cln = ntfx[ntfx["idx"].notnull()]
# Left Join based on idx|tconst
ntfx_cln.reset_index(drop=True, inplace=True)
ntfx_cln = ntfx_cln.merge(right=imdb, left_on='idx', right_on='tconst', how='left')
ntfx_cln

Unnamed: 0,Movie_Id,Year,Name,idx,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres
0,1,2003,Dinosaur Planet,tt0389605,tt0389605,tvMiniSeries,Dinosaur Planet,Dinosaur Planet,0,2003,,48,"Animation,Documentary,Family"
1,3,1997,Character,tt0119448,tt0119448,movie,Character,Karakter,0,1997,,122,"Crime,Drama,Mystery"
2,6,1997,Sick,tt0120126,tt0120126,movie,Sick,"Sick: The Life & Death of Bob Flanagan, Superm...",0,1997,,90,Documentary
3,7,1992,8 Man,tt0182668,tt0182668,movie,8 Man,Eitoman - Subete no sabishii yoru no tame ni,0,1992,,83,"Action,Sci-Fi"
4,8,2004,What the #$*! Do We Know!?,tt0399877,tt0399877,movie,What the #$*! Do We (K)now!?,What the #$*! Do We (K)now!?,0,2004,,109,"Comedy,Documentary,Drama"
...,...,...,...,...,...,...,...,...,...,...,...,...,...
11177,17763,1978,Interiors,tt0077742,tt0077742,movie,Interiors,Interiors,0,1978,,92,Drama
11178,17764,1998,Shakespeare in Love,tt0138097,tt0138097,movie,Shakespeare in Love,Shakespeare in Love,0,1998,,123,"Comedy,Drama,History"
11179,17768,2000,Epoch,tt0233657,tt0233657,tvMovie,Epoch,Epoch,0,2001,,96,"Sci-Fi,Thriller"
11180,17769,2003,The Company,tt0335013,tt0335013,movie,The Company,The Company,0,2003,,112,"Drama,Music,Romance"


In [66]:
warnings.filterwarnings('default')

#### Resources
https://bert-as-service.readthedocs.io/en/latest/index.html



### Comparing based on difflab ratios

In [163]:
import difflib
import numpy as np

In [212]:
sm1 = difflib.SequenceMatcher()
sm2 = difflib.SequenceMatcher()
rat = 0.0
chosen_titles = {}
used = {}
n = 0 # For testing
print('(ntitle|ititle|nyear|iyear|ratio)') # For testing
for ntitle, nyear in ntfx_titles:
    best_rat = 0.0
    best_title = None
    best_year = None
    for ititle1, ititle2, iyear, iyear_end in imdb_titles:
        if (pd.isna(iyear_end)):
            test = abs(iyear - nyear) > 1
        else:
            test = not (iyear-1 <= nyear <= iyear_end+1)
        if used.get(ititle1, False) or test:
            continue
        sm1.set_seqs(ntitle, ititle1)
        sm2.set_seqs(ntitle, ititle2)
        rat = max(sm1.ratio(), sm2.ratio())
        if rat > 0.5 and rat > best_rat:
            best_rat = rat
            best_title = ititle1
            best_year = iyear
    print(f'({ntitle}|{best_title}|{nyear}|{best_year}|{best_rat})') # For testing
    chosen_titles[ntitle] = (best_title, iyear, nyear)
    used[best_title] = True
    n+=1 # For testing
    if (n == 1000): # For testing
        break # For testing
    

(ntitle|ititle|nyear|iyear|ratio)
(Dinosaur Planet|Dinosaur Planet|2003|2003|1.0)
(Isle of Man TT 2004 Review|None|2004|None|0.0)
(Character|Character|1997|1997|1.0)
(Paula Abdul's Get Up & Dance|Get Up and Dance!|1994|1994|0.5777777777777777)
(The Rise and Fall of ECW|The Rise & Fall of ECW|2004|2004|0.9130434782608695)
(Sick|Sick|1997|1997|1.0)
(8 Man|8 Man|1992|1992|1.0)
(What the #$*! Do We Know!?|What the #$*! Do We (K)now!?|2004|2004|0.9629629629629629)
(Class of Nuke 'Em High 2|Class of Nuke 'Em High Part II: Subhumanoid Meltdown|1991|1991|0.6052631578947368)
(Fighter|Fighter|2001|2000|1.0)
(Full Frame: Documentary Shorts|Untitled Al Gore Documentary|1999|2000|0.5862068965517241)
(My Favorite Brunette|My Favorite Brunette|1947|1947|1.0)
(Lord of the Rings: The Return of the King: Extended Edition: Bonus Material|The Lord of the Rings: The Return of the King - Special Extended Edition Scenes|2003|2004|0.7922077922077922)
(Nature: Antarctica|Antarctica|1982|1983|0.7142857142857143

(Cat and the Canary|Fat and the Canary|1927|1927|0.9444444444444444)
(Naked Lies|Naked Lies|1998|1998|1.0)
(Star Trek: Voyager: Season 1|Star Trek: Voyager|1995|1995|0.782608695652174)
(Allergies: A Natural Approach|All Natural 4|2001|2000|0.5714285714285714)
(Lost in the Wild|Christy in the Wild|1993|1993|0.8)
(Goddess of Mercy|Goddess of Mercy|2004|2003|1.0)
(The Tricky Master|The Tricky Master|2000|1999|1.0)
(The Game|The Game|1997|1997|1.0)
(Deepak Chopra: The Way of the Wizard & Alchemy|The Way of the Birds|2000|2000|0.5454545454545454)
(Get Out Your Handkerchiefs|Get Out Your Handkerchiefs|1978|1978|1.0)
(Cannibal Women in the Avocado Jungle of Death|Cannibal Women in the Avocado Jungle of Death|1988|1989|1.0)
(Where Sleeping Dogs Lie|Where Sleeping Dogs Lie|1992|1991|1.0)
(Sweet November|Sweet November|2001|2001|1.0)
(The Edward R. Murrow Collection|The Homegrown Collection|2005|2006|0.6545454545454545)
(Firetrap|Firetrap|2001|2001|1.0)
(Sleepover Nightmare|Sleepover Nightmare|2

(Saudade Do Futuro|Saudade Do Futuro|2000|2000|1.0)
(Touched by an Angel: Season 1|Touched by an Angel|1994|1994|0.7916666666666666)
(The Final Countdown|The Final Countdown|1980|1980|1.0)
(Parenthood|Parenthood|1989|1989|1.0)
(Sex and the City: Season 4|Sex and the City|2001|1998|0.7619047619047619)
(Saludos Amigos|Saludos Amigos|1943|1942|1.0)
(Female Yakuza Tale|Female Yakuza Tale|1973|1973|1.0)
(Taxi|Taxi|2004|2004|1.0)
(Crazy as Hell|Crazy as Hell|2002|2002|1.0)
(Evelyn|Evelyn|2002|2002|1.0)
(Cold Harvest|Cold Harvest|1998|1999|1.0)
(Enigma: MCMXC A.D|Excitant Eye Shot|2003|2002|0.5217391304347826)
(100 Days Before the Command|100 Days Before the Command|1990|1991|1.0)
(Micki and Maude|Micki + Maude|1984|1984|0.8571428571428571)
(Sarah Brightman: In Concert|Sarah Brightman in Concert|1998|1998|0.9433962264150944)
(The Legend|The Legend|1993|1993|1.0)
(Blue Seed: Beyond|Quick Step Beyond|2003|2002|0.6470588235294118)
(If These Walls Could Talk|If These Walls Could Talk|1996|1996|1.

(Rio Lobo|Rio Lobo|1970|1970|1.0)
(Halloween 5: The Revenge of Michael Myers|Halloween 5: The Revenge of Michael Myers|1989|1989|1.0)
(Pan Tadeusz|Pan Tadeusz: The Last Foray in Lithuania|1999|1999|1.0)
(Substitute 4: Failure is Not an Option|The Substitute: Failure Is Not an Option|2000|2001|0.8974358974358975)
(The Shaft|The Shaft|2001|2001|1.0)
(Wings of Desire|Wings of Desire|1987|1987|1.0)
(Hostage|Hostage|2005|2005|1.0)
(The Abductors|The Abductors|1972|1972|1.0)
(Nightbreed|Nightbreed|1990|1990|1.0)
(Godzilla vs. The Sea Monster|Gamera: The Giant Monster|1966|1965|0.6037735849056604)
(Frank Lloyd Wright|Frank Lloyd Wright|1998|1998|1.0)
(Vendetta|Vendetta|1999|1999|1.0)
(Jay Jay the Jet Plane: Adventures in Learning|Jay Jay the Jet Plane|1999|1998|0.6363636363636364)
(Igby Goes Down|Igby Goes Down|2002|2002|1.0)
(Girl|Girl|1999|1998|1.0)
(Reign in Darkness|Reign in Darkness|2002|2002|1.0)
(Elephant|Elephant|2003|2003|1.0)
(Transformers: Season 3: Part 1|Transformers: Five Faces 

(Summer of the Monkeys|Summer of the Monkeys|1998|1998|1.0)
(Return to Horror High|Return to Horror High|1987|1987|1.0)
(You're Invited to Mary-Kate and Ashley's Vacation Parties|You're Invited to Mary-Kate and Ashley's Mall Party|1996|1997|0.8703703703703703)
(Young Einstein|Young Einstein|1988|1988|1.0)
(Drop Dead Fred|Drop Dead Fred|1991|1991|1.0)
(With a Friend Like Harry|With a Friend Like Harry...|2000|2000|0.9411764705882353)
(The Alamo|The Alamo|1960|1960|1.0)
(Sol Goode|Stolen Good|2001|2002|0.8)
(Here is Greenwood|Here Is Greenwood|1991|1991|0.9411764705882353)
(A Crime of Passion|A Crime of Passion|2003|2003|1.0)
(Rumpole of the Bailey: Series 4|Rumpole of the Bailey|1987|1978|0.8076923076923077)
(Ghosts of the Abyss: Bonus Material|Ghosts of the Abyss|2003|2003|0.7037037037037037)
(King Cobra|King Cobra|1998|1999|1.0)
(The Blackout|The Blackout|1997|1997|1.0)
(Roughnecks: The Starship Troopers Chronicles: The Pluto Campaign|Roughnecks: The Starship Troopers Chronicles|1999|

(Veronica 2030|Veronica|2004|2004|0.7619047619047619)
(Highlander: Season 5|Highlander|1996|1992|0.6666666666666666)
(Robin Hood: Prince of Thieves|Robin Hood: Prince of Thieves|1991|1991|1.0)
(The Last House on the Left|The Last House on the Left|1972|1972|1.0)
(Saving Grace|Saving Grace|2000|2000|1.0)
(Who is Cletis Tout?|Who Is Cletis Tout?|2002|2001|0.9473684210526315)
(Rob Roy|Rob Roy|1995|1995|1.0)
(La Femme Nikita: Season 3|La Femme Nikita|1999|1997|0.75)
(Office Killer|Office Killer|1997|1997|1.0)
(Lipstick|Lipstick|1976|1976|1.0)
(The Man Who Came to Dinner|The Man Who Came to Dinner|2000|2000|1.0)
(Gilligan's Island: Season 2|Gilligan's Island|1965|1964|0.7727272727272727)
(Saturday Night Live: The Best of Will Ferrell 2|Saturday Night Live: The Best of Will Ferrell - Volume 2|2004|2004|0.912621359223301)
(Real Kung Fu of Shaolin|Young Hero of Shaolin II|1986|1986|0.6382978723404256)
(He Loves Me, He Loves Me Not|He Loves Me, He Loves Me Not|2002|2002|1.0)
(The Winter People|

(La Vallee|La vallée|1972|1972|0.7777777777777778)
(Clerks|Clerks|1994|1994|1.0)
(Boyz N the Hood|Boyz n the Hood|1991|1991|0.9333333333333333)
(A Single Girl|Single Girls|2000|2001|0.88)
(Don Henley: Live Inside Job|Don Henley: Live Inside Job|2000|2000|1.0)
(Unsolved History: Salem Witch Trials|Unsolved History|2004|2002|0.6153846153846154)
(By Brakhage: An Anthology|By Brakhage: An Anthology, Volume One|2004|2003|0.8064516129032258)
(A Stranger Among Us|A Stranger Among Us|1992|1992|1.0)
(Landmarks of Early Film|Landmarks of Early Film|1997|1997|1.0)
(Rabid|Rabid|1976|1977|1.0)
(Look Back in Anger|Look Back in Anger|1959|1959|1.0)
(Jaws|Jaws|1975|1975|1.0)
(Teen Titans: Season 1|Teen Titans|2003|2003|0.6875)
(Dracula Vs. Frankenstein|Dracula vs. Frankenstein|1971|1971|0.9583333333333334)
(We Know Where You Live. Live!|We Know Where You Live. Live!|2001|2001|1.0)
(Spies|Spies|1928|1928|1.0)
(Soul Assassin|Soul Assassin|2001|2001|1.0)
(Sherlock Holmes: The Scarlet Claw|The Scarlet Cla

(A Hard Day's Night: Collector's Series|A Hard Day's Night|1964|1964|0.6428571428571429)
(Comedian|Comedian|2002|2002|1.0)
(Satanis: The Devil's Mass / Sinthia: The Devil's Doll: Double Feature|None|1968|None|0.0)
(Don't Bother to Knock|Don't Bother to Knock|1952|1952|1.0)
(Gojoe: Spirit War Chronicle|Security Cam Chronicles 1|2004|2004|0.6153846153846154)
(Giant Robo|Giant|2004|2003|0.6666666666666666)
(Taxi Zum Klo|Taxi zum Klo|1981|1980|0.9166666666666666)
(Dodsworth|Dodsworth|1936|1936|1.0)
(Fear Of A Punk Planet|US Off the Planet|2002|2001|0.5789473684210527)
(Danielle Steel's Changes|Valentine's Challenge|1991|1992|0.6222222222222222)
(The Night That Never Happened|The Night That Never Happened|1997|1997|1.0)
(Journeys with George|Journeys with George|2002|2002|1.0)
(Back to the Beach|Back to the Beach|1987|1987|1.0)
(Another Day in Paradise|Another Day in Paradise|1998|1998|1.0)
(Where Are We?|Where Were You?|1992|1993|0.7142857142857143)
(Backyardigans: It's Great to Be a Ghost

### Calling an API to find the title for us

The final method is to call a search API which will search for the movie title for us. For this, we will use rapidapi's IMDB search API to create a request for each movie (https://rapidapi.com/unogs/api/uNoGS)

In [None]:
import unirest
response = unirest.post(API_URL,
  headers={
    "X-RapidAPI-Key": API_KEY,
    "Content-Type": "application/x-www-form-urlencoded"
  },
  params={
    "parameter": "value"
  }
)
response.code # The HTTP status code
response.headers # The HTTP headers
response.body # The parsed response
response.raw_body # The unparsed response