# Abstract 

Digitization, as one of the key outcomes of technological growth, has led to profound changes in entertainment, and therefore in the world of cinema, as well as in many other areas. As a result, the distribution and broadcasting strategy that Netflix brought to the market turned into an amazing success story in a very short period of time.

Netflix's strategy is based on the idea that consumers can access the platform's entire content catalog for a monthly price. In addition, Netflix only broadcasts its films on the web, with no theatrical or limited distribution. The approach, which is vastly different from the classic idea of ​​the Hollywood studio system, has led to significant advances for audiences, directors and studios in various ways. In this way, we can confidently say that streaming services, such as Netflix, are influencing the film industry in terms of how we access movies, what material we consume and how movies are made.

Every day, platforms such as Netflix and Amazon Prime gain more users thanks to competitive prices compared to movie theaters, and recommendation algorithms. The latter play an important role in the dissemination of romantic comedies and thrillers, obtaining some success thanks to the data of millions of users who use them. This dominant position places Internet platforms in a strong position in terms of film content. In the future, that authority could be key in determining what constitutes a "well-made film".

The impact of Internet streaming services on filmmakers has been one of the most important transformations in the world of cinema in recent years. The promise of a more open environment for filmmakers than other large studios has attracted numerous directors to the platforms, with huge ramifications in the world of cinema. Furthermore, the fact that these services have less stringent standards than cinemas makes them attractive to producers. Another important aspect concerns independent directors. Since the 1980s, when Hollywood became the hub of cinema and blockbuster films began to dominate theaters, it has been difficult for independent directors to reach large audiences. Cinemas often prefer high-budget movies as they can make a much larger profit from them. As a result, independent films have few opportunities outside of film festivals to date. However, with internet streaming services becoming a major role in the world of cinema, independent filmmakers now have the opportunity to reach a wider audience.

The purpose of this notebook is to investigate, through data, how streaming platforms have changed film production. is the world of production really fairer? How much power does the user of these platforms have?

# Data gatering
We start from two existing datasets:
* [Netflix](https://www.kaggle.com/datasets/shivamb/netflix-shows): One of the most popular media and video streaming platforms. They have over 8000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Netflix, along with details such as - cast, directors, ratings, release year, duration, etc.

* [Amazon prime](https://www.kaggle.com/datasets/shivamb/amazon-prime-movies-and-tv-shows): Another one of the most popular media and video streaming platforms. They have close to 10000 movies or tv shows available on their platform, as of mid-2021, they have over 200M Subscribers globally. This tabular dataset consists of listings of all the movies and tv shows available on Amazon Prime, along with details such as - cast, directors, ratings, release year, duration, etc.*



In [1]:
import pandas as pd # data processing
import pandas_profiling as pp
import numpy as np # linear algebra

In [92]:
df_netflix = pd.read_csv('originalDataset/netflix_titles.csv')
df_amazon = pd.read_csv('originalDataset/amazon_prime_titles.csv')

In [93]:
print(len(df_netflix))
print(len(df_amazon))

8807
9668


# Prepering data

Objective is the one of concatenate amazon and netflix databases mantaing storage information. W
We add two colums: netflix and amazon both with value 1 or 0 representing the absence or presence of the title on the platform. To keep the date added information we rename columns to distinguish the relative streaming service.

In [94]:
df_netflix.drop(columns = df_netflix.columns[0], axis = 1, inplace= True)
df_netflix['netflix'] = 1
df_netflix['amazon'] = 0
df_netflix.rename(columns = {'date_added':'date_added_netflix'}, inplace = True)

df_netflix.head(2)

Unnamed: 0,type,title,director,cast,country,date_added_netflix,release_year,rating,duration,listed_in,description,netflix,amazon
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",1,0
1,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",1,0


In [95]:
df_amazon.drop(columns = df_amazon.columns[0], axis = 1, inplace= True)
df_amazon['amazon'] = 1
df_amazon['netflix'] = 0
df_amazon.rename(columns = {'date_added':'date_added_amazon'}, inplace = True)

df_amazon.head(2)

Unnamed: 0,type,title,director,cast,country,date_added_amazon,release_year,rating,duration,listed_in,description,amazon,netflix
0,Movie,The Grand Seduction,Don McKellar,"Brendan Gleeson, Taylor Kitsch, Gordon Pinsent",Canada,"March 30, 2021",2014,,113 min,"Comedy, Drama",A small fishing village must procure a local d...,1,0
1,Movie,Take Care Good Night,Girish Joshi,"Mahesh Manjrekar, Abhay Mahajan, Sachin Khedekar",India,"March 30, 2021",2018,13+,110 min,"Drama, International",A Metro Family decides to fight a Cyber Crimin...,1,0


In [96]:

dataset = pd.concat([df_netflix, df_amazon],axis=0, join="outer", sort=False)

dataset.head(3)


Unnamed: 0,type,title,director,cast,country,date_added_netflix,release_year,rating,duration,listed_in,description,netflix,amazon,date_added_amazon
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",1,0,
1,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t...",1,0,
2,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...,1,0,


The concatenated dataset present some errors. Shows that are both present on Netflix and Amazon Prime are recoded twice in our dataset: first time having only netflix information, the second time only the amazon ones. We decide to extract unic triples from the original datasets containing year, type (movie or tv Series) and name. Then we check the triples that the two dataset has in common and we store them in a list named 'title'.

In [97]:
netflix = []
amazon = []

def union(df, new):
    for  i, x in df['title'].iteritems():
        year = df['release_year'][i]
        type = df['type'][i]
        movie = x
        new.append((year, type, movie))
    return new

union(df_netflix, netflix)
union(df_amazon, amazon)

print(len(netflix), len(amazon))

8807 9668


In [98]:
title = []
for (y,t,m) in netflix:
    if (y,t,m) in amazon:
        title.append((y,t,m))

Finally, we iterate over the concatenated dataset querying only the shared titles merging amazon and Netflix information about possession and date of addition.
We decide to take description, country and cast information from the Netflix dataset because it was the best filled of the two. So, at the end of this process, we drop duplicates filtered by title, director, release year and type; keeping the first entries. Then we fill null value with 'No Data'.

In [99]:
df = dataset.copy()
df.replace(np.nan, 'null', inplace=True)
for i, r in df.iterrows(): 
    if (r['release_year'],r['type'],r['title']) in title:
        df.loc[i,'netflix'] = 1
        df.loc[i, 'amazon'] = 1
        q = df.query('title=="'+r['title']+'" & type=="'+r['type']+'" & release_year== '+ str(r['release_year']) +'')

        for i, x in q['date_added_netflix'].iteritems():
            if x != 'null':
                df.loc[i, 'date_added_netflix'] = x

        for i, x in q['date_added_amazon'].iteritems():
            if x != 'null':
                df.loc[i, 'date_added_amazon'] = x

In [100]:
df.replace( 'null', np.nan, inplace=True)
for i in df.columns:
    null_rate = df[i].isna().sum() / len(df) * 100 
    if null_rate > 0 :
        print("{} null rate: {}%".format(i,round(null_rate,2)))

director null rate: 25.53%
cast null rate: 11.14%
country null rate: 53.19%
date_added_netflix null rate: 51.4%
rating null rate: 1.85%
duration null rate: 0.02%
date_added_amazon null rate: 99.15%


In [101]:

df['date_added_netflix'].replace(np.nan, 1000,inplace  = True)
df['date_added_amazon'].replace(np.nan, 1000,inplace  = True)
df['country'].replace(np.nan, 'No Data',inplace  = True)
df['director'].replace(np.nan, 'No Data',inplace  = True)
df['cast'].replace(np.nan, 'No Data',inplace  = True)
df['rating'].replace(np.nan, 'No Data',inplace  = True)
df = df.drop_duplicates(subset=['title','director', 'release_year', 'type'], keep='first')
df = df.dropna()
df = df.reset_index(drop=True)

In [102]:
df['title'] = df['title'].replace({'"':''}, regex=True)
df['title'] = df['title'].replace({'\n':' '}, regex=True)

In [103]:
df.to_csv('data.csv')

# Data enrichment 

Our data will be enriched using two sources:

* [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page): wikidata is the a free and open knowledge graph containing linked information used by the famouse online encyclopedia. 

* [iMdB](https://www.imdb.com/): Internet Movie Database is the world's most popular and authoritative source for movie, TV shows and celebrity content, where you can find ratings and reviews by creteque and public.

In [104]:
import pprint #indet json 
import requests #make http requests
from qwikidata.sparql  import return_sparql_query_results #return sparql results
from SPARQLWrapper import SPARQLWrapper, JSON #questo serve a vedere la struttura delle risposte
import ssl
from http.client import IncompleteRead
import time
import urllib.error
from xml.etree.ElementPath import xpath_tokenizer_re

In order to query information, we split our dataset in Movies and TV Shows. Through wikidate we retrive missing information form our starting dataset, such like countries and directors. In addition, we add interesting information for us such as the gender of the director and the distributor. Finally, we also retrive the iMdb id of the movie for future query.

In [105]:
df = pd.read_csv('data.csv')

Unnamed: 0.1,Unnamed: 0,type,title,director,cast,country,date_added_netflix,release_year,rating,duration,listed_in,description,netflix,amazon,date_added_amazon
8416,8416,Movie,The Memphis Belle: A Story of a Flying Fortress,William Wyler,No Data,United States,"March 31, 2017",1944,TV-PG,40 min,"Classic Movies, Documentaries",This documentary centers on the crew of the B-...,1,0,1000


In [75]:
for i, x in df['title'].iteritems():
    if x == 'The Memphis Belle: A Story of a Flying Fortress':
        print(x)

In [41]:
movie_title = df.query("type == 'Movie'")
movie_title.reset_index(level=None, drop=True, inplace=True, col_level=0, col_fill='')
movie_title.head(2)

Unnamed: 0,type,title,director,cast,country,date_added_netflix,release_year,rating,duration,listed_in,description,netflix,amazon,date_added_amazon
0,Movie,Dick Johnson Is Dead,Kirsten Johnson,No Data,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm...",1,0,1000
1,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",No Data,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,1,0,1000


In [60]:
movie_title = movie_title.replace({'"':''}, regex=True)
movie_title = movie_title.replace({'\r\n':' '}, regex=True)

In [None]:
imdbID = []
not_found = []

In [61]:

index = []

def wikidata_reconciliation(query):
        wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"

        for i, x in query.iterrows():
                title = x['title']
                year =  x['release_year']

                try:

                        my_SPARQL_query = """SELECT ?imdbID
                        WHERE
                        {
                        ?film wdt:P31 wd:Q11424 .
                        ?film rdfs:label """+'"'+ title +'"' +"""@en .
                        ?film wdt:P577 ?time .
                        FILTER ( YEAR(?time) =  """+ str(year) +""" ).
                        ?film wdt:P345 ?imdbID.
                        }"""

                        print(title)

                        sparql_wd = SPARQLWrapper(wikidata_endpoint)
                        # set the query
                        sparql_wd.setQuery(my_SPARQL_query)
                        # set the returned format
                        sparql_wd.setReturnFormat(JSON)

                        results = sparql_wd.query().convert()

                        if results['results']['bindings'] == []:
                                not_found.append(title)
                                
                        else:

                                imdbID.append((results['results']['bindings'][0]['imdbID']['value'], title, year))
                                index.append(i)

                except urllib.error.HTTPError as e:
                        time.sleep((int(e.headers["retry-after"])) + 1)
                        wikidata_reconciliation(query[i:])
    
wikidata_reconciliation(movie_title[5810:7000])

The Mayor
The Memphis Belle: A Story of a Flying Fortress
The Men Who Stare at Goats
The Model
The Monster
The Monster of Mangatiti
The Muppets
The Music of Silence
The Naked Gun 2 1/2: The Smell of Fear
The Naked Gun: From the Files of Police Squad!
The Natural
The Negro Soldier
The New Romantic
The NSU-Complex
The Nutcracker and the Four Realms
The One I Love
The Only Mother To You All
The Other Guys
The Parole Officer
The Pass
The Peacemaker
The Pelican Brief
The Perfect Day
The Perks of Being a Wallflower
The Phantom of the Opera
The Physician
The Pink Panther
The Pirate Fairy
The Pirates of Somalia
The Pirates! Band of Misfits
The Place Beyond the Pines
The Plan
The Player
The Polar Express
The Power of Grayskull: The Definitive History of He-Man and the Masters of the Universe
The President's Barber
The Prince
The Prince & Me
The Princess and the Frog
The Prison
The Pursuit
The Pursuit of Happyness
The Push
The Rainbow Troops
The Rainmaker
The Rat Race
The Real Miyagi
The Rebound

In [59]:
print(index[-1])

5810


In [63]:
print(imdbID[0:5])

[('tt7133686', 'Next Gen', 2018), ('tt6186696', 'The Most Assassinated Woman in the World', 2018), ('tt2338151', 'PK', 2014), ('tt7448180', 'The Debt Collector', 2018), ('tt6183834', 'Carbon', 2017)]


In [None]:
mid_df = pd.DataFrame(imdbID, columns =['id', 'title', 'year'])

In [66]:
mid_df.to_csv('mid_half.csv')

In [2]:
mid_df = pd.read_csv('mid_half.csv')
first_half = pd.read_csv('first_half.csv')
last_half = pd.read_csv('last_half.csv')

In [3]:
frames = [first_half, mid_df, last_half]

new_id_dataset = pd.concat(frames)

In [4]:
print(len(first_half))
print(len(last_half))
print(len(mid_df))
print(len(new_id_dataset))

1588
1729
2161
5478


In [7]:

new_id_dataset = new_id_dataset.loc[:, ~new_id_dataset.columns.str.contains('^Unnamed')]
new_id_dataset.reset_index(level=None, drop=True, inplace=True, col_level=0, col_fill='')
new_id_dataset.head(5)

Unnamed: 0,id,title,year
0,tt11394180,Dick Johnson Is Dead,2020
1,tt0108041,Sankofa,1993
2,tt5164438,The Starling,2021
3,tt15204288,Confessions of an Invisible Girl,2021
4,tt0139872,Avvai Shanmughi,1996


In [8]:
len(new_id_dataset)

5478

In [9]:
new_id_dataset1 = new_id_dataset.copy()

In [98]:
df_id =  new_id_dataset1.replace('no_data', np.nan)

In [99]:
df_id.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5619 entries, 0 to 5618
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      5619 non-null   object
 1   title   5619 non-null   object
 2   year    5619 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 131.8+ KB


In [10]:
pd.concat(g for _, g in new_id_dataset.groupby("id") if len(g) > 1)

Unnamed: 0,id,title,year
2815,tt0091607,Martial Arts of Shaolin,1986
4127,tt0091607,Martial Arts of Shaolin,1986
372,tt0103671,American Me,1992
5195,tt0103671,American Me,1992
3935,tt0118980,Digging to China,1998
3936,tt0118980,Digging to China,1998
2350,tt0806203,Carriers,2009
3981,tt0806203,Carriers,2009
2793,tt0951216,Mad Money,2008
4131,tt0951216,Mad Money,2008


In [11]:
new_id_dataset_copy = new_id_dataset.drop_duplicates(keep='first')
len(new_id_dataset_copy)

5457

In [12]:
new_id_dataset_copy["id"]

0       tt11394180
1        tt0108041
2        tt5164438
3       tt15204288
4        tt0139872
           ...    
5473     tt0113492
5474     tt0116669
5475     tt0147800
5476    tt14222534
5477     tt0892899
Name: id, Length: 5457, dtype: object

# new  id list = id_list

In [13]:
id_list = new_id_dataset_copy['id'].tolist()
print(len(id_list))



5457


In [14]:
other_ids = pd.read_csv('clean-movie.csv')
len(other_ids)

6666

In [15]:
other_id_list = other_ids['id'].tolist()
print(len(other_id_list))

6666


In [16]:
other_id_list_clean = []
for i in other_id_list:
    if i != "no_data":
        other_id_list_clean.append(i)


In [17]:
len(other_id_list_clean)

6405

In [18]:
only_new = []
for i in id_list:
    if i not in other_id_list_clean:
        only_new.append(i)

In [19]:
print(len(only_new ))

906


In [24]:
full_list = id_list.copy()
for i in other_id_list:
    if i not in id_list:
        full_list.append(i)
print(len(both))

4549


In [118]:
import requests
json_list =[]
for id in full_list[10:]:


    #url = "https://imdb-api.com/en/API/Title/k_2xtxoo0v/"+id+"/Ratings,Awards"

    resp = requests.get(url)
    data = resp.json()
    json_list.append(data)

    print(data['title'])


Night of the Running Man
Night Falls on Manhattan
New Faces
Nawabzaade
Navy Secrets
Narc
Nakhuda
Mystery Road
My Pal Trigger
My Man Godfrey
My Kid Could Paint That
My Foolish Heart
My Father's Guests
My Dog Stupid
My Best Friend
Murder with Music
Muklawa
Mr. Reckless
Mr. Imperium
Moonfire
Moondance Alexander
Monte Carlo Nights
Vig
Mitchell
Missing
Minesweeper
Millie
Merry-Go-Round
Meet Dr. Christian
McLintock!
Maximum Ride
Mario
Maane Number 13
Mandingo
Mambo Italiano
Making a Killing
Majili
Mafia: Chapter 1
Madame Behave
Luv Ka the End
Love Liza
Louisiana Story
Little Big Man
Linsanity
Line of Duty
Light of My Life
Let's Sing Again
Let's Go Collegiate
Late Bloomers
The Last Rampage
Ladybugs
Lady in the Death House
Lady Behave!
Ladies in Lavender
Laal Kaptaan
Corrode
Knockout
Knock Off
Knights of Badassdom
Kesari
Katha Sangama
Kalank
Kadaikutty Singam
Kaappaan
Judge Priest
Joyride
Joshy
Joni
Johnny Guitar
Jessica
Jeffrey
Jaws of Justice
Jawaan
Jack Goes Home
Izzie's Way Home
Itzhak
Ish

KeyboardInterrupt: 

In [115]:
print(len(full_list))

7301


In [127]:
print(len(json_list))
print(type(json_list[1]))

3080
<class 'dict'>


In [131]:
import json
list_josn_string = []
for i in json_list:
    json_string = json.dumps(i)
    list_josn_string.append(json_string)
    
print(len(list_josn_string))
print(type(list_josn_string[1]))

3080
<class 'str'>


In [132]:
names = ['Jessa', 'Eric', 'Bob']

# open file in write mode
with open('save_file.txt', 'w') as fp:
    for item in list_josn_string:
        # write each item on a new line
        fp.write("%s\n" % item)
    print('Done')

Done


In [121]:
import numpy as np
np.savetxt("json_list.csv", json_list, delimiter=", ", fmt="% s")

UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 4908: character maps to <undefined>

In [23]:
film = []
director = []
gender = []
distributor = []
imdbID = []
rottenscore = []
not_found = []

In [None]:

def wikidata_reconciliation(query):

    
    # get the endpoint API
    wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"
        

    for i, r in query.iterrow():
        x = r['title'] 
        y = r['release_year']
        
        try:

            print(x)
            my_SPARQL_query = """
            SELECT ?film_label ?director_label ?dir_gen_label 
            WHERE
            {
            ?film wdt:P31 wd:Q11424 .
            ?film rdfs:label """+'"'+ x +'"' +"""@en .
            ?film rdfs:label ?film_label .
            FILTER(lang(?film_label) = 'en')
            OPTIONAL {?film wdt:P57 ?director . 
            ?director rdfs:label ?director_label .    
            FILTER(lang(?director_label) = 'en')
            OPTIONAL {?director wdt:P21 ?dir_gen . 
            ?dir_gen rdfs:label ?dir_gen_label .
            FILTER(lang(?dir_gen_label) = 'en')}}
            
            }
            """
            # set the endpoint 
            sparql_wd = SPARQLWrapper(wikidata_endpoint)
            # set the query
            sparql_wd.setQuery(my_SPARQL_query)
            # set the returned format
            sparql_wd.setReturnFormat(JSON)
            # get the results
            
            results = sparql_wd.query().convert()

            if results['results']['bindings'] == []:
                not_found.append(""+x+"")
                
            else:
                film.append(results['results']['bindings'][0]['film_label']['value'])
                if "director_label" in results['results']['bindings'][0]:
                    director.append(results['results']['bindings'][0]['director_label']['value'])
                else:
                    director.append("no_data")
                if "dir_gen_label" in results['results']['bindings'][0]:
                    gender.append(results['results']['bindings'][0]['dir_gen_label']['value'])
                else:
                    gender.append("no_data")
                


        except urllib.error.HTTPError as e:
            time.sleep((int(e.headers["retry-after"])) + 1)
            error_title = query.index(x)
            wikidata_reconciliation(query[error_title:])
            

wikidata_reconciliation(movie_title[2637:])

In [54]:


import urllib.error
from xml.etree.ElementPath import xpath_tokenizer_re

def wikidata_reconciliation1(query):

    
    # get the endpoint API
    wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"
        

    for x in query:

        try:

            
            my_SPARQL_query = """
            SELECT ?film_label ?distributor_label ?imdbID ?rottenscore
            WHERE
            {
            ?film wdt:P31 wd:Q11424 .
            ?film rdfs:label """+'"'+ x +'"' +"""@en .
            ?film rdfs:label ?film_label .
            FILTER(lang(?film_label) = 'en')
            OPTIONAL {?film wdt:P750 ?distributor . 
            ?distributor rdfs:label ?distributor_label .
            FILTER(lang(?distributor_label) = 'en')}
            OPTIONAL {?film wdt:P345 ?imdbID.}
            OPTIONAL {?film wdt:P444 ?rottenscore.}
            }
            """
            # set the endpoint 
            sparql_wd = SPARQLWrapper(wikidata_endpoint)
            # set the query
            sparql_wd.setQuery(my_SPARQL_query)
            # set the returned format
            sparql_wd.setReturnFormat(JSON)
            # get the results
            
            results = sparql_wd.query().convert()

            if results['results']['bindings'] == []:
                print(""+x+" not found")
            else:
                film.append(results['results']['bindings'][0]['film_label']['value'])
                if "distributor_label" in results['results']['bindings'][0]:
                    distributor.append(results['results']['bindings'][0]['distributor_label']['value'])
                else:
                    distributor.append("no_data")
                if "imdbID" in results['results']['bindings'][0]:
                    imdbID.append(results['results']['bindings'][0]['imdbID']['value'])
                else:
                    imdbID.append("no_data")
                if "rottenscore" in results['results']['bindings'][0]:
                    rottenscore.append(results['results']['bindings'][0]['rottenscore']['value'])
                else:
                    rottenscore.append("no_data")



        except urllib.error.HTTPError as e:

            print(e.headers["retry-after"])
            time.sleep((int(e.headers["retry-after"])) + 1)
            error_title = query.index(x)
            wikidata_reconciliation1(query[error_title:])
        except IncompleteRead:
            # Oh well, reconnect and keep trucking
            continue
            




wikidata_reconciliation1(movie_title[2975:7000])

type not found
title not found
director not found
cast not found
country not found
date_added_netflix not found
release_year not found
rating not found
duration not found
listed_in not found
description not found


KeyboardInterrupt: 

In [None]:
dict = {"title": film, "director": director,
        "director gender": gender}

lists_df = pd.DataFrame(dict)
lists_df.to_csv('savelist.csv')
lists_df.head(5)

In [None]:
dict1 = {"title": film, "distributor": distributor, "id": imdbID, "rating score": rottenscore}

lists1_df = pd.DataFrame(dict1)
lists1_df.to_csv('6000-9000.csv')
lists1_df.head(5)

# data analysis

# data visualization

In [None]:
#