In [1]:
# Import modules
from bs4 import BeautifulSoup
import os
import pandas as pd
from movie_webscraping_utils import *

### 1. Scraping movie info from IMDb and WIKI pages

In this notebook, we scrape the data about the top rated movies from IMDB and Wikipedia. Later, in a separate Jupyter notebook, we will use this data to build different topic models so we can retrieve latent topics present in movies plots.

First, we access the IMDb page with 250 top rated movies and scrape the urls of each individual movie. Then, we will scrape the basic info from each movie page, such as movie title, genre and realese date, as well as the synopsis page url. We then proceed to the synopsis page and scrape a detailed storyline of each movie, which we will use later to build our topic models. Moreover, we also try to scrape the individual movie plots from the wikipedia pages dedicated to those movies. In the majority of movies, we can find the wiki webpage by simply adding a movie title to the wikipedia homepage. For the rest, we have to tweak the title slightly to get the correct webpage.  

In [2]:
url = 'https://www.imdb.com/chart/top/?ref_=nv_mv_250%20lang_eng'
# Make HTTP requests and get a content from a given url
html = read_request(url)
soup = BeautifulSoup(html, 'html.parser')

# Scraping web pages for each movie on 250 IMDB top chart list
movie_webpage_list = []
for a in soup.select( 'td.titleColumn a' ):
    url_new = a.get('href')
    url_new = convert_if_relative_url(url, url_new)
    if not is_absolute_url(url_new):
        print('URL not valid')
    movie_webpage_list.append(url_new)

# Check if we downloaded all 250 movie URLs
assert len(movie_webpage_list) == 250

In [3]:
# Getting English translated titles from the movies
headers = {'Accept-Language': 'en-US, en;q=0.5'}

wiki_url = 'http://en.wikipedia.org/wiki/'
error_log_list = []

count = 1
for movie_url in movie_webpage_list:
    # Getting movie information from IMDB page of each movie
    movie_html = read_request(movie_url, headers=headers)
    movie_soup = BeautifulSoup(movie_html, 'html.parser')
    
    # Scraping movie title, genre and release date
    try:
        title = movie_soup.find(
            'h1', {"data-testid":"hero-title-block__title"}).get_text(strip=True)
    except:
        error_log_list.append('Obtaining the movie title failed ' + movie_url)
        title = None
    try:
        genre = movie_soup.find('div', {"data-testid":"genres"}).get_text(" ")
        genre_list = list(genre.split())
    except:
        error_log_list.append('Obtaining the movie genre failed ' + movie_url)
        genre_list = None
    try:
        release_date = movie_soup.find(
            'li',{"role":"presentation"}).find('span').get_text(" ")
    except:
        error_log_list.append('Obtaining the movie release date failed ' + movie_url)
        release_date = None
    
   
    # Accessing the synopsis page of each movie and scraping the storyline from it
    try:
        synopsis_ulr = movie_soup.find(
            'ul', {"data-testid":"storyline-plot-links"}).find_all('a')[1].get('href')
        synopsis_ulr = convert_if_relative_url(movie_url, synopsis_ulr)
        synopsis_html = read_request(synopsis_ulr)
        synopsis_soup = BeautifulSoup(synopsis_html, 'html.parser')
        synopsis = synopsis_soup.find(
            'ul', {"id":"plot-synopsis-content"}).get_text(" ", strip=True)
        assert (synopsis is not None and synopsis is not "")
    except:
        error_log_list.append('Obtaining the movie synopsis failed ' + movie_url)
        synopsis = None

    # Getting the movie plot from the Wikipedia page
    try:
        movie_url_wiki = os.path.join(wiki_url, title.replace(' ', '_'))
        # Extracting the plot summary
        wiki_plot = extract_wiki_plot(movie_url_wiki)
        assert (wiki_plot is not None and  wiki_plot is not "")
    except:
        wiki_plot = None
        error_log_list.append(title + ' ' + release_date + ' ' + movie_url_wiki)

    # Adding the scraped movie data to 'top_250_movies.csv'
    update_csv_file('top_250_movies.csv',
                    [[count, title, genre_list, release_date, synopsis, wiki_plot]],
                    cols=['rank', 'title', 'genre', 'release_date', 'imdb_synopsis', 'wiki_plot'],
                    folder_path = './data/')
    
    count += 1

top_250_movies.csv file does not exist. Creating new file!
Read failed: http://en.wikipedia.org/wiki/Léon:_The_Professional
Read failed: http://en.wikipedia.org/wiki/WALL·E
Read failed: http://en.wikipedia.org/wiki/Capharnaüm
Read failed: http://en.wikipedia.org/wiki/Amélie
Read failed: http://en.wikipedia.org/wiki/Nausicaä_of_the_Valley_of_the_Wind


### 2. Inspecting the error logs

In the cell below, we look at the cases for which scraping the movie data failed. This happened for quite a few movies - 114 in total. When scraping IMDb website, we only encountred problems in the synopsis part of the webpage (with 16 movies in total). On the other hand, there were many more problems when scraping the wiki pages (98 in total). The large number of problems with the wiki pages is due to our often incorrect initial guess for the webpage link - we simply add the movie title to the Wikipedia homepage. So we will need a little bit more effort to scrape the rest of wiki movie plots.

In [4]:
# Inspecting the error logs
print('Total number of fails: ', len(error_log_list))
print()
print('Number of imdb failed synopsis scraping: ',
      len([i for i in error_log_list if i.startswith('Obtaining ')]))
print([i for i in error_log_list if i.startswith('Obtaining ')])
print()
print('Number of wiki failed plot scraping: ',
      len([i for i in error_log_list if not i.startswith('Obtaining ')]))
print([i for i in error_log_list if not i.startswith('Obtaining ')])

Total number of fails:  114

Number of imdb failed synopsis scraping:  16
['Obtaining the movie synopsis failed https://www.imdb.com/title/tt0027977/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt8267604/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0057565/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0091251/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0012349/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0050976/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0476735/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt3011894/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0077711/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0053198/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0116231/', 'Obtaining the movie synopsis failed https://www.imdb.com/title/tt0113247/', '

### 3. Scraping the movie plots from IMDb pages for the movies that are missing synopsis

On further inspection of error_logs_list, we can see that all of the errors, encountered when scraping from the IMDb website, are for the movies that do not have a synopsis yet added to the page. So, we scrape their storyline plots instead. The storyline plots are quite shorter than the synopsis, but still contain relevant information about the movie storyline.

In [5]:
# Loading the partially completed dataset
movies_df = pd.read_csv('./data/top_250_movies.csv')

# Webscraping plot summaries from IMDb for movies without synopsis
for k in [i for i in error_log_list if i.startswith('Obtaining ')]:
    movie_url = k.split()[-1]
    movie_html = read_request(movie_url, headers=headers)
    movie_soup = BeautifulSoup(movie_html, 'html.parser')
    title = movie_soup.find(
            'h1', {"data-testid":"hero-title-block__title"}).get_text(strip=True)
    plot = movie_soup.find(
        'div', {"data-testid":"storyline-plot-summary"}
        ).find('div').find('div').find(text=True, recursive=False)
    if plot is None or "":
        print('Plot scraping failed', movie_url)
        continue
    # Adding wiki plot for the particular movie to movies_df
    add_data_to_df_field(movies_df, movies_df['title'] == title, 
                         'imdb_synopsis', plot)

Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
36,37,Modern Times,"['Comedy', 'Drama', 'Family']",1936,"Chaplin's last 'silent' film, filled with soun...",


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
76,77,Capharnaüm,['Drama'],2018,"Capernaüm (""Chaos"") tells the story of Zain (Z...",


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
78,79,High and Low,"['Crime', 'Drama', 'Mystery']",1963,A wealthy businessman is told his son has been...,


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
95,96,Come and See,"['Drama', 'Thriller', 'War']",1985,"The feature film directed by Elem Klimov, shot...","In 1943, two Belarusian boys dig in a sand-fi..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
103,104,The Kid,"['Comedy', 'Drama', 'Family']",1921,"The opening title reads: ""A comedy with a smil...",


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
157,158,The Seventh Seal,"['Drama', 'Fantasy', 'History']",1957,A Knight and his squire are home from the crus...,Disillusioned knight Antonius Block and his cy...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
174,175,My Father and My Son,"['Drama', 'Family']",2005,Sadik is one of the rebellious youth who has b...,In order to study journalism at Istanbul Unive...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
182,183,Wild Tales,"['Comedy', 'Drama', 'Thriller']",2014,"The film is divided into six segments. (1) ""Pa...",


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
210,211,Autumn Sonata,"['Drama', 'Music']",1978,After having neglected her children for many y...,"Eva (Liv Ullmann), wife of the village pastor,..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
216,217,The 400 Blows,"['Crime', 'Drama']",1959,"Seemingly in constant trouble at school, 14-ye...",Antoine Doinel is a young boy growing up in Pa...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
218,219,The Bandit,"['Crime', 'Drama', 'Thriller']",1996,The epic adventures of the legendary Baran the...,


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
221,222,La Haine,"['Crime', 'Drama']",1995,The film follows three young men and their tim...,La Haine opens with a montage of news footage ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
231,232,A Silent Voice: The Movie,"['Animation', 'Drama']",2016,"The story revolves around Shôko Nishimiya, a g...",Japanese high school student Shoya Ishida inte...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
236,237,Nausicaä of the Valley of the Wind,"['Animation', 'Adventure', 'Fantasy']",1984,An animated fantasy-adventure. Set one thousan...,


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
239,240,Raatchasan,"['Action', 'Crime', 'Mystery']",2018,"Circumstances force Arun, an aspiring film dir...",The film opens with two old men discovering a ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
247,248,Drishyam,"['Crime', 'Drama', 'Thriller']",2013,Georgekutty (Mohanlal) is a cable TV network o...,The film begins with Georgekutty - an man accu...


In [6]:
# Checking that imdb_synopsis column doesn't contain NaN values
print(movies_df['imdb_synopsis'].isnull().any())

False


### 4. Scraping the rest of Wikipedia movie plots

As we saw when inspecting the error log, we were not able to scrape the Wikipedia movie plots of 98 movies. We used a simple way of forming the movie wiki page urls by concatenating the wikipedia homepage with the name of the film.
(Example. `http://en.wikipedia.org/wiki/The_Wolf_of_Wall_Street`)
On further inspection of the movies that were unsucessfully scraped, we can observe two common patterns for their respective wiki urls - the first have `_(film)` after the movie title, while the second have `_(XXXX_film)` where `XXXX` stands for the release year of the film. An example of the first pattern is `http://en.wikipedia.org/wiki/The_Dark_Knight_(film)`, while an example for the second pattern is `http://en.wikipedia.org/wiki/Gladiator_(2000_film)`.

In [7]:
wiki_error_log_list = []
for error_log in [i for i in error_log_list if not i.startswith('Obtaining ')]:
    wiki_url = error_log.split()[-1]
    title = ' '.join(error_log.split()[0:-2])
    wiki_ulr_modified = wiki_url + '_(film)'
    try:
        # Extracting the plot summary
        wiki_plot = extract_wiki_plot(wiki_ulr_modified)
        assert (wiki_plot is not None and  wiki_plot is not "")
        # Adding wiki plot for the particular movie to movies_df
        add_data_to_df_field(movies_df, movies_df['title'] == title, 
                             'wiki_plot', wiki_plot)
    except:
        wiki_error_log_list.append(error_log)

Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
3,4,The Dark Knight,"['Action', 'Crime', 'Drama']",2008,The movie begins with a gang of men with clown...,A gang of criminals rob a Gotham City mob bank...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
17,18,One Flew Over the Cuckoo's Nest,['Drama'],1975,"In 1963 Oregon, Randle Patrick McMurphy (Nicho...","In Oregon in 1963, Randle Patrick McMurphy is ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
19,20,Se7en,"['Crime', 'Drama', 'Mystery']",1995,In an unidentified city of constant rain and u...,Soon-to-retire Detective Lieutenant William So...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
20,21,The Silence of the Lambs,"['Crime', 'Drama', 'Thriller']",1991,Promising FBI Academy student Clarice Starling...,"In 1990, Clarice Starling is pulled from her F..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
27,28,Interstellar,"['Adventure', 'Drama', 'Sci-Fi']",2014,A group of elderly people are giving interview...,"In 2067, crop blights and dust storms threaten..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
28,29,The Green Mile,"['Crime', 'Drama', 'Fantasy']",1999,The movie opens with a group of people running...,"At a Louisiana assisted-living home in 1999, e..."


Read failed: http://en.wikipedia.org/wiki/Léon:_The_Professional_(film)
Read failed: http://en.wikipedia.org/wiki/Hara-Kiri_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
36,37,Modern Times,"['Comedy', 'Drama', 'Family']",1936,"Chaplin's last 'silent' film, filled with soun...","The Tramp works on an assembly line, where he ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
47,48,Casablanca,"['Drama', 'Romance', 'War']",1942,"In the early years of World War II, December 1...","In December 1941, American expatriate Rick Bla..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
51,52,Alien,"['Horror', 'Sci-Fi']",1979,The opening credits appear in front of a large...,The commercial space tug Nostromo is returning...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
53,54,Memento,"['Mystery', 'Thriller']",2000,This is a complex story about Leonard Shelby (...,The film starts with a Polaroid photograph of ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
59,60,Sunset Blvd.,"['Drama', 'Film-Noir']",1950,The film opens with the camera tracking down S...,"At a mansion on Sunset Boulevard, a group of p..."


Read failed: http://en.wikipedia.org/wiki/WALL·E_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
61,62,The Shining,"['Drama', 'Horror']",1980,Former teacher and recovering alcoholic Jack T...,Jack Torrance takes a winter caretaker positio...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
73,74,Aliens,"['Action', 'Adventure', 'Sci-Fi']",1986,"After the opening credits, we see a spacecraft...",Ellen Ripley has been in stasis for 57 years i...


Read failed: http://en.wikipedia.org/wiki/Capharnaüm_(film)
Read failed: http://en.wikipedia.org/wiki/High_and_Low_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
79,80,American Beauty,['Drama'],1999,Lester Burnham (Kevin Spacey) is a 42-year-old...,Lester Burnham is a middle-aged magazine execu...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
82,83,Amadeus,"['Biography', 'Drama', 'History']",1984,The story begins in 1823 as the elderly Antoni...,"In the winter of 1823, Antonio Salieri is comm..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
88,89,2001: A Space Odyssey,"['Adventure', 'Sci-Fi']",1968,"To Richard Strauss ' tone poem ""Thus Spake Zar...","In the prehistoric African veldt, a tribe of h..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
92,93,Vertigo,"['Mystery', 'Romance', 'Thriller']",1958,A woman's face gives way to a kaleidoscope of ...,"After a rooftop chase, where a fellow policema..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
102,103,Lawrence of Arabia,"['Adventure', 'Biography', 'Drama']",1962,"In 1935, T. E. Lawrence (Peter O'Toole) is kil...","The film is presented in two parts, divided by..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
104,105,Dangal,"['Action', 'Biography', 'Drama']",2016,Spoiler text The story is said by the voice of...,"Mahavir Singh Phogat, a former amateur wrestle..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
107,108,A Clockwork Orange,"['Crime', 'Drama', 'Sci-Fi']",1971,"""A bit of the old ultra-violence."" The story t...","In a futuristic Britain, Alex DeLarge is the l..."


Read failed: http://en.wikipedia.org/wiki/Amélie_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
117,118,Snatch,"['Comedy', 'Crime']",2000,The film opens as we see boxing promoter Turki...,After stealing an 86-carat (17.2 g) diamond in...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
120,121,To Kill a Mockingbird,"['Crime', 'Drama']",1962,The titles appear as a young child babbles whi...,The film is narrated by the adult Jean Louise ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
128,129,Ran,"['Action', 'Drama', 'War']",1985,&#12302;&#20081;&#12303; Akira Kurosawa's trea...,"Hidetora Ichimonji, a powerful though now elde..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
130,131,Green Book,"['Biography', 'Comedy', 'Drama']",2018,"New York City, 1962 Tony ""Tony Lip"" Vallelonga...","In 1962 New York City, bouncer Tony Lip search..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
138,139,Howl's Moving Castle,"['Animation', 'Adventure', 'Family']",2004,"Over a quaint area of land, shrouded by fog, a...","Sophie, a young milliner and eldest of three s..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
140,141,The Great Escape,"['Adventure', 'Drama', 'History']",1963,"The year is 1943. During World War II, the Ger...","In late 1942, having expended enormous resourc..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
142,143,Casino,"['Crime', 'Drama']",1995,Martin Scorsese's 1995 film Casino follows the...,"In 1973, sports handicapper and Mafia associat..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
146,147,A Beautiful Mind,"['Biography', 'Drama']",2001,John Nash ( Russell Crowe ) arrives at Princet...,"In 1947, John Nash arrives at Princeton Univer..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
158,159,The Elephant Man,"['Biography', 'Drama']",1980,"In 19th Century Victorian England, Dr. Frederi...","Frederick Treves, a surgeon at the London Hosp..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
161,162,Klaus,"['Animation', 'Adventure', 'Comedy']",2019,The film begins with a letter being delivered....,"Jesper Johansson is the lazy, spoiled son of t..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
165,166,Wild Strawberries,"['Drama', 'Romance']",1957,"The movie opens when 78-year-old Isak Borg, pl...","Grouchy, stubborn, and egotistical Professor I..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
169,170,Trainspotting,['Drama'],1996,"Set in Edinburgh, the film begins with Mark Re...","Mark Renton, a 26-year-old unemployed heroin a..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
170,171,Jurassic Park,"['Action', 'Adventure', 'Sci-Fi']",1993,"The story begins on Isla Nublar, a small islan...",Industrialist John Hammond has created a theme...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
176,177,Gone with the Wind,"['Drama', 'History', 'Romance']",1939,"The film opens in Tara, a cotton plantation ow...","In 1861, on the eve of the American Civil War,..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
182,183,Wild Tales,"['Comedy', 'Drama', 'Thriller']",2014,"The film is divided into six segments. (1) ""Pa...","The film is composed of six short segments: ""P..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
193,194,In the Name of the Father,"['Biography', 'Crime', 'Drama']",1993,"Story of Gerry Conlon, purported ringleader of...",Gerry Conlon is shown in Belfast stripping lea...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
195,196,Gone Girl,"['Drama', 'Mystery', 'Thriller']",2014,Nick Dunne (Ben Affleck) is stroking the hair ...,"On their fifth wedding anniversary, writing te..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
203,204,12 Years a Slave,"['Biography', 'Drama', 'History']",2013,The movie opens with a group of slaves receivi...,Solomon Northup is a free African-American man...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
204,205,How to Train Your Dragon,"['Animation', 'Action', 'Adventure']",2010,A Viking boy called Hiccup (voice: Jay Baruche...,"The viking village of Clan Berk, located on a ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
213,214,Stand by Me,"['Adventure', 'Drama']",1986,A man ( Richard Dreyfuss ) sits in his car rea...,Writer Gordie Lachance reads in the newspaper ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
217,218,Logan,"['Action', 'Drama', 'Sci-Fi']",2017,The theatrical release of Logan was preceded b...,"In 2029, no mutants have been born in 25 years..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
220,221,Platoon,"['Drama', 'War']",1986,Chris Taylor (Charlie Sheen) is a young Americ...,"In 1967, U.S. Army volunteer Chris Taylor arri..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
222,223,Spotlight,"['Biography', 'Crime', 'Drama']",2015,The opening shot shows the text: BASED ON ACTU...,"In 1976, at a Boston Police station, two polic..."


Read failed: http://en.wikipedia.org/wiki/Gangs_of_Wasseypur_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
229,230,Andrei Rublev,"['Biography', 'Drama', 'History']",1966,Director Andrei Tarkovsky shows the beautiful ...,"Andrei Rublev is divided into eight episodes, ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
230,231,Into the Wild,"['Adventure', 'Biography', 'Drama']",2007,A young man leaves his middle class existence ...,"In April 1992, Christopher McCandless arrives ..."


Read failed: http://en.wikipedia.org/wiki/Nausicaä_of_the_Valley_of_the_Wind_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
244,245,The Princess Bride,"['Adventure', 'Family', 'Fantasy']",1987,Fairy tale story-within-a-story with an all-st...,The film is an enactment of a book that a gran...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
245,246,Sunrise,"['Drama', 'Romance']",1927,"In the summertime, described as vacation time,...",A vacationing Woman from the City (Margaret Li...


Read failed: http://en.wikipedia.org/wiki/Hera_Pheri_(film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
249,250,"Paris, Texas",['Drama'],1984,The film opens with sweeping shots of the vast...,Travis Henderson walks alone through the West ...


In [8]:
print('Number of remaining movies without wiki_plot:', len(wiki_error_log_list))

Number of remaining movies without wiki_plot: 50


In [9]:
wiki_error_log_list1 = []
for error_log in wiki_error_log_list:
    wiki_url = error_log.split()[-1]
    title = ' '.join(error_log.split()[0:-2])
    release_date = error_log.split()[-2]
    wiki_ulr_modified = wiki_url + '_(' + release_date + '_film)'
    try:
        # Extracting the plot summary
        wiki_plot = extract_wiki_plot(wiki_ulr_modified)
        assert (wiki_plot is not None and  wiki_plot is not "")
        # Adding wiki plot for the particular movie to movies_df
        add_data_to_df_field(movies_df, movies_df['title'] == title, 
                             'wiki_plot', wiki_plot)
    except:
        wiki_error_log_list1.append(error_log)

Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
4,5,12 Angry Men,"['Crime', 'Drama']",1957,"In a New York City courthouse, an eighteen-yea...",In the overheated jury room of the New York Co...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
21,22,City of God,"['Crime', 'Drama']",2002,Taking place over the course of over two decad...,The film begins in medias res with an armed ga...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
29,30,Parasite,"['Comedy', 'Drama', 'Thriller']",2019,Ki-woo Kim (Choi Woo-Shik) is a young man livi...,"The Kim family—father Ki-taek, mother Chung-so..."


Read failed: http://en.wikipedia.org/wiki/Léon:_The_Professional_(1994_film)
Read failed: http://en.wikipedia.org/wiki/Hara-Kiri_(1962_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
32,33,The Pianist,"['Biography', 'Drama', 'Music']",2002,"""The Pianist"" begins in Warsaw, Poland in Sept...","In September 1939, Władysław Szpilman, a Polis..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
37,38,Psycho,"['Horror', 'Mystery', 'Thriller']",1960,"In a Phoenix hotel room on a Friday afternoon,...","During a Friday afternoon tryst in a Phoenix, ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
42,43,Gladiator,"['Action', 'Adventure', 'Drama']",2000,"Shouting ""Roma victor!"" as his forces attack, ...","In AD 180, Hispano-Roman General Maximus Decim..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
43,44,Whiplash,"['Drama', 'Music']",2014,The films opens with Andrew Neimann ( Miles Te...,Andrew Neiman is a first-year student at the p...


Read failed: http://en.wikipedia.org/wiki/WALL·E_(2008_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
63,64,Witness for the Prosecution,"['Crime', 'Drama', 'Mystery']",1957,"A few years after War War II, in London, Leona...","Sir Wilfrid Robarts, a senior barrister, just ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
65,66,Joker,"['Crime', 'Drama', 'Thriller']",2019,"The story takes place in Gotham City, 1981. Ar...",Party clown and aspiring stand-up comedian Art...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
67,68,Oldboy,"['Action', 'Drama', 'Mystery']",2003,"The film begins in medias res, with the silhou...","In 1988, a businessman named Oh Dae-su is arre..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
74,75,Coco,"['Animation', 'Adventure', 'Drama']",2017,"In Santa Cecilia, Mexico, Imelda Rivera was th...","In Santa Cecilia, Mexico, Miguel dreams of bec..."


Read failed: http://en.wikipedia.org/wiki/Capharnaüm_(2018_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
78,79,High and Low,"['Crime', 'Drama', 'Mystery']",1963,A wealthy businessman is told his son has been...,A wealthy executive named Kingo Gondo (Toshiro...


Read failed: http://en.wikipedia.org/wiki/Pather_Panchali_(1955_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
91,92,M,"['Crime', 'Mystery', 'Thriller']",1931,It's noon. Concerned parents are lined up outs...,"In Berlin,[9] a group of children are playing ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
94,95,The Hunt,['Drama'],2012,Lucas (Mads Mikkelsen) is a member of a close-...,Lucas is a member of a close-knit Danish Hunti...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
103,104,The Kid,"['Comedy', 'Drama', 'Family']",1921,"The opening title reads: ""A comedy with a smil...",An unmarried mother leaves a charity hospital ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
106,107,The Father,['Drama'],2020,Anne (Olivia Colman) visits her father Anthony...,Anne visits her father Anthony in his flat aft...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
108,109,Metropolis,"['Drama', 'Sci-Fi']",1927,(This is the synopsis of the full 150 minute v...,"In the future, in the Million-acre city of Met..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
114,115,1917,"['Drama', 'Thriller', 'War']",2019,"On 6 April 1917, aerial reconnaissance has obs...","On 6 April 1917, aerial reconnaissance has obs..."


Read failed: http://en.wikipedia.org/wiki/Amélie_(2001_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
118,119,Scarface,"['Crime', 'Drama']",1983,"In May 1980, a Cuban man named Tony Montana ( ...","In 1980, Cuban refugee and ex-convict Tony Mon..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
122,123,Up,"['Animation', 'Adventure', 'Comedy']",2009,"Young Carl Fredricksen ( Jeremy Leary ), a qui...",Young Carl Fredricksen idolizes famous explore...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
125,126,Heat,"['Crime', 'Drama', 'Thriller']",1995,An inbound Los Angeles Blue Line train pulls i...,Neil McCauley is a professional thief based in...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
132,133,Downfall,"['Biography', 'Drama', 'History']",2004,The film starts out with a short clip from a d...,"In November 1942, at the Wolf's Lair in East P..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
139,140,The Wolf of Wall Street,"['Biography', 'Crime', 'Drama']",2013,The movie opens with a TV advertisement for St...,"In 1987, Jordan Belfort lands a job as a Wall ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
150,151,Chinatown,"['Drama', 'Mystery', 'Thriller']",1974,"Set in 1937 Los Angeles, a private investigato...","In 1937, a woman identifying herself as Evelyn..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
159,160,The Thing,"['Horror', 'Mystery', 'Sci-Fi']",1982,"In the opening shot, an alien spaceship flies ...","In Antarctica, a Norwegian helicopter pursues ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
162,163,Inside Out,"['Animation', 'Adventure', 'Comedy']",2015,Riley is a girl born in Minnesota. Five emotio...,Within the mind of a girl named Riley are the ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
172,173,Warrior,"['Action', 'Drama', 'Sport']",2011,"Paddy Conlon (Nick Nolte), exits a Pittsburgh ...","U.S. Marine Tommy Riordan visits his father, P..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
173,174,Fargo,"['Crime', 'Drama', 'Thriller']",1996,The movie opens with a car towing a new tan Ol...,"In 1987, Jerry Lundegaard, the sales manager o..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
180,181,Stalker,"['Drama', 'Sci-Fi']",1979,"A ""stalker"" is a guide who takes people throug...","In the distant future, the protagonist (Alexan..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
181,182,The General,"['Action', 'Adventure', 'Comedy']",1926,"The Western & Atlantic Flyer ""speeds into Mari...",Western & Atlantic Railroad train engineer Joh...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
187,188,Persona,"['Drama', 'Thriller']",1966,Persona begins with images of camera equipment...,A projector begins screening a series of image...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
189,190,Room,"['Drama', 'Thriller']",2015,The film begins with a young boy with really l...,"In Akron, Ohio, 24-year-old Joy Newsome and he..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
191,192,Prisoners,"['Crime', 'Drama', 'Mystery']",2013,Keller Dover (Hugh Jackman) and his son Ralph ...,"In Pennsylvania, Keller Dover, his wife Grace,..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
196,197,To Be or Not to Be,"['Comedy', 'Romance', 'War']",1942,Before the 1939 invasion of Poland by Nazi Ger...,The well-known stars of a Warsaw theater compa...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
206,207,Ben-Hur,"['Adventure', 'Drama', 'History']",1959,Judah Ben-Hur (Charlton Heston) is a wealthy m...,"In the Prologue, a baby is born in Bethlehem a..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
211,212,Network,['Drama'],1976,The following synopsis has mostly been taken f...,"Howard Beale, longtime evening newscaster for ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
218,219,The Bandit,"['Crime', 'Drama', 'Thriller']",1996,The epic adventures of the legendary Baran the...,"After serving a 35-year jail sentence, Baran (..."


Read failed: http://en.wikipedia.org/wiki/Gangs_of_Wasseypur_(2012_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
227,228,Rebecca,"['Drama', 'Mystery', 'Romance']",1940,"The film begins with a female voiceover: ""Last...",An inexperienced young woman meets aristocrati...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
228,229,Rush,"['Action', 'Biography', 'Drama']",2013,The following is based on a true story. The fi...,James Hunt and Niki Lauda are exceptional raci...


Read failed: http://en.wikipedia.org/wiki/Nausicaä_of_the_Valley_of_the_Wind_(1984_film)
Read failed: http://en.wikipedia.org/wiki/The_Battle_of_Algiers_(1966_film)


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
246,247,Hera Pheri,"['Action', 'Comedy', 'Crime']",2000,"The plot revolves around an eccentric trio, co...",The film begins with Ghanshyam Tripathi (Sunie...


In [10]:
print('Number of remaining movies without wiki_plot:', len(wiki_error_log_list1))
print(wiki_error_log_list1)

Number of remaining movies without wiki_plot: 10
['Léon: The Professional 1994 http://en.wikipedia.org/wiki/Léon:_The_Professional', 'Hara-Kiri 1962 http://en.wikipedia.org/wiki/Hara-Kiri', 'WALL·E 2008 http://en.wikipedia.org/wiki/WALL·E', 'Hamilton 2020 http://en.wikipedia.org/wiki/Hamilton', 'Capharnaüm 2018 http://en.wikipedia.org/wiki/Capharnaüm', 'Pather Panchali 1955 http://en.wikipedia.org/wiki/Pather_Panchali', 'Amélie 2001 http://en.wikipedia.org/wiki/Amélie', 'Gangs of Wasseypur 2012 http://en.wikipedia.org/wiki/Gangs_of_Wasseypur', 'Nausicaä of the Valley of the Wind 1984 http://en.wikipedia.org/wiki/Nausicaä_of_the_Valley_of_the_Wind', 'The Battle of Algiers 1966 http://en.wikipedia.org/wiki/The_Battle_of_Algiers']


We are left with 10 movies for which we still need to scrape the movie plot. For 6 of them we need to provide their complete urls as a list (see `movie_url_wiki_list`) and their titles (`movie_title_wiki_list`), because they contain special letter characters. Nevertheless, we can still apply the same function (`extract_wiki_plot`) to scrape their storylines. For the last 4 (`Hamilton`, `Pather_Panchali`, `Gangs of Wasseypur` and `The Battle of Algiers`), we need to scrape indivdually because their HTML webpages are slightly different from the rest of the movies.

In [11]:
movie_url_wiki_list = ['https://en.wikipedia.org/wiki/L%C3%A9on:_The_Professional',
                       'https://en.wikipedia.org/wiki/Harakiri_(1962_film)',
                       'https://en.wikipedia.org/wiki/WALL-E',
                       'https://en.wikipedia.org/wiki/Capernaum_(film)',
                       'https://en.wikipedia.org/wiki/Am%C3%A9lie',
                       'https://en.wikipedia.org/wiki/Nausica%C3%A4_of_the_Valley_of_the_Wind_(film)']
movie_title_wiki_list = ['Léon: The Professional', 'Hara-Kiri', 'WALL·E',
                         'Capharnaüm', 'Amélie', 'Nausicaä of the Valley of the Wind']

for i in range(len(movie_url_wiki_list)):
    movie_url = movie_url_wiki_list[i]
    title = movie_title_wiki_list[i]
    try:
        # Extracting the plot summary
        wiki_plot = extract_wiki_plot(movie_url)
        assert (wiki_plot is not None and  wiki_plot is not "")
        # Adding wiki plot for the particular movie to movies_df
        add_data_to_df_field(movies_df, movies_df['title'] == title, 'wiki_plot', wiki_plot)
    except:
        print('Plot scraping failed: ', movie_url)

Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
30,31,Léon: The Professional,"['Action', 'Crime', 'Drama']",1994,"Léon (Jean Reno) is a hitman (or ""cleaner"" as ...","Léon is an Italian-American hitman (or ""cleane..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
31,32,Hara-Kiri,"['Action', 'Drama', 'Mystery']",1962,"""Seppuku,"" or ""Harakiri"" has it is known in th...","Edo, 1630. Tsugumo Hanshirō arrives at the est..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
60,61,WALL·E,"['Animation', 'Adventure', 'Family']",2008,A Dystopia in the Future Approximately seven h...,"In the 29th century, rampant consumerism, corp..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
76,77,Capharnaüm,['Drama'],2018,"Capernaüm (""Chaos"") tells the story of Zain (Z...","Zain El Hajj, a 12-year-old from the slums of ..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
116,117,Amélie,"['Comedy', 'Romance']",2001,Amelie Poulain (Audrey Tautou) is the only chi...,Amélie Poulain is born in June 1974 and brough...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
236,237,Nausicaä of the Valley of the Wind,"['Animation', 'Adventure', 'Fantasy']",1984,An animated fantasy-adventure. Set one thousan...,One thousand years have passed since the Seven...


In [12]:
# Scraping the plot for movie `Hamilton`
title = 'Hamilton'
wiki_url = 'https://en.wikipedia.org/wiki/Hamilton_(2020_film)'
# Extracting the plot summary
movie_html_wiki = read_request(wiki_url)
movie_soup_wiki = BeautifulSoup(movie_html_wiki, 'html.parser')
# Extracting the plot summary
tag  = movie_soup_wiki.select_one('#Synopsis').find_parent('h2')
while tag.name != 'p':
    tag = tag.find_next_sibling()
wiki_plot = tag.get_text(" ", strip=True)
assert (wiki_plot is not None and  wiki_plot is not "")
# Adding wiki plot for the particular movie to movies_df
add_data_to_df_field(movies_df, movies_df['title'] == title, 'wiki_plot', wiki_plot)

# Scraping the plot for movie `Pather_Panchali`
title = 'Pather Panchali'
wiki_url = 'https://en.wikipedia.org/wiki/Pather_Panchali'
# Extracting the plot summary
movie_html_wiki = read_request(wiki_url)
movie_soup_wiki = BeautifulSoup(movie_html_wiki, 'html.parser')
# Extracting the plot summary
tag  = movie_soup_wiki.select_one('#Plot_summary').find_parent('h2').find_next_sibling()
wiki_plot = ''
while tag.name == 'p':
    wiki_plot += tag.text.replace('\n', ' ')
    tag = tag.find_next_sibling()
assert (wiki_plot is not None and  wiki_plot is not "")
# Adding wiki plot for the particular movie to movies_df
add_data_to_df_field(movies_df, movies_df['title'] == title, 'wiki_plot', wiki_plot)

# Scraping the plot for movie `The Battle of Algiers`
title = 'The Battle of Algiers'
wiki_url = 'https://en.wikipedia.org/wiki/The_Battle_of_Algiers'
# Extracting the plot summary
movie_html_wiki = read_request(wiki_url)
movie_soup_wiki = BeautifulSoup(movie_html_wiki, 'html.parser')
# Extracting the plot summary
tag  = movie_soup_wiki.select_one('#Subject').find_parent('h2').find_next_sibling()
wiki_plot = ''
while tag.name == 'p':
    wiki_plot += tag.text.replace('\n', ' ')
    tag = tag.find_next_sibling()

assert (wiki_plot is not None and  wiki_plot is not "")
# Adding wiki plot for the particular movie to movies_df
add_data_to_df_field(movies_df, movies_df['title'] == title, 'wiki_plot', wiki_plot)

# Scraping the plot for `Gangs of Wasseypur`. This movie has been
# divided into two parts because of its original length of over 5 hours
# We will scrape the plots for both parts and combine them.
wiki_plot = ''
title = 'Gangs of Wasseypur'
urls_list = ['https://en.wikipedia.org/wiki/Gangs_of_Wasseypur_%E2%80%93_Part_1', 
             'https://en.wikipedia.org/wiki/Gangs_of_Wasseypur_%E2%80%93_Part_2']
select_par_list = [{'id':'Plot'}, {'id':'Plot.5B13.5D'}]
for i in range(2):
    wiki_url = urls_list[i]
    select_par = select_par_list[i]
    # Extracting the plot summary
    movie_html_wiki = read_request(wiki_url)
    movie_soup_wiki = BeautifulSoup(movie_html_wiki, 'html.parser')
    # Extracting the plot summary
    tag  = movie_soup_wiki.find('span', select_par).find_parent('h2').find_next_sibling()
    while tag.name != 'h2':
        if tag.name == 'p':
            wiki_plot += tag.text.replace('\n', ' ')
        tag = tag.find_next_sibling()

assert (wiki_plot is not None and  wiki_plot is not "")
# Adding wiki plot for the particular movie to movies_df
add_data_to_df_field(movies_df, movies_df['title'] == title, 'wiki_plot', wiki_plot)

Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
70,71,Hamilton,"['Biography', 'Drama', 'History']",2020,Act I\nThe orphan Alexander Hamilton leaves hi...,"Divided in two acts, the musical depicts a dra..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
85,86,Pather Panchali,['Drama'],1955,(Previous content of this page was removed due...,"In Nischindipur, rural Bengal, in the 1910s, H..."


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
237,238,The Battle of Algiers,"['Drama', 'War']",1966,The opening scene is of a man who has presumab...,The Battle of Algiers reconstructs the events ...


Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
223,224,Gangs of Wasseypur,"['Action', 'Crime', 'Drama']",2012,"Prologue\nIn January 2004 , a gang of heavily ...","In January 2004, a gang of heavily armed men s..."


### 5. Inspect the final movies dataframe and save it as a .csv file

We look at our `movies_df` dataframe and inspect it for the missing values. Since there are no more movies with the missing information we go ahead and save the dataframe.

In [13]:
movies_df.head(3)

Unnamed: 0,rank,title,genre,release_date,imdb_synopsis,wiki_plot
0,1,The Shawshank Redemption,['Drama'],1994,"In 1947, Andy Dufresne ( Tim Robbins ), a bank...","In 1947 Portland, Maine, banker Andy Dufresne ..."
1,2,The Godfather,"['Crime', 'Drama']",1972,"In late summer 1945, guests are gathered for t...","In 1945 New York City, at his daughter Connie'..."
2,3,The Godfather: Part II,"['Crime', 'Drama']",1974,The Godfather Part II presents two parallel st...,The film intercuts between events some time af...


In [14]:
movies_df.isnull().any()

rank             False
title            False
genre            False
release_date     False
imdb_synopsis    False
wiki_plot        False
dtype: bool

In [17]:
# Saving the dataframe to the folder ./data/
movies_df.to_csv('./data/top_250_movies.csv', index=False)