# Scraping IMDb for Top 250 TV Shows using Python 

**Web Scraping** is process of collecting information from a website in an automated manner using code. This information is collected and then exported into a format that is more useful for the user. I will be using the libraries [Requests](https://docs.python-requests.org/en/master/index.html) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) in Python to scrape data from the website.

[IMDb](https://www.imdb.com/) is the world's most popular website that contains information about TV Shows, movies, video games, and streaming content online – including cast, production crew, plot summaries, trivia, ratings, fan and critical reviews and a lot more. It is designed to help fans explore the world of movies and shows and decide what to watch.

The page https://www.imdb.com/chart/toptv/ provides a list of the top 250 TV Shows as rated by the IMDb users. In this project I will retrive information from this page using _web scraping_. This is how the IMDb site looks: ![](https://i.imgur.com/JYEtFsw.png) 

Here is the outline of the steps that I am planning to follow:
1. Import the required libraries and download the website using `requests`
2. Parse the HTML source code using `BeautifulSoup`
3. Extract the top 250 TV Shows rank, title, release year, IMDb ratings, genre, runtime, and plot summary from the website
4. Complie extracted information into Python dictionaries and `Pandas` DataFrame
5. Save the extracted information to a CSV file. 

By the end of the project I will create a CSV file in the following format:

```
Rank, Title, Release Year, IMDb Ratings, Genre, Runtime, Plot Summary
1, Planet Earth II, 2016, 9.5, Documentary, 4h 58min, Wildlife documentary series with David Attenborough, beginning with a look at the remote islands which offer sanctuary to some of the planet's rarest creatures, to the beauty of cities, which are home to humans, and animals..
2, Planet Earth, 2006, 9.4, Documentary, 8h 58min, Emmy Award-winning, 11 episodes, five years in the making, the most expensive nature documentary series ever commissioned by the BBC, and the first to be filmed in high definition.	
....
```


## Download the webpage using `requests`


These are the libraries that I will be using:
- **[Requests Library](https://docs.python-requests.org/en/master/index.html)** to use built-in methods for making HTTP requests
- **[BeautifulSoup Library](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)** to pull data out of HTML File 
- **[Pandas Library](https://pandas.pydata.org/docs/)** for Data Manipulation

The libraries can be installed using `pip`.

In [3]:
# Installing the libraries
!pip install jovian requests beautifulsoup4 pandas --upgrade --quiet

In [2]:
# Importing the libraries
import jovian 
import requests 
from bs4 import BeautifulSoup
import pandas as pd

Assigning the url of the web page to the variable *data_set* and downloading its contents using the `requests.get` function 

In [16]:
imdb_url = 'https://www.imdb.com/chart/toptv/'

In [17]:
response = requests.get(imdb_url)

`requests.get` returns a response object which has the contents of the web page and some other information.

The `status_code` property can be used to check if the request was successful or not. A successful request returns an [HTTP status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status) between 200 and 299.

In [18]:
response.status_code

200

The request was successful

I am using the `.text` property of the `response` to access the contents of the web page. 

In [19]:
page_contents = response.text

*page_contents* has the source code of the web page which is in HTML. This is what the first 200 characters look like

In [20]:
page_contents[:200]

'\n\n\n<!DOCTYPE html>\n<html\n    xmlns:og="http://ogp.me/ns#"\n    xmlns:fb="http://www.facebook.com/2008/fbml">\n    <head>\n         \n\n        <meta charset="utf-8">\n        <meta http-equiv="X-UA-Compatib'

## Parse the HTML source code using `BeautifulSoup`

Creating a `BeautifulSoup` object named *doc* to parse the contents in *page_contents*

In [21]:
doc = BeautifulSoup(page_contents, 'html.parser')

We can check the title of the webpage using `.title`

In [22]:
doc.title

<title>IMDb Top 250 TV - IMDb</title>

I am defining a `get_page` function that takes the URL of the web page, calls requests, checks the status code, gets the page contents and returns doc. 

In [23]:
def get_page(url):
    """Download a web page and return a beautiful soup doc"""
    # Get the HTML page content using requests
    response = requests.get(url)
        
   # Check successful response
    if not response.ok:
        print('Status code:', response.status_code)
        raise Exception('Failed to fetch web page ' + url)
        
    page_contents = response.text
    
    # Construct a beautiful soup document
    doc = BeautifulSoup(response.text, 'html.parser')

    return doc

In [24]:
doc2 = get_page('https://www.imdb.com/chart/toptv/')

In [25]:
doc2.title

<title>IMDb Top 250 TV - IMDb</title>

The function *get_page* returns doc and works successfully.

We can now use the function `get_function` to download any web page and parse it using Beautiful Soup. 

## Extract the top 250 TV Shows 

### Finding Ranks of the TV Shows

In web scraping we try to get information out of the HTML code by extracting the required [HTML tags](https://www.w3schools.com/TAGS/default.ASP). We use the inspect option in the web page to get to know the information contained in the HTML tags. 

For example in the highlighted part of the image below, `td` tag of class `titleColumn` contains the rank ("1.") and a few other tags. Any web page can be inspected by just right-clicking on the web page and clicking on inspect. 
![](https://i.imgur.com/M8ImBLx.png)

I will first try to just extract the rank of the show. Below is the function to do so:

In [26]:
def get_ranks(doc):
    '''Function to get the rank of all the TV Shows'''
    # Get the list of all the td tags of class 'titleColumn'
    tv_show_rank_tags = doc.select('td.titleColumn')
    
    # Iterate over individual tags to extract rank and append all the results to tv_show_rank 
    tv_show_rank= []
    
    for tv_show_rank_tag in tv_show_rank_tags:
        # Since the for loop returns the rank along with some other tags I will be using the index of the list  
        if tv_show_rank_tag.text.strip()[1] == '.':
            tv_show_rank.append(tv_show_rank_tag.text.strip()[0:1])
        elif tv_show_rank_tag.text.strip()[2] == '.':
            tv_show_rank.append(tv_show_rank_tag.text.strip()[0:2])
        else:
            tv_show_rank.append(tv_show_rank_tag.text.strip()[0:3])
    
    return tv_show_rank

The function `get_ranks` takes a BeautifulSoup object as an input and returns the rank of the TV Shows listed in the website. Below is a sample output:

In [27]:
ranks = get_ranks(doc)

In [28]:
len(ranks)

250

The length of *ranks* is 250 which shows that our output is correct. Here is the first ten ranks of the TV Shows:

In [29]:
ranks[:10]

['1', '2', '3', '4', '5', '6', '7', '8', '9', '10']

### Finding Titles of the TV Shows 
The TV Shows titles are contained in the `a href` tag with a parent `td` tag of class `titleColumn` as shown in the image below: ![](https://i.imgur.com/57gOFqK.png)

Below is the function to extract the titles:

In [31]:
def get_title(doc):
    '''Function to get the title of all the TV Shows'''
    # Get a[href] attribute from td tags of class 'titleColumn' for Title
    tv_show_title_tags = doc.select('td.titleColumn a[href]')
    
    # Iterate over individual tags to extract title and append all the results to tv_show_titles
    tv_show_titles = []
    for tv_show_title_tag in tv_show_title_tags:
        tv_show_titles.append(tv_show_title_tag.text)

    return tv_show_titles

The function `get_title` takes a Beautiful Soup object as an input and returns all the titles of the TV Shows mentioned in the website. Here is a sample output:

In [32]:
titles = get_title(doc)

In [33]:
len(titles)

250

The length of *titles* is 250 which shows that our output is correct. Below is the first ten titles of the TV Shows:

In [34]:
titles[:10]

['Planet Earth II',
 'Planet Earth',
 'Breaking Bad',
 'Band of Brothers',
 'Chernobyl',
 'The Wire',
 'Blue Planet II',
 'Our Planet',
 'Cosmos: A Spacetime Odyssey',
 'Avatar: The Last Airbender']

### Finding the Release Year of the TV Shows 
The release year of the TV Shows are contained in `span` tag with a parent tag `td` of class `titleColumn` as shown in the image below: 
![](https://i.imgur.com/yCJuoRd.png)
Below is the function to get the release years:

In [35]:
def get_release_years(doc):
    '''Function to get the release year of all the TV Shows'''
    # Get span attribute from td tags of class 'titleColumn' for Release Years
    tv_show_release_year_tags = doc.select('td.titleColumn span')
    
    # Iterate over individual tags to extract release year and append all the results to tv_show_release_year
    tv_show_release_years = []
    for tv_show_release_year_tag in tv_show_release_year_tags:
        tv_show_release_years.append(tv_show_release_year_tag.text[1:-1])

    return tv_show_release_years

The function `get_release_years` takes a Beautiful Soup object as an input and returns all the release years of the TV Shows mentioned in the website. 

In [36]:
release_years = get_release_years(doc)

In [37]:
len(release_years)

250

The length of *release_years* is 250 which shows that our output is correct. Below is the first ten release years of the TV Shows:

In [38]:
release_years[:10]

['2016',
 '2006',
 '2008',
 '2001',
 '2019',
 '2002',
 '2017',
 '2019',
 '2014',
 '2005']

### Finding IMDB Ratings of the TV Shows 
The IMDb rating is contained in the `td` tag of class `ratingColumn imdbRating`. We can see it in the image below:
![](https://i.imgur.com/mOdsF44.png)

Here is a function to get all the IMDb ratings of the TV Shows:

In [40]:
def get_ratings(doc):
    '''Function to get the IMDB ratings of all the TV Shows'''
    # Get td tags of class 'ratingColumn imdbRating' for Ratings     
    tv_show_rating_tags = doc.find_all('td', class_ = 'ratingColumn imdbRating')
    
    # Iterate over individual tags to extract ratings and append all the results to tv_show_rating    
    tv_show_rating = []
    for tv_show_rating_tag in tv_show_rating_tags:
        tv_show_rating.append(tv_show_rating_tag.text.strip())
        
    return tv_show_rating

The function `get_ratings` takes a Beautiful Soup object as an input and returns all the IMDb ratings of the TV Shows mentioned in the website. Here is a sample output:

In [41]:
ratings = get_ratings(doc)

In [42]:
len(ratings)

250

The length of *ratings* is 250 which shows that our output is correct. Below is the first ten IMDb ratings of the TV Shows:

In [43]:
ratings[:10]

['9.5', '9.4', '9.4', '9.4', '9.3', '9.3', '9.3', '9.2', '9.2', '9.2']

I have extracted all the required information from the homepage. Now to get further information on Genres, Runtimes, and Plot Summaries we will have to go to the individual website of the TV Shows and extract information from there. This is how one of the individual TV Show website looks:

![](https://i.imgur.com/WrTkjd8.png)

First lets get the URLS of individual TV Shows. We can see that the individual link of the TV Show is contained in the *a href* attribute of `td` tag of class `titleColumn`: 
![](https://i.imgur.com/57gOFqK.png)

Here is function to get all the individual websites:

In [44]:
def get_tv_show_url(doc):
    """Function to get individual web sites of all the TV Shows"""

    # Get a[href] attribute from td tags of class 'titleColumn'         
    tv_show_href_tags = doc.select('td.titleColumn a[href]')
    
    # Iterate over individual tags to extract a[href] attribute and append all the results to href_list
    href_list = []
    for link in tv_show_href_tags:
        a = (link['href'])
        href_list.append(a)
    
    # Add the base url(https://www.imdb.com) to the href_list to get the indivdual websites
    tv_show_urls = []
    base_url = 'https://www.imdb.com'
    for href in href_list:
        url = base_url + href
        tv_show_urls.append(url)
        
    return tv_show_urls

The function `get_tv_show_url` takes a Beautiful Soup object as an input and returns all the individual websites of the TV Shows mentioned. 

In [45]:
tv_show_urls = get_tv_show_url(doc)

In [46]:
len(tv_show_urls)

250

The variable *tv_show_urls* has all individual TV Show URLS. It has the length 250 which confirms that my function is working. Here is the links of the first 5 TV Shows:

In [47]:
tv_show_urls[:5]

['https://www.imdb.com/title/tt5491994/',
 'https://www.imdb.com/title/tt0795176/',
 'https://www.imdb.com/title/tt0903747/',
 'https://www.imdb.com/title/tt0185906/',
 'https://www.imdb.com/title/tt7366338/']

I can use the `get_page` function defined earlier to get the Beautiful Soup object by passing the url of the TV Shows.

I will be using the object to get further information in the individual site.

### Finding Genres of the TV Shows 
The Genres is contained in the `span` tag of class `ipc-chip__text` having a parent `div` tag of class `GenresAndPlot__ContentParent-cum89p-8 bFvaWW Hero__GenresAndPlotContainer-kvkd64-11 twqaW` or in some cases of class `GenresAndPlot__OffsetContentParent-cum89p-9 dUAPpa Hero__GenresAndPlotContainer-kvkd64-11 twqaW` as shown in the picture below:
![](https://i.imgur.com/HLyMVg7.png)

![](https://i.imgur.com/mPBmTb5.png)

Here is a function to do so:

In [49]:
def get_genres(doc):
    # Get div tags of class 'GenresAndPlot__ContentParent-cum89p-8 bFvaWW Hero__GenresAndPlotContainer-kvkd64-11 twqaW' and 'GenresAndPlot__OffsetContentParent-cum89p-9 dUAPpa Hero__GenresAndPlotContainer-kvkd64-11 twqaW'
    div_tags_genres = doc.find_all("div", {"class":["GenresAndPlot__ContentParent-cum89p-8 bFvaWW Hero__GenresAndPlotContainer-kvkd64-11 twqaW",
    "GenresAndPlot__OffsetContentParent-cum89p-9 dUAPpa Hero__GenresAndPlotContainer-kvkd64-11 twqaW"]})
        
    # Get span tags of class'chip__text' from the div tags 
    span_tags_genres = []
    for div_tags_genre in div_tags_genres:
        span_tags_genres.append(div_tags_genre.find_all('span', class_ = 'ipc-chip__text'))
    
    # Iterate over individual tags to extract .text of span tag and append all the results to show_genre    
    show_genre = []
    for span_tags_genre in span_tags_genres[0]:
        show_genre.append(span_tags_genre.text)
    tv_show_genres = '|'.join(show_genre)
    
    return tv_show_genres

In [50]:
ex_web_page_url = tv_show_urls[2]

In [51]:
ex_web_page_url

'https://www.imdb.com/title/tt0903747/'

Storing a sample TV Show website in *ex_web_page_url* to test out my function

In [52]:
ex_web_page_doc = get_page(ex_web_page_url)

Calling `get_page` function with *ex_web_page_url* as input url to get a Beautiful Soup object of the website and storing it in *ex_web_page_doc*

In [54]:
get_genres(ex_web_page_doc)

'Crime|Drama|Thriller'

The function `get_genres` works accurately and returns the Genre of a single TV Show. 

### Finding Runtime of the TV Shows 
The Runtime of the TV Show is contained in the `li` tag of parent `ul` tag of class `ipc-inline-list ipc-inline-list--show-dividers TitleBlockMetaData__MetaDataList-sc-12ein40-0 dxizHm baseAlt` as show in the picture below:
![](https://i.imgur.com/pv39LIm.png)

Here is a function to get the Runtime of the TV Show:

In [55]:
def get_runtime(doc):
    # Get span tags for Runtime
    ul_tags = doc.find_all('ul', class_ = 'ipc-inline-list ipc-inline-list--show-dividers TitleBlockMetaData__MetaDataList-sc-12ein40-0 dxizHm baseAlt')
    
    # Iterate over single ul tags to get the child li tag and append all of them to the li_tags_run_time variable
    li_tags_run_time = []
    for ul_tag in ul_tags:
        for li in ul_tag.findAll('li'):
            li_tags_run_time.append(li)

    # Removing the Runtime which is present in the last li tag 
    tv_show_run_time = li_tags_run_time[-1].text
    return tv_show_run_time

Let's test out the function `get_runtime` by passing our `ex_web_page_doc` from before:

In [56]:
get_runtime(ex_web_page_doc)

'49min'

The function `get_runtime` works accurately and returns the Runtime of a single TV Show. 

### Finding Plot Summary of the TV Shows 
The Plot Summary is contained in the `div` tag of class `ipc-html-content ipc-html-content--base` as shown below:
![](https://i.imgur.com/r9Wz1x3.png)

The function to extract the Plot Summary of TV Shows is given below:

In [63]:
def get_plot_summary(doc):
    # Get div tags for Plot Summary 
    div_tags_summary = doc.find_all('div', class_ = 'ipc-html-content ipc-html-content--base')
    
    tv_show_summary = div_tags_summary[0].text
    return tv_show_summary

In [64]:
get_plot_summary(ex_web_page_doc)

'When chemistry teacher Walter White is diagnosed with Stage III cancer and given only two years to live, he decides he has nothing to lose. He lives with his teenage son, who has cerebral palsy, and his wife, in New Mexico. Determined to ensure that his family will have a secure future, Walt embarks on a career of drugs and crime. He proves to be remarkably proficient in this new world as he begins manufacturing and selling methamphetamine with one of his former students. The series tracks the impacts of a fatal diagnosis on a regular, hard working man, and explores how a fatal diagnosis affects his morality and transforms him into a major player of the drug trade. —WellardRockard, jackenyon'

The function `get_plot_summary` works accurately when `ex_web_page_doc` is passed. It successfully returns the Plot Summary of a single TV Show.

I have combined all the above functions for Genre, Runtime, and Plot Summary to a single function to return all the extra information of the TV Show from individual pages

In [65]:
def get_tv_show_info(doc):
    return get_genres(doc), get_runtime(doc), get_plot_summary(doc)

In [66]:
get_tv_show_info(ex_web_page_doc)

('Crime|Drama|Thriller',
 '49min',
 'When chemistry teacher Walter White is diagnosed with Stage III cancer and given only two years to live, he decides he has nothing to lose. He lives with his teenage son, who has cerebral palsy, and his wife, in New Mexico. Determined to ensure that his family will have a secure future, Walt embarks on a career of drugs and crime. He proves to be remarkably proficient in this new world as he begins manufacturing and selling methamphetamine with one of his former students. The series tracks the impacts of a fatal diagnosis on a regular, hard working man, and explores how a fatal diagnosis affects his morality and transforms him into a major player of the drug trade. —WellardRockard, jackenyon')

The function `get_tv_show_info` works correctly and returns the Genre, Runtime, and Plot Summary of a single TV Show after taking a Beautiful Soup object. 

## Complie extracted information into Python dictionaries and `Pandas` DataFrame

Now I will be storing all the extracted information in one single `Pandas` DataFrame for easier data manipulation. First I will store the list of Ranks, Titles, Release Years, and Ratings collected from the homepage. 

Here is the first function to return TV Show Rank, Title, Release Year, Ratings in a single `Pandas` DataFrame:

In [69]:
def get_tv_show_df (doc):
    # Calling the required functions to get rank, title, release year, and IMDb ratings
    rank = get_ranks(doc)
    title = get_title(doc)
    year = get_release_years(doc)
    rating = get_ratings(doc) 
    
    # Defining a dictionary to store TV Show Information
    tv_show_dict = {
        'Rank' : rank,
        'Title' : title,
        'Release Year' : year,
        'IMDb Rating' : rating
    }
    # Converting TV Show dictionary to a Pandas Dataframe
    tv_show_df = pd.DataFrame(tv_show_dict)
    return tv_show_df

In [70]:
get_tv_show_df(doc)

Unnamed: 0,Rank,Title,Release Year,IMDb Rating
0,1,Planet Earth II,2016,9.5
1,2,Planet Earth,2006,9.4
2,3,Breaking Bad,2008,9.4
3,4,Band of Brothers,2001,9.4
4,5,Chernobyl,2019,9.3
...,...,...,...,...
245,246,Rurouni Kenshin,1996,8.4
246,247,Normal People,2020,8.4
247,248,House of Cards,1990,8.4
248,249,Anne of Green Gables,1985,8.4


The function `tv_show_df` takes a Beautiful Soup object and returns a `pandas` dataframe containing information about the TV Show Rank, Title, Release Year, and IMDb Rating. As shown above, the dataframe has 250 rows and 4 columns of data which confirms my function is working accurately. 

Now, I will write a function to store the Genre, Runtime, and Summary of TV Shows in a single `Pandas` DataFrame. 

Since my function `get_tv_show_info` just returns the required information about a single TV Show, I will be passing the *tv_show_urls* variable iteratively in the `get_page` function to get the list of Beautiful Soup objects for all the TV Show URLS. This object will then be passed in `get_tv_show_info` function and the information will be extracted and appended accordingly. 

Below is the function to return TV Show Genre, Runtime, and Summary in a single `Pandas` DataFrame 

In [72]:
def get_extra_tv_show_info_df(tv_show_urls):
    # Defining an empty dictionary to store extra TV Show Information
    extra_info = { 
        'Genre' : [],
        'Runtime' : [], 
        'Summary' : []
    }
    
    # Iterate over all the tv show urls 
    for i in range(len(tv_show_urls)):
        # Call get_page and get_tv_show_info function iteratively
        show_info = get_tv_show_info(get_page(tv_show_urls[i]))
        # Append Genre, Runtime, and Summary from show_info to the dictionary
        extra_info['Genre'].append(show_info[0])
        extra_info['Runtime'].append(show_info[1])
        extra_info['Summary'].append(show_info[2])
    extra_info_df = pd.DataFrame(extra_info)
    return extra_info_df

### A Single Function to Scrape IMDb for TV Shows 

In [73]:
def scrape_imdb_tv_shows():
    '''Get the top 250 TV Shows from IMDb'''
    
    # Calling the tv_show_df and get_extra_show_info_df and storing the information into variables 
    tv_shows_df = get_tv_show_df (doc) 
    extra_info_df = get_extra_tv_show_info_df(tv_show_urls)
    # Joining the two dataframes into one single DataFrame
    result = pd.concat([tv_shows_df, extra_info_df], axis = 1, join = 'inner')
    return result

In [74]:
result = scrape_imdb_tv_shows()

In [75]:
result 

Unnamed: 0,Rank,Title,Release Year,IMDb Rating,Genre,Runtime,Summary
0,1,Planet Earth II,2016,9.5,Documentary,4h 58min,Wildlife documentary series with David Attenbo...
1,2,Planet Earth,2006,9.4,Documentary,8h 58min,Each 50 minute episode features a global overv...
2,3,Breaking Bad,2008,9.4,Crime|Drama|Thriller,49min,When chemistry teacher Walter White is diagnos...
3,4,Band of Brothers,2001,9.4,Action|Drama|History,9h 54min,"This is the story of ""E"" Easy Company, 506th R..."
4,5,Chernobyl,2019,9.3,Drama|History|Thriller,5h 30min,"In April 1986, a huge explosion erupted at the..."
...,...,...,...,...,...,...,...
245,246,Rurouni Kenshin,1996,8.4,Animation|Action|Adventure,24min,"A man slayer, Kenshin Himura, who played a maj..."
246,247,Normal People,2020,8.4,Drama|Romance,5h 39min,Marianne and Connell's time at secondary schoo...
247,248,House of Cards,1990,8.4,Drama,3h 46min,Francis Urquhart is the Chief Whip of the Cons...
248,249,Anne of Green Gables,1985,8.4,Drama|Family,3h 19min,At the turn of the century on Prince Edward Is...


Calling `scrape_imdb_tv_shows` function returns the top 250 TV Shows from IMDb in a single `Pandas` DataFrame. The variable *result* has the output. 

## Save the extracted information to a CSV file

In [76]:
result.to_csv('TOP_250_TV_SHOW_INFO', index = False)

The file *TOP_250_TV_SHOW_INFO* contains all the extracted information in CSV format.

## Summary
Here is a quick outline what I did in this project:
1. Import the required libraries and download the website using `requests`
2. Parse the HTML source code using `BeautifulSoup`
3. Extract the top 250 TV Shows rank, title, release year, IMDb ratings, genre, runtime, and plot summary from the website
4. Complie extracted information into Python dictionaries and `Pandas` DataFrame
5. Save the extracted information to a CSV file. 

The CSV file *TOP_250_TV_SHOW_INFO* created in the project is in the following format:

```
Rank, Title, Release Year, IMDb Ratings, Genre, Runtime, Plot Summary
1, Planet Earth II, 2016, 9.5, Documentary, 4h 58min, Wildlife documentary series with David Attenborough, beginning with a look at the remote islands which offer sanctuary to some of the planet's rarest creatures, to the beauty of cities, which are home to humans, and animals..
2, Planet Earth, 2006, 9.4, Documentary, 8h 58min, Emmy Award-winning, 11 episodes, five years in the making, the most expensive nature documentary series ever commissioned by the BBC, and the first to be filmed in high definition.	
....
```

## Future Work 
The dataset from this project can be used for further data analysis and visualization. 

## References
1. Requests Documentation: https://docs.python-requests.org/en/master/index.html
2. Beautiful Soup Documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
3. Pandas Documentation: https://pandas.pydata.org/docs/
4. Web Scraping tutorial: https://www.tutorialspoint.com/python_web_scraping/index.htm
 

In [None]:
jovian.commit(output=['TOP_250_TV_SHOW_INFO.csv'])

<IPython.core.display.Javascript object>