# TV SHOWS WEB SCRAPER

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. It is about downloading structured data from the web, selecting some of that data, and passing along what you selected to another process.

The idea is to go through a website of interest and, using special python programming libraries, extract relevant data/information that can be presented as a DataFrame.


## Project Objective


This web scraping project  explores the 200 most popular TV Shows on themoviedb.org in descending order. One challenge is that we will have to parse in several pages to extract these informations as all 200 TV Shows are not on the same web page.

Below are the steps i will be taking:

1. download the web page using requests library.
2.  parse the HTML source code using BeautifulSoup.
3.  Extract show name, release date and web link.
4.  Get links and information of 9 other pages to complete 200 TV shows.

5.  Create a Dataframe and save the information as a CSV file.
6. Get info about a TV series using its web link.
7. Create a dataframe containing some TV Show's details.
8. Scrape all shows and create their csv files containing some of their info.
9. Create a folder to store all the created csv files.


I will extracting the below information for each TV Show.
- creator
- description
- genre
- viewer age suitability
- Top casts




![themoviedb.png](https://i.imgur.com/YmsoDPT.png)

I will be creating functions that will help me get the above stated information easily and then write and save the data obtained as a csv file.


First, let's prepare the environment by installing jovian, naming the project and saving the notebook. You can run this notebook by clicking the "Run" button above the notebook and choosing "Run on Binder" option.


In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

In [None]:
# Execute this to save new versions of the notebook
jovian.commit(project="web-scraper")


<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m


## Download Web Page Using Requests Library

In [None]:
### Install the requests library that will help me download the website i want to scrap.

!pip install requests --upgrade --quiet

In [None]:
import requests

In [None]:
# Define website to scrap and store the link.
shows_url = 'https://www.themoviedb.org/tv'

In [None]:
response = requests.get(shows_url)

A successful request download should have the value between 200 - 299. We can confirm this using the .status_code property of the request library as done below. Check out this link to know more about status code : https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

In [None]:
# check if the web page was successfully downloaded
response.status_code

Let's have a look at a sample of what our downloaded page looks like as well as the length.

In [None]:
page_contents = response.text

In [None]:
page_contents[:1000]

In [None]:
len(page_contents)

Let's write the content of our downloaded page into a file 'shows-data.html'. To view this file, either as a web page or HTML file, we can go to file at the top-left of this notebook and click open. 

In [None]:
# we can write page content to a file 'Shows-data.html'
with open('shows-data.html', 'w') as shows:
    shows.write(page_contents)

Now using BeautifulSoup, we can explore the page-contents, and finding our data in whichever HTML tags they might be enclosed in. Click this link to learn more about HTML tags: https://www.javatpoint.com/html-tags#:~:text=HTML%20tags%20are%20like%20keywords,tag%2C%20content%20and%20closing%20tag.&text=Every%20tag%20in%20HTML%20perform%20different%20tasks.

## Parse HTML Code Using BeautifulSoup

BeautifulSoup is a python library used for pulling data out of HTML and XML files.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names

In [None]:
### Install BeautifulSoup Python library  for pulling data out of HTML files

!pip install beautifulsoup4 --upgrade --quiet


In [None]:
from bs4 import BeautifulSoup

In [None]:
# let's create a BeautifulSoup object called 'docu' containing all the html
docu = BeautifulSoup(page_contents, 'html.parser')

Let's create a helper function 'create_doc' that will take a url as an argument and carry out all the process we followed above to return a BeautifulSoup object. 

Functions make our work easier by reducing multiple lines of code needed to get an information to just one or two lines. These functions come in handy especially when it involves repeated operations.


In [None]:
def create_doc(url):
    # download page using requests lib.
    response1 = requests.get(url)
    # check if the web page was successfully downloaded else raise an exception.
    if response1.status_code != 200:
        return "Page not successfully downloaded"
    contents = response1.text
    
    # convert contents into BeautifulSoup object
    docs = BeautifulSoup(contents, 'html.parser')
    return docs
    

Some popular tags that will really come in handy for this projects are the Div, h2, p, and a tags, along with attributes such as 'href' and 'class'. Most of the information we seek to extract will be enclosed in them.

So far, we have been able to download our web page of interest using the requests library, saving its content as a html file and then parsing using the BeautifulSoup Library in order to be able to extract required information.

## Extract Show Name, Release Date And Web Link

By right-clicking and choosing the 'inspect option on this page, we can see that a form of dialogue box pops out at the bottom that shows us the HTML codes that make up this page.

![img.png](https://i.imgur.com/7V51ubA.png)

Looking at the highlighted sections, we can see that a show's title is inside an a-tag. While the a-tag is a child of a h2-tag.


Note that there are more than 200 TV shows on this website, with 20 shows on each web page. Notice the "page_1" written after the div-tag with 'id' attribute = page_1

As stated earlier, will not be able to get all 200 TV shows from our downloaded page. To get more, we will need to click on a 'load more' button just under the page. That means me need to get 9 more links that we can download and obtain their informations. We will call them page_2, page_3, page_4 and page_5, page_6, page_7, page_8, page_9, page_10.

Before we get these other links, let's use our present 'docu' object to extract required information and then create a function that will make it easier for us to get the others.

Lets find the 'h2' and 'a' tags using the find_all property of BeautifulSoup.

In [None]:
# h2 tags
h2_tags = docu.find_all('h2')
len(h2_tags)

In [None]:
# a_tags
a_tags = docu.find_all('a')
len(a_tags)

In [None]:
# a sample of the a-tags
a_tags[2]

We can use the ".text" property to get only the important text from the a_tag above.

In [None]:
a_tags[2].text

Using a python 'for' loop, we want to extract all a-tags that are inside a h2-tag to get the Titles, just as seen in the image above.

In [None]:
# An empty list that we will append all the Titles into.
Titles = []
# iterating through the h2 tags
for h2 in h2_tags:
    #iterating through the a tags
    for title in a_tags:
        
        if title in h2:
            Titles.append(title.text)

print(Titles)


Let's confirm that there are 20 Titles on the first page.

In [None]:
len(Titles)

Before we continue, let's create and test a helper function that will take in a h2 and a-tags as argument and return the titles as output easily with just a line of code.

In [None]:
def get_titles(h2_tag,a_tag):
    # An empty list that we will append all the Titles into.
    titles = []
    # iterating through the h2 tags
    for h2 in h2_tag:
        #iterating through the a tags
        for title in a_tag:
        
            if title in h2:
                titles.append(title.text)

    return titles


In [None]:
Titles = get_titles(h2_tags,a_tags)
Titles[:5]

The link to each TV series on the page is represented by unique numbers inside the 'href' attribute of an a-tag which is in a h2-tag as seen in the image above.


see this example in the cell below. It shows the link in 'href' attribute to a particular TV series. But this link is incomplete, so we will try to fix that in a 'for' loop by adding a base url to it.

In [None]:
h2_tags[5].a['href']

In [None]:
# An empty list that we will append all the links into.
Link = []
# Define a base url
Base_url = 'https://www.themoviedb.org'

# iterating through the h2 tags
for h2 in h2_tags:
    #iterating through the a tags
    for links in a_tags:
        if links in h2:
            Link.append(Base_url+links['href'])
            
print(Link)

Lets create a helper function that will take in h2 and a-tags as argument and return a show's link

In [None]:
def get_links(h2_tag,a_tag):
    # An empty list that we will append all the links into.
    links = []
    
    # Define a base url
    base_url = 'https://www.themoviedb.org'
    
    # iterating through the h2 tags
    for h2 in h2_tag:
        #iterating through the a tags
        for lnk in a_tag:
            if lnk in h2:
                links.append(Base_url+lnk['href'])
                
    return links
    

In [None]:
Links = get_links(h2_tags,a_tags)
Links[:5]

The release date is in a p-tag. The p-tag is in a div-tag of class 'content'. So let's find the 'p' and 'div' tags as well.


![img2.png](https://i.imgur.com/keLEkjm.png)

In [None]:
p_tags = docu.find_all('p')
len(p_tags)

In [None]:
div_tags = docu.find_all('div', class_ = 'content')
len(div_tags)

In [None]:
release_dates = []
# iterating through the div_tags
for div in div_tags:
    #iterating through the p tags
    for p in p_tags:
        
        if p in div:
            release_dates.append(p.text)
print(release_dates)

We can equally create a helper function for the dates with the p and div-tags as argument.

In [None]:
def dates(div_tag,p_tag):
    rls_dates = []
    # iterating through the div_tags
    for div in div_tag:
        
        #iterating through the p tags
        for p in p_tag:
            if p in div:
                
                date = p.text
                # iterate to check if date is empty
                if not date:
                    rls_dates.append('NO DATE GIVEN')
                else:
                    rls_dates.append(date)
                    
    
    return rls_dates
    

In [None]:
release_date = dates(div_tags,p_tags)
release_date[:5]

So far, we have been able to extract the Shows' Titles, Release Dates and their individual links. Let's create a function that takes a url and returns all these 3 information using all our already defined functions.

In [None]:
def TV_Shows(url):
    
    # download page using requests lib and create a BeautifulSoup object.
    
    Docs = create_doc(url)
    
    # Define necessary tags
    
    h2_tages = Docs.find_all('h2')
    a_tages = Docs.find_all('a')
    p_tages = Docs.find_all('p')
    div_tages = Docs.find_all('div', class_ = 'content')
    
    # Get titles
    
    Title = get_titles(h2_tages,a_tages)
    
    # Get links
    Links = get_links(h2_tages,a_tages)
    
    # get release dates
    
    Released_on = dates(div_tages, p_tages)
    
    return Title, Links, Released_on

Let's test this function using our intial shows url.

In [None]:
page_1 = TV_Shows(shows_url)
page_1

Before we proceed, let's save a copy of our work to avoid losing it due to timeout.

In [None]:
jovian.commit()

## Get Links and Extract Information of 9 Other Pages

Now let's find the links that will take us to the other 9 pages containing 20 TV Shows each where we will be able to get same informations.

![img3.png](https://i.imgur.com/2AHwAME.png)

As we can see from the image above, the "Load More" link is in an a-tag of class attribute "no_click load_more".
Let's find it using .find_all.

In [None]:
load = docu.find_all('a', class_ = 'no_click load_more')
load

There are 3 items in the list, but we are only interested in the second one, i.e index 1.

In [None]:
next_page = load[1]['href']
next_page

Notice the last character in the output for the 'next_page' variable above. It is indicating the next page number. We can obtain any page number by changing that last character to whichever page number we desire and then adding a base url to it.

In [None]:
base_url = 'https://www.themoviedb.org'
base_url

Let's get the links for page 2 - 10 using a 'for loop'.

In [None]:
page_links = []
for i in range(2,11):
    base_url = 'https://www.themoviedb.org'
    # All characters of 'next_page', except the last one, is added to i which is converted to a string
    x = next_page[:-1] + str(i)
    page_links.append(base_url + x)
print(page_links)

Let's create a new list containing all the links to the 10 pages we are interested in. 

First, we will convert our initial shows_url to a list object and then add it to the 9 other page links.

In [None]:
page1_url = list(shows_url.split("/n"))
page1_url

In [None]:
# create a new list containing links to all 5 pages
pages_urls = page1_url + page_links
pages_urls

Let's now extract 200 show Titles with their release dates and links using a for loop.

In [None]:
Titles = []
Release_Dates = []
URLs = []

for url in pages_urls:
    series = TV_Shows(url)
    Titles.append(series[0])
    Release_Dates.append(series[2])
    URLs.append(series[1])
    
# print the Titles to see sample of output
print(Titles)

Observing the outputs above, we can see that it is actually a list containing 10 list items. But we want each of our outputs to be in a single list.

We can acheive this using the itertools. The itertools is a module in Python having a collection of functions that are used for handling iterators. They make iterating through the iterables like lists and strings very easy. One of such functions is the chain().

https://www.geeksforgeeks.org/python-itertools-chain/#:~:text=chain()%20function,thus%20explicitly%20converted%20into%20iterables.

In [None]:
from itertools import chain

In [None]:
Titles = list(chain(*Titles))
len(Titles), Titles[:5]

In [None]:
Release_Date = list(chain(*Release_Dates))
len(Release_Date), Release_Date[:5]

Seems like there is something wrong with the released date, because its supposed to contain 200 items, yet we are getting 202.
Lets view the first 100 to see what the problem might be.

In [None]:
Release_Date[:101]


After observing, we can see that there are some duplication of entries on indexes 66-67 and 69-70.
Not sure what could be the reason but we can remove the duplicates manually so as to continue with our work.

In [None]:
# create a new variable
Released_on = Release_Date
len(Released_on)

let's list the indexes to remove

In [None]:
indexes = [67,70]
for index in sorted(indexes, reverse=True):
    del Released_on[index]

In [None]:
len(Released_on)

Next is the shows URL

In [None]:
Show_URL = list(chain(*URLs))
len(Show_URL), Show_URL[:5]

Now we have all 200 TV Shows with their Titles, Release dates and links.

## To create a Pandas Dataframe and a CSV file for The TV Shows

Now that i have the Title, release date and links to each of the 200 most popular shows on this website, i will like to create a dataframe using pandas.

Pandas is an open-source Python library used for data analysis and manipulation. In particular, it offers data structures and operations for manipulating numerical tables and time series

First off, i will have to import the pandas library.


In [None]:
import pandas as pd

Let's create a dictionary with key titles Show Titles, Date released and show link and attach our scrapped information as value to each key respectively.

In [None]:
data = {'Show Title':Titles,
       'Date Released': Released_on,
       'Show Link': Show_URL}

In [None]:
data

In [None]:
# create a pandas dataframe
TV_shows_df = pd.DataFrame(data)

In [None]:
TV_shows_df

Pandas has a way of truncating a dataframe with large rows or columns, hence we can only view the first and last five in the dataframe. Of course there are other ways to view the whole content.

We can now write and save this dataframe as a csv file. To access this file, we can click the file button on the top-left of this page and then click open.

In [None]:
# save data as a csv file and drop the index column(numbers)
TV_shows_df.to_csv('TVShows.csv',index= None)

Let's have a  sample view of the csv file.

In [None]:
!head TVShows.csv

In [None]:
jovian.commit(files= ['TVShows.csv'])

## Get Information About A TV Series Using Its Link

As stated in the project objectives, i will be trying to obtain the following information for each TV show:

- creator
- description
- genre
- parental guidlines
- Top casts


I will be using the link to the second TV series,'The Falcon and the Winter Soldier', to obtain information about that particular series. And then use the method followed to get similar info about the others.

![image.png](https://i.imgur.com/PqKRAX4.png)

In [None]:
Series_url = Show_URL[1]
Series_url

Lets obtain a BeautifulSoup object using our previously defined function 'create_doc(url)'.

In [None]:
docu2 = create_doc(Series_url)

Let me get some important tags

In [None]:
a_tags2 = docu2.find_all('a')
len(a_tags2)

In [None]:
p_tags2 =docu2.find_all('p')
len(p_tags2)

### CREATOR...

![image.png](https://i.imgur.com/rH5MzsF.png)

As can be seen in the image above, the creator name is in an a-tag inside an li-tag with class name 'profile'.

But the a-tag is also inside a p-tag, so i will try to access the p-tag as well.

In [None]:
li_tag = docu2.find_all('li',class_ = 'profile')
li_tag

In [None]:
creator = li_tag[0].p.text
creator

But it is also possible that there can be more than one person as the show creator or that the show creator's name was not mentioned on the web page. Hence, we will put all these into consideration while writing a function for finding the show's creator(s).

We will also need to get the ol-tags with class attribute 'people no_image'

In [None]:
def showmaker(showurl):
    # create a Beautifulsoup object
    doc = create_doc(showurl)
    
    # Get li-tags and ol-tags
    ol_tags = doc.find_all('ol', class_ = 'people no_image')
    li_tags = doc.find_all('li',class_ = 'profile')
    
    # create a list for adding all creators 
    creators = []
    
    for ol in ol_tags:
        for li in li_tags:
            # Check if li tags are in ol tags
            if li in ol:
                # Add all li creator texts to the creator list
                creators.append(li.p.text)
                
    # If creator name is not given:     
    if len(creators) == 0:
        return "Creator name not Given"
    # Return list of creators
    return creators
    
    

In [None]:
showmaker(Show_URL[1])

In [None]:
showmaker(Show_URL[15])

### DESCRIPTION...

The TV series description is in the div tag of class 'overview'

In [None]:
div_tag = docu2.find('div', class_ = 'overview')
overview = div_tag.text.strip()
overview

Let's write a function for the show's description

In [None]:
def Description(showurl):
    # download Show page and create a Beautifulsoup object
    doc1 = create_doc(showurl)
   
    # Find div-tags
    div_tags = doc1.find('div', class_ = 'overview')
    overview = div_tags.text.strip()
    
    # If no overview given...
    if len(overview) == 0:
        return "No Overview Given"
    return overview

In [None]:
Description(Show_URL[12])

### GENRE...

The show's genre is in the span-tag with class name 'genres'

![image.png](https://i.imgur.com/Ov5lGLM.png)

In [None]:
genre = docu2.find_all('span', class_ = 'genres')

In [None]:
Genres = []
for g in genre:
    for a in a_tags2:
        if a in g:
            Genres.append(a.text)
print(Genres)

Let's write a function for the show's genre

In [None]:
def Genre(showurl):
    # download Show page and create a Beautifulsoup object
    doc1 = create_doc(showurl)
   
    genre = doc1.find_all('span', class_ = 'genres')
    a_tag = doc1.find_all('a')
    
    Genres = []
    for g in genre:
        for a in a_tag:
            if a in g:
                Genres.append(a.text)
    if len(Genres) == 0:
        return 'Genre not given'
    return Genres

In [None]:
Genre(Show_URL[0])

In [None]:
Genre(Show_URL[76])

### PARENTAL GUIDLINES...

The Parental Guidlines rating shows how suitable a TV series is for viewing by different age groups. Some examples of age restriction tags are TV-MA(mature, adult audiences), TV-14(unsuitable for children under 14 years of age), TV-G(generally suited for all audiences),e.t.c.

This information is stored in the span-tag with class name 'certification' in our web page's html.

In [None]:
age = docu2.find('span', class_ = 'certification').text.strip()
age

I noticed that some of the shows don't have this information available, hence i had to modify in the function to deal with such cases.

In [None]:
def viewer_suitability(showurl):
    # download Show page and create a Beautifulsoup object
    doc1 = create_doc(showurl)
    
    if doc1.find('span', class_ = 'certification') == None:
        return "Not Available"
    ages = doc1.find('span', class_ = 'certification').text.strip()
    
    return ages

In [None]:
viewer_suitability(Show_URL[14])

In [None]:
viewer_suitability(Show_URL[45])

### TOP CASTS ...

![image-2.png](https://i.imgur.com/4unFjl7.png)

The casts tag is in an li-tag of class 'card' which is also in an ol-tag as seen in the image above.

In [None]:
li_tagss = docu2.find_all('li',class_ = 'card')

In [None]:
ol_tagss = docu2.find_all('ol', class_ = 'people scroller')

In [None]:
cast = []
    
for ol in ol_tagss:
    for li in li_tagss:
        # Check if li tags are in ol tags
        if li in ol:
            # Add all li casts texts to the creator list
            cast.append(li.p.text)
print(cast)           

In [None]:
def Series_cast(showurl):
   # download Show page and create a Beautifulsoup object
    doc1 = create_doc(showurl)
   
    # Find all ol and li tags
    ol_tags = doc1.find_all('ol', class_ = 'people scroller')
    li_tags = doc1.find_all('li',class_ = 'card')
    
    # create a list to add all casts
    casts = []
    
    for ol in ol_tags:
        for li in li_tags:
            # Check if li tags are in ol tags
            if li in ol:
                # Add all li cast texts to the casts list
                casts.append(li.p.text)
                
    # If casts names are not given:     
    if len(casts) == 0:
        return "Casts name not Given"
    # Return list of casts
    return casts

In [None]:
Series_cast(Show_URL[4])

So far we have been able to extract details such as the a show's creator(s), an overview of the show, the genre type, age restriction guidlines as well as the top casts. We have also been able to create functions that make it easy to get these infos without writing long lines of code.

## A Show's Information DataFrame

Let's create a pandas dataframe for the second Series with informations such as:
- Series Creator
- Genre
- Description
- Parental Guidlines
- Top Casts

In [None]:
Show_Creators = showmaker(Show_URL[1])
Show_Creators

In [None]:
Overview = Description(Show_URL[1])
Overview

In [None]:
Genres = Genre(Show_URL[1]) 
Genres

In [None]:
Age_restriction = viewer_suitability(Show_URL[1])
Age_restriction

In [None]:
Casts = Series_cast(Show_URL[1])
print(Casts)

Let's use lists in dictionary to create a dataframe for the first show.


Firstly, we will create a dictionary that consists of the column names as keys and the informations we just generated as values.

In [None]:
column ={'Series Creator':Show_Creators, 'Genre':Genres, 'Overview':Overview, 'Age Suitability':Age_restriction, ' Top Casts':Casts}

In [None]:
# To display the full contents of all columns
pd.set_option('max_colwidth', None)

In [None]:
falcon_wintersoldier_df = pd.DataFrame(list(column.items()),columns = ['Item','Details'])
falcon_wintersoldier_df

Let's create a function that can take any of the Show's URL and return a dataframe containing the above informations

In [None]:
def shows_df(url):
    
    # Extract all required info using the created functions
    Creator = showmaker(url)
    Genres = Genre(url)
    Overview = Description(url)
    Age_restr = viewer_suitability(url)
    Casts = Series_cast(url)
    
    # To display the full contents of all columns without truncating.
    pd.set_option('max_colwidth', None)
    
    # Create a dataframe
    column ={'Series Creator':Creator,
             'Genre':Genres, 'Overview':Overview, 'Age Suitability':Age_restr, ' Top Casts':Casts}
    return pd.DataFrame(list(column.items()),columns = ['Item','Details'])
    


In [None]:
shows_df(Show_URL[12])

In [None]:
Titles[12]

## Scrape Shows And Create Their CSV File

To do this, we will first of all create a function 'create-show-csv' that takes a show's url and title as argument and returns it csv file.

At this point, it will be wise to import the os module to help us navigate around files in the system. This module provides a portable way of using operating system dependent functionality. Click the link to learn more.
https://docs.python.org/3/library/os.html

In [None]:
import os

In [None]:
def create_show_csv(url,title):
    # name of the csv file
    filename = title + '.csv'
    
    # check if file already exists
    if os.path.exists(filename):
        print('The File {} already exists, skipping...'.format(filename))
        return
    # create a csv file
    dataframe = shows_df(url)
    dataframe.to_csv(filename, index = None)
    

Now we will create a function 'tvshows_csv' that will be able to iterate through all 200 shows and create a csv file for each one.

In [None]:
def tvshows_csv():
    print("Scraping top 200 TV shows")
    
    df = TV_shows_df
    # iterate through our TV_Shows dataframe along rows with column name show link and show title
    for index, rows in df.iterrows():
        print('Scraping {}...'.format(rows['Show Title']))
        create_show_csv(rows['Show Link'], rows['Show Title'])

Let's call our function and see how it goes..

In [None]:
tvshows_csv()

Lets see an example of one of the csv

![Am.dad](https://i.imgur.com/zeVc8J0.png)

We have successfully scraped all 200 TV Shows, obtained some information about each of them and stored them as a csv file using their names.

We can view and even download these files by clicking file-open.

## Create Directory To Store CSV Files

Let us create a directory/folder where we can store all of these csv files we just created. We will use the os.makedirs() method to do this.

In [None]:
# folder/directory name
directory = "TVShows"

In [None]:
# parent directory path
par_dir = "D:/Web scrapping project"

Joining both parent and 'child directories'...

In [None]:
path = os.path.join(directory, par_dir)

In [None]:
os.makedirs(path) 
print("Directory '% s' created" % directory) 

Now that our folder's been created, we can mark all our csv files and move them into it. Just like in the image below.

![csvs](https://i.imgur.com/ljzcslF.png)

## Summary

We have been able to successfully extract some important information on the website https://www.themoviedb.org/tv using web srapping techniques.

- We were able to download the web page using requests library and then parse the HTML codes in the downloaded page using the Python BeautifulSoup library.

- We used our knowledge of HTML tags to extract some information such as the Show's name, the date it was first released as well as the show's web link.

- Because we were interested in getting the top 100 popular shows and realized all 100 were not on our downloaded page, we had to find the link to 4 other pages of the same website (each page contained 20 shows maximum) and download same information as the first page.

- We then used the information gotten to create a Pandas dataframe and also saved it as a csv file.

- We went further to get informations about all the Shows. The informations include the show's creator, a brief description, the genre, The parental guidlines and top casts. We created a dataframe as well as csv files for these.
- Finally, we saved all the generated csv files into a created directory and called it 'TVShows'.

Note that in all of this, we were able to write functions that helped us extract these information automatically without having to write long codes for each TV Show.

One notable challenge i faced was the duplication of some outputs on the 'Released Date' information. Luckily, we didn't need the released date info to get any other data. Although we had to solve the problem manually, there is need for a programmatic solution in case of larger datasets.

For future work, perhaps we could have a function that will take a show's title as argument and then return all interesting information about that show.

### References
- https://stackoverflow.com/questions/44958587/python-beautifulsoup-get-tag-within-a-tag
- https://www.geeksforgeeks.org/create-a-pandas-dataframe-from-lists/
- https://towardsdatascience.com/how-to-show-all-columns-rows-of-a-pandas-dataframe-c49d4507fcf
- https://jovian.ai/learn/zero-to-data-analyst-bootcamp/assignment/project-1-web-scraping-with-python
- https://www.geeksforgeeks.org/python-itertools-chain/#:~:text=chain()%20function,thus%20explicitly%20converted%20into%20iterables.
- https://stackoverflow.com/questions/716477/join-list-of-lists-in-python
- https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigating-using-tag-names
- https://www.geeksforgeeks.org/create-a-directory-in-python/



In [None]:
jovian.commit()