Question 5 -
Write a program to download the data from the given API link and then extract the following data with
proper formatting

Link - http://api.tvmaze.com/singlesearch/shows?q=westworld&embed=episodes

Note - Write proper code comments wherever needed for the code understanding

Excepted Output Data Attributes -

- ● id - int url - string
- ● name - string season
- ● - int number - int
- ● type - string airdate -
- ● date format airtime -
- ● 12-hour time format
- ● runtime - float
- ● average rating - float
- ● summary - string
- ● without html tags
- ● medium image link - string
- ● Original image link - string

In [1]:
import requests
import json
from bs4 import BeautifulSoup

def download_data(url):
    response = requests.get(url)  # Send a GET request to the URL
    data = response.json()  # Get the JSON data from the response
    return data

def extract_data(data):
    # Extract the desired data attributes
    show_id = data["id"]
    show_url = data["url"]
    show_name = data["name"]
    episodes = data["_embedded"]["episodes"]

    extracted_data = []
    for episode in episodes:
        episode_id = episode["id"]
        episode_season = episode["season"]
        episode_number = episode["number"]
        episode_type = episode["type"]
        episode_airdate = episode["airdate"]
        episode_airtime = episode["airtime"]
        episode_runtime = episode["runtime"]
        episode_rating = episode["rating"]["average"]
        episode_summary = BeautifulSoup(episode["summary"], "html.parser").get_text()
        episode_image_medium = episode["image"]["medium"]
        episode_image_original = episode["image"]["original"]

        # Append the extracted data to the list
        extracted_data.append({
            "id": episode_id,
            "url": show_url,
            "name": show_name,
            "season": episode_season,
            "number": episode_number,
            "type": episode_type,
            "airdate": episode_airdate,
            "airtime": episode_airtime,
            "runtime": episode_runtime,
            "average rating": episode_rating,
            "summary": episode_summary,
            "medium image link": episode_image_medium,
            "original image link": episode_image_original
        })

    return extracted_data

# Provide the URL of the API
url = "http://api.tvmaze.com/singlesearch/shows?q=westworld&embed=episodes"

# Download the data
api_data = download_data(url)

# Extract the desired data attributes
extracted_data = extract_data(api_data)

# Print the extracted data
for episode_data in extracted_data:
    print(episode_data)


{'id': 869671, 'url': 'https://www.tvmaze.com/shows/1371/westworld', 'name': 'Westworld', 'season': 1, 'number': 1, 'type': 'regular', 'airdate': '2016-10-02', 'airtime': '21:00', 'runtime': 68, 'average rating': 8, 'summary': "A woman named Dolores is a free spirit in the Old West... and unaware that she's actually an android, programmed to entertain rich guests seeking to act out their fantasies in an idealized vision of the 1880s. However, the people in charge soon realize that their androids are acting in ways that they didn't anticipate.", 'medium image link': 'https://static.tvmaze.com/uploads/images/medium_landscape/78/195475.jpg', 'original image link': 'https://static.tvmaze.com/uploads/images/original_untouched/78/195475.jpg'}
{'id': 911201, 'url': 'https://www.tvmaze.com/shows/1371/westworld', 'name': 'Westworld', 'season': 1, 'number': 2, 'type': 'regular', 'airdate': '2016-10-09', 'airtime': '21:00', 'runtime': 60, 'average rating': 7.7, 'summary': 'Bernard suspects that s

Explanation:

The download_data function takes a URL as input, sends a GET request to that URL using the requests library, and retrieves the JSON data from the response.

The extract_data function takes the downloaded data as input.

It extracts the desired data attributes from the JSON data, including the show ID, show URL, show name, and episode details.

The function loops over each episode and extracts the necessary attributes such as episode ID, season, number, type, airdate, airtime, runtime, average rating, summary, medium image link, and original image link.

The HTML tags are removed from the episode summary using the BeautifulSoup library, which is used to parse and process HTML content.

The extracted data is stored in a list of dictionaries, where each dictionary represents the attributes of an episode.

Finally, the program calls the download_data function with the provided URL and passes the downloaded data to the extract_data function to extract and format the desired data attributes.