<div class="alert alert-block alert-info">
Author:<br>Felix Gonzalez, P.E. <br> Adjunct Instructor, <br> Division of Professional Studies <br> Computer Science and Electrical Engineering <br> University of Maryland Baltimore County <br> fgonzale@umbc.edu
</div>

This notebook provides an overview of basic concepts in workign with various types of online data sources such as online files (e.g., csv), website data (e.g., web crawling) and application programming interface in the Python Programming Language and Jupyter Notebooks.

# Table of Contents
[Python Libraries in this Notebook](#Python-Libraries-in-this-Notebook)

[Accessing Website Data](#Accessing-Website-Data)

[Website Files](#Website-Files)

[Web-Crawling ](#Web-Crawling )

[Parsing Website HTML Data](#Parsing-Website-HTML-Data)

[Websites and PDF Files](#Websites-and-PDF-Files)

[Application Programming Interfaces (API)](#Application-Programming-Interfaces-(API))

[API Requests](#API-Requests)

- [API Request Example: Open Notify](#API-Request-Example:-Open-Notify)

- [API Requests Example: U.S. Federal Government Websites](#API-Requests-Example:-U.S.-Federal-Government-Websites)

- [API Requests Example: U.S. Department of Energy](#API-Requests-Example:-U.S.-Department-of-Energy)

- [API Requests Example: U.S. Federal Goverment Federal Registry](#API-Requests-Example:-U.S.-Federal-Goverment-Federal-Registry)

- [Automating API Requests Example: U.S. Federal Goverment Federal Registry](#Automating-API-Requests-Example:-U.S.-Federal-Goverment-Federal-Registry)

# Python Libraries in this Notebook
[Return to Table of Contents](#Table-of-Contents)

In [1]:
# Install PyPDF2 if needed. Run one time only. Restart notebook after installing.
#!pip install PyPDF2

In [59]:
# Library Loading
import requests
import json
import pandas as pd

import urllib.request
from urllib.request import urlopen
import io

import csv
import re
from PyPDF2 import PdfReader 

from datetime import datetime, date, timedelta
import time

from bs4 import BeautifulSoup

# Accessing Website Data
[Return to Table of Contents](#Table-of-Contents)

As a data scientist in many cases we will need to access data from websites. This can include files in a website (e.g., github), or data from the webpage (e.g., HTML), or data thru an application programming interface (API). The Requests library is an elegant and simple HTTP library for Python that will be used in most cases when making requests to a website.

Documentation References:
- https://docs.python-requests.org/en/v2.0.0/

# Website Files
[Return to Table of Contents](#Table-of-Contents)

In [3]:
# Web link: https://github.com/JuliaData/CSV.jl/blob/main/test/testfiles/Sacramentorealestatetransactions.csv
CSV_URL = 'https://raw.githubusercontent.com/JuliaData/CSV.jl/main/test/testfiles/Sacramentorealestatetransactions.csv'

In [4]:
with requests.Session() as s:
    download = s.get(CSV_URL) # 
    
    decoded_content = download.content.decode('utf-8') # 

    cr = csv.reader(decoded_content.splitlines(), delimiter=',') #
    my_list = list(cr) #
    
    for row in my_list:
        print(row)

['street', 'city', 'zip', 'state', 'beds', 'baths', 'sq__ft', 'type', 'sale_date', 'price', 'latitude', 'longitude']
['3526 HIGH ST', 'SACRAMENTO', '95838', 'CA', '2', '1', '836', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '59222', '38.631913', '-121.434879']
['51 OMAHA CT', 'SACRAMENTO', '95823', 'CA', '3', '1', '1167', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68212', '38.478902', '-121.431028']
['2796 BRANCH ST', 'SACRAMENTO', '95815', 'CA', '2', '1', '796', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '68880', '38.618305', '-121.443839']
['2805 JANETTE WAY', 'SACRAMENTO', '95815', 'CA', '2', '1', '852', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '69307', '38.616835', '-121.439146']
['6001 MCMAHON DR', 'SACRAMENTO', '95824', 'CA', '2', '1', '797', 'Residential', 'Wed May 21 00:00:00 EDT 2008', '81900', '38.51947', '-121.435768']
['5828 PEPPERMILL CT', 'SACRAMENTO', '95841', 'CA', '3', '1', '1122', 'Condo', 'Wed May 21 00:00:00 EDT 2008', '89921', '38.662595', '-121.3

In [5]:
# ANOTHER EXAMPLE
# https://data.worldbank.org/topic/climate-change
# Seems API was updated and CSV files not accessible directly.

url = 'http://data.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv'
response = requests.get(url)
if response.status_code != 200:
    print('Failed to get data:', response.status_code)
else:
    wrapper = csv.reader(response.text.strip().split('\n'))
    results = []
    for record in wrapper:
        if record[0] != 'year':
            year = int(record[0])
            value = float(record[1])
            print(year, value)

Failed to get data: 404


In [6]:
pd.read_csv(CSV_URL)

Unnamed: 0,street,city,zip,state,beds,baths,sq__ft,type,sale_date,price,latitude,longitude
0,3526 HIGH ST,SACRAMENTO,95838,CA,2,1,836,Residential,Wed May 21 00:00:00 EDT 2008,59222,38.631913,-121.434879
1,51 OMAHA CT,SACRAMENTO,95823,CA,3,1,1167,Residential,Wed May 21 00:00:00 EDT 2008,68212,38.478902,-121.431028
2,2796 BRANCH ST,SACRAMENTO,95815,CA,2,1,796,Residential,Wed May 21 00:00:00 EDT 2008,68880,38.618305,-121.443839
3,2805 JANETTE WAY,SACRAMENTO,95815,CA,2,1,852,Residential,Wed May 21 00:00:00 EDT 2008,69307,38.616835,-121.439146
4,6001 MCMAHON DR,SACRAMENTO,95824,CA,2,1,797,Residential,Wed May 21 00:00:00 EDT 2008,81900,38.519470,-121.435768
...,...,...,...,...,...,...,...,...,...,...,...,...
980,9169 GARLINGTON CT,SACRAMENTO,95829,CA,4,3,2280,Residential,Thu May 15 00:00:00 EDT 2008,232425,38.457679,-121.359620
981,6932 RUSKUT WAY,SACRAMENTO,95823,CA,3,2,1477,Residential,Thu May 15 00:00:00 EDT 2008,234000,38.499893,-121.458890
982,7933 DAFFODIL WAY,CITRUS HEIGHTS,95610,CA,3,2,1216,Residential,Thu May 15 00:00:00 EDT 2008,235000,38.708824,-121.256803
983,8304 RED FOX WAY,ELK GROVE,95758,CA,4,2,1685,Residential,Thu May 15 00:00:00 EDT 2008,235301,38.417000,-121.397424


# Web-Crawling 
[Return to Table of Contents](#Table-of-Contents)

Note that web-crawling may be prohibited in some websites. Always checks the website "robots.txt" file. The robots.txt can be found under the main site url. For example, for UMBC it can be found at www.umbc.edu/robots.txt. Note that web-crawling may be prohibited in some websites. Read Robots.txt file for each website. to have a better understanding of what can and can't be extracted.

When extracting data from websites it is recommended to use the API developed by the website owners if one exist. API's are discussed later in this notobook.

In [7]:
response = requests.get('https://professionalprograms.umbc.edu/data-science/')
print(response.text)

<!doctype html>
<html lang="en-US"> 
<head>
	<meta charset="UTF-8" /><script type="text/javascript">(window.NREUM||(NREUM={})).init={ajax:{deny_list:["bam.nr-data.net"]}};(window.NREUM||(NREUM={})).loader_config={licenseKey:"NRJS-9b76224478fb57ed3c0",applicationID:"488453678"};;/*! For license information please see nr-loader-rum-1.246.1.min.js.LICENSE.txt */
(()=>{"use strict";var e,t,n={234:(e,t,n)=>{n.d(t,{P_:()=>h,Mt:()=>m,C5:()=>s,DL:()=>w,OP:()=>j,lF:()=>S,Yu:()=>_,Dg:()=>v,CX:()=>c,GE:()=>A,sU:()=>T});var r=n(8632),i=n(9567);const a={beacon:r.ce.beacon,errorBeacon:r.ce.errorBeacon,licenseKey:void 0,applicationID:void 0,sa:void 0,queueTime:void 0,applicationTime:void 0,ttGuid:void 0,user:void 0,account:void 0,product:void 0,extra:void 0,jsAttributes:{},userAttributes:void 0,atts:void 0,transactionName:void 0,tNamePlain:void 0},o={};function s(e){if(!e)throw new Error("All info objects require an agent identifier!");if(!o[e])throw new Error("Info for ".concat(e," was never set"));

In [8]:
# LET'S DO SOME WEBCRAWLING

# conntect to a url
website = urlopen('https://professionalprograms.umbc.edu/data-science/')
# read htmlcode
html = website.read().decode('utf-8')

#use re.findall to get all the links
links = re.findall('"((http|ftp)s?://.*?)"', html)

# print links
for onelink in links:
    print(onelink)

('https://', 'http')
('http://custom.transaction', 'http')
('https://js-agent.newrelic.com/', 'http')
('https://gmpg.org/xfn/11', 'http')
('https://fonts.googleapis.com/css?family=Roboto:400,500,900', 'http')
('https://professionalprograms.umbc.edu/data-science/', 'http')
('https://professionalprograms.umbc.edu/data-science/', 'http')
('https://www.facebook.com/UMBCProfessionalGradPrograms/', 'http')
('https://professionalprograms.umbc.edu/wp-content/uploads/2019/07/Data-Science-Certificate.jpg', 'http')
('https://schema.org', 'http')
('https://professionalprograms.umbc.edu/data-science/', 'http')
('https://professionalprograms.umbc.edu/data-science/', 'http')
('https://professionalprograms.umbc.edu/#website', 'http')
('https://professionalprograms.umbc.edu/data-science/#primaryimage', 'http')
('https://professionalprograms.umbc.edu/data-science/#primaryimage', 'http')
('https://professionalprograms.umbc.edu/wp-content/uploads/2020/10/best-colleges-national-universities-300x300.png', '

In [9]:
# EMAIL CRAWLING!!!
# Note that web-crawling may be prohibited in some websites. Always checks the website "robots.txt" file.
website = urlopen('https://dps.umbc.edu/staff-directory/')
# read htmlcode
html = website.read().decode('utf-8')

#use re.findall to get all the links
email_addresses = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", html)

# print email_addresses
for an_email_address in email_addresses:
    print(an_email_address)

abdullah@umbc.edu
sahuja1@umbc.edu
llen@umbc.edu
abdullah@umbc.edu
sahuja1@umbc.edu
llen@umbc.edu
abdullah@umbc.edu
abdullah@umbc.edu
sahuja1@umbc.edu
sahuja1@umbc.edu
llen@umbc.edu
llen@umbc.edu
bae2@umbc.edu
bae2@umbc.edu
sbansal1@umbc.edu
sbansal1@umbc.edu
joshuaba@umbc.edu
joshuaba@umbc.edu
cbateman1@umbc.edu
cbateman1@umbc.edu
cbehm@umbc.edu
cbehm@umbc.edu
benson@umbc.edu
benson@umbc.edu
bermud@umbc.edu
bermud@umbc.edu
dcardona@umbc.edu
dcardona@umbc.edu
mceasar1@umbc.edu
mceasar1@umbc.edu
rashad.cheeks@umbc.edu
rashad.cheeks@umbc.edu
nancyc@umbc.edu
nancyc@umbc.edu
nandita@umbc.edu
nandita@umbc.edu
vu58619@umbc.edu
vu58619@umbc.edu
ladavis@umbc.edu
ladavis@umbc.edu
faithdinh@umbc.edu
faithdinh@umbc.edu
kedmonds@umbc.edu
kedmonds@umbc.edu
eedwards@umbc.edu
eedwards@umbc.edu
reisen@umbc.edu
reisen@umbc.edu
makebaellis@umbc.edu
makebaellis@umbc.edu
jfitzpatrick@esri.com
jfitzpatrick@esri.com
tfoster@umbc.edu
tfoster@umbc.edu
gardenghi@umbc.edu
gardenghi@umbc.edu
jgilless@umbc.edu
jg

# Parsing Website HTML Data
[Return to Table of Contents](#Table-of-Contents)

When parsing website or HTML data you can use the Beautiful Soup library. This library is used for pulling data out of HTML and XML files including website data which are typically deployed in HTML if they are static websites. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. 

For non-static websites the Selenium Library can be used. The selenium package is used to automate web browser interaction from Python. Can be used for dynamic JavaScript based websites.

Documentation References:
- https://beautiful-soup-4.readthedocs.io/en/latest/
- https://selenium-python.readthedocs.io/

Other References:
- http://www.compjour.org/warmups/govt-text-releases/intro-to-bs4-lxml-parsing-wh-press-briefings/
- https://stackabuse.com/guide-to-parsing-html-with-beautifulsoup-in-python/

In [10]:
website = urlopen('http://datascience.umbc.edu/')
soup = BeautifulSoup(website)

In [11]:
print(type(soup))

<class 'bs4.BeautifulSoup'>


In [12]:
# Uncomment the items below to test the outputs and how the vary.
#soup
#print(soup)
#print(soup.text)
#soup.get_text()

In [13]:
list_tags_in_HTML = [tag.name for tag in soup.find_all()] # List of all the tags being used.
list_tags_in_HTML_unique = (list(set(list_tags_in_HTML)))
print(list_tags_in_HTML_unique)

['form', 'center', 'iframe', 'fieldset', 'defs', 'link', 'option', 'ul', 'body', 'textarea', 'span', 'h2', 'legend', 'h1', 'h3', 'strong', 'div', 'header', 'br', 'figcaption', 'main', 'h4', 'hr', 'article', 'blockquote', 'select', 'img', 'title', 'path', 'section', 'meta', 'a', 'input', 'pattern', 'li', 'html', 'noscript', 'script', 'svg', 'head', 'p', 'nav', 'label', 'figure', 'em', 'address', 'aside', 'footer', 'style', 'i', 'button', 'rect']


In [14]:
soup.find_all('h4') # Can change to the unique tags above to find all instances of the HTML tag as a list.

[<h4>Average Completion Time</h4>,
 <h4>Credit Hours</h4>,
 <h4>Tuition &amp; Fees</h4>,
 <h4>Start Date</h4>]

In [15]:
soup.find_all('h4')[0].text # Text of the element.

'Average Completion Time'

In [16]:
heading = soup.find_all('h4')
for t in heading:
    print(t.text)

Average Completion Time
Credit Hours
Tuition & Fees
Start Date


In [17]:
soup.find_all(['title', 'h1', 'h4'])

[<title>Data Science Graduate Programs – UMBC Professional Programs</title>,
 <h1 class="entry-title">Data Science</h1>,
 <h4>Average Completion Time</h4>,
 <h4>Credit Hours</h4>,
 <h4>Tuition &amp; Fees</h4>,
 <h4>Start Date</h4>]

In [18]:
# Can Extract the data by HTML Tag.
tags_to_find = soup.find_all(['h1', 'p'])

for element in range(len(tags_to_find)):
    print(f'# ELEMENT: {element}')
    print(f'TAG: {tags_to_find[element].name}') # Prints the tag name of the element
    print(f'CONTENT: {tags_to_find[element].text}') # Prints the text of the element          

# ELEMENT: 0
TAG: h1
CONTENT: Data Science
# ELEMENT: 1
TAG: p
CONTENT: The Data Science graduate program at UMBC prepares students to respond to the growing demand for professionals with data science knowledge, skills, and abilities. Our program brings together faculty from a wide range of fields who have a deep understanding of the real-world applications of data analytics. UMBC’s Data Science programs prepare students to excel in data science roles through hands-on experience, rigorous academics, and access to a robust network of knowledgeable industry professionals.
# ELEMENT: 2
TAG: p
CONTENT: These programs were designed with working professionals in mind and offer courses in the evening and online to accommodate students with full-time jobs. With two campuses in Baltimore and Rockville, students can choose the location that best suits their needs. UMBC offers a 10-course Data Science Master’s program (M.P.S. in Data Science) as well as a 4-course post-baccalaureate certificate i

# Websites and PDF Files
[Return to Table of Contents](#Table-of-Contents)

In some cases we may need to extract data from a PDF file. In this example we have various PDF report links that we want to access and extract the text data. We weant to iterate thru all the report links and extract the text of the report and add it to our dataframe.

In this example we will use PDF files from the U.S. Nuclear Regulatory Commission (US NRC). The PDF's in this example are Licensee Event Report (LER) PDFs and Accident Sequence Precursor Analysis (ASP) Report PDFs. The LERs are reports from events that happen at commercial nuclear power plants. The ASP reports are analysis reports from some of the LERs that meet specific thresholds. 

Documentation References:
- https://pypdf2.readthedocs.io/en/stable/
- US NRC Licensee Event Reports: https://www.nrc.gov/reading-rm/doc-collections/cfr/part050/part050-0073.html
- US NRC ASP Program: https://www.nrc.gov/about-nrc/regulatory/research/asp.html
- US NRC ASP Program MS Power BI Dashboard allows you to export the table of reports and links: https://app.powerbigov.us/view?r=eyJrIjoiNmU2NjJiYjktOTQyYS00OGRhLTk0MGItMmUxNDdlOGI5NTgzIiwidCI6ImU4ZDAxNDc1LWMzYjUtNDM2YS1hMDY1LTVkZWY0YzY0ZjUyZSJ9
- 

In [19]:
NRC_ASP_df = pd.read_csv('./input_data/NRC_ASP_DATA_from_Public_ASP_Dashboard.csv')
print(NRC_ASP_df.shape)
NRC_ASP_df.head(5)

(10, 6)


Unnamed: 0,Plant,Event Date,LER / IR,Description,Result,ASP Analysis
0,D.C. Cook 2,9/4/2020,https://www.nrc.gov/docs/ML2031/ML20311A129.pdf,Manual reactor trip and automatic SI due to fa...,1e-05,https://www.nrc.gov/docs/ML2103/ML21035A236.pdf
1,Duane Arnold,8/10/2020,https://www.nrc.gov/docs/ML2028/ML20283A373.pdf,LOOP caused by high winds during derecho,0.0008,https://www.nrc.gov/docs/ML2105/ML21056A382.pdf
2,Brunswick 1,8/3/2020,https://www.nrc.gov/docs/ML2026/ML20265A162.pdf,LOOP during Hurricane Isaias,2e-05,https://www.nrc.gov/docs/ML2029/ML20294A552.pdf
3,Fitzpatrick,4/10/2020,https://www.nrc.gov/docs/ML2016/ML20161A405.pdf,High pressure coolant injection inoperable due...,3e-06,https://www.nrc.gov/docs/ML2110/ML21105A543.pdf
4,Quad Cities 2,3/30/2020,https://www.nrc.gov/docs/ML2014/ML20149K600.pdf,Electromatic relief valve 3D failed to actuate...,3e-05,https://www.nrc.gov/docs/ML2102/ML21029A319.pdf


In [20]:
# Function checks if the url is alive.
def url_is_alive(url): # Reference: https://gist.github.com/dehowell/884204
    """
    Checks that a given URL is reachable.
    :param url: A URL
    :rtype: bool
    """
    request = urllib.request.Request(url)
    request.get_method = lambda: 'HEAD'

    try:
        urllib.request.urlopen(request)
        return True
    except urllib.request.HTTPError:
        return False

In [21]:
# Function iterates thru the links and extracts the PDF link text
def extract_link_text(target_link_column, column_title):
    for row in range(NRC_ASP_df.shape[0]): # Column with url
        URL = NRC_ASP_df.at[row, target_link_column]
        if str('nrc.gov') in str(URL):
            if url_is_alive(URL) == True:
                req = urllib.request.Request(URL)
                remote_file = urllib.request.urlopen(req).read()
                remote_file_bytes = io.BytesIO(remote_file)
                pdfdoc_remote = PdfReader(remote_file_bytes)
                all_text = str('')                
                for i in range(len(pdfdoc_remote.pages)): # Iterates thru all the pages.
                    current_page = pdfdoc_remote.pages[i]
                    all_text += current_page.extract_text()
                NRC_ASP_df.at[row, column_title+'_RPT_Text'] = all_text
            elif url_is_alive(URL) == False:
                NRC_ASP_df.at[row, column_title+'_RPT_Text'] = str('REPORT UNAVAILABLE OR URL DOES NOT EXIST')
        else:
            NRC_ASP_df.at[row, column_title+'_RPT_Text'] = str('REPORT UNAVAILABLE OR URL DOES NOT EXIST')

In [22]:
# Calls function to extract the Licensee Event Report Text and the Accident Sequence Precursor Program Report Text.
extract_link_text('ASP Analysis', 'ASP')
extract_link_text('LER / IR', 'LER')

In [23]:
NRC_ASP_df.head(4)

Unnamed: 0,Plant,Event Date,LER / IR,Description,Result,ASP Analysis,ASP_RPT_Text,LER_RPT_Text
0,D.C. Cook 2,9/4/2020,https://www.nrc.gov/docs/ML2031/ML20311A129.pdf,Manual reactor trip and automatic SI due to fa...,1e-05,https://www.nrc.gov/docs/ML2103/ML21035A236.pdf,\n \n1 Final ASP Analysis – Precursor \nAcci...,a: \nINDIANA \nMICHIGAN flOfllfEll• \nA umt of...
1,Duane Arnold,8/10/2020,https://www.nrc.gov/docs/ML2028/ML20283A373.pdf,LOOP caused by high winds during derecho,0.0008,https://www.nrc.gov/docs/ML2105/ML21056A382.pdf,\n \n1 Final ASP Analysis – Precursor \nAcci...,"September 30, 2020 \nU.S. Nuclear Regulatory C..."
2,Brunswick 1,8/3/2020,https://www.nrc.gov/docs/ML2026/ML20265A162.pdf,LOOP during Hurricane Isaias,2e-05,https://www.nrc.gov/docs/ML2029/ML20294A552.pdf,\n \n1 Final ASP Analysis – Precursor \nAcci...,"September 21, 2020e..l_~ DUKE \n~ ENERGY® \nSe..."
3,Fitzpatrick,4/10/2020,https://www.nrc.gov/docs/ML2016/ML20161A405.pdf,High pressure coolant injection inoperable due...,3e-06,https://www.nrc.gov/docs/ML2110/ML21105A543.pdf,UNITED STATES \nNUCLEAR REGULATORY COMMISSION ...,"Exelon Generation \nJAFP-20-0042 \nJune 9, 202..."


# Application Programming Interfaces (API)
[Return to Table of Contents](#Table-of-Contents)

Application Programming Interfaces (API) or API endpoint are a way for computer programs and tools to communicate with each other. Typical interactions include web API's communication from website to website or website to a computer. The API will reside in the source website and the developer and owner of the API will have some instructions and documentation on how to submit requests, download data, query parameters available, among other features. Most API's require registration to the owners Website/API as well as accepting user terms of service. An API may include some data but not all data available in a website. An API endpoint will allow you or your software to access the data that the site owners and developers are making available.  

One example of an open API is [Open Notify](http://api.open-notify.org) which is an open source project that provides a simple API for some of National Aeronautics and Space Administration (NASA) data. This Jupyter Notebook uses this API to run some example on how API downloaded data could be converted to Pandas usable data.

Most Websites will typically have a section on API's or a section for Developers which will provide specific instructions and documentation on how to use their API. Note that some API's may be very simple while others may include many different parameters that can be modified to obtain different datasets from the API. Note that Open API's may limit the number of requests (e.g., X daily requests) in order to manage the resources of the website.

References:
- [Python API Tutorial: Getting Started with APIs – Dataquest](https://www.dataquest.io/blog/python-api-tutorial/)

Public Open API's (do not require registration or API key):
- [Open Notify API](http://open-notify.org/)
- [U.S. Federal Goverment Federal Registry](https://www.federalregister.gov/reader-aids/developer-resources/rest-api)
- [USAJobs: U.S. Federal Goverment Job Postings Website API](https://developer.usajobs.gov/API-Reference)
- [U.S. Federal Government Data.gov Website API](https://data.gov/developers/apis/)
- [U.S. Federal Government Open Source Software By Agency](https://code.gov/agencies)

# API Requests
[Return to Table of Contents](#Table-of-Contents)

The Request library and various functions (e.g., .get(), .post()) to allows us to request information from a website address. When making requests, the .get() function does not alter the state of the server and only "gets" information. In the .post() function a request is sent to the server whcih gets processed and may or may not alter the state of the server. An API will have some address to make requests and the API will return information if connection was successful.

#### API Status Codes
Status codes are returned with every request that is made to a web server. Status codes indicate information about what happened with a request. Here are some codes that are relevant to GET requests:

- 200: Everything went okay, and the result has been returned (if any).
- 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
- 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
- 401: The server thinks you’re not authenticated. Many APIs require login ccredentials, so this happens when you don’t send the right credentials to access an API.
- 403: The resource you’re trying to access is forbidden: you don’t have the right perlessons to see it.
- 404: The resource you tried to access wasn’t found on the server.
- 503: The server is not ready to handle the request.

In [24]:
# In this case the API address does not exist and will return an error.
response = requests.get("https://api.open-notify.org/this-api-doesnt-exist")

ConnectionError: HTTPSConnectionPool(host='api.open-notify.org', port=443): Max retries exceeded with url: /this-api-doesnt-exist (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000181C503C7D0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

### API Request Example: [Open Notify](http://open-notify.org/)
[Return to Table of Contents](#Table-of-Contents)

As of 12/4/2022, Open Notify has two API's. One with the location of the international space station (ISS) (http://open-notify.org/Open-Notify-API/ISS-Location-Now/) and another with the total number of people in space right now (http://open-notify.org/Open-Notify-API/People-In-Space/). The API on ISS pass times has been removed (http://open-notify.org/Open-Notify-API/ISS-Pass-Times/).

In [25]:
# The Open-Notify API address is http://api.open-notify.org and points to a JSON data file.
# Get Documentation: https://requests.readthedocs.io/en/latest/user/quickstart/
response = requests.get("http://api.open-notify.org/astros.json")

In [26]:
print(response.status_code)
# The API Status codes are bellow.
# Status code 200 means the request was successful.

200


In [27]:
# Let's see the data. It returs a list of people currently in Space.
response.json()

{'message': 'success',
 'people': [{'name': 'Jasmin Moghbeli', 'craft': 'ISS'},
  {'name': 'Andreas Mogensen', 'craft': 'ISS'},
  {'name': 'Satoshi Furukawa', 'craft': 'ISS'},
  {'name': 'Konstantin Borisov', 'craft': 'ISS'},
  {'name': 'Oleg Kononenko', 'craft': 'ISS'},
  {'name': 'Nikolai Chub', 'craft': 'ISS'},
  {'name': "Loral O'Hara", 'craft': 'ISS'}],
 'number': 7}

In [28]:
# We can assign a variable to the data.
json_data = response.json()
# We can see that the data is a dictionary form.
type(json_data)

dict

In [29]:
# We can convert a dictionary to a dataframe.
pd.DataFrame(json_data)
# We can see that we have one column called "people" whcih has the craft and name of the astronaut.
# We have a collumn called "number" whcih has the total number of people.
# And a column message.

Unnamed: 0,message,people,number
0,success,"{'name': 'Jasmin Moghbeli', 'craft': 'ISS'}",7
1,success,"{'name': 'Andreas Mogensen', 'craft': 'ISS'}",7
2,success,"{'name': 'Satoshi Furukawa', 'craft': 'ISS'}",7
3,success,"{'name': 'Konstantin Borisov', 'craft': 'ISS'}",7
4,success,"{'name': 'Oleg Kononenko', 'craft': 'ISS'}",7
5,success,"{'name': 'Nikolai Chub', 'craft': 'ISS'}",7
6,success,"{'name': 'Loral O'Hara', 'craft': 'ISS'}",7


In [30]:
# We can get rid of the "number" and "message" columns by only selecting the column "people"
# This data now looks similar to our typical dataframe and easier to interact with.
pd.DataFrame(json_data['people'])

Unnamed: 0,name,craft
0,Jasmin Moghbeli,ISS
1,Andreas Mogensen,ISS
2,Satoshi Furukawa,ISS
3,Konstantin Borisov,ISS
4,Oleg Kononenko,ISS
5,Nikolai Chub,ISS
6,Loral O'Hara,ISS


In [31]:
# We can assign a dataframe name and variable. 
df_open_notify_astros = pd.DataFrame(json_data['people'])

In [32]:
# Each row is an astronaut name and the total should match our total number of astronauts in space, 13.
print(f'Total number of astronauts in space is {len(df_open_notify_astros)}.')

Total number of astronauts in space is 7.


In [33]:
# We can filter for astronauts at the ISS.
df_open_notify_astros[df_open_notify_astros['craft'] == 'ISS']

Unnamed: 0,name,craft
0,Jasmin Moghbeli,ISS
1,Andreas Mogensen,ISS
2,Satoshi Furukawa,ISS
3,Konstantin Borisov,ISS
4,Oleg Kononenko,ISS
5,Nikolai Chub,ISS
6,Loral O'Hara,ISS


In [34]:
# Once the API data has been converted to a dataframe, Pandas can be used to filter, plot, and transform the data.
# Some API's also allow input of parameters.

### API Requests Example: U.S. Federal Government Websites
[Return to Table of Contents](#Table-of-Contents)

U.S. Federal Government website, Code.gov (https://code.gov/agencies) includes a database of website resources which includes code, data and API links. This section discusses API REquests 

### API Requests Example: U.S. Department of Energy
[Return to Table of Contents](#Table-of-Contents)

The section discusses the website API for the U.S. Department of Energy. This is a very simple API.

In [35]:
response = requests.get("https://www.energy.gov/sites/default/files/2022-10/code-10-03-2022.json")

In [36]:
print(response.status_code)

200


In [37]:
response.json().keys() # Provides the top keys of the json/dictionary

dict_keys(['agency', 'measurementType', 'releases', 'version'])

In [38]:
response.json()['agency']

'DOE'

In [39]:
response.json()['measurementType']

{'ifOther': '', 'method': 'other'}

In [40]:
response.json()['version']

'2.0.0'

In [41]:
# The releases seem to be a list of Open Source codes published by the U.S. Department of energy.
print(pd.DataFrame(response.json()['releases']).shape)
pd.DataFrame(response.json()['releases']).head(5)

(4569, 14)


Unnamed: 0,contact,date,description,laborHours,name,organization,permissions,repositoryURL,status,tags,vcs,languages,homepageURL,version
0,{'email': 'jcrouch@sandia.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...","Teuchos is designed to provide portable, objec...",8344830.4,Teuchos Utility Package,Sandia National Laboratories (SNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/trilinos/Trilinos,Production,"[DOE CODE, Sandia National Laboratories (SNL)]",git,,,
1,{'email': 'jcrouch@sandia.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",Amesos is the Direct Sparse Solver Package in ...,8344830.4,Amesos Solver Package,Sandia National Laboratories (SNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/trilinos/Trilinos,Production,"[DOE CODE, Sandia National Laboratories (SNL)]",git,[],,
2,{'email': 'holdensanchez2@llnl.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",The MRSH project is a collection of the follow...,24213.6,MRSH Version V2.0,Lawrence Livermore National Laboratory (LLNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/chaos/mrsh,Production,"[DOE CODE, Lawrence Livermore National Laborat...",git,,,
3,{'email': 'holdensanchez2@llnl.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",The Lustre Administrative Tools (LAT) is a set...,5639.2,Lustre Administrative Tool,Lawrence Livermore National Laboratory (LLNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/cea-hpc/shine,Production,"[DOE CODE, Lawrence Livermore National Laborat...",git,,,
4,{'email': 'holdensanchez2@llnl.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",WHATSUP determines which nodes in a cluster ar...,58793.6,WHATSUP Version1.3,Lawrence Livermore National Laboratory (LLNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/chaos/whatsup,Production,"[DOE CODE, Lawrence Livermore National Laborat...",git,,,


### API Requests Example: [U.S. Federal Goverment Federal Registry](https://www.federalregister.gov/reader-aids/developer-resources/rest-api)
[Return to Table of Contents](#Table-of-Contents)

The Federal Register is a website where all U.S. Federal Government Agencies publish rules, proposed rules, notices and other  documents for public awareness. The address in this example is generated from the RESTFUL API Interactive Documentation.

In [42]:
# To generate used the Restful API with the following parameters:
# "/documents.{format} Search all Federal Register documents published since 1994."
# Fields = abstract, agencies, dates, document_number, page_length, pdf_url, publication_date, title.
# per_page How many documents to return at once; 1000 maximum. = 1000 
# conditions[publication_date][is] = '2022-12-01'
response = requests.get("https://www.federalregister.gov/api/v1/documents.json?fields[]=abstract&fields[]=agencies&fields[]=dates&fields[]=document_number&fields[]=page_length&fields[]=pdf_url&fields[]=publication_date&fields[]=title&per_page=1000&order=newest&conditions[publication_date][is]=2022-12-01")

In [43]:
print(response.status_code)

200


In [44]:
response.json().keys() # Provides the top keys of the json/dictionary

dict_keys(['count', 'description', 'total_pages', 'results'])

In [45]:
response.json()

{'count': 98,
 'description': 'Documents published on 12/01/2022',
 'total_pages': 1,
 'results': [{'abstract': 'This document summarizes the Federal Acquisition Regulation (FAR) rules agreed to by the Civilian Agency Acquisition Council and the Defense Acquisition Regulations Council (Councils) in this Federal Acquisition Circular (FAC) 2023-01. A companion document, the Small Entity Compliance Guide (SECG), follows this FAC.',
   'agencies': [{'raw_name': 'DEPARTMENT OF DEFENSE',
     'name': 'Defense Department',
     'id': 103,
     'url': 'https://www.federalregister.gov/agencies/defense-department',
     'json_url': 'https://www.federalregister.gov/api/v1/agencies/103',
     'parent_id': None,
     'slug': 'defense-department'},
    {'raw_name': 'GENERAL SERVICES ADMINISTRATION',
     'name': 'General Services Administration',
     'id': 210,
     'url': 'https://www.federalregister.gov/agencies/general-services-administration',
     'json_url': 'https://www.federalregister.gov/a

In [46]:
# The releases seem to be a list of Open Source codes published by the U.S. Department of energy.
print(pd.DataFrame(response.json()).shape)
pd.DataFrame(response.json()).head(5)

(98, 4)


Unnamed: 0,count,description,total_pages,results
0,98,Documents published on 12/01/2022,1,{'abstract': 'This document summarizes the Fed...
1,98,Documents published on 12/01/2022,1,{'abstract': 'This document makes amendments t...
2,98,Documents published on 12/01/2022,1,{'abstract': 'This document is issued under th...
3,98,Documents published on 12/01/2022,1,"{'abstract': 'DoD, GSA, and NASA are issuing a..."
4,98,Documents published on 12/01/2022,1,"{'abstract': 'DoD, GSA, and NASA are issuing a..."


In [47]:
# The releases seem to be a list of Open Source codes published by the U.S. Department of energy.
print(pd.DataFrame(response.json()['results']).shape)
pd.DataFrame(response.json()['results']).head(5)
# This includes 98 publications on December 1, 2022.
# Now that we have the url date. We can define the date as a variable and put it in the url.

(98, 8)


Unnamed: 0,abstract,agencies,dates,document_number,page_length,pdf_url,publication_date,title
0,This document summarizes the Federal Acquisiti...,"[{'raw_name': 'DEPARTMENT OF DEFENSE', 'name':...",For effective dates see the separate documents...,2022-25957,2,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Federal Acquisition Regulation; Federal Acquis...
1,This document makes amendments to the Federal ...,"[{'raw_name': 'DEPARTMENT OF DEFENSE', 'name':...","Effective: December 30, 2022.",2022-25961,1,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Federal Acquisition Regulation; Technical Amen...
2,This document is issued under the joint author...,"[{'raw_name': 'DEPARTMENT OF DEFENSE', 'name':...","December 1, 2022.",2022-25962,2,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Federal Acquisition Regulation; Federal Acquis...
3,"DoD, GSA, and NASA are issuing a final rule am...","[{'raw_name': 'DEPARTMENT OF DEFENSE', 'name':...","Effective December 30, 2022.",2022-25960,5,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Federal Acquisition Regulation: United States-...
4,"DoD, GSA, and NASA are issuing a final rule am...","[{'raw_name': 'DEPARTMENT OF DEFENSE', 'name':...","Effective: December 30, 2022.",2022-25958,9,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Federal Acquisition Regulation: Update to Titl...


### Automating API Requests Example 1: [U.S. Federal Goverment Federal Registry](https://www.federalregister.gov/reader-aids/developer-resources/rest-api)
[Return to Table of Contents](#Table-of-Contents)

In this example, we want to download all the documents that have been published in the Federal Register in the previous weekday.

In [48]:
# Using Datetime today lets see how many documents were published yesterday.
# Need to specify date format of YYYY-MM-DD for input in the address.

def previous_weekday(a_date): # Function to calculate the date of previous weekday 
    a_date = a_date - timedelta(days=1)
    while a_date.weekday() > 4: # Checks if date is a weekday. Monday to Friday are index 0 to 4
        a_date = a_date - timedelta(days=1) # Continues to subtract a day until it is a weekday.
    return a_date.strftime('%Y-%m-%d') # Format from the API.

previous_weekday_date = previous_weekday(date.today()) # 

# Input date at the end of the API URL.
response = requests.get(f"https://www.federalregister.gov/api/v1/documents.json?fields[]=abstract&fields[]=agencies&fields[]=dates&fields[]=document_number&fields[]=page_length&fields[]=pdf_url&fields[]=publication_date&fields[]=title&per_page=1000&order=newest&conditions[publication_date][is]={previous_weekday_date}")

print(f'Status code: {response.status_code}')

# If there are reports published enters this code statement.
if response.json()['count'] != 0:
    published_documents = pd.DataFrame(response.json()['results']).shape[0]
    print(f'On {previous_weekday_date}, {published_documents} documents were published in the Federal Register.')
    
    # List of columns in the data.
    list_of_columns = pd.DataFrame(response.json()['results']).columns
    print(f'Columns in the results data: {list(list_of_columns)}.')
    
    # Converts the data to a dataframe.
    df_published_documents = pd.DataFrame(response.json()['results'])

# If there are no published reports (e.g., Weekend) enters this condition.    
else:
    print(f'No documents were published on {date_yesterday}.')
    
# If we are interested in documents that have the word "stock" in the title we can do as follows. 
df_documents_stocks = df_published_documents[df_published_documents['title'].str.contains('health', case = False)]

Status code: 200
On 2023-11-03, 135 documents were published in the Federal Register.
Columns in the results data: ['abstract', 'agencies', 'dates', 'document_number', 'page_length', 'pdf_url', 'publication_date', 'title'].


In [49]:
df_published_documents

Unnamed: 0,abstract,agencies,dates,document_number,page_length,pdf_url,publication_date,title
0,,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...",,2023-24367,2,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Agency Information Collection Activities; Subm...
1,,[{'raw_name': 'DEPARTMENT OF HEALTH AND HUMAN ...,"November 14, 2023.",2023-24277,1,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Center for Scientific Review; Notice of Closed...
2,,[{'raw_name': 'DEPARTMENT OF HEALTH AND HUMAN ...,"February 8, 2024.",2023-24278,1,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,National Center for Advancing Translational Sc...
3,,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...",,2023-24358,2,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Reorganization of Foreign-Trade Zone 128 Under...
4,,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...",,2023-24357,1,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Reorganization of Foreign-Trade Zone 281 (Expa...
...,...,...,...,...,...,...,...,...
130,"The Internal Revenue Service, as part of its c...","[{'raw_name': 'DEPARTMENT OF THE TREASURY', 'n...",Written comments should be received on or befo...,2023-24288,2,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Proposed Extension of Information Collection R...
131,The Department of Veterans Affairs (VA) gives ...,[{'raw_name': 'DEPARTMENT OF VETERANS AFFAIRS'...,,2023-24342,2,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Findings of Research Misconduct
132,"In this document, the Federal Communications C...",[{'raw_name': 'FEDERAL COMMUNICATIONS COMMISSI...,"Comments are due on or before December 14, 202...",2023-23630,49,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Safeguarding and Securing the Open Internet
133,The Food and Drug Administration (FDA or we) i...,[{'raw_name': 'DEPARTMENT OF HEALTH AND HUMAN ...,Either electronic or written comments on the p...,2023-24084,6,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Revocation of Authorization for Use of Bromina...


In [50]:
# This is the resulting dataframe of the code block above.
print(df_documents_stocks.shape)
df_documents_stocks

(1, 8)


Unnamed: 0,abstract,agencies,dates,document_number,page_length,pdf_url,publication_date,title
13,In compliance with the Paperwork Reduction Act...,[{'raw_name': 'DEPARTMENT OF HEALTH AND HUMAN ...,Comments on this ICR should be received no lat...,2023-24273,2,https://www.govinfo.gov/content/pkg/FR-2023-11...,2023-11-03,Agency Information Collection Activities: Subm...


### Automating API Requests Example 2: [U.S. Federal Goverment Federal Registry](https://www.federalregister.gov/reader-aids/developer-resources/rest-api)
[Return to Table of Contents](#Table-of-Contents)

The code below uses the Fed Register API Endpoint and searches for the terms in the "terms_list". It uses the API Endpoint to  iterate through each term and downloads 20 records for each term. 

In [51]:
# This is the list of terms that we will search using the API Endpoint.
terms_list = ['accident', 'weather']
print(f'Total number of terms in the list: {len(terms_list)}.')

Total number of terms in the list: 2.


In [61]:
# This code block iterates through the terms list and extracts the json output from the API endpoint request

print(f'API Request start time:  {datetime.now()}.')

for i in terms_list:
    # Used the API Endpoint with: /documents.{format} Search all Federal Register documents published since 1994 and keyword
    # The API Endpoint maximum documents is 1000 per request.
    # For each term we will get 1000 records or rows of data.
    response = requests.get(f"https://www.federalregister.gov/api/v1/documents.json?fields[]=abstract&fields[]=action&fields[]=agencies&fields[]=agency_names&fields[]=citation&fields[]=dates&fields[]=disposition_notes&fields[]=excerpts&fields[]=html_url&fields[]=page_length&fields[]=page_views&fields[]=pdf_url&fields[]=publication_date&fields[]=significant&fields[]=subtype&fields[]=title&fields[]=topics&fields[]=type&per_page=20&conditions[term]={i}")
    print(f'Processing term "{i}". Status code for request: {response.status_code}')
    
    data_df = response.json()
    
    # Writes the response from Federal Register API as a JSON file.
    with open(f"./output_data/API_initial_download/fed_register_data_{i}.json", "w") as file:
        json.dump(data_df, file)
        
    time.sleep(15) # Waits 15 seconds to perform the next request and be considerate of the host resources.

print("Data download is complete.")
print(f'API Request end time:  {datetime.now()}.')
print(f"Date last run: {date.today()}.")

API Request start time:  2023-11-06 08:54:58.874302.
Processing term "accident". Status code for request: 200
Processing term "weather". Status code for request: 200
Data download is complete.
API Request end time:  2023-11-06 08:55:29.311237.
Date last run: 2023-11-06.


In [62]:
# This code block shows the JSON results of the last term that was searched.
print(pd.DataFrame(response.json()['results']).shape)
pd.DataFrame(response.json()['results']).head(5)

(20, 18)


Unnamed: 0,abstract,action,agencies,agency_names,citation,dates,disposition_notes,excerpts,html_url,page_length,page_views,pdf_url,publication_date,significant,subtype,title,topics,type
0,The Federal Energy Regulatory Commission (Comm...,Final rule.,"[{'raw_name': 'DEPARTMENT OF ENERGY', 'name': ...","[Energy Department, Federal Energy Regulatory ...",88 FR 41477,"This rule is effective September 25, 2023. Eac...",,"southeast are “extreme <span class=""match"">wea...",https://www.federalregister.gov/documents/2023...,23,"{'count': 871, 'last_updated': '2023-11-06 08:...",https://www.govinfo.gov/content/pkg/FR-2023-06...,2023-06-27,,,One-Time Informational Reports on Extreme Weat...,[],Rule
1,The Federal Energy Regulatory Commission direc...,Final rule.,"[{'raw_name': 'DEPARTMENT OF ENERGY', 'name': ...","[Energy Department, Federal Energy Regulatory ...",88 FR 41262,"This rule is effective September 21, 2023.",,"Southwest Cold <span class=""match"">Weather</sp...",https://www.federalregister.gov/documents/2023...,26,"{'count': 1144, 'last_updated': '2023-11-06 08...",https://www.govinfo.gov/content/pkg/FR-2023-06...,2023-06-23,,,Transmission System Planning Performance Requi...,[],Rule
2,The Space Weather Advisory Group (SWAG) will m...,Notice of public meeting.,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...","[Commerce Department, National Oceanic and Atm...",88 FR 22411,"The meeting is scheduled as follows: April 17,...",,"Technology Council's Space <span class=""match""...",https://www.federalregister.gov/documents/2023...,1,"{'count': 101, 'last_updated': '2023-11-06 08:...",https://www.govinfo.gov/content/pkg/FR-2023-04...,2023-04-13,,,Space Weather Advisory Group Meeting,[],Notice
3,The Space Weather Advisory Group (SWAG) will m...,Notice of public meeting.,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...","[Commerce Department, National Oceanic and Atm...",88 FR 15681,"The meeting is scheduled as follows: March 20,...",,"Technology Council's Space <span class=""match""...",https://www.federalregister.gov/documents/2023...,1,"{'count': 207, 'last_updated': '2023-11-06 08:...",https://www.govinfo.gov/content/pkg/FR-2023-03...,2023-03-14,,,Space Weather Advisory Group Meeting,[],Notice
4,The U.S. Nuclear Regulatory Commission (NRC) i...,Final guide; issuance.,"[{'raw_name': 'NUCLEAR REGULATORY COMMISSION',...",[Nuclear Regulatory Commission],88 FR 67065,Revision 0 to RG 3.77 is available on Septembe...,,Commission (NRC) is issuing a new Regulatory G...,https://www.federalregister.gov/documents/2023...,2,"{'count': 57, 'last_updated': '2023-11-06 08:1...",https://www.govinfo.gov/content/pkg/FR-2023-09...,2023-09-29,,,Regulatory Guide: Weather-Related Administrati...,[],Rule


In the case above for each term there is a JSON file that gets created. After downloading the data from the API endpoin, as a data scientist you will need to decide how to process the information. In some cases you may want to iterate and combine the data into a single Pandas dataframe and then go from there depending on your use case.

# NOTEBOOK END