<h1 align='center'>From REST to data frame - interacting with REST APIs using Python </h1>

<center><h4 align='center'>Laura Gutierrez Funderburk (@LGFunderburk) </h4></center>


<center><img src="api.png" alt="Drawing" style="width: 400px;"/></center>


In this talk we will explore the Python libraries `requests`, `json`, `pandas` and `plotly` to interact with data from a REST API, format the data into a data frame and create interactive plots.


<h2 align='center'> Overview </h2>

- Introductions

- What is an API?

- What forms part of an API?

- Why (and when) use an API?

- What is REST API?

- Where to find public APIs?

- Time to code: the Internet Archive & Age of Empires II

- Q&A

<h2 align='center'> Introductions </h2>


<center><img src="https://media1.giphy.com/media/j1soPQE95y0eXhMwKT/source.gif" alt="Drawing" style="width: 500px;"/></center>



<h2 align='center'> What is an API? </h2>

An API (application programming interface) is a collection of programming code whose task is to facilitate data transmission between one software product and another. This transmission is bound to terms determined by the code.

<center><img src="https://lh6.googleusercontent.com/_nYCLkTG8PO_Wx5-jkdIq3Wyo1PBGrh4wiincNkb5cnZijpxfzCclyxMJbNwYItS8T3Iao-Cqm5WphPy4d4Mlihszki6MBiqZA7NBwTnuFxU49ppZEZyv8v0QEEVAA9Ai3buwhG7" alt="Drawing" style="width: 1000px;"/></center>

<em>How API works. Source: Medium</em>


<h2 align='center'>What forms part of an API?</h2>

APIs consist of two main components:

- Technical specification: this component describes data exchange options between two actors -  request for processing and delivery protocols.

- Software interface written to the specification that represents it (web browser + interface, for example).

<h2 align='center'> Why (and when) use APIs? </h2>

- When looking for data - there are millions of APIs online which provide access to data. 
- Data changes regularly (stock price data) - need mechanism that allows to update data as needed. CSV format can be cumbersome, takes a lot of bandwidth and it is slow.
- You want a small subset of the data. 
- There is repeated computation involved.



<h2 align='center'> What is a REST API?</h2>

* A REST API (also referred to as a RESTful web service or RESTful API)  is an architectural style for an API that uses HTTP requests to access and use data (think url).

* Based on representational state transfer (REST). This style and approach to communications often used in web services development.

* REST uses less bandwidth $\Rightarrow$ more suitable for efficient internet usage. 

* REST APIs can also be built with programming languages such as JavaScript or Python (though we won't explore how to do this in this meetup).

<h2 align='center'>Commands a REST API uses</h2>

A REST API uses commands to obtain resources. The state of a resource at any given timestamp is called a resource representation. A REST API uses existing HTTP methodologies defined by the RFC 2616 protocol, such as:

* GET: retrieve a resource.
* PUT: change the state of or update a resource, which can be an object, file or block.
* POST: create that resource.
* DELETE to remove it.

<b>In this meetup, we will focus on using GET.</b>

<h2 align='center'>API Status Codes</h2>

Whenever we make a request to a web server, we get status codes. These codes encode information on the success of a request. Here are some codes that are relevant to GET requests:

* 200: get request was successful. 
* 301: redirection to a different endpoint (new domain name, new endpoint name). 
* 400: a bad request (not sending right data, errors in request).
* 401: Many APIs require login credentials, this code indicates right credentials have not been sent. 
* 403: Forbidden: you don’t have the right permissions to see it.
* 404: The resource you tried to access wasn’t found on the server.
* 503: The server is not ready to handle the request.

<h2 align='center'> Where to find public (REST) APIs? </h2>

Many cities worldwide have a portal with open data - i.e. look up Ireland open data.

GitHub has compiled a list of public APIs https://github.com/public-apis/public-apis

<h2 align='center'>Code time</h2>

We will explore two APIs from that list: the Internet Archive (PHP based) and an API hosted using Heroku that contains data on Age of Empires II. 

Both can be accessed via a URL, however "queries" have a different structure on each. You should expect you will need to spend time getting familiar with a database's API documentation before you can get started running queries. This holds true for REST APIs as well. 

<h2 align='center'>Python libraries we will use</h2>

In [None]:
import requests
from requests.exceptions import HTTPError
import json
import pandas as pd
import plotly.express as px

<h2 align='center'>Our query-performing function</h2>

In [None]:
def query_entry_pt(url):
    """This function takes as input a URL entry point and returns the complete JSON response in a REST API
    
    Input:
        - url(string): complete url (or entry point) pointing at server 
        
    Output:
        - jsonResponse(json object): JSON response associated wtih query
    
    """
    try:
        # Using GET command 
        response = requests.get(url)
        # Raise issues if response is different from 200
        response.raise_for_status()
        # access JSOn content
        jsonResponse = response.json()
        print("Success!",response)
        return jsonResponse

    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
    except Exception as err:
        print(f'Other error occurred: {err}')

<h2 align='center'>Let's try getting information from a non-existent API</h2>

In [None]:
# API that does not exist
url_no_api = "http://api.open-notify.org/this-api-doesnt-exist"
query_entry_pt(url_no_api)

In [None]:
# Web page that is NOT an API
url_no_api = "https://en.wikipedia.org/wiki/Main_Page"
query_entry_pt(url_no_api)

## The Internet Archive

About: Internet Archive is a non-profit library of millions of free books, movies, software, music, websites, and more.

Source https://archive.org/advancedsearch.php 

In [None]:
# API (should) but we made a mistake in the query
url_archive='https://archive.org/services/search/v1/scrape?fields=title,month,year,downloads&q=collection%nasa'
query_entry_pt(url_archive)

In [None]:
# API returns information
url_nasa='https://archive.org/services/search/v1/scrape?fields=title,month,year,downloads,collection&q=collection%3Anasa'
jsonResponse_nasa=query_entry_pt(url_nasa)

In [None]:
# Keys 
jsonResponse_nasa.keys()

<h2 align='center'>Formatting the data using pandas</h2>

In [None]:
jsonResponse_nasa

<h2 align='center'>Formatting the data using pandas</h2>


In [None]:
# Format as a data frame
pd.json_normalize(jsonResponse_nasa)

In [None]:
# Format as a data frame
flattened_nasa = pd.json_normalize(jsonResponse_nasa,record_path='items')

flattened_nasa.head()

<h2 align='center'>Cleaning the data using pandas</h2>


In [None]:
#[item for item in flattened_nasa['collection'] if 'jsc-pao-video-collection' in item]
flattened_nasa['collection']= flattened_nasa["collection"].str[0]

In [None]:
flattened_nasa.info()

<h2 align='center'>Visualizing some data using Plotly</h2>


In [None]:
# audiocollection, nasa, jsc-pao-video-collection
subset = flattened_nasa[flattened_nasa['collection']=='langleyresearchcentermediaarchive']

px.scatter(data_frame=subset,x='year',y='downloads',hover_name='title',title='Number of times an item was visited')

### Learning about McDonnell Douglas Experimental Winglets on DC 10 


See https://archive.org/details/1981-L-12090

* [May 25, 1979]: the McDonnell Douglas DC-10-10 that operated the flight, took off from runway 32R only to crash after the left engine detached. 

* Crash resulted in the death of 258 passengers and 13 crew members on board, in addition to two people on the ground. 

* This is the deadliest aviation accident to have occurred in the United States.


----> Downloads indicates the number of times an item has been viewed, and does not reflect number of visits at any given time, only the total up to date. 

### Learning about the most visited video produced by NASA

In [None]:
subset = flattened_nasa[flattened_nasa['collection']=='jsc-pao-video-collection']

px.scatter(data_frame=subset,x='year',y='downloads',hover_name='title',title='Number of times an item was visited')

In [None]:
# Video posted Feb 27, 2015
from IPython.display import YouTubeVideo

YouTubeVideo('2AwUnkrqNyw', width=800, height=300)

## Different APIs are structured differently

#### Refer to the documentation!



## Age Of Empires II API

Simple API to retrieve resources related to Age of Empires II. The base URL for retrieving the resources is /api/v1

Source https://age-of-empires-2-api.herokuapp.com/docs/#/ 

In [None]:
# Age of Empires entry points
entry_point = 'https://age-of-empires-2-api.herokuapp.com/api/v1/'
aoe_civ=entry_point + 'civilizations'

# Perform queries
jsonResponse_civ=query_entry_pt(aoe_civ)

# Flaten JSON response
flattened_civ = pd.json_normalize(jsonResponse_civ,record_path='civilizations')



In [None]:
print("Civilizations")
display(flattened_civ.head())

## Let's take a look at technologies


In [None]:
print("Technologies")
# Perform queries
aoe_tech = entry_point +  'technologies'
jsonResponse_tech=query_entry_pt(aoe_tech)
# Normalize 
flattened_tech = pd.json_normalize(jsonResponse_tech,record_path='technologies')

display(flattened_tech.head())

In [None]:
px.scatter(data_frame=flattened_tech,x='build_time',y='cost.Food',color='age',hover_name='name',
          labels={"build_time":"Build time","cost.Food":"Food cost"},
          title='Cost vs build time of technologies (colored by age)')

In [None]:
px.scatter_3d(data_frame=flattened_tech,x='build_time',y='cost.Wood',z='cost.Food',color='age',hover_name='name',
          labels={"build_time":"Build time","cost.Wood":"Wood cost","cost.Food":"Food cost"},
          title='Cost vs build time of technologies (colored by age)')

In [None]:
px.violin(data_frame=flattened_tech,x='age',y='build_time',points='all',hover_name='name',
         labels={"age":"Age","build_time":"Build time"}, title="Violin plot: Built time per age")

## Let's take a look at structures

In [None]:
print("Structures")
# Perform query
aoe_stru = entry_point + 'structures'
jsonResponse_stru=query_entry_pt(aoe_stru)
# Normalize 
flattened_stru = pd.json_normalize(jsonResponse_stru,record_path='structures')
display(flattened_stru.head())

In [None]:
# array(['Dark', 'Feudal', 'Castle', 'Imperial'], dtype=object)
subset_dat = flattened_stru[flattened_stru['age']=='Dark']
px.bar(data_frame=subset_dat,y='build_time',x='name',
            title='Build time per age',
       labels={"age":"Age","build_time":"total build time"}).update_xaxes(categoryorder='total descending')

In [None]:
px.box(data_frame=flattened_stru,y='hit_points',x='age',points='all',hover_name='name',
            title='Hit points per age',labels={"age":"Age","hit_points":"Hit points"})

<h2 align='center'>What did we explore?</h2>

* We learned about APIs, and in particular REST APIs - this interface allows us to interact with the data on a server, via a URL  

* We learned about different status codes returned based on the response 

* Although REST API base their architecture on HTTP, structure of queries differs across different entry points and servers

* We learned how we can use the requests library to GET data

* We parsed JSON response using pandas 

* We visualized data using Plotly



<h2 align='center'>Q & A</h2>


<center><img src="https://lh6.googleusercontent.com/_nYCLkTG8PO_Wx5-jkdIq3Wyo1PBGrh4wiincNkb5cnZijpxfzCclyxMJbNwYItS8T3Iao-Cqm5WphPy4d4Mlihszki6MBiqZA7NBwTnuFxU49ppZEZyv8v0QEEVAA9Ai3buwhG7" alt="Drawing" style="width: 1000px;"/></center>

<h1 align='center'>Thank you! </h1>


<center><img src="api.png" alt="Drawing" style="width: 400px;"/> </center>

<h4 align='center'>If you'd like to chat after the meetup, you can send me message on Twitter @LGFunderburk </h4>