# Using APIs and JSON Data

![more_data](https://media.giphy.com/media/3o7TKSx0g7RqRniGFG/giphy.gif)

## Agenda

* Introduce JSON Schemas and learn how to interact with them
* Introduce APIs.
* Walk through how to make an API request. 
* Practice making API requests and parsing the data.


# What is a JSON file?

JSON stands for Java Script Object Notation. 

JSON objects are one way that data is transmitted over the web.  It is noteable for being lightweight, which makes it [preferred](https://stackoverflow.com/questions/383692/what-is-json-and-why-would-i-use-it), generally, over XML.

When you have time, check out this [link](https://www.json.org/json-en.html) to the JSON website.



## Format

JSON objects look similar to Python dictionaries, and you will see that they can be interacted with in a similar way.  They are both objects that contain information within open and closed curly braces.  JSON's look much like nested dictionaries.

## Take a second...
and open up the JSON file in the data folder.  If you are in Jupyter Lab, control-click and open with the editor.

Get familiar with the structure, and locate where the bulk of the data is held.

## Loading the JSON file

The new_releases.json file resulted from a query to the Spotify API for new releases. 

Let's begin by importing the json package, opening a file with python's built in function, and then loading that data in.

In [None]:
import json
import pandas as pd

with open('data/new_releases.json', 'rb') as read_file:
    data = json.load(read_file)



The Spotify API returned the data in the form of JSON object, which the json module transforms into a Python object.
![drake_dancing_weird](https://media.giphy.com/media/3o85xJohCZUc524lSU/giphy.gif)

## Exploring JSON Schemas  

Recall that JSON files have a **nested** structure. The most granular level of raw data will be individual numbers (float/int) and strings. These in turn will be stored in the equivalent of python lists and dictionaries. Because these can be combined, we'll start exploring by checking the type of our root object, and start mapping out the hierarchy of the json file.

## Quick Q:
if I run <br>
`type(data)`<br>
what will the output be?

In [None]:
# your answer here

As you can see, in this case, the first level of the hierarchy is a <>. Let's explore what keys are within this:

In [None]:
data.keys()

In this case, there is only a single key, 'albums', so we'll continue on down the pathway exploring and mapping out the hierarchy. Once again, let's start by checking the type of this nested data structure.

In [None]:
type(data['albums'])

In [None]:
data['albums'].keys()

At this point, things are starting to look something like this: 

![](images/json_diagram1.JPG)

At this point, if we were to continue checking individual data types, we have a lot to go through. To simplify this, let's use a for loop:

In [None]:
for key,value in data['albums'].items():
    print(key, type(value))

Adding this to our diagram we now have something like this:
![](images/json_diagram2.JPG)


Let's look at the type of items:

In [None]:
type(data['albums']['items'])

That is a json array, whose python equivalent is the list

In [None]:
data['albums']['items']

We access the values as we would a list, as it has now been converted into one:

In [None]:
data['albums']['items'][0]['artists'][0]['name']

In [None]:
data['albums']['items'][1]['artists'][0]['name']

# Pair Programming: 7 minutes

Now that we have some familiarity with how json objects function (and we already had some built-in familiarity with them given their similarity to dictionaries), let's practice grabbing some information.

The cells below won't run on your computer, but they are here for your future reference.  They show you how to navigate the Spotify API.  

Take a look at the 3rd cell down.  The offset parameter allows us to retrieve the next page of 20 records from the API. 

Run the cell with `import pickle`.

In [None]:

# with open('~/.ssh/spotify.json', 'rb') as read_file:
#     spot_keys = json.load(read_file)
    
# AUTH_URL = 'https://accounts.spotify.com/api/token'

# # POST
# auth_response = requests.post(AUTH_URL, {
#     'grant_type': 'client_credentials',
#     'client_id': spot_keys['Client ID'],
#     'client_secret': spot_keys['Client Secret'],
# })

# # convert the response to JSON
# auth_response_data = auth_response.json()

# # save the access token
# access_token = auth_response_data['access_token']


In [None]:
# import requests

# headers = {
#     'Authorization': 'Bearer {}'.format(access_token),
# }

In [None]:
# url = "https://api.spotify.com/v1/browse/new-releases"

# responses = []
# for i in range(0,101,20):
    
#     params = {'offset': i, 'limit':limit}
#     r = requests.get(url, headers=headers, params=params)
#     responses.append(r.content)
    

In [None]:
import pickle 

# with open('data/offset_newreleases.p','wb') as write_file:
#     pickle.dump(responses, write_file)

with open('data/offset_newreleases.p','rb') as read_file:
    responses = pickle.load(read_file)

The challenge of this exercise is to generate a list of the unique song names from new releases held in the `responses` variable loaded above. 

There are several skills you will have to employ, and here are some hints about how to employ them:
  - Check the type of an element of the responses list.  Checkout this [stackoverflow post](https://stackoverflow.com/questions/39719689/what-is-the-difference-between-json-load-and-json-loads-functions) to find how to convert the elements to dictionaries.
  - Figure out where the album name data is located and what key is associated with it.  Use the key at the correct level to access the data.
  - Use a for loop or list list comprehension to create a list of songs
  - Use the set() builtin method to filter out duplicates

In [None]:
# Your code here

![Arby_pharrell_bighat](https://media.giphy.com/media/MDwql6xEUzyjS/giphy.gif)

## What is an API?

In [None]:
# Unmute yourselves and discuss

**Application Program Interfaces**, or APIs, are commonly used to retrieve data from remote websites. Sites like Reddit, Twitter, and Facebook all offer certain data through their APIs. 

To use an API, you make a request to a remote web server, and retrieve the data you need.

Python has two built-in modules, `urllib` and `urllib2` to handle these requests but these can be very confusing  and the documentation is not clear.

To make these things simpler, one easy-to-use third-party library, known as` Requests`, is what most developers prefer to use instead or urllib/urllib2. With this library, you can access content like web page headers, form data, files, and parameters via simple Python commands. It also allows you to access the response data in a simple way.

![](images/logo.png)

Below is how you would install and import the requests library before making any requests. 
```python
# Uncomment and install requests if you dont have it already
# conda install -c anaconda requests

# Import requests to working environment
import requests
```

In [None]:
#importing requests method
import requests


## The `.get()` Method

Now we have requests library ready in our working environment, we can start making some requests using the `.get()` method as shown below:


We can use a simple GET request to retrieve information from the [OpenNotify API](http://open-notify.org/).

![astronaut_flossing](https://media.giphy.com/media/S99cgkURVO62qemEKM/giphy.gif)

OpenNotify has several API **endpoints**. An endpoint is a server route that is used to retrieve different data from the API. For example, the /comments endpoint on the Reddit API might retrieve information about comments, whereas the /users endpoint might retrieve data about users. To access them, you would add the endpoint to the base url of the API.



In [None]:
# Make a get request to get the latest position of the international space station from the opennotify api.
url = 'http://api.open-notify.org/iss-now.json'
response = requests.get(url)


GET is by far the most used HTTP method. We can use GET request to retrieve data from any destination. 


## Status Codes
The request we make may not be always successful. The best way is to check the status code which gets returned with the response. Here is how you would do this. 


In [None]:
# Print the status code of the response.
response.status_code

In [None]:
# https://httpstatusdogs.com/

So this is a good check to see if our request was successful. Depending on the status of the web server, the access rights of the clients and availibility of requested information. A web server may return a number of status codes within the response. Wikipedia has an exhaustive details on all these codes. [Check them out here](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

### Common status codes

* 200 — everything went okay, and the result has been returned (if any)
* 301 — the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
* 401 — the server thinks you’re not authenticated. This happens when you don’t send the right credentials to access an API (we’ll talk about authentication in a later post).
* 400 — the server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
* 403 — the resource you’re trying to access is forbidden — you don’t have the right permissions to see it.
* 404 — the resource you tried to access wasn’t found on the server.

In [None]:
# Let's check out who is in space right now!
url = 'http://api.open-notify.org/astros.json'
response = None

data = response.json()
data.keys()

In [None]:
# Quick Exercise: use a list comprehension to get the names of the people in space 
people = None

In [None]:
print(f"There are {data['number']} people in the space station right now!")
print("Their names are {}".format(', '.join(people)))

### Hitting the right endpoint

We’ll now make a GET request to http://api.open-notify.org/iss-pass.json.

In [None]:
url =  'http://api.open-notify.org/iss-pass.json'
response = requests.get(url)
response.status_code


## Response Contents
We can check the returned information using `.text` property of the response object. 
```python
print (resp.text)
```

In [None]:
response = requests.get("http://api.open-notify.org/iss-pass.json")

In [None]:
# In this case, the text gives us a failure message and a reason
response.text


### Query parameters

If you look at the documentation for the OpenNotify API, we see that the ISS Pass endpoint requires two parameters.

We can do this by adding an optional keyword argument, **params**, to our request. In this case, there are two parameters we need to pass:

* lat — The latitude of the location we want.
* lon — The longitude of the location we want.
We can make a dictionary with these parameters, and then pass them into the requests.get function.

We can also do the same thing directly by adding the query parameters to the url, like this: http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74.

It’s almost always preferable to setup the parameters as a dictionary, because requests takes care of some things that come up, like properly formatting the query parameters.

We’ll make a request using the coordinates of San Francisco, and see what response we get.

We can add parameters to the get method in the form of a dictionary.  In this instance, the dictionary parameter has two keys, lat and long.
Help me code out the correct request.

# Student screen share

In [None]:
%load_ext autoreload
%autoreload 2

from src.student_list import student_list
from src.student_caller import three_random_students

three_random_students(student_list)

One student from the above list will please share their screen.  Then, the rest of the class can help out via chat.

    1. Find the parameters for latitude and longitude of some place in SF.
    2. The goal is to fill in the correct parameters into the dictionary below.
    3. Fill response with the get method call.
    3. Pass the correct url into the get method.
    4. Correctly pass the parameters dictionary to the get method.
    5. Run the datetime cell below to convert the response to somthing legible.

In [None]:
# Our code here
parameters = {}

response = None

# Print the content of the response (the data the server returned)
print(response.content)
# This gets the same data as the command aboveresponse = requests.get("http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74")


In [None]:
# let's change that timestamp to something readable
from datetime import datetime

In [None]:
datetime.fromtimestamp(response.json()['response'][3]['risetime'])

So this returns a lot of information which by default is not really human understandable due to data encoding, HTML tags and other styling information that only a web browser can truly translate. In later lessons we shall look at how we can use ** Regular Expressions**  to clean this information and extract the required bits and pieces for analysis. 

## Response Headers
The response of an HTTP request can contain many headers that hold different bits of information. We can use `.headers` property of the response object to access the header information as shown below:


In [None]:
# Code here 
response.headers

You can see the key-value pairs holding various pieces of  information about the resource and request. Let's try to parse some of these values using the requests library:

```python
print(response.headers['Content-Length'])  # length of the response
print(response.headers['Date'])  # Date the response was sent
print(response.headers['server'])   # Server type (google web service - GWS)
```

# API's with Access Tokens


## Generating Access Tokens

In order to use many APIs, one needs to use OAuth which requires an access token. As such, our first step will be to generate this login information so that we can start making some requests.  

With that, lets go grab an access token from an API site and make some API calls!
Point your browser over to this [yelp page](https://www.yelp.com/developers/v3/manage_app) and start creating an app in order to obtain an api access token:


![](./images/yelp_app.png)

You can either sign in to an existing Yelp account, or create a new one, if needed.

On the page you see above, simply fill out some sample information such as "Flatiron Edu API Example" for the app name, or whatever floats your boat. Afterwards, you should be presented with an API key that you can use to make requests!

With that, it's time to start making some api calls!

In [None]:
#As a general rule of thumb, don't store passwords in a main file. 
#Instead, you would normally store those passwords under a sub file like passwords.py which you would then import.
#Or even better, as an environment variable that could then be imported!
# Never upload you keys to Github, and don't print out the keys you imported, then upload to Github.

with open('~/.ssh/yelp_api.json', 'rb') as read_file:
    keys = json.load(read_file)
    
client_id = keys['Client ID'] #Your client ID goes here (as a string) 
api_key = keys['API Key'] #Your api key goes here (as a string)



## An Example Request with OAuth <a id="oauth_request"></a>
https://www.yelp.com/developers/documentation/v3/get_started

Let's look at an example request and dissect it into its consituent parts:

In [None]:
term = 'Hamburgers'
location = 'Chicago IL'
SEARCH_LIMIT = 50

url = 'https://api.yelp.com/v3/businesses/search'

headers = {
    'Authorization': 'Bearer {}'.format(api_key),
}

url_params = {
    'term': term.replace(' ', '+'),
    'location': location.replace(' ', '+'),
    'limit': SEARCH_LIMIT,
    'offset': 0
}
response = requests.get(url, headers=headers, params=url_params)
print(response)
data = response.json()
data.keys()


In [None]:
data['businesses']

## Breaking Down the Request

As you can see, there are three main parts to our request.  
  
They are:
* The url
* The header
* The parameters
  
The url is fairly straightforward and is simply the base url as described in the documentation.

The header is a dictionary of key-value pairs. In this case, we are using a fairly standard header used by many APIs. It has a strict form where 'Authorization' is the key and 'Bearer YourApiKey' is the value.

The parameters are the filters which we wish to pass into the query. These will be embedded into the url when the request is made to the api. Similar to the header, they form key-value pairs. Valid key parameters by which to structure your queries, are described in the API documentation which we'll look at further shortly. A final important note however, is the need to replace spaces with "+". This is standard to many requests as URLs cannot contain spaces. (Note that the header itself isn't directly embedded into the url itself and as such, the space between 'Bearer' and YourApiKey is valid.)


## The Response

As before, our response object has both a status code, as well as the data itself. With that, let's start with a little data exploration!

In [None]:
response.json().keys()

Now let's go a bit further and start to preview what's stored in each of the values for these keys.


In [None]:
for key in response.json().keys():
    print(key)
    value = response.json()[key] 
    print(type(value), sep="\n\n")

Let's continue to preview these further to get a little better acquainted.


In [None]:
yelp_data =response.json()

yelp_data['businesses'][0]

In [None]:
df = pd.DataFrame(yelp_data['businesses'])

df.sort_values('review_count', ascending=False)

In [None]:
df[(df.review_count > 20)& (df.rating > 4)].sort_values('rating', ascending=False)

# Group Challenge: OpenAq

A big part of API's is parsing the documentation.  Here are the docs for OpenAq, which allows us access for air quality data across the globe. 

https://docs.openaq.org/

OpenAq does not require an authentication key.  

Goal: Plot the most recent 100 (which is the default return number) o3 readings in a location in one of our 3 locations: Seattle, Chicago, San Francisco     

In order to do so, you will have to become familiar with how OpenAQ expects it's paramater values.

Your final query should hit the measurements endpoints, which will require you pass in a params dictionary with 'country', 'city', and 'location' keys.  

However, the correct values for each of those parameters has a specific form.  You can get a sense of appropriate parameter values by hitting the measurement endpoint with no parameters.


In [None]:
r = requests.get('https://api.openaq.org/v1/measurements/')
json.loads(r.content)['results'][:2]

But, in order to get a location San Fran, Chicago, or Seattle, we will need to inspect what are the appropriate inputs parameters.  We can explore potential inputs with some different endpoints:
  - [Countries](https://docs.openaq.org/#api-Countries)
  - [Cities](https://docs.openaq.org/#api-Cities)
  - [Locations](https://api.openaq.org/v1/locations)


Once you have sorted through how to pass your location to the measurements endpoint,fill in the cell below.

In [None]:
params = {"country": , "city":, 'location': }

r = requests.get('https://api.openaq.org/v1/measurements/', params=params)


Your plot may look something like this at the end.  Don't worry if you can't get all of the details of the plot working (i.e. tick labels). Focus first on producing some sort of visualization, then move on to cleaning it up.

![evanston](images/evanston_airq.svg)