# APIs and HTTP requests

APIs (Application Programming Interfaces) are systems that allow computer programs to interact with one another. Many organizations that maintain large databases available on the internet will also provide an API that is designed to allow users to retrieve, post, or modify data by sending queries programmatically. For instance: you can get a list of current members of Congress by directing your web browser to <a href="https://www.congress.gov/search?q=%7B%22congress%22%3A%5B%22118%22%5D%2C%22source%22%3A%22members%22%7D">Congress.gov</a> and using the search bar, but you could also write a Python program that queries <a href="https://github.com/LibraryOfCongress/api.congress.gov/blob/main/Documentation/MemberEndpoint.md">the Congress.gov API</a> and retrieves the same information. Learning how to navigate these systems makes it very easy to collect data and analyze data from the web.

We'll use the <a href="https://pypi.org/project/requests/">Requests</a> library to interact with Web-based APIs using HTTP methods.

In [None]:
from requests import get # "get" requests can retrieve data from a server
import time # for pausing between iterations of a loop
# data manipulation
import pandas as pd 
import numpy as np


# How does the request package work?
We'll start with a simple example of using an API to get information about the International Space Station, such as location and people currently on the ISS. Information about this API can be found here: http://open-notify.org

Note: There are Python code examples provided in the documentation as well. We will be using slightly different code, but their code should work too! There are multiple modules you can use to access APIs, and we just use one possibility. Feel free to look at the code that they provide and see if you can figure out what is going on.


# Using the Open Notify ISS API
To access the API, we use the request function. In oder to tell Python what to access we need to specify the url of the API endpoint.

# Making a Request
When you ping a website or portal for information this is called making a request. That is exactly what the requests library has been designed to do.

### Step 1. Specify the URL

In [None]:
url = "http://api.open-notify.org/iss-now.json"

(you could also plug this url directly into the address bar in your browser and get a response. You're sending simple get requests every time you click a link)

### Step 2. Get the response

Now let's get the response using the URL defined above, using the requests library. We'll use the HTTP `get` method to retrieve data. Note that, as soon as we use `get()` we're sending a request to the server and (hopefully) retrieving data.  This will be important to keep in mind since many sources will place limits on the number of requests we can send in a given time period.

In [None]:
# Response from the URL
# get is a function from requests

r = get(url) 

r.url

## Step 3. Check the Response Code
Before you can do anything with a website or URL in Python, it’s a good idea to check the current status code of said portal.

The following are some useful response codes to keep in mind:

`200` - the query parameters are all valid; the results will be in the body of the response

`400` - the query parameters are not valid, typically either because they are not in valid JSON format, or a specified field or value is not valid; the “status reason” in the header will contain the error message

`500` - there is an internal error with the processing of the query; the “status reason” in the header will contain the error message

If we get a `200`, we're ready to start checking out the results of our query.


In [None]:
r.status_code  # Check the status code


## Step 4. Get the Content (and maybe parse it)

Web browsers will use HTML to display information in an attractive format that's easy for humans to read, but when we're working with an API, all this extra formatting is just wasted space, so we'll usually get data a more computer-friendly format like <a href="https://www.json.org/json-en.html">JSON (Java Script Object Notation)</a>. 

JSON data will consist of a set of attributes followed by one or more values: 

In [None]:
print(r.content)

Despite the name, JSON is platform independent and can easily be converted into Python data by using the `json` method:

In [None]:
json_result = r.json()

In [None]:
type(json_result) # what kind of data is this? 

In [None]:
json_result # view the data

Here, this API gives us information on the timestamp, the message whether it was a success or not, and the ISS position. This isn't a super sophisticated API, because it really only gives information about the position of the ISS whenever you send a request, but it does give some information.

Sometimes, it can be hard to see exactly what is in the response. Since this object is a python dictioanry, we can use the `.keys` method to list the fields available in this response:

In [None]:
json_result.keys()  # View JSON keys

Alternatively, to just plug the request url into a browser bar and look at the result

In [None]:
r.url

Note that we have three keys: message, iss_position, and timestamp. The information that we really want is in the iss_position key. We can try taking a look at it.

In [None]:
json_result['iss_position']


<b style="color:red;">Question 1: use the open-notify API to find the number and names of all of the people currently in space.</b>

You'll want to send a `get` request to this URL: http://api.open-notify.org/astros.json

# Processing complex API data
The ISS API is a very simple example of an API. There is only one thing that we can get from it: the position of the ISS at the point in time that we send the request. Usually, we also have query parameters that we add so that we can specify exactly what data we want to get, and we'll often need to use pagination to navigate through large amounts of data in manageable chunks.

We'll use the pokeapi, a suprisingly well-documented API with information about various pokemon, as an example. The PokeApi, like most larger data sources, organizes their data into different sets of resources that users can query. For instance:

 - [https://pokeapi.co/api/v2/pokemon](https://pokeapi.co/api/v2/pokemon) for pokemon

 - [https://pokeapi.co/api/v2/move](https://pokeapi.co/api/v2/move) for moves

 - [https://pokeapi.co/api/v2/item](https://pokeapi.co/api/v2/item) for items


You can find a list of those resources [here](https://pokeapi.co/docs/v2#info)



We'll start by accessing the "pokemon" resource. For now, you don't need to do anything in python, just enter this url into your browser and take a look at the result:

[https://pokeapi.co/api/v2/pokemon](https://pokeapi.co/api/v2/pokemon)



You might notice that the data here are a little more complex that what we got from the ISS API: the `count`, `next`, and `previous` keys all contain just a single value, but the `results` key contains a list with multiple dictionaries nested inside it. This kind of nesting structure is very common when working with data from APIs because its an efficient way to transfer and store data, but its generally not a good format for analysis.

Reshaping this kind of data into something that we can put in a table will be one of the key challenges of working with most APIs.

Lets pull in a single response and look at it in python:

In [None]:
base_url  = 'https://pokeapi.co/api/v2/pokemon'
r = get(base_url)


In [None]:

r.json()

For now, ignore the `count`, `next`, and `previous` parts and just look at the `results` object. How do we get this into a more useable format?

If want to put all the the `url` values into a single list. I could just use a list comprehension to iterate through each results:

In [None]:
one_result = r.json()['results']

[i['url'] for i in one_result]

However, in most cases, our end-goal is going to be to put this kind of API response into a pandas `DataFrame` object so that we can easily do things like plot results or calculate summary statistics. 


If our data is already stored in a format where we just have a list of dictionary objects, the `pd.DataFrame` function will automatically restructure the response and make it into a data frame:

In [None]:
result_df = pd.DataFrame(one_result)

# .head  method prints the first few rows
result_df.head()

This won't always work! Sometimes the data will have multiple layers of nesting and we'll need to do some additional refactoring before we can really put anything into a data frame, but we can usually use some list comprehensions/loops to process the data into something that can be easily turned into a pandas `DataFrame`

## Pagination and Request Parameters

You'll notice that we've only got a about 20 pokemon listed here. There's a lot more than that, but we can't access it right away because the PokeApi uses pagination to limit the number of results per query. To collect a complete list, we'll need to send multiple queries to retrieve each "page" of data. 

We'll do this pagination process by incrementing an **offset parameter** that tells the API what parts of the full data set to send us. If we were doing this manually, we would write out URLS like this:

- results 1-20: https://pokeapi.co/api/v2/pokemon/?offset=0 
- results 21-40: https://pokeapi.co/api/v2/pokemon/?offset=20
- results 41-60: https://pokeapi.co/api/v2/pokemon/?offset=40

... and so on.

However, since life is short, we can do this programmatically by using the `get` function with the params argument in a loop.



To send a single parameterized query, we can pass a dictionary where the keys represent the name of each parameter, and the values indicate the value of each parameter. So here's how we would send a single request with the `offset` at 20:

In [None]:
base_url  = 'https://pokeapi.co/api/v2/pokemon'
params = {'offset' : 20}

r = get(base_url, params = params)

In [None]:
# note the url:
r.url

So doing this in a loop should be reasonably straightforward: we just need a loop that increments the offset parameter by 20 after each iteration. Here's a loop that just illustrates that idea:

In [None]:
# just showing the general idea here
for i in range(5):
    print("offset :", 20 * i)

Since we don't want to send too many requests at once, we'll put a small pause between each iteration of the loop with `time.sleep()`. I'll also add a `print(i, end =" ")` line that just prints the current value of `i` after each iteration. This will let us keep track of how the loop is progressing.

In [None]:
base_url  = 'https://pokeapi.co/api/v2/pokemon'
result_list = []
for i in range(5):
    params = {'offset' : 20 * i}
    r = get(base_url, params = params)
    result_list.append(r.json())
    time.sleep(.3)
    print(i, end =" ")
    

That's it! So now we've got the first five pages

In [None]:
len(result_list)

## While Loops

We've got a method that gives us a certain number of pages, but what if we just want to collect all of the data from this resource? The simplest and most generalizable way to do this is to use a `while` loop, which is a kind of loop that repeats until a logical expression = `False`. For example, here's a loop that increments `i` while it is less than 5 and then breaks:


In [None]:
i = 0
while i < 5:
    print(i)
    i = i +1 

<div class="alert alert-block alert-info"> 
<b>NOTE</b> unlike a <code>for</code> loop, I actually need to "manually" increase the value of <code>i</code> in each iteration. If I don't do this, the loop just runs forever. If you find things are running for a very long time, you can press the little square button at the top of the page to stop the kernel)
</div>


So <b>I just need a while loop that runs until there are no more results to retrieve.</b>

To make it a little easier, here's the URL for the last page of data:
https://pokeapi.co/api/v2/pokemon?offset=1304

<b style="color:red;">Question 2: see if you can write a logical expression that evaluates to `True` for the `firstpage` response, and `False` for the `lastpage` response</b>


In [None]:
firstpage = get('https://pokeapi.co/api/v2/pokemon?offset=0').json()
lastpage = get('https://pokeapi.co/api/v2/pokemon?offset=1304').json()

In [None]:
#

<b style="color:red;"> Question 3: Now try to use the expression you set up above to create a while loop that collects all of the data</b>

Here's a rough outline of what you should be doing:

```
all_results = []
i = 0
morepages = True
while morepages == True: 
    1. send a request
    2. append the response to all_results
    3. set morepages = [some expression that is False when you have reached the last page]
    4. increment i 
```



In [None]:
#


### Possible Solutions
You can click the cells below to see some potential solutions. But only do this if you're stuck

<details> 
<summary>Click to see possible answer 1</summary>    

## Solution 1


Here's one possible version that just runs until `morepages = len(result['results'])> 0`. Once we reach the last page, we get an empty list, so this should evaluate to False, which will break the loop
``` python
i = 0 
morepages = True
pokemon_results = []

while morepages == True:
    params = {'offset' : 20 * i}
    r = get(base_url, params = params)
    response = r.json()
    pokemon_results.append(response)
    time.sleep(.3)
    morepages = len(response['results'])> 0 
    print(i, end =" ")
    i+=1
```
</details>   

<details> 
<summary>Click to see possible answer 2</summary>    

## Solution 2
Here's another option that uses some metadata provided by the API. Since it gives us a `next` url with each response, we can actually just use this response to paginate through our results and break the loop when there are no more next urls to retrieve.

``` python
i = 0 
morepages = True
pokemon_results = []
request_url = 'https://pokeapi.co/api/v2/pokemon'

while request_url is not None:
    response = get(request_url).json()
    pokemon_results.append(response)
    request_url = response['next']
    time.sleep(.3)
    print(request_url, end ="\r")
```
</details>

If we want to be a little more efficient: we can use the `limit` parameter to adjust how much data is returned from each request. The default is 20, but we can increase this to a larger value so that we can retrieve the data in a smaller number of requests. The only catch is that we need to also adjust how we increment the offset parameter (it should increase by the value of limit after every iteration)

In [None]:
pagelength = 250
i = 0 
morepages = True
pokemon_results = []

while morepages == True:
    params = {'offset' : pagelength * i, 'limit': pagelength} # adjusting the offset AND limit parameters
    r = get(base_url, params = params)
    response = r.json()
    pokemon_results.append(response)
    time.sleep(.3)
    morepages = len(response['results'])> 0 
    print(i, end =" ")
    i+=1

## Processing the results

Now that we've retrieved our results, we want to do something with the data we've assembled. In many cases we'll want to do some additional restructuring, but in this instance we probably just want to retrieve the `results` part of our `pokemon_results` list and store it as one big data set.

In [None]:
len(pokemon_results) # the number of pages we collected

In [None]:
# accessing the first five values of the "results" part of the 1st page of data
pokemon_results[0]['results'][:5]

We already know that we can turn the `results` list into a DataFrame with `pd.DataFrame`, but now we need to do that same process to the multiple pages of data we just collected.

We'll use a list comprehension to turn each page of data into a separate data frame, and then use `pd.concat` to concatenate all of them into one large data frame object

In [None]:
datalist = [pd.DataFrame(i['results']) for i in pokemon_results]

df = pd.concat(datalist).reset_index() # reset index ensures the row indices are unique
df.shape # get the dimensions of the data frame:

In [None]:
# view the first few rows:
df.tail()

# Querying other resources

Okay, so we've assembled a list of all pokemon by name and also a set of urls. Following those urls will take us to *another* resource with data about a specific pokemon. 

https://pokeapi.co/api/v2/pokemon/1/


If we send a get request to that URL, we can pull in even more data about the selected pokemon. This response object is a little more complex:

In [None]:
url = df['url'][0]
pokemon = get(url)
data = pokemon.json()

data.keys()


The response object contains a lot of data that we don't necessarily want, but we can create a dictionary that only contains elements we care about like this. I'm going to extract only the name, weight, height, and a like to an image file of each pokemon:

<b style="color:red;"> Question 4: write code to create a dictionary with the following: the `name`, `weight`, `height` and the `front_default` element from the `sprites` key.</b>

In [None]:
#




What if I want to do this for the first 20 pokemon? Here again, I can use a loop. In this case, we're going to use our loop to do the following:
1. Navigate to urls 1 through 20
2. With each response, extract the `name`, `weight`, `height`, and `sprite` objects
3. Store these 4 values in a list


As with the previous case, I want to use `time.sleep` to put a small pause between each query. In some cases, APIs will have a specific limit on how many requests you can send in a given time frame, but in this case there's no set limit, so we're just going to use a delay of .3 seconds as a courtesy

<b style="color:red;"> Question 5: Use a loop and the code you wrote in question three to create a nested list called `pokeinfo` that contains `name`, `weight`, `height` and the `front_default` for the first 20 pokemon</b>

Hint: you can access the first twenty urls like this: `df['url'][:20]`

Example:


In [None]:
for i in df['url'][:20]:
    print(i)

In [None]:
# Your code    

Once I've collected my data, I can use the `pd.DataFrame` function again to combine my list of dictionaries into a single data frame:

In [None]:
pokedata = pd.DataFrame(pokeinfo)

In [None]:
pokedata.head()

# Making something

This is skipping ahead quite a bit, but I wanted to illustrate cool stuff you can do with API data. So the code in this section is going to create an interactive scatter plot showing the heights and weights of pokemon, using the imagess of the pokemon themselves in place of the points.

In [None]:
pokeinfo = []

for i in df['url'][:20]:
    pokemon = get(i)
    data = pokemon.json()
    res = {
        'url' : i,
        'name' : data['name'],
        'weight': data['weight'], 
        'height': data['height'],
        'sprite' : data['sprites']['front_default']
        
    }
    pokeinfo.append(res)
    time.sleep(.3)
    print(i, end='\r')


pokedata = pd.DataFrame(pokeinfo)
pokedata.head()

We can use plotly to make an interactive scatter plot from our data. If you hover over a point on this plot, you should be able to see a popover that shows you the name of each Pokemon:

In [None]:
from PIL import Image
import plotly.express as px

fig = px.scatter(
    pokedata,
    x="height",
    y="weight",
  #  size = pokedata['height'] * pokedata['weight'],
    hover_name="name",
    template="simple_white",
    width=800, height=600
)
fig.show()

We can do something a bit more interesting, though: the `sprite` column has a link to an image file that shows an image of each pokemon. We're going to use another loop to download these images into a folder so that we can use them in our plot

In [None]:
import urllib.request
import os
os.makedirs('pokemon', exist_ok=True) # creates a directory 
for i, row in pokedata.iterrows():   # iterates through each row of data
    img = 'pokemon/' + row["name"] + '.png'   # the new file name we'll use for each image
    if os.path.exists(img):                   # checking to make sure we haven't already downloaded this
        next
    else:
        urllib.request.urlretrieve(row["sprite"], 'pokemon/' + row["name"] + '.png') # downloading the image and storing in a local folder

Now we're going to replace the points with images of each Pokemon. We're also going to scale the images so that they roughly correspond to the relative size of each one. 

In [None]:
fig = px.scatter(
    pokedata,
    x="height",
    y="weight",
  #  size = pokedata['height'] * pokedata['weight'],
    hover_name="name",
    template="simple_white",
    width=800, height=600
)

fig.update_traces(marker_color="rgba(0,0,0,0)") # make the original points white so they don't show up on the graph
maxDim = pokedata[["height", "weight"]].max().idxmax()  # get some information on the maximum dimensions so we can scale the plots 
maxi = pokedata[maxDim].max() 
for i, row in pokedata.iterrows():
   # country = row['country'].replace(" ", "-")
    img = 'pokemon/' + row["name"] + '.png'
    fig.add_layout_image(
        dict(
            source=Image.open(img),
            xref="x",
            yref="y",
            xanchor="center",
            yanchor="middle",
            x=row["height"],
            y=row["weight"],
       #    sizex=np.sqrt(row["height"] / pokedata["height"].max()) * maxi * .5 + maxi * 0.05,
         #  sizey=np.sqrt(row["weight"] / pokedata["weight"].max()) * maxi * .5 + maxi * 0.05,
            sizex = row['height'],
            sizey=row['weight'],
            sizing="contain",
            opacity=0.8,
            layer="above"
        )
    )

# changing the dimensions a bit
fig.update_layout(yaxis_range=[0,1400],
                  xaxis_range = [0, 30]
                 )
#fig.write_html("pokemon_sizes.html")

fig.show()