# API and HTTP requests

APIs (Application Programming Interfaces) are systems that allow computer programs to interact with one another. Many organizations that maintain large databases available on the internet will also provide an API that is designed to allow users to retrieve, post, or modify data by sending queries programmatically. For instance: you can get a list of current members of Congress by directing your web browser to <a href="https://www.congress.gov/search?q=%7B%22congress%22%3A%5B%22118%22%5D%2C%22source%22%3A%22members%22%7D">Congress.gov</a> and using the search bar, but you could also write a Python program that queries <a href="https://github.com/LibraryOfCongress/api.congress.gov/blob/main/Documentation/MemberEndpoint.md">the Congress.gov API</a> and retrieves the same information. Learning how to navigate these systems makes it very easy to collect data and analyze data from the web.

We'll use the <a href="https://pypi.org/project/requests/">Requests</a> library to interact with Web-based APIs using HTTP methods.

In [2]:
from requests import get # "get" requests can retrieve data from a server

# data manipulation
import pandas as pd 
import numpy as np

# How does the request package work?
We'll start with a simple example of using an API to get information about the International Space Station, such as location and people currently on the ISS. Information about this API can be found here: http://open-notify.org

Note: There are Python code examples provided in the documentation as well. We will be using slightly different code, but their code should work too! There are multiple modules you can use to access APIs, and we just use one possibility. Feel free to look at the code that they provide and see if you can figure out what is going on.


# Using the Open Notify ISS API
To access the API, we use the request function. In oder to tell Python what to access we need to specify the url of the API endpoint.

# Making a Request
When you ping a website or portal for information this is called making a request. That is exactly what the requests library has been designed to do.

### Step 1. Specify the URL

In [28]:
url = "http://api.open-notify.org/iss-now.json"

(you could also plug this url directly into the address bar in your browser and get a response. You're sending simple get requests every time you click a link)

### Step 2. Get the response

Now let's get the response using the URL defined above, using the requests library. We'll use the HTTP `get` method to retrieve data. Note that, as soon as we use `get()` we're sending a request to the server and (hopefully) retrieving data.  This will be important to keep in mind when using APIs that place limits on our 

In [29]:
# Response from the URL
# get is a function from requests

r = get(url) 

r.url

'http://api.open-notify.org/iss-now.json'

## Step 3. Check the Response Code
Before you can do anything with a website or URL in Python, it’s a good idea to check the current status code of said portal.

The following are some useful response codes to keep in mind:

`200` - the query parameters are all valid; the results will be in the body of the response

`400` - the query parameters are not valid, typically either because they are not in valid JSON format, or a specified field or value is not valid; the “status reason” in the header will contain the error message

`500` - there is an internal error with the processing of the query; the “status reason” in the header will contain the error message

If we get a `200`, we're ready to start checking out the results of our query.


In [30]:
r.status_code  # Check the status code


200

# Step 4. Get the Content (and maybe parse it)

Web browsers will use HTML to display information in an attractive format that's easy for humans to read, but when we're working with an API, all this extra formatting is just wasted space, so we'll usually get data a more computer-friendly format like <a href="https://www.json.org/json-en.html">JSON (Java Script Object Notation)</a>. 

JSON data will consist of a set of attributes followed by one or more values: 

In [31]:
print(r.content)

b'{"message": "success", "timestamp": 1725906108, "iss_position": {"latitude": "12.0068", "longitude": "161.9452"}}'


Despite the name, JSON is platform independent and can easily be converted into Python data by using the `json` method:

In [32]:
json_result = r.json()

In [33]:
type(json_result) # what kind of data is this? 

dict

In [34]:
json_result # view the data

{'message': 'success',
 'timestamp': 1725906108,
 'iss_position': {'latitude': '12.0068', 'longitude': '161.9452'}}

Here, this API gives us information on the timestamp, the message whether it was a success or not, and the ISS position. This isn't a super sophisticated API, because it really only gives information about the position of the ISS whenever you send a request, but it does give some information.

<b style="color:red;">Question 1: What is the length of json? What type of object is the value associated with the key iss_position?</b>

Sometimes, it can be hard to see exactly what is in the response. It might be useful to look at the keys to see what data we actually want.

In [35]:
json_result.keys()  # View JSON keys

dict_keys(['message', 'timestamp', 'iss_position'])

Note that we have three keys: message, iss_position, and timestamp. The information that we really want is in the iss_position key. We can try taking a look at it.

In [36]:
json_result['iss_position']


{'latitude': '12.0068', 'longitude': '161.9452'}

## Adding Queries to the API Request
The ISS API is a very simple example of an API. There is only one thing that we can get from it: the position of the ISS at the point in time that we send the request. Usually, we also have query parameters that we add so that we can specify exactly what data we want to get. For example, if you wanted to get data about the US, there's lots of different variables that you might be interested, over different time frames. These are things that you might need to specify to get the data you need.

Consider the Data USA API, which can be found here: https://datausa.io/about/api/. This is an API that you can use to get information about various statistics about the US, broken down by categories like State or Year. Let's look at an example of constructing the API query.

In [37]:
datausa_base_url = 'https://datausa.io/api/data'
parameters = {'drilldowns': 'State', 'measures':'Population' ,'year':2020}
datausa_response = get(datausa_base_url, params = parameters) 
datausa_response.status_code




200

In [38]:
datausa_response.url

'https://datausa.io/api/data?drilldowns=State&measures=Population&year=2020'

(here again, you might just try plugging this into your browser and taking a look at the result. The `get` function is taking our base URL and some query parameters and building a link for us)

Here, we start with the base URL and add the queries that we want to include. The way to define the parameters to get the data you want should generally be described within the API documentation (the Data USA website isn't the best about this, but they do include some examples to help you see how this might be constructed). In our example above, we want the `Population` of each state in the year 2020. Looking at the documentation from the Data USA site, we can see that we should specify a drilldown of `State`, a measure of `Population`, and a `year` of 2020. This helps us to construct the final URL which retrieves the data we want.

You can try looking at that URL and actually navigating to it. You should see the JSON of the response we get from it.

In [39]:
pop_by_state_2020 = datausa_response.json()

<b style="color:red;"> Question 2: What are the keys in pop_by_state_2020? What are the types of objects for the values for those keys? What is the source of the data that we pulled?</b>

<b style="color:red;">Question 3: Assign the population of Alabama to alabama_pop. Do not hard code anything (that is, retrieve the information from pop_by_state_2020 instead of just typing out the number after reading it).
</b>


## Exploring the Data
Looking through the various tools within the Data USA website, you should be able to find other drilldowns, measures, and characteristics you can request data about. For example, to get the total population in 2020 broken down by citizenship status, we can use the drilldown of Citizenship Status with a measure of Total Population and a year of 2020.


In [40]:
parameters = {'drilldowns': 'Citizenship Status', 'measure':'Total Population', "Year":2020}

response = get(datausa_base_url, params = parameters)
print(response.url)
response.json()


https://datausa.io/api/data?drilldowns=Citizenship+Status&measure=Total+Population&Year=2020


{'data': [{'ID Citizenship Status': 5,
   'Citizenship Status': 'Not a citizen of the U.S.',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 21981211},
  {'ID Citizenship Status': 1,
   'Citizenship Status': 'Born in the U.S.',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 277039429},
  {'ID Citizenship Status': 2,
   'Citizenship Status': 'Born in Puerto Rico, Guam, the U.S. Virgin Islands, or the Northern Marianas',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 5036174},
  {'ID Citizenship Status': 3,
   'Citizenship Status': 'Born abroad of American parent(s)',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 3250155},
  {'ID Citizenship Status': 4,
   'Citizenship Status': 'U.S. citizen by naturalization',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 22517983}],
 'source': [{'measures': ['Total Population'],
   'annotations': {'source_name': 'Census Bureau',
    'source_description': 'The American Community

In [41]:
response_with_full_url = get('https://datausa.io/api/data?drilldowns=Citizenship+Status&measure=Total+Population&Year=2020')


In [42]:
response_with_full_url.json()


{'data': [{'ID Citizenship Status': 5,
   'Citizenship Status': 'Not a citizen of the U.S.',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 21981211},
  {'ID Citizenship Status': 1,
   'Citizenship Status': 'Born in the U.S.',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 277039429},
  {'ID Citizenship Status': 2,
   'Citizenship Status': 'Born in Puerto Rico, Guam, the U.S. Virgin Islands, or the Northern Marianas',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 5036174},
  {'ID Citizenship Status': 3,
   'Citizenship Status': 'Born abroad of American parent(s)',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 3250155},
  {'ID Citizenship Status': 4,
   'Citizenship Status': 'U.S. citizen by naturalization',
   'ID Year': 2020,
   'Year': '2020',
   'Total Population': 22517983}],
 'source': [{'measures': ['Total Population'],
   'annotations': {'source_name': 'Census Bureau',
    'source_description': 'The American Community

<b style="color:red;">Question 4: Pull from the Data USA API to get the breakdown of the number of people by Gender in the US in the year 2020.</b>



In [43]:
# Hint: You can use Gender for this.


You can also include multiple variables in your parameters by including the variable names in a list. Take a look at the URL to see what happens when you do this. You should be able to see the way that the URL is constructed, as well as the resulting data that you get back from this request.

In [44]:
parameters = {'drilldowns': ['State','Citizenship Status'], 'measures':'Total Population', "Year":2020}

citizenship_by_state_response = get(datausa_base_url, params = parameters)
print(citizenship_by_state_response.url)
citizenship_by_state = citizenship_by_state_response.json()['data']

https://datausa.io/api/data?drilldowns=State&drilldowns=Citizenship+Status&measures=Total+Population&Year=2020


<b style="color:red;">Question 5: What type of object is citizenship_by_state? What is the length of citizenship_by_state? What are the types of objects that are inside citizenship_by_state?</b>


# Dictionary Comprehension
Dictionary comprehension is very similar to list comprehension, except we create a dictionary instead of a list as the output. We have the same format, except it is in curly braces ({}) and includes an expression for how we should define the keys as well as the values.

Recall: Loop structure looks like

`for i in <range>:`
<br>
    `   <some expression>`

Dictionary comprehension would look something like this: 

`{<key expression>:<value expression> for i in <range>}`


For example:

In [45]:
{x:x*2 for x in range(10)}

{0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14, 8: 16, 9: 18}

<b style="color:red;"><b style="color:red;">Question 6: Create a dictionary called noncitizens that contains the number of non-citizens in each state. The key should contain the state name and the value should be the number of non-citizens.<b>

<b style="color:red;">Question 7: In 2015, what was the average wage by race for male and female workers? Create two dictionaries, one called `male_wages` and one called `female_wages`, with keys representing race category and values representing the average wage for people in that group.</b>




APIs are generally useful because they are typically well-documented and come with example code. This is because the data provider wants to make the data available to others. However, there are many cases in which the documentation can be confusing or misleading. In addition, there might be times when building the URL can be a bit difficult or may not follow the exact conventions that you are used to. Feel free to try building the URL manually and navigating to it so that you can see
the JSON response before using it in Python. Sometimes, the best way to check something is by trying it out in the browser!