In [1]:
import requests
import pandas as pd
import io

# APIs

An Application Programming Interface, or API, is a structured way to retrieve data from a website. Using an API is safer and easier than something like webscraping, since what you get back is already in a usable format. Many organizations use APIs like:
- Government organizations ([US Government](https://www.data.gov/developers/apis))
- United States Census ([Census API](https://www.census.gov/data/developers/data-sets.html))
- Large companies ([Twitter API](https://developer.twitter.com/en/docs))
- News organizations ([NYT API](https://developer.nytimes.com/))
- And [many more](https://github.com/public-apis/public-apis)

If you type `how to use an api in python` in google, you get back many articles walking through how to use an API. It is a well documented and useful tool to be familiar with.

The CDC has a large number of datasets avaiable at https://data.cdc.gov/, which can be accessed using their API. 

Let's look at the COVID-19 Case Surveillance Public Use Data with Geography, located at https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data-with-Ge/n8mc-b4w4.

The COVID-19 case surveillance database includes patient-level data reported by U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates.

If you browse to this page, you should see a button in the upper right labeled "API". Click this button and then change the value in the bottom right from JSON to CSV. Copy the API Endpoint.

In [7]:
URL = 'https://data.cdc.gov/resource/n8mc-b4w4.csv'

To interact with this we'll use the `requests` library.

In [8]:
response = requests.get(URL)

Once we have this response, we can check the status code. A 200 indicates success.

In [9]:
response.status_code

200

To take the result, we need to use the _io_ library to get it into a format that can be read into a _pandas_ DataFrame.

In [10]:
covid = pd.read_csv(io.StringIO(response.text))

In [11]:
covid

Unnamed: 0,case_month,res_state,state_fips_code,res_county,county_fips_code,age_group,sex,race,ethnicity,case_positive_specimen,case_onset_interval,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,death_yn,underlying_conditions_yn
0,2020-12,AL,1,ST. CLAIR,26147.0,,,,,,,Missing,Missing,Probable Case,Missing,Missing,Missing,Missing,
1,2020-05,AR,5,GREENE,5055.0,0 - 17 years,,,,0.0,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,No,Missing,,Yes
2,2020-11,OH,39,HENRY,39069.0,0 - 17 years,,,,,0.0,Clinical evaluation,Unknown,Laboratory-confirmed case,Symptomatic,No,Missing,,Yes
3,2020-11,KY,21,MUHLENBERG,21177.0,18 to 49 years,,,,,,Missing,Missing,Laboratory-confirmed case,Missing,No,Missing,No,
4,2020-05,NY,36,YATES,36123.0,50 to 64 years,,,,0.0,,Missing,Missing,Probable Case,Missing,Missing,Missing,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,2021-01,IL,17,SALINE,17165.0,0 - 17 years,Male,,,,0.0,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,Missing,
996,2021-01,IA,19,MARION,19125.0,18 to 49 years,Male,Missing,,,,Missing,Missing,Probable Case,Missing,Missing,Missing,Missing,
997,2021-04,WI,55,SAUK,55111.0,18 to 49 years,Male,,,,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,Unknown,Missing,,
998,2021-06,VA,51,WYTHE,51197.0,18 to 49 years,Male,,,,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,No,Missing,,


**Question:** What do you notice about the results?

This returned only 1000 results. It is common for APIs to limit the number of results returned.

Click on the API Docs button and let's see if we can figure out how to get more results.

If you click on the "Paging through Data" link, you'll find how to control this. It seems that by default, this API will only return 1000 results, but we can change this by using the `$limit` query parameter.

One way to include a query parameter is to pass it as a dictionary to the `params` argument of `.get()`.

In [12]:
params = {
    '$limit': 5000
}

In [14]:
response = requests.get(URL, params = params)

covid = pd.read_csv(io.StringIO(response.text))
covid

Unnamed: 0,case_month,res_state,state_fips_code,res_county,county_fips_code,age_group,sex,race,ethnicity,case_positive_specimen,case_onset_interval,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,death_yn,underlying_conditions_yn
0,2020-12,AL,1,ST. CLAIR,26147.0,,,,,,,Missing,Missing,Probable Case,Missing,Missing,Missing,Missing,
1,2020-05,AR,5,GREENE,5055.0,0 - 17 years,,,,0.0,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,No,Missing,,Yes
2,2020-11,OH,39,HENRY,39069.0,0 - 17 years,,,,,0.0,Clinical evaluation,Unknown,Laboratory-confirmed case,Symptomatic,No,Missing,,Yes
3,2020-11,KY,21,MUHLENBERG,21177.0,18 to 49 years,,,,,,Missing,Missing,Laboratory-confirmed case,Missing,No,Missing,No,
4,2020-05,NY,36,YATES,36123.0,50 to 64 years,,,,0.0,,Missing,Missing,Probable Case,Missing,Missing,Missing,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,2020-11,KS,20,SUMNER,20191.0,50 to 64 years,Female,,,,,Missing,Missing,Laboratory-confirmed case,Unknown,Unknown,Unknown,No,
4996,2020-10,WI,55,WAUPACA,55135.0,50 to 64 years,Female,,,,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,Unknown,Missing,Unknown,
4997,2021-04,TN,47,MCMINN,47107.0,65+ years,Female,,,,,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,,
4998,2021-01,NE,31,DODGE,31053.0,0 - 17 years,Male,,,,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,Missing,Missing,Missing,


What if we don't care about the whole country and only want to get the results for Tennessee?

To accomplish this, we can filter by specifying the state parameter.

In [17]:
params = {
    '$limit': 5000,
    'res_state': 'TN'
}

In [18]:
response = requests.get(URL, params = params)

covid = pd.read_csv(io.StringIO(response.text))
covid

Unnamed: 0,case_month,res_state,state_fips_code,res_county,county_fips_code,age_group,sex,race,ethnicity,case_positive_specimen,case_onset_interval,process,exposure_yn,current_status,symptom_status,hosp_yn,icu_yn,death_yn,underlying_conditions_yn
0,2020-07,TN,47,LAUDERDALE,47097,18 to 49 years,Female,,,,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,Missing,Missing,Missing,
1,2020-12,TN,47,MCNAIRY,47109,18 to 49 years,Female,Unknown,Missing,,,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,Missing,
2,2020-12,TN,47,FAYETTE,47047,50 to 64 years,Male,Missing,Missing,,,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,Missing,
3,2020-06,TN,47,SEVIER,47155,0 - 17 years,Female,White,Hispanic/Latino,,0.0,Missing,Missing,Laboratory-confirmed case,Symptomatic,No,No,No,
4,2020-12,TN,47,ROBERTSON,47147,18 to 49 years,Female,White,Hispanic/Latino,,,Missing,Missing,Laboratory-confirmed case,Symptomatic,No,No,,Yes
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,2020-12,TN,47,RHEA,47143,50 to 64 years,Male,,,,,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,Missing,
4996,2020-12,TN,47,MAURY,47119,18 to 49 years,Male,Multiple/Other,Missing,,,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,Missing,
4997,2020-12,TN,47,WARREN,47177,65+ years,Male,,,,,Missing,Missing,Laboratory-confirmed case,Symptomatic,No,Missing,Missing,Yes
4998,2020-12,TN,47,COFFEE,47031,18 to 49 years,Female,White,Missing,,,Missing,Missing,Laboratory-confirmed case,Missing,Missing,Missing,Missing,


**Your Turn:** Use the API to retrieve data to retrieve all cases that happened in Davidson County, TN in which the patient was hospitalized. In what percentage of those cases did the patient die?