# FBI Application Programming Interface (API) Module
In this tutorial, we will be using Python to access the FBI's Application Programming Interface (API) to retrieve data from their servers to use in our research. APIs are powerful tools that many websites use for data storage and access. It will not be long before online data is primarily retrieved via APIs, so it is important to know the basic structure and skills required to do so.


## Module Outline
Step One: Get an API Key<br>
Step Two: Documentation<br>
Step Three: Initialize API Variables and Import Useful Python Libraries.<br>
Step Four: Find the Appropriate Syntax for our Agency of Interest<br>
Step Five: Convert Data<br>
Step Six: Use our new Knowledge of the FBI Data Structure to Retrieve Crime Data.<br>
Step Seven: Narrowing Down Our Search<br>
Step Eight: Visualization<br>
Step Nine: Saving our Data<br>

## Important Terminology
<b> Application Programming Interface (API):</b> Functions that allow for access to features or data of an operating system, applications, or online services. <br>
<b> Data Scraping:</b> The process of using an API to import or download data from a website into a local machine. <br>
<b> API Key:</b> APIs require an access key that behaves like a password to identify users who are accessing the data. <b> Do not</b> share your API key with anyone, nor make it accessible or visible in a public presentation or Jupyter Notebook. <br>
<b> Uniform Crime Reporting (UCR):</b> The FBI's nationwide data collection and information system that reports crime statistics.<br>
<b> Crime Data Explorer (CDE):</b> The public facing database for the UCR.<br>
<b> National Incident-Based Reporting System (NIBRS):</b> A new system the FBI is implementing nationwide for data collection. However, only some municipalities have adopted it so far, but the goal is to have all agencies reporting NIBRS statistics by 2021.<br>
<b> Summary Reporting System (SRS):</b> The current FBI standard for agencies reporting crime statistics. SRS is being replaced by the more-specific NIBRS statistics.<br>


***

## Contact Information
Garrett Morrow <br>
Digital Teaching Integration Research Fellow <br>
PhD Student, Political Science <br>
morrow.g [at] husky [dot] neu [dot] edu
***

## Step One: Get an API Key
First, go to https://crime-data-explorer.fr.cloud.gov/api and click on where it says 'Get an API key here.' You will be redirected to the signup to get a key, and it will then be emailed to you.

## Step Two: Documentation
APIs are similar, but they all function differently and require different syntax. You will need to find and reference the documentation for whatever API you would like to use. For the FBI API scroll down to see the endpoint controllers for the API.

The FBI is available in a github format, and can be found here: https://github.com/fbi-cde/crime-data-frontend.

For more information on the FBI's Uniform Crime Reporting (UCR) Program, see here: https://www.fbi.gov/services/cjis/ucr.

Note the the "requests format" the FBI API documentation gives you. This is the format we will need to follow with our code and variables.

Requests Format: https://api.usa.gov/crime/fbi/sapi/{desired_endpiont}?api_key=<API_KEY>

## Step Three: Initialize API Variables and Import Useful Python Libraries.
Step three is initialize our API variables to make our data search easier because we insert these variables into later code.

In [None]:
# The first variable is the API key we obtained in step one. This variable should be a string.
# Note that I have left this variable empty for you to insert your key between the "".

key = ""

# The second variable we need is the base search URL. This is found in the documentation and is also a string.
base_url = 'https://api.usa.gov/crime/fbi/sapi'

# Next we will import the libraries we will need for data analysis and API scraping.
import requests # Enables the 'get' requests we use request data from the API server.\
import json # JSONs are the data structure APIs usually return. JSONs are similar to dictionaries.
import pandas # Pandas are spreadsheet-like data structures.
import matplotlib.pyplot as plt # Matplotlib (specifically pyplot) is our library for visualization.
import seaborn as sns # Seaborn is an alternative plotting library.

## Step Four: Find the Appropriate Syntax for our Agency of Interest
First we will need to use the lookups-controller to find the specific state law enforcement agency that interests us and apply the given syntax to future uses. We already have the base url, so we can now create URL extensions to add on to it.

This is the most difficult step of using APIs, as each API uses a different syntax and organizes their data in different ways. You may have to use the URL extension that allows for broad searches or lookups in order to find identifying data (in the case of the FBI, we want to find the Originating Agency Information number or ORI). Then once you have identifying data, you can use the other URL extensions to more in-depth searches of the specific data.

In [None]:
# This is the syntax to search for regions; from documentation:
# /api/agencies Returns List of Agencies utilized by CDE Endpoint
# The FBI's API uses '?' to end the url and separate the url from the API key.

search_url = "/api/agencies?"

In [None]:
# Next, we perform the actual search.

agency_search = requests.get(base_url + search_url + key)

In [None]:
# If everything went ok, after our search, we should receive: <Response [200]>
# We will need to convert the response into something usable.

agency_search

## Step Five: Convert Data
Next we need to convert the data we received to the JSON format.

In [None]:
# First we can convert the data into text, but this too is mostly unusable .
agencyinfo = agency_search.text
agencyinfo[:1000]

In [None]:
# Instead, we convert the text into JSON, which are essentially big dictionaries.
agencydata = json.loads(agencyinfo)
agencydata

In [None]:
# We need to narrow it down to find information on our specific area of interest.
# In this case, Massachusetts.
# We can use .keys() to explore the structure of our lookup data.

agencydata.keys()

In [None]:
# We can narrow it down further so we can see the structure of the MA data.

agencydata['MA']

We know can clearly see the structure for our data and can now use it to use the other API lines to retrieve crime data about our area of interest.

The data is structured like so:

'MA0130100':<br> {'ori': 'MA0130100',<br>
   'agency_name': 'Boston Police Department',<br>
   'agency_type_name': 'City',<br>
   'state_name': 'Massachusetts',<br>
   'state_abbr': 'MA',<br>
   'division_name': 'New England',<br>
   'region_name': 'Northeast',<br>
   'region_desc': 'Region I',<br>
   'county_name': 'SUFFOLK',<br>
   'nibrs': False,<br>
   'latitude': 42.33196,<br>
   'longitude': -71.020173,<br>
   'nibrs_start_date': None}<br>
   
As we can see, the Boston Police Department does not use the National Incident Based Reporting System (NIBRS), so we will have to use the legacy Summary Reporting System (SRS) data format which is aggregate data. While the FBI plans on having all crime data in NIBRS format by 2021, some locales, particularly big cities, are still training and adapting.

## Step Six: Use our new Knowledge of the FBI Data Structure to Retrieve Crime Data. 
The ORI number for the Boston Police Department is MA0130100 and they use SRS reporting not NIBRS, so we will use this number combined with the SRS crime data url extensions as seen in the FBI's API documentation to retrieve crime statistics. For SRS data, we can use the following endpoints:<br>

/api/summarized/agencies/{ori}/offenses/{since}/{until} Agency level SRS Crime Data Endpoint<br>
/api/summarized/agencies/{ori}/{offense}/{since}/{until} Agency level SRS Crime Data Endpoint by Offense<br>
/api/summarized/state/{stateAbbr}/{offense}/{since}/{until} Agency level SRS Crime Data Endpoint by Offense<br>

We will use the first endpoint to retrieve Boston's SRS data. 2018 data is not yet available, so we will use the date range 2000-2017 - if we use 2018 as our endpoint, we will receive an error.

In [None]:
# Similar to our search url above, we will create a new url for our SRS data.
# We will need to create start and end date variables because we need to enter integers.

boston_url = "/api/summarized/agencies/MA0130100/offenses/2000/2017?"


In [None]:
# Again, we will combine our urls and key to get our data.

boston_search = requests.get(base_url + boston_url + key)

In [None]:
# If everything went ok, after our search, we should receive: <Response [200]>

boston_search

In [None]:
# Next we need to convert our data.
# First we convert the data into text.

bostoninfo = boston_search.text

# Next we convert the text into JSON.
bostondata = json.loads(bostoninfo)
bostondata

In [None]:
# Again, the way we dig into our data is to use the .keys() function

bostondata.keys()

In [None]:
# Since we found results, we can turn this into a Pandas dataframe.

bostondf = pandas.DataFrame(bostondata['results'])

In [None]:
bostondf

As we can see, the FBI divides crime statistics into differen categories of offenses. Lets narrow down our search to just burglaries.

## Step Seven: Narrowing Down Our Search


In [None]:
# First, we create our new, more-specific url - see the endpoint urls above.

bostonburgs_url = "/api/summarized/agencies/MA0130100/burglary/2000/2017?"

In [None]:
# Next we perform our search with our new url.

bostonburgs_search = requests.get(base_url + bostonburgs_url + key)

In [None]:
# If everything went ok, after our search, we should receive: <Response [200]>

bostonburgs_search

In [None]:
# Next we need to convert our data.
# First we convert the data into text.

bostonburgsinfo = bostonburgs_search.text

# Next we convert the text into JSON.
bostonburgsdata = json.loads(bostonburgsinfo)
bostonburgsdata

In [None]:
# Again, the way we dig into our data is to use the .keys() function

bostonburgsdata.keys()

In [None]:
bostonburgsdf = pandas.DataFrame(bostonburgsdata['results'])

In [None]:
bostonburgsdf

## Step Eight: Visualization
This section will be a brief example of a simple visualization using the matplotlib pyplot function and the seaborn alternative.

In [None]:
# We will visualize a bar graph to show change over time.

bostonburgsdf.plot(kind='bar', x = 'data_year', y='actual') # Our plot function
plt.rcParams["figure.figsize"] = [10,5] # This resizes our figure
plt.show()

In [None]:
# Seaborn is a similar, but alternative plotting library.

sns.barplot(x='data_year', y='actual', data=bostonburgsdf)
plt.rcParams["figure.figsize"] = [10,5] # This resizes our figure
plt.title('Boston Burglaries Over Time')
plt.ylabel('Burglaries')
plt.xlabel('Year')
plt.show()

## Step Nine: Saving our Data
The final step will be to save our data. Since we have our data in a pandas dataframe, we can export this as a .csv file.

In [None]:
# The following function is used to save our data as .csv. Remove the # to run.

#bostonburgsdf.to_csv('Boston_burglaries.csv')