# Federal Bureau of Investigation - Crime Data Explorer API

The FBI Crime Data API is a read-only web service that returns JSON or CSV data. It is broadly organized around the data reporting systems the FBI UCR program uses and their related entities. Agencies submit data using one of two reporting formats -- the Summary Reporting System (SRS), or the National Incident Based Reporting System (NIBRS). SRS data is the legacy format that provides aggregated counts of the reported crime offenses known to law enforcement by location.

NIBRS is a newer format that provides an incident-based view of crime. It includes information about each offense, such as the time of day an incident occurred, the demographics of the offenders/victims, the known relationships between the offenders and victims, and many other details around how and where crime occurs. Neither format includes personally identifiable information (PII) about the offenders or victims. While many agencies submit SRS data, the FBI plans to transition all crime reporting to the NIBRS format by 2021.

Other UCR data collection systems made available by this API include:

- Summarized Agency Data
- NIBRS Counts
- Law Enforcement Employees Data
- State and Agency Participation Data

To obtain an API key: https://api.data.gov/signup/

The API was designed to provide as much information as possible in a usable format. However, the FBI still has some recommendations about how to interpret and display the data provided. The FBI strongly advises against using this data to do any sort of ranking or comparison among states or other entities. The exception being that it is appropriate to compare a city to its respective state, and that state to a national perspective.

*We are only able to view data which is on the national level, as we do not have security clearance and do not belong to an agency*

The Crime Data Explorer Base API URL is https://api.usa.gov/crime/fbi/sapi/

## Biography

#### Useful links to assist in retrieving data:

State Abbreviations: 
https://www.ssa.gov/international/coc-docs/states.html



### CODE FOR DATA RETRIEVAL

#### Necessary libraries

In [9]:
import requests
import pandas as pd
import pprint as pp

#### Set URL for request & API key input

You would need to sign up for your personal API key.  
There are also several APIs within this website.  
Remember you would you have run the correct cell for it to choose the correct URL to use to retrieve the data from the API.

### API_1: Provides victim demographic information for offenses that were reported to the UCR Program.

/api/data/nibrs/{offense}/victim/national/{variable}

The offenses you can input are: 
*aggravated-assault,  
all-other-larceny,  
all-other-offenses,  
animal-cruelty,  
arson,  
assisting-or-promoting-prostitution,  
bad-checks,  
betting,  
bribery,  
burglary-breaking-and-entering,  
counterfeiting-forgery,  
**credit-card-automated-teller-machine-fraud,**  
destruction-damage-vandalism-of-property,  
driving-under-the-influence,  
drug-equipment-violations,  
drug-violations,drunkenness,  
embezzlement,  
extortion-blackmail,  
false-pretenses-swindle-confidence-game,  
fondling,  
gambling-equipment-violation,  
hacking-computer-invasion,  
human-trafficking-commerical-sex-acts,  
human-trafficking-commerical-involuntary-servitude,  
identity-theft,  
impersonation,  
incest,  
intimidation,  
justifiable-homicide,  
kidnapping-abduction,  
motor-vehicle-theft,  
murder-and-nonnegligent-manslaughter,  
negligent-manslaughter,  
operating-promoting-assiting-gambling,  
curfew-loitering-vagrancy-violations,  
peeping-tom,  
pocket-picking,  
pornography-obscence-material,  
prostitution,  
purchasing-prostitution,  
purse-snatching,  
rape,  
robbery,  
sexual-assult-with-an-object,  
sex-offenses-non-forcible,  
shoplifting,  
simple-assault,  
sodomy,  
sports-tampering,  
statutory-rape,  
stolen-property-offenses,  
theft-from-building,  
theft-from-coin-operated-machine-or-device,  
theft-from-motor-vehicle,  
theft-of-motor-vehicle-parts-or-accessories,  
theft-from-motor-vehicle,  
weapon-law-violation,  
welfare-fraud,wire-fraud,  
not-specified,liquor-law-violations,  
crime-against-person,  
crime-against-property,  
crime-against-society,assault-offenses,  
homicide-offenses,  
human-trafficking-offenses,  
sex-offenses,  
sex-offenses-non-forcible,   
fraud-offenses,  
larceny-theft-offenses,   
drugs-narcotic-offenses,  
gambling-offenses,  
prostitution-offenses,  
all-offenses*  

variable options:  
age, **count,** ethnicity, race, sex, relationship

In [10]:
API_KEY = "" #enter your API key
offense1 = "credit-card-automated-teller-machine-fraud"
variable1 = "count"
url1 = f"https://api.usa.gov/crime/fbi/sapi/api/data/nibrs/{offense1}/offense/national/{variable1}?API_KEY={API_KEY}"

#### Data Request

Retrieving the data from the API and then storing it in a variable.
Making sure to print the status of the response code to ensure it is running correctly.   

In [11]:
response = requests.get(url1)
data_API1 = response.json()
print(response.status_code)
pp.pprint(data_API1)

200
{'pagination': {'count': 30, 'page': 0, 'pages': 1, 'per_page': 0},
 'results': [{'data_year': 2011,
              'incident_count': 86915,
              'offense_count': 86915},
             {'data_year': 2008,
              'incident_count': 74127,
              'offense_count': 74127},
             {'data_year': 2009,
              'incident_count': 77774,
              'offense_count': 77774},
             {'data_year': 2006,
              'incident_count': 57346,
              'offense_count': 57346},
             {'data_year': 2003,
              'incident_count': 29352,
              'offense_count': 29352},
             {'data_year': 1994, 'incident_count': 1737, 'offense_count': 1737},
             {'data_year': 2016,
              'incident_count': 109646,
              'offense_count': 109646},
             {'data_year': 2002,
              'incident_count': 26785,
              'offense_count': 26785},
             {'data_year': 1998, 'incident_count': 7610, 'offense_co

#### Normalise the Data

This requires 'flattening' of the nested list within the json file.

In [12]:
df_results_API1 = pd.json_normalize(
    data_API1, 
    record_path = ["results"]
)
df_results_API1

Unnamed: 0,incident_count,offense_count,data_year
0,86915,86915,2011
1,74127,74127,2008
2,77774,77774,2009
3,57346,57346,2006
4,29352,29352,2003
5,1737,1737,1994
6,109646,109646,2016
7,26785,26785,2002
8,7610,7610,1998
9,46096,46096,2005


#### Checking the data types and that they're all integers

In [13]:
df_results_API1.dtypes

incident_count    int64
offense_count     int64
data_year         int64
dtype: object

#### Getting some information about the data

In [14]:
df_results_API1.describe()

Unnamed: 0,incident_count,offense_count,data_year
count,30.0,30.0,30.0
mean,54815.366667,54815.366667,2005.5
std,45730.603149,45730.603149,8.803408
min,1248.0,1248.0,1991.0
25%,8185.5,8185.5,1998.25
50%,51721.0,51721.0,2005.5
75%,88255.25,88255.25,2012.75
max,141078.0,141078.0,2020.0


#### Extracting the rows I need (2019 & 2020)

In [15]:
API1results2019 = df_results_API1[df_results_API1['data_year'] == 2019]
API1results2020 = df_results_API1[df_results_API1['data_year'] == 2020]

print(API1results2019)
print(API1results2020) #this took me so long, i'm an idioooottttt

    incident_count  offense_count  data_year
26          139855         139855       2019
    incident_count  offense_count  data_year
25          141078         141078       2020


### API_2: Provides additional details for summarized UCR data beyond the count of reported crimes. Details include type of weapon used, value of items, etc.

/api/data/supplemental/{offense}/states/{stateAbbr}/{variable}/{from}/{to}

parameters below are -->  
offense: larceny, burglary, **robbery,** not-specified, motor-vehicle-theft   
stateAbbr: AL #Alabama (you can also use any of the other 51 state abbreviations found in this link: https://www.ssa.gov/international/coc-docs/states.html  )  
variable: LARCENY_TYPE, MVT_RECOVERED, OFFENSE, OFFENSE_SUB_CATEGORY  
from: 2019 #year beginning  
to: 2020 #year ending

In [16]:
offense2 = "robbery"
stateAbbr = "AL"
variable2 = "OFFENSE"
from2 = "2019"
to2 = "2020"
url2 = f"https://api.usa.gov/crime/fbi/sapi/api/data/supplemental/{offense2}/states/{stateAbbr}/{variable2}/{from2}/{to2}?API_KEY={API_KEY}"

In [17]:
response = requests.get(url2)
data_API2 = response.json()
print(response.status_code)
pp.pprint(data_API2)

200
{'pagination': {'count': 2, 'page': 0, 'pages': 1, 'per_page': 0},
 'results': [{'actual_count': 992,
              'data_year': 2019,
              'stolen_value_total': 1913496},
             {'actual_count': 802,
              'data_year': 2020,
              'stolen_value_total': 1041116}]}


In [18]:
df_results_API2 = pd.json_normalize(
    data_API2, 
    record_path = ["results"]
)
df_results_API2

Unnamed: 0,stolen_value_total,actual_count,data_year
0,1913496,992,2019
1,1041116,802,2020


In [19]:
df_results_API2.dtypes

stolen_value_total    int64
actual_count          int64
data_year             int64
dtype: object

In [20]:
df_results_API2.describe()

Unnamed: 0,stolen_value_total,actual_count,data_year
count,2.0,2.0,2.0
mean,1477306.0,897.0,2019.5
std,616865.8,134.350288,0.707107
min,1041116.0,802.0,2019.0
25%,1259211.0,849.5,2019.25
50%,1477306.0,897.0,2019.5
75%,1695401.0,944.5,2019.75
max,1913496.0,992.0,2020.0


### API_3: Provides details of the number of arrests, citations, or summons for an offense. View arrest information on the national and regional level along with federal, state, and local agencies.

/api/data/arrest/national/{offense}/{variable}/{from}/{to}

Can format according to the offense and another variable, the example below has:  
offense = **fraud** #there are quite a few others but this was most appropriate
variable = offense  #OR male, female, race, monthly  
from = 2019 #year beginning  
to = 2020 #year ending

In [21]:
offense3 = "fraud"
variable3 = "offense"  
from3 = "2019" 
to3 = "2020"
url3 = f"https://api.usa.gov/crime/fbi/sapi/api/data/arrest/national/{offense3}/{variable3}/{from3}/{to3}?API_KEY={API_KEY}"

In [22]:
response = requests.get(url3)
data_API3 = response.json()
print(response.status_code)
pp.pprint(data_API3)

200
{'pagination': {'count': 2, 'page': 0, 'pages': 1, 'per_page': 0},
 'results': [{'csv_header': None,
              'data_year': 2019,
              'female_count': 36173,
              'male_count': 64370,
              'unknown_count': 0},
             {'csv_header': None,
              'data_year': 2020,
              'female_count': 22708,
              'male_count': 42174,
              'unknown_count': 0}]}


In [23]:
df_results_API3 = pd.json_normalize(
    data_API3, 
    record_path = ["results"]
)
print("Before deleting csv_header column")
print(df_results_API3)
print("After deleting csv_header column")
del df_results_API3["csv_header"]
df_results_API3

Before deleting csv_header column
   male_count  female_count  unknown_count csv_header  data_year
0       64370         36173              0       None       2019
1       42174         22708              0       None       2020
After deleting csv_header column


Unnamed: 0,male_count,female_count,unknown_count,data_year
0,64370,36173,0,2019
1,42174,22708,0,2020


In [24]:
df_results_API3.dtypes

male_count       int64
female_count     int64
unknown_count    int64
data_year        int64
dtype: object

In [25]:
df_results_API3.describe()

Unnamed: 0,male_count,female_count,unknown_count,data_year
count,2.0,2.0,2.0,2.0
mean,53272.0,29440.5,0.0,2019.5
std,15694.942115,9521.192809,0.0,0.707107
min,42174.0,22708.0,0.0,2019.0
25%,47723.0,26074.25,0.0,2019.25
50%,53272.0,29440.5,0.0,2019.5
75%,58821.0,32806.75,0.0,2019.75
max,64370.0,36173.0,0.0,2020.0


### API_4: Provides data showing the percent change in offenses known to law enforcement when compared to the same time frame from the previous year. Each quarterly report (released in March, June, September, and December) is cumulative and includes data from prior quarterly reports presented for the same year.

api/data/preliminary/national/{variable}

Parameters below are:  
variable = **PERCENT_CHANGE,** DETAILS, POPULATION_GROUP, POPULATION_GROUP_DETAILS, REGION

abbreviations:  
vc = vehicle code --> vehicle related crimes (such as speeding, vehicle registration fraud, driving under the influence)  
pc = penal code --> bribery, resisting arrest, murder, assault  
mvt = motor vehicle theft

In [26]:
variable4 = "PERCENT_CHANGE"
url4 = F"https://api.usa.gov/crime/fbi/sapi/api/data/preliminary/national/{variable4}?API_KEY={API_KEY}"

In [27]:
response = requests.get(url4)
data_API4 = response.json()
print(response.status_code)
pp.pprint(data_API4)

200
{'pagination': {'count': 4, 'page': 0, 'pages': 1, 'per_page': 0},
 'results': [{'agencies': 12409,
              'aggravated': 3.7,
              'arson': 12.7,
              'burglary': -4.3,
              'data_year': 2020,
              'larceny_theft': 0.6,
              'murder': 6.6,
              'mvt': 2.9,
              'pc': 0.1,
              'population': 261569188,
              'quarter': 1,
              'rape': -6.5,
              'robbery': 3.8,
              'vc': 2.5},
             {'agencies': 12206,
              'aggravated': 4.6,
              'arson': 19.2,
              'burglary': -7.8,
              'data_year': 2020,
              'larceny_theft': -9.9,
              'murder': 14.8,
              'mvt': 6.2,
              'pc': -7.8,
              'population': 258037619,
              'quarter': 2,
              'rape': -17.8,
              'robbery': -7.1,
              'vc': -0.4},
             {'agencies': 11980,
              'aggravated': 8.3,
   

In [28]:
df_results_API4 = pd.json_normalize(
    data_API4,
    record_path = ['results']
)

df_results_API4

Unnamed: 0,data_year,quarter,agencies,population,vc,murder,rape,robbery,aggravated,pc,burglary,larceny_theft,mvt,arson
0,2020,1,12409,261569188,2.5,6.6,-6.5,3.8,3.7,0.1,-4.3,0.6,2.9,12.7
1,2020,2,12206,258037619,-0.4,14.8,-17.8,-7.1,4.6,-7.8,-7.8,-9.9,6.2,19.2
2,2020,3,11980,259148089,1.9,20.9,-15.3,-9.7,8.3,-8.3,-8.7,-10.8,9.0,24.0
3,2020,4,12974,287795641,3.3,24.7,-14.2,-10.4,10.5,-7.9,-8.4,-10.5,10.5,23.5


In [29]:
df_results_API4.dtypes #will see whether the float and integer difference will make it difficult for graphs

data_year          int64
quarter            int64
agencies           int64
population         int64
vc               float64
murder           float64
rape             float64
robbery          float64
aggravated       float64
pc               float64
burglary         float64
larceny_theft    float64
mvt              float64
arson            float64
dtype: object

In [30]:
df_results_API4.describe()

Unnamed: 0,data_year,quarter,agencies,population,vc,murder,rape,robbery,aggravated,pc,burglary,larceny_theft,mvt,arson
count,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0
mean,2020.0,2.5,12392.25,266637600.0,1.825,16.75,-13.45,-5.85,6.775,-5.975,-7.3,-7.65,7.15,19.85
std,0.0,1.290994,425.579115,14182190.0,1.590335,7.900422,4.872029,6.588121,3.182635,4.055757,2.034699,5.512713,3.347138,5.230997
min,2020.0,1.0,11980.0,258037600.0,-0.4,6.6,-17.8,-10.4,3.7,-8.3,-8.7,-10.8,2.9,12.7
25%,2020.0,1.75,12149.5,258870500.0,1.325,12.75,-15.925,-9.875,4.375,-8.0,-8.475,-10.575,5.375,17.575
50%,2020.0,2.5,12307.5,260358600.0,2.2,17.85,-14.75,-8.4,6.45,-7.85,-8.1,-10.2,7.6,21.35
75%,2020.0,3.25,12550.25,268125800.0,2.7,21.85,-12.275,-4.375,8.85,-5.825,-6.925,-7.275,9.375,23.625
max,2020.0,4.0,12974.0,287795600.0,3.3,24.7,-6.5,3.8,10.5,0.1,-4.3,0.6,10.5,24.0
