In [1]:
import requests
import pandas as pd
import json
import matplotlib.pyplot as plt
from datetime import datetime

# Analyzing Aggravated Burglaries in Davidson County

### Part 1 - Data Gathering using APIs

1.find all aggravated burglary incidents () that were reported during the nine month period from January 1, 2022 through September 30, 2022. (**Hint:** Check out the [API Docs](https://dev.socrata.com/foundry/data.nashville.gov/2u6v-ujjs) to see how to narrow down the response to just the desired results).

**Aggrivated Burglary** - TIBRS Data Collection Manual states that TN uses NIBRS for coding offenses under TN Code Annotated 39-14-403.  The following NIBRS codes are used for aggrivated burglary in Tennessee
- Burglary - 220
- Assualt - 13A
- Robery - 120
- Weapon Law Violation - 520

In [2]:
#MPD dataset API
# Load API Key into dictionary: credentials
with open('app_token.json') as fi:
    credentials = json.load(fi)
# Returns value associated with dictionary key
api_key = credentials['token']
#List of dates of interest
dates = ("'2022-01-01T00:00:00'", "'2022-09-30T23:59:59'")
#NIBRS codes for Aggrivated burglary
nibrs = ['220', '13A', '120', '520']

In [3]:
# API Calls for Nashville's MPD dataset
endpoint = 'https://data.nashville.gov/resource/2u6v-ujjs.json'
# Parameter dictionary for API call
params = {
    # Find information between Dates
    '$where': f'incident_occurred between '+dates[0]+' and '+dates[1],
    # Find NIBRS code 220: Burglary 
    'offense_nibrs': nibrs[0],
    # Limit changed to allow more results than default
    '$limit': '10000',
    # Provides api key for query
    '$$app_token': api_key
}
# Stores API response as a requests object: response
response = requests.get(endpoint)
# Applies the parameters dictionary to API response
response = requests.get(endpoint, params = params)

In [4]:
# Checks response code.  We want to see "200"
response

<Response [200]>

In [5]:
#formats response to .json()
res = response.json()
# creates pandas dataframe from res variable (response.json)
mpd_df = pd.DataFrame(res)
mpd_df

Unnamed: 0,primary_key,incident_number,report_type,report_type_description,incident_status_code,incident_status_description,investigation_status,incident_occurred,incident_reported,incident_location,...,victim_type,victim_description,victim_gender,victim_race,victim_ethnicity,victim_county_resident,mapped_location,rpa,zone,zip_code
0,20220167824_11,20220167824,D,DISPATCHED,O,OPEN,Open,2022-04-07T15:00:00.000,2022-04-08T11:52:00.000,JACKSON ST,...,I,INDIVIDUAL (18 AND OVER),U,W,Non-Hispanic,NON RESIDENT,"{'type': 'Point', 'coordinates': [-86.8, 36.17]}",,,
1,20220126184_31,20220126184,D,DISPATCHED,O,OPEN,Open,2022-03-18T02:30:00.000,2022-03-18T06:51:00.000,BENTON AVE,...,I,INDIVIDUAL (18 AND OVER),M,W,Non-Hispanic,RESIDENT,"{'type': 'Point', 'coordinates': [-86.77, 36.13]}",8029,817,
2,20220027854_12,20220027854,D,DISPATCHED,O,OPEN,Open,2022-01-18T07:45:00.000,2022-01-19T23:48:00.000,CANE RIDGE RD,...,I,INDIVIDUAL (18 AND OVER),M,B,Non-Hispanic,RESIDENT,"{'type': 'Point', 'coordinates': [-86.66, 36.04]}",,,
3,20220032825_11,20220032825,D,DISPATCHED,O,OPEN,Open,2022-01-23T00:40:00.000,2022-01-24T06:57:00.000,BROOKWOOD TER,...,I,INDIVIDUAL (18 AND OVER),F,W,Non-Hispanic,RESIDENT,"{'type': 'Point', 'coordinates': [-86.86, 36.13]}",5019,121,
4,20220034986_11,20220034986,D,DISPATCHED,O,OPEN,Open,2022-01-23T21:00:00.000,2022-01-24T18:02:00.000,HAMILTON CHURCH RD,...,I,INDIVIDUAL (18 AND OVER),F,W,Non-Hispanic,RESIDENT,"{'type': 'Point', 'coordinates': [-86.59, 36.06]}",,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3455,20220487005_11,20220487005,D,DISPATCHED,R,REFUSED TO COOPERATE,Closed,2022-09-27T19:16:00.000,2022-09-27T22:46:00.000,400 400,...,I,INDIVIDUAL (18 AND OVER),F,B,Non-Hispanic,RESIDENT,"{'type': 'Point', 'coordinates': [-86.794, 36....",,,37207
3456,20220436026_11,20220436026,D,DISPATCHED,A,CLEARED BY ARREST,Closed,2022-08-31T18:30:00.000,2022-08-31T20:46:00.000,306 306,...,I,INDIVIDUAL (18 AND OVER),M,W,Non-Hispanic,NON RESIDENT,"{'type': 'Point', 'coordinates': [-86.624, 36....",,,37214
3457,20220513387_11,20220513387,D,DISPATCHED,O,OPEN,Open,2022-09-28T07:00:00.000,2022-10-11T22:42:00.000,GALLATIN PKE,...,I,INDIVIDUAL (18 AND OVER),M,W,Hispanic,RESIDENT,,,,
3458,20220116140_31,20220116140,D,DISPATCHED,A,CLEARED BY ARREST,Closed,2022-03-13T01:49:00.000,2022-03-13T01:49:00.000,306 306,...,I,INDIVIDUAL (18 AND OVER),F,B,Non-Hispanic,RESIDENT,"{'type': 'Point', 'coordinates': [-86.757, 36....",,,37206


2. Using the [2020 American Community Survey API](https://www.census.gov/data/developers/data-sets/acs-5year.html), obtain, for each census tract, the population (B01001_001E in the detailed tables) and the median income (S1901_C01_012E in the subject tables). Hint: Tennessee's FIPS code is 47 and Davidson County's FIPS code is 37. 

### Part 2 - Spatial Joining and Data Merging

3. Download the 2020 census tract shapefiles for Tennessee from https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2020.html. (The FIPS code for Tennessee is 47). Perform a spatial join to determine the census tract in which each burglary incident occurred. 

4. Aggregate the data by census tract. **Warning:** each incident can appear multiple times if there are multiple victims, so be sure that you aren't double-counting any incidents. Which census tract had the highest number of burglaries? Which census tract had the highest number of burglaries per 1000 residents? **Note:** Make sure that you keep all census tracts, not just those that have had a burglary.

5. Merge in the census data that you gathered in question 2. Remove any rows that have zero population or negative median income values.

### Part 3 - Statistical Modeling

6. Finally, we'll build some statistical models to see how well we can explain the number of aggravated burglaries using the median income of each census tract. Start with some EDA to look at the relationship between median income and number of aggravated burglaries.

7. Fit a Poisson regression model with target variable the rate of burglaries per census tract and with predictor the median income. Offset using the log of the population so that we are looking at the rate of burglaries per population instead of the number of burglaries. How can you interpret the meaning of the output?

8. **Bonus:** Try out a negative binomial model. To get started with a negative binomial model, you can check out [this tutorial](https://timeseriesreasoning.com/contents/negative-binomial-regression-model/). How does this model compare to the Poisson model?

Additional Resources for Generalized Linear Models:
* DataCamp - [Generalized Linear Models in Python](https://learn.datacamp.com/courses/generalized-linear-models-in-python)
* [Beyond Multiple Linear Regression, Chapter 4](https://bookdown.org/roback/bookdown-BeyondMLR/ch-poissonreg.html) Warning - the code in this book is all R, but the conceptual explanations are very clear.
* [This set of notes](https://apwheele.github.io/MathPosts/PoissonReg.html#negative-binomial-when-the-poisson-does-not-fit), which talks about the problem of overdispersion.