# Crime and Policing Expenditures Exploratory Questions


*Jiechen Li*


In this exercise we'll be examining the relationship between crime and policing expenditures using county-level data from Massachusetts. In particular, we're hoping to answer the question "Is there a substantial relationship between crime and policing expenditures?"

## Exercises

### Exercise 1

Begin by downloading the data for this exercise from https://github.com/nickeubank/MIDS_Data/blob/master/descriptive_exercise/crime_expend_MA.csv (just go to `github.com/nickeubank/MIDS_Data`, then go to `descriptive_exercise` and get `crime_expend_MA.csv` if you don't want to type all that).


In [4]:
import pandas as pd
import warnings

warnings.filterwarnings("ignore")
pd.set_option("mode.copy_on_write", True)

crime_data = pd.read_csv(
    "https://github.com/nickeubank/MIDS_Data/raw/master/descriptive_exercise/crime_expend_MA.csv"
)
crime_data.sample(5)

Unnamed: 0,months,county_code,crimeindex,policeexpenditures,month,year
865,66,4,85.2564,50.0,7,1995
183,14,10,64.6104,28.324572,3,1991
819,63,1,40.682503,31.005258,4,1995
255,19,5,52.123011,52.025214,8,1991
846,65,10,86.576863,57.837478,6,1995


### Exercise 2

This data includes monthly data on both each county's policing expenditures (`policeexpenditures` as share of county budget) and an index of crime (`crimeindex`, scaled 0-100) from 1990 to late 2001. 

In these exercises, we'll be focusing on just two counties -- `county_code` 4 and 10. 

First, for each of these two counties, calculate the mean expenditure level and mean crimeindex score (i.e. calculate both means separately for each county). 

Just to make sure we're practicing applied skills—use a loop to calculate your means and print your results nicely! So you should get output like this (though obviously with different numbers—I'm not gonna give you the answer!):

```
for county 4, average policing expenditure is 23.7 and average crime index is 75.83
for county 10, average policing expenditure is 62.15 and average crime index is 55.88
```

In [None]:
# calculate two means 

### Exercise 3

Now calculate the standard deviation of both expenditures and crime for these two counties.

In [None]:
# calculate two standard deviations


### Exercise 4

Now calculate the correlation between `policeexpenditures` and `crimeindex` for both of these counties (again, output the correlations with nicely formatted and labelled statements!)

In [None]:
# calculate correlation 


### Exercise 5

Based on your results up to this point, what would you guess about whether policing reduces crime? (I know -- this is just a descriptive statistics, and correlation does not imply causality. But what would you infer if this was all you knew?

### Exercise 6

Given what you've seen up till now, would you infer that county 4 and county 10 have a similar relationship between crime and police expenditures?

### Exercise 7

Now plot histograms of `policeexpenditures` for both county 4 and county 10. Do the results change you impression of the similarity of county 4 and county 10?

### Exercise 8

Finally, create a scatter plot of the relationship between crime and police expenditures for each county (e.g. crime on one axis, police expenditures on the other). Does this change your sense of how similar these are?

## After you have answered...

Read this [discussion page](discussion_exploratory.ipynb).
