# Analyzing Milwaukee Police Call Data and Weather Data
### Grant Fass and Chris Hubbell

## Introduction
Across the world, there are many crimes commited every hour. One of the greatest challenges is reducing crime and maintaining safety for citizens. Part of preventing crime relies on the reporting of it by citizens. If nobody informs the police, the police are unable to act. This is why reporting crimes and incidents is so important, especially when people's lives are in danger. In Wisconsin, Milwaukee Police Department (MPD) releases data regarding all of their dispatch calls, which we have been able to get since 2016. This allows for analyzing trends of crime reporting over time as well as as it relates to other factors. In 2010, Milwaukee installed a new system for detecting gun shots called ShotSpotter, which was expanded into more neighborhoods in 2014. This system is capable of detecting when a shot is fired and where it was to a high degree of accuracy. The data consists of both ShotSpotter calls as well as Shots Fired calls. The key difference is that Shots Fired are calls from people and ShotSpotter are automatic.

## Research Questions:
- Is there a significant difference between the distribution of shots spotted over time and calls for shots fired?
- Is there a significant difference in the number of calls that were unable to be located for shots fired calls compared to shots spotted?
- Does the number of shots spotted and fired correlate with certain dates including holidays and events?
- Does the number of calls correlate with certain weather conditions?
- Is it possible to predict number of calls based on location and district?
- Is it possible to predict the nature of a call based on its location and district?

## Hypotheses:
- There are significantly more shots spotted than calls about shots fired.
- Significantly more shots fired calls are unable to be located than shots spotted.
- There will be significantly more shots spotted calls on July 4th, Dec. 31st, and Jan 1st than normal days.
- There will be significantly less shots fired calls on holidays than normal days.
- There are significantly more calls on days with clear weather than inclement weather.
- There are significantly more calls on days around 75 degrees than there are on days around 95 or 55 degrees.
- The number of calls will be able to be predicted based on location and district.
- The type of call will be unable to be predicted based on location and district.

# Imports
These are the libraries that will be relvant for working with the dataset.

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from IPython.display import Image
sns.set()

# Loading the Data
This section is used to load the data and make sure that all of the features have been formatted using the correct types. This data is ready for use since it has already been cleaned in another notebook. The MPDDataCleaning notebook was used to clean the MPD (Milwaukee Police Department) dataset. The WeatherDataCleaning notebook was used to clean the weather dataset. These two datasets were then combined in the DatasetCombining notebook.

In [None]:
df = pd.read_csv('merged_data.csv')

In [None]:
df.info()

## Revising Feature Types
Calling the [`.info()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html) command shows that there are a number of features that are improperly formatted. The district, nature, status, primaryStreetName, primaryStreetSuffix, secondaryStreetName, secondaryStreetSuffix, and weatherDesc all need to become categorical features. The datetime feature needs to be changed to datetime.

In [None]:
df['district'] = df['district'].astype('category')
df['nature'] = df['nature'].astype('category')
df['status'] = df['status'].astype('category')
df['primaryStreetName'] = df['primaryStreetName'].astype('category')
df['primaryStreetSuffix'] = df['primaryStreetSuffix'].astype('category')
df['secondaryStreetName'] = df['secondaryStreetName'].astype('category')
df['secondaryStreetSuffix'] = df['secondaryStreetSuffix'].astype('category')
df['weatherDesc'] = df['weatherDesc'].astype('category')
df['datetime'] = pd.to_datetime(df['datetime'], infer_datetime_format=True)

## Examining The Loaded Data
The data should now be in the proper types. This will be examined using the [`.head()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.head.html), [`.info()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html), and [`.describe()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html) methods.

In [None]:
df.head(5).T

In [None]:
df.tail(5).T

In [None]:
df.info(verbose=True, show_counts=True)

In [None]:
df.describe()

# TODO: Explain



# TODO GRAPHS
- Date vs number/calls line graph
- Box plot of district/num calls
- heatmap call type/status? Is this too big?
- district/nature
- time/nature
- time/status
- calls that are 'in service' and 'resolved' , are they related? compare by time? would ID's match?

# Graphs
This section looks to explore the data by generating graphs. Extra features for year, month of the year, week of the year, day of the month, and hour of the day will be generated to help with graphs. This will allow for the exploration of different granularity levels. Some of the graphs use help from [this](https://www.statology.org/seaborn-legend-outside/) for moving the legend outside of the graph, and from [this](https://stackoverflow.com/a/60679315) for plotting multiple categories (fix legend not showing).

In [None]:
df['year'] = df['datetime'].map(lambda t: t.year)
df['month'] = df['datetime'].map(lambda t: t.month)
df['week'] = df['datetime'].map(lambda t: t.week)
df['day'] = df['datetime'].map(lambda t: t.day)
df['hour'] = df['datetime'].map(lambda t: t.hour)

In [None]:
ax = sns.histplot(data=df, x='datetime')
ax.set_title('All Calls Over Time')

This graph shows all of the call types over time. What is interesting is that there appears to be a big gap around 2018 and smaller gaps in 2019 and 2021. There is a second big gap between 2021 and 2022 but not as bad as the 2018 gap. Another interesting observation is that it appears that overall call numbers is trending down. Looking deeper:

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[(df['year'] == 2017) & (df['month']==2)]
ax = sns.histplot(data=plot_data, x='datetime', bins=29)
ax.set_title('All Calls Feb 2017')
sns.set(rc={"figure.figsize":(7, 7)})

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[(df['year'] == 2017) & (df['month']==12)]
ax = sns.histplot(data=plot_data, x='datetime', bins=31)
ax.set_title('All Calls Dec 2017')
sns.set(rc={"figure.figsize":(7, 7)})

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[(df['year'] == 2018) & (df['month']==1)]
ax = sns.histplot(data=plot_data, x='datetime', bins=31)
ax.set_title('All Calls Jan 2018')
sns.set(rc={"figure.figsize":(7, 7)})

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[(df['year'] == 2021) & (df['month']==2)]
ax = sns.histplot(data=plot_data, x='datetime', bins=30)
ax.set_title('All Calls Feb 2021')
sns.set(rc={"figure.figsize":(7, 7)})

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[df['year'] == 2021]
plot_data = df[(df['year'] == 2021) & ((df['month']==8))]
ax = sns.histplot(data=plot_data, x='datetime', bins=31)
# ax = sns.histplot(data=df, x='datetime', bins=200)
ax.set_title('All Calls Aug 2021')
sns.set(rc={"figure.figsize":(7, 7)})

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[df['year'] == 2021]
plot_data = df[(df['year'] == 2021) & ((df['month']==9))]
ax = sns.histplot(data=plot_data, x='datetime', bins=30)
# ax = sns.histplot(data=df, x='datetime', bins=200)
ax.set_title('All Calls Sept 2021')
sns.set(rc={"figure.figsize":(7, 7)})

In [None]:
sns.set(rc={"figure.figsize":(14, 6)})
plot_data = df[df['year'] == 2021]
plot_data = df[(df['year'] == 2021) & ((df['month']==10))]
ax = sns.histplot(data=plot_data, x='datetime', bins=31)
# ax = sns.histplot(data=df, x='datetime', bins=200)
ax.set_title('All Calls Oct 2021')
sns.set(rc={"figure.figsize":(7, 7)})

# TODO
I emailed Nick to see if he can offer inisght here. At the moment, I'm guessing it's outage related

In [None]:
print(type(df['datetime'][0]))
ax = sns.histplot(data=df, x='week')
ax.set_title('All Calls Over Week of the Year')

This graph shows a comparison between the number of calls recieved per week of the year. The number of calls looks to be consistent except for week 52 each year. This is likely due to week 52 being the last week of the year. This week would usually contain or be between two holidays which may account for the lower amount of calls.

In [None]:
plt.figure(figsize=(7,7))
ax = plt.axes()
sns.kdeplot(x=df[df['year'] != 2022]['week'], hue=df['year'], common_norm=False, multiple="fill", alpha=1, ax=ax)
plt.title("Occurances of Nature Per Week of Year")
plt.show()

## TODO: Explain this graph

## Exploring Graphs of Weapon Crime
A filtered dataframe must be created containing only the entries for weapon crime to do this. Extra categorical values must be removed when filtering down a categorical feature with many values (such as nature). This can be done by redefining the type as a category.

In [None]:
weapon_crime_df = df[df['weapon_crime']].copy(deep=True)
weapon_crime_df['nature'] = weapon_crime_df['nature'].astype('object').astype('category')
weapon_crime_df['nature'].dtype

In [None]:
# sns.histplot(data=weapon_crime_df, x="year", hue="nature", legend=True)
# plt.title("Occurances of Weapon Crimes Per Year")
# plt.show()

In [None]:
sns.kdeplot(x=weapon_crime_df["week"], hue=weapon_crime_df["nature"], common_norm=False, multiple="fill", alpha=1)
plt.title("Occurances of Weapon Crimes Per Week of Year")
plt.show()

In [None]:
sns.kdeplot(x=weapon_crime_df["day"], hue=weapon_crime_df["nature"], common_norm=False, multiple="fill", alpha=1)
plt.title("Occurances of Weapon Crimes Per Day of Month")
plt.show()

In [None]:
plt.figure(figsize=(7,7))
ax = plt.axes()
for use in weapon_crime_df['nature'].unique():
    sns.kdeplot(x=weapon_crime_df["week"], hue=weapon_crime_df[weapon_crime_df == use]["nature"], 
    ax=ax, common_norm=False, multiple="layer", alpha=1, label=use)
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)
plt.title("Occurances of Nature Per Week of Year")
plt.show()

In [None]:
# https://www.statology.org/seaborn-legend-outside/ for moving the legend outside
# https://stackoverflow.com/a/60679315 for plotting multiple categories (fix legend not showing)
uses = weapon_crime_df['nature'].unique()
plt.figure(figsize=(7,7))
ax = plt.axes()
for use in uses:
    sns.kdeplot(x=weapon_crime_df["day"], hue=weapon_crime_df[weapon_crime_df['nature']==use]["nature"], 
    ax=ax, common_norm=False, multiple="layer", alpha=1, label=use)
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left', borderaxespad=0)
plt.title("Occurances of Nature Per Day of Month")
plt.show()

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
natures = weapon_crime_df['nature'].unique()
plot_data = []
for ind, nature in enumerate(natures):
    plot_data.append((nature, len(weapon_crime_df[weapon_crime_df['nature'] == nature])))

list.sort(plot_data, reverse=True, key=lambda y: y[1])

natures = []
nature_totals = []
for nature, total in plot_data:
    natures.append(nature)
    nature_totals.append(total)

sns.barplot(x=natures, y=nature_totals).set(title='Counts of Weapon Crimes')
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right')

## Comparing Shots Fired vs Shotspotter
For hypotheses:
- There are significantly more shots spotted than calls about shots fired.
- Significantly more shots fired calls are unable to be located than shots spotted.
- There will be significantly more shots spotted calls on July 4th, Dec. 31st, and Jan 1st than normal days.
- There will be significantly less shots fired calls on holidays than normal days.

In [None]:
shots = df[df['nature'] == 'SHOTS FIRED']
shotspotter_df = df[df['nature'] == 'SHOTSPOTTER']

In [None]:
# sns.boxplot(x=df_no_empty_lots_no_weird_price['street_types'], y=df_no_empty_lots_no_weird_price['price']).set(title='Street Types vs Price')
weapon_crime_df.head()

In [None]:
fig, ax = plt.subplots(figsize=(7, 7))
natures = ['SHOTSPOTTER', 'SHOTS FIRED']
plot_data = []
for ind, nature in enumerate(natures):
    plot_data.append((nature, len(weapon_crime_df[weapon_crime_df['nature'] == nature])))

list.sort(plot_data, reverse=True, key=lambda y: y[1])

natures = []
nature_totals = []
for nature, total in plot_data:
    natures.append(nature)
    nature_totals.append(total)

sns.barplot(x=natures, y=nature_totals).set(title='')
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment='right')

In [None]:
sns.set(rc={"figure.figsize":(12, 6)})
years = [2016, 2017, 2018, 2019, 2020, 2021, 2022]
natures = ['SHOTSPOTTER', 'SHOTS FIRED']

plot_data = []
for ind, nature in enumerate(natures):
    for year in years:
        plot_data.append((nature, len(weapon_crime_df[(weapon_crime_df['nature'] == nature) & (weapon_crime_df['year'] == year)]), year))

list.sort(plot_data, reverse=False, key=lambda y: y[2]) # Sort by year, ascending, 2016-2022

shotspotter_totals = []
shots_fired_totals = []
for nature, total, year in plot_data:
    if(nature == 'SHOTSPOTTER'):
        shotspotter_totals.append(total)
    else:
        shots_fired_totals.append(total)

X_axis = np.arange(len(years))

plt.bar(X_axis - 0.2, shotspotter_totals, 0.4, label = 'SHOTSPOTTER')
plt.bar(X_axis + 0.2, shots_fired_totals, 0.4, label = 'SHOTS FIRED')
  
plt.xticks(X_axis, years)
plt.xlabel("Year")
plt.ylabel("Number of Calls")
plt.title("Number of Shotspotter vs Shots Fired calls by Year")
plt.legend()
plt.show()