# **Crime Analytics - Visualising Incident Reports**

In this notebook, we will be analysing the San Francisco criminal incident data as contained in the police department incident reports published on San Francisco's open data web portal: [https://data.sfgov.org](https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-2018-to-Present/wg3w-h783). Note that whilst the original instructions were to use incident data from the summer of 2014, this is now 13 years ago so the underlying dataset has been updated to a full year of data from the summer of 2022.

The raw data contained 136,677 rows of data across 35 columns:
| Field Name | Data Type |
| ----------- | --------- |
| Incident Datetime | Object |
| Incident Date | Object |
| Incident Time | Object |
| Incident Year | Integer |
| Incident Day of Week | Object |
| Report Datetime | Object |
| Row ID | Float |
| Incident ID | Integer |
| Incident Number | Integer |
| CAD Number | Object |
| Report Type Code | Object |
| Report Type Description | Object |
| Filed Online | Object |
| Incident Code | Integer |
| Incident Category | Object |
| Incident Subcategory | Object |
| Incident Description | Object |
| Resolution | Object |
| Intersection | Object |
| CNN | Float |
| Police District | Object |
| Analysis Neighborhood | Object |
| Supervisor District | Float |
| Supervisor District 2012 | Float |
| Latitude | Float |
| Longitude | Float |
| Point | Object |
| Neighborhoods | Float |
| ESNCAG - Boundary File | Float |
| Central Market/Tenderloin Boundary Polygon - Updated | Float |
| Civic Center Harm Reduction Project Boundary | Float |
| HSOC Zones as of 2018-06-05 | Float |
| Invest In Neighborhoods (IIN) Areas | Float |
| Current Supervisor Districts | Float |
| Current Police Districts | Float |

For field descriptions see the [Field Definitions](https://datasf.gitbook.io/datasf-dataset-explainers/sfpd-incident-report-2018-to-present#field-definitions).

In [61]:
# import all necessary libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly
import plotly.express as px
import seaborn as sns

In [20]:
incidents = pd.read_csv('sf_incidents_2223.csv')
# clean up the column names
incidents.columns = incidents.columns.str.lower().str.replace(' ', '_')
# select the 'useful' columns for the analysis
useful_cols = [col for col in incidents if col.startswith('incident') or col in ['resolution', 'police_district', 'analysis_neighborhood', 'supervisor_district', 'neighbourhoods',
                                                                                 'latitude', 'longitude', 'point']]
incidents = incidents.loc[:, useful_cols]
# convert incident_datetime to a datetime column
incidents['incident_datetime'] = pd.to_datetime(incidents.incident_datetime, infer_datetime_format = True)
incidents['incident_date'] = pd.to_datetime(incidents.incident_date, infer_datetime_format = True)
incidents.info()

# add hour and month information
incidents['incident_hour'] = incidents.incident_datetime.dt.hour
incidents['incident_month_num'] = incidents.incident_datetime.dt.month
incidents['incident_month'] = incidents.incident_datetime.dt.month_name()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 136677 entries, 0 to 136676
Data columns (total 18 columns):
 #   Column                 Non-Null Count   Dtype         
---  ------                 --------------   -----         
 0   incident_datetime      136677 non-null  datetime64[ns]
 1   incident_date          136677 non-null  datetime64[ns]
 2   incident_time          136677 non-null  object        
 3   incident_year          136677 non-null  int64         
 4   incident_day_of_week   136677 non-null  object        
 5   incident_id            136677 non-null  int64         
 6   incident_number        136677 non-null  int64         
 7   incident_code          136677 non-null  int64         
 8   incident_category      136545 non-null  object        
 9   incident_subcategory   136545 non-null  object        
 10  incident_description   136677 non-null  object        
 11  resolution             136677 non-null  object        
 12  police_district        136677 non-null  obje

In [21]:
incidents.iloc[:5, :12]

Unnamed: 0,incident_datetime,incident_date,incident_time,incident_year,incident_day_of_week,incident_id,incident_number,incident_code,incident_category,incident_subcategory,incident_description,resolution
0,2023-05-31 23:49:00,2023-05-31,23:49,2023,Wednesday,1281250,230377142,19057,Disorderly Conduct,Intimidation,Terrorist Threats,Open or Active
1,2023-05-31 23:45:00,2023-05-31,23:45,2023,Wednesday,1281464,230377948,75000,Missing Person,Missing Person,Found Person,Open or Active
2,2023-05-31 23:45:00,2023-05-31,23:45,2023,Wednesday,1281464,230377948,74000,Missing Person,Missing Adult,Missing Adult,Open or Active
3,2023-05-31 23:44:00,2023-05-31,23:44,2023,Wednesday,1281275,230377136,16210,Drug Offense,Drug Violation,Opiates Offense,Cite or Arrest Adult
4,2023-05-31 23:44:00,2023-05-31,23:44,2023,Wednesday,1281275,230377136,62050,Warrant,Warrant,"Warrant Arrest, Enroute To Outside Jurisdiction",Cite or Arrest Adult


In [22]:
incidents.iloc[:5, 12:]

Unnamed: 0,police_district,analysis_neighborhood,supervisor_district,latitude,longitude,point,incident_hour,incident_month_num,incident_month
0,Tenderloin,Tenderloin,5.0,37.78579,-122.41297,POINT (-122.41296966814406 37.78578958358186),23,5,May
1,Taraval,Lakeshore,7.0,37.719056,-122.481424,POINT (-122.48142396348878 37.71905643663192),23,5,May
2,Taraval,Lakeshore,7.0,37.719056,-122.481424,POINT (-122.48142396348878 37.71905643663192),23,5,May
3,Northern,Tenderloin,5.0,37.783101,-122.419182,POINT (-122.41918170505187 37.78310139923345),23,5,May
4,Northern,Tenderloin,5.0,37.783101,-122.419182,POINT (-122.41918170505187 37.78310139923345),23,5,May


There are multiple records for each unit incident id/number. The plot below clearly indicates that most incidents only have a single record attached to them, but about 10% (16,617) have up to 4 records. The documentation indicates that a single incident report can have one or more incident codes attached; for example if the police officer discovers narcotics whilst making an arrest (for a non-narcotics related issue), then both an arrest and narcotics incident code would be reported against the same report id. Unless otherwise specified, the analysis will be at the incident code, rather than report level.

In [23]:
counts = pd.DataFrame(incidents.incident_id.value_counts()).reset_index()
counts = counts.groupby("incident_id").count().reset_index()
counts['prop'] = counts['index'] / incidents.shape[0] * 100
counts.columns = ["id_count", "count", "prop"]
counts

Unnamed: 0,id_count,count,prop
0,1,98180,71.833593
1,2,11373,8.321078
2,3,5225,3.822882
3,4,19,0.013901


In [119]:
import plotly.io as pio
pio.renderers
# pio.renderers.default = "iframe"

Renderers configuration
-----------------------
    Default renderer: 'plotly_mimetype+notebook'
    Available renderers:
        ['plotly_mimetype', 'jupyterlab', 'nteract', 'vscode',
         'notebook', 'notebook_connected', 'kaggle', 'azure', 'colab',
         'cocalc', 'databricks', 'json', 'png', 'jpeg', 'jpg', 'svg',
         'pdf', 'browser', 'firefox', 'chrome', 'chromium', 'iframe',
         'iframe_connected', 'sphinx_gallery', 'sphinx_gallery_png']

In [133]:
fig = px.bar(data_frame=counts, x="id_count", y="prop",
             hover_data={'prop':':.2f',
                         'count':':,'},
             labels={'prop':'Percentage (%)', 'id_count':'Occurrence'},
             title='Proportion of records by incident id occurrence',
             template='simple_white')
fig.show(renderer='iframe')

In [87]:
# clean up the incident categories
incidents['incident_category_new'] = np.where(incidents.incident_category == 'Other Miscellaneous', 'Other',
                                     np.where(incidents.incident_category == 'Motor Vehicle Theft?', 'Motor Vehicle Theft',
                                     np.where(incidents.incident_category == 'Human Trafficking (A), Commercial Sex Acts', 'Human Trafficking, Commercial Sex Acts',
                                     np.where(incidents.incident_category == 'Weapons Offence', 'Weapons Offense',
                                     np.where(incidents.incident_category == 'Suspicious', 'Suspicious Occ',
                                     incidents.incident_category)))))

In [86]:
sub = incidents.loc[incidents.incident_category == 'Burglary',
                    ['incident_category', 'incident_subcategory']]
sub.incident_subcategory.value_counts()

Burglary - Other          2816
Burglary - Residential    2547
Burglary - Commercial      880
Burglary - Hot Prowl       814
Name: incident_subcategory, dtype: int64

The plot below indicates that the most common incident reported in the twelve months between 1 June 2022 and 31 May 2023 was larceny/theft (30.9%). The secondmost common incident category reported was 'Other' (6.9%), and malicious mischief (6.7%), assault (6.4%) and motor vehicle theft (6.3%) rounded out the top 5.

At the opposite end of the spectrum, the least common incidents included civil sidewalks, gambling, human trafficking/commercial sex acts, rape and homicide.

In [88]:
by_cat = pd.DataFrame(incidents.incident_category_new.value_counts())
by_cat.reset_index(inplace=True)
by_cat.columns = ['category', 'count']
by_cat['prop'] = by_cat['count']/incidents.shape[0]*100
by_cat.sort_values(by='prop', inplace=True)
# by_cat

Unnamed: 0,category,count,prop
42,Civil Sidewalks,2,0.001463
41,Gambling,7,0.005122
40,"Human Trafficking, Commercial Sex Acts",10,0.007317
39,Rape,30,0.02195
38,Homicide,37,0.027071
37,Liquor Laws,41,0.029998
36,Prostitution,47,0.034388
35,Drug Violation,58,0.042436
34,Vehicle Misplaced,59,0.043167
33,Suicide,79,0.057801


In [112]:
# let's viw the distribution by category
fig = px.bar(data_frame=by_cat[-20:], x='prop', y='category',
             hover_data={'prop':':.2f',
                         'count':':,'},
             labels={'prop':'Percentage (%)', 'category':'Category'},
             title='Proportion of top 20 incidents by category',
             template='simple_white')
fig

In [96]:
# let's see if there is any pattern in the number of incidents by month of the year
by_month_cat = pd.DataFrame(incidents.groupby(['incident_month', 'incident_month_num', 'incident_category_new'])['incident_id'].count())
by_month_cat.reset_index(inplace=True)
by_month_cat.columns = ['month', 'month_num', 'category', 'count']
by_month_cat['month_sum'] = by_month_cat.groupby('month')['count'].transform('sum')
by_month_cat['total_prop'] = by_month_cat['count']/incidents.shape[0]*100
by_month_cat['month_prop'] = by_month_cat['count']/by_month_cat.month_sum*100
# by_month_cat

Unnamed: 0,month,month_num,category,count,month_sum,total_prop,month_prop
0,April,4,Arson,25,10983,0.018291,0.227625
1,April,4,Assault,683,10983,0.499718,6.218702
2,April,4,Burglary,550,10983,0.402409,5.007739
3,April,4,Case Closure,45,10983,0.032924,0.409724
4,April,4,Courtesy Report,47,10983,0.034388,0.427934
...,...,...,...,...,...,...,...
478,September,9,Vehicle Impounded,6,12308,0.004390,0.048749
479,September,9,Vehicle Misplaced,5,12308,0.003658,0.040624
480,September,9,Warrant,273,12308,0.199741,2.218070
481,September,9,Weapons Carrying Etc,69,12308,0.050484,0.560611


In [113]:
plot_data = by_month_cat.sort_values(by=['month_num', 'month_prop'], ascending=[True, False])
fig = px.bar(plot_data, x='month', y='month_prop', color='category',
             hover_data={'month_prop':':.2f',
                         'count':':,'},
             labels={'month':'Month', 'month_prop':'Percentage (%)', 'category':'Category'},
             title='Proportion of incidents by category and calendar month',
             template='simple_white')
# fig.update_layout(width=700, height=700)
fig

In [114]:
fig = px.bar(plot_data, x='month', y='month_prop', color='category',
             hover_data={'month_prop':':.2f',
                         'count':':,'},
             labels={'month':'Month', 'month_prop':'Percentage (%)', 'category':'Category'},
             title='Proportion of incidents by category and calendar month',
             template='simple_white')
fig.update_layout(showlegend=False)
fig