In [1]:
%load_ext pretty_jupyter

In [2]:
%%html

<style>
    #Styling {
        font-weight: bold;
        font-family: Helvetica;
    }
</style>

# Goal

- What we have in dev-pre-restructure is working fine but we're on a short deadline and need to streamline the data. Let's make a barebones table of the core data we need for analysis, including:
    - `event_no`
    - `date_occurred` (w/ `year_occurred`)
    - `source_type` (a field derived from the initial event type in the data, either 911 call reporting gunfire or Shotspotter GDT alert)
    - `event_location` (service address if specified)
    - `date_dispatched` (is this the same as date of arrival?)

# Research questions we're working towards

1. Is dispatch reported at the same rate for all districts?
2. RE: Soundthinking / Brookings Institute claim that some 80% of gunfire events do not get reported by citizens - is that True in Chicago?
    * When SST alerts aren't matched to 911 calls, what is the typical disposition of such an alert?
    * When 911 calls aren't matched to SST alert, what is the typical disposition?

# Time period covered

- Earliest date occurred included: '2021-01-01'
- Last date occurred included: '2024-11-05' (when the SST technology was reportedly deactivated; CPD and OEMC_sst only cover to this date)
    - Does it make sense to extend to end of 2024 anyways? the Dec 2024 911 calls won't have a SST match and could be ignored or addressed differently depending on how we handle analysis

# Identifying Event type (based on the _init_type_ field)

```
shotspotter = ['SST', 'PSST', 'MSST'] # keywords provided by CPD in Info sheet
citizencalls = ['SHOTS', 'SHOTSF', 'PERSHO',] # Note: 'PERGUN', 'PERDOW','PERHLP', 'DOMBAT', etc. excluded
```

# Note about what humans can do that SST can't

In the OEMC data, in addition to reports originating as calls about shots fired ('SHOTSF') or persons being shot ('PERSHO'), we also see events with an initial label 'PERGUN' and final label referring to shots fired or someone shot. This tells us that not only do Chicagoans report gunfire in general, they may also report early warning signs of conflict involving firearms before any shots occur, giving first responders a head start to arrive on scene and provide potentially life-saving care.

In [3]:
# dependencies
import re
import numpy as np
from datetime import timedelta
import pandas as pd
from geopy.geocoders import Nominatim
from geopy.distance import geodesic

In [4]:
# support methods
def format_count(v):
    return "{:,}".format(v)


def format_prop(prop, decn=1, asperc=True):
    if asperc: prop = prop*100
    return "{}%".format(round(prop, decn))


def report_fields(df, idcol, cols, fillna=False, headn=10):
    data = df[[idcol] + cols].drop_duplicates()
    if fillna:
        count = data[cols].fillna('None reported').value_counts().to_frame().reset_index()
        perc = data[cols].fillna('None reported').value_counts(normalize=True).to_frame().reset_index(
            ).rename(columns={'proportion': 'percent'})
    else:
        count = data[cols].value_counts().to_frame().reset_index()
        perc = data[cols].value_counts(normalize=True).to_frame().reset_index(
            ).rename(columns={'proportion': 'percent'})
    count['count'] = count['count'].apply(format_count)
    perc.percent = perc.percent.apply(format_prop)
    out = pd.merge(count, perc, on=cols)
    return out.head(headn)

In [5]:
# main
colorder = [
    'event_no',
    'date_occurred',
    'date_dispatched',
    'location',
    'location_x',
    'location_y',
    'init_type',
    'init_type_desc',
    'fin_type',
    'fin_type_desc',
    'disposition',
    'source_type',
    'early_warning',
]
data = pd.read_parquet("../../merge/output/subset.parquet").rename(columns={
    'event_type': 'source_type',
    'event_type_init': 'init_type_desc',
    'event_type_fin': 'fin_type_desc',})[colorder]
assert data.event_no.nunique() == data.shape[0]

# Review data

In [6]:
%%jmd

<details open>
<summary>Sample emergency event record.</summary>

{{ data.sample().T.to_html() }}

</details>


<details open>
<summary>Sample emergency event record.</summary>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>230684</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>event_no</th>
      <td>2215502048</td>
    </tr>
    <tr>
      <th>date_occurred</th>
      <td>2022-06-04 02:49:00</td>
    </tr>
    <tr>
      <th>date_dispatched</th>
      <td>2022-06-04 02:49:00</td>
    </tr>
    <tr>
      <th>location</th>
      <td>92XX  S STONY ISLAND AV</td>
    </tr>
    <tr>
      <th>location_x</th>
      <td>41.726</td>
    </tr>
    <tr>
      <th>location_y</th>
      <td>-87.584</td>
    </tr>
    <tr>
      <th>init_type</th>
      <td>SST</td>
    </tr>
    <tr>
      <th>init_type_desc</th>
      <td>SHOT SPOTTER</td>
    </tr>
    <tr>
      <th>fin_type</th>
      <td>SST</td>
    </tr>
    <tr>
      <th>fin_type_desc</th>
      <td>SHOT SPOTTER</td>
    </tr>
    <tr>
      <th>disposition</th>
      <td>MISC.INC./NO PERSON FND.</td>
    </tr>
    <tr>
      <th>source_type</th>
      <td>ShotSpotter alert</td>
    </tr>
    <tr>
      <th>early_warning</th>
      <td>False</td>
    </tr>
  </tbody>
</table>

</details>

# Counts

In [7]:
%%jmd

### Overall

- There are {{format_count(data.shape[0])}} gunfire-related 911 calls and ShotSpotter alerts prepared for this analysis.
- The data cover a time period between {{ data.date_occurred.dt.date.min() }} and {{ data.date_occurred.dt.date.max() }}.

---


### Overall

- There are 394,199 gunfire-related 911 calls and ShotSpotter alerts prepared for this analysis.
- The data cover a time period between 2021-01-01 and 2024-11-04.

---

In [8]:
%%jmd

### Source type

- Frequency table:

{{ report_fields(df=data, idcol='event_no', cols=['source_type',]).to_html() }}


- Summary: Of the {{format_count(data.shape[0])}} emergency events included in the analysis,
    - {{format_count((data.source_type == 'Human reporting gunfire').sum()) }} or {{
        format_prop((data.source_type == 'Human reporting gunfire').sum()/data.shape[0]) }}
        were generated by a 911 call, and
    - {{format_count((data.source_type == 'ShotSpotter alert').sum())}} or {{
        format_prop((data.source_type == 'ShotSpotter alert').sum()/data.shape[0])}}
        were generated by a ShotSpotter alert.

---

#### 5 most frequently reported `disposition` values
Presented are:
- the initial event type as reported by OEMC and CPD (`init_type`),
- the description of the initial type as found in the data (`init_type_desc`), and
- the type of source which reported the event (`source_type`).

{{ report_fields(df=data, idcol='event_no', cols=['init_type', 'init_type_desc', 'source_type'], headn=10).to_html() }}

---


### Source type

- Frequency table:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>source_type</th>
      <th>count</th>
      <th>percent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>Human reporting gunfire</td>
      <td>237,302</td>
      <td>60.2%</td>
    </tr>
    <tr>
      <th>1</th>
      <td>ShotSpotter alert</td>
      <td>156,897</td>
      <td>39.8%</td>
    </tr>
  </tbody>
</table>


- Summary: Of the 394,199 emergency events included in the analysis,
    - 237,302 or 60.2%
        were generated by a 911 call, and
    - 156,897 or 39.8%
        were generated by a ShotSpotter alert.

---

#### 5 most frequently reported `disposition` values
Presented are:
- the initial event type as reported by OEMC and CPD (`init_type`),
- the description of the initial type as found in the data (`init_type_desc`), and
- the type of source which reported the event (`source_type`).

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>init_type</th>
      <th>init_type_desc</th>
      <th>source_type</th>
      <th>count</th>
      <th>percent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>SHOTSF</td>
      <td>SHOTS FIRED</td>
      <td>Human reporting gunfire</td>
      <td>191,216</td>
      <td>48.5%</td>
    </tr>
    <tr>
      <th>1</th>
      <td>SST</td>
      <td>SHOT SPOTTER</td>
      <td>ShotSpotter alert</td>
      <td>113,791</td>
      <td>28.9%</td>
    </tr>
    <tr>
      <th>2</th>
      <td>MSST</td>
      <td>Multiple Shot - ShotSpotter</td>
      <td>ShotSpotter alert</td>
      <td>38,855</td>
      <td>9.9%</td>
    </tr>
    <tr>
      <th>3</th>
      <td>PERSHO</td>
      <td>PERSON SHOT</td>
      <td>Human reporting gunfire</td>
      <td>32,468</td>
      <td>8.2%</td>
    </tr>
    <tr>
      <th>4</th>
      <td>SHOTS</td>
      <td>SHOTS FIRED (OV)</td>
      <td>Human reporting gunfire</td>
      <td>13,618</td>
      <td>3.5%</td>
    </tr>
    <tr>
      <th>5</th>
      <td>PSST</td>
      <td>Probable Shot - ShotSpotter</td>
      <td>ShotSpotter alert</td>
      <td>4,251</td>
      <td>1.1%</td>
    </tr>
  </tbody>
</table>

---

In [9]:
%%jmd

### Disposition

In the info page included with the data, the CPD FOIA officer informed us that they had internally
identified emergency events from both sources, 911 callers and ShotSpotter, that referred to the same underlying gunfire event,
and that the `disposition` field was only included in the responsive records when ShotSpotter was the first to report.
[Source](https://github.com/HRDAG/US-IL-ShotSpotter/blob/main/data/CPD_sst/import/docs/22809-P022910-CFS-SHOTSPOTTER.xlsx%20-%20Header%20sheet.pdf)

- Of the {{format_count(data.shape[0])}} emergency events included in the analysis, {{
    format_count(data.disposition.notna().sum())}} or {{
    format_prop(data.disposition.notna().sum()/data.shape[0])}}
    have a reported disposition.

---

#### 5 most frequently reported `disposition` values
<details open>
<summary>Presented are the 5 most frequently reported `disposition` values
for emergency events in which ShotSpotter was the first alert.</summary>

{{ report_fields(df=data, idcol='event_no', cols=['disposition',], headn=5).to_html() }}

</details>

- Of the {{format_count(data.disposition.notna().sum())}} emergency events about potential gunfire
    identified by CPD as first reported by ShotSpotter, {{
    format_count(data.disposition.str.contains("MISC.INC.", na=False, regex=False).sum())}} or {{
    format_prop(data.disposition.str.contains("MISC.INC.", na=False, regex=False).sum()/data.disposition.notna().sum())}}
    are labeled as a "Miscellaneous Incident."

---


### Disposition

In the info page included with the data, the CPD FOIA officer informed us that they had internally
identified emergency events from both sources, 911 callers and ShotSpotter, that referred to the same underlying gunfire event,
and that the `disposition` field was only included in the responsive records when ShotSpotter was the first to report.
[Source](https://github.com/HRDAG/US-IL-ShotSpotter/blob/main/data/CPD_sst/import/docs/22809-P022910-CFS-SHOTSPOTTER.xlsx%20-%20Header%20sheet.pdf)

- Of the 394,199 emergency events included in the analysis, 123,216 or 31.3%
    have a reported disposition.

---

#### 5 most frequently reported `disposition` values
<details open>
<summary>Presented are the 5 most frequently reported `disposition` values
for emergency events in which ShotSpotter was the first alert.</summary>

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>disposition</th>
      <th>count</th>
      <th>percent</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>MISC.INC./OTH POLICE SER</td>
      <td>87,415</td>
      <td>70.9%</td>
    </tr>
    <tr>
      <th>1</th>
      <td>MISC.INC./NO PERSON FND.</td>
      <td>15,671</td>
      <td>12.7%</td>
    </tr>
    <tr>
      <th>2</th>
      <td>WEAP VIO/DISC OF FIREA</td>
      <td>5,435</td>
      <td>4.4%</td>
    </tr>
    <tr>
      <th>3</th>
      <td>BATTERY:AGGR:HANDGUN</td>
      <td>2,746</td>
      <td>2.2%</td>
    </tr>
    <tr>
      <th>4</th>
      <td>ASSAULT;AGG HAND</td>
      <td>1,314</td>
      <td>1.1%</td>
    </tr>
  </tbody>
</table>

</details>

- Of the 123,216 emergency events about potential gunfire
    identified by CPD as first reported by ShotSpotter, 104,586 or 84.9%
    are labeled as a "Miscellaneous Incident."

---