# Creating SQL Database
Building on our initial exploration, we can ingest and clean the data using the techniques from `exploration.ipynb`.
Rather than using a static CSV file, let's download the most recent CSV.


In [1]:
import pandas as pd
pd.set_option('display.max_columns', None)
df = pd.read_csv('https://data.seattle.gov/resource/i2q9-thny.csv?$limit=100000')
df.replace({'-': None}, inplace=True)

In [2]:
import sqlite3
conn = sqlite3.connect('police-records.db')
data = df
data.to_sql('Crisis', conn, if_exists='replace', index=False)

75987

In [3]:
q = '''SELECT COUNT(*)
FROM crisis'''
print(pd.read_sql(q, conn))

   COUNT(*)
0     75987


The dataset website warns us in many places that there is a one to many relationship between the disposition and the event. This means to get accurate counts of the crisis events we need to count the distinct `template_id` values, not rows. 

In [4]:
q = '''SELECT COUNT(DISTINCT template_id) Counts
FROM crisis'''
print(pd.read_sql(q, conn))

   Counts
0   74900


We can check that are database looks correct:

In [9]:
q = '''SELECT *
FROM crisis
LIMIT 5
OFFSET 10'''
print(pd.read_sql(q, conn))

   template_id        reported_date reported_time        occured_date_time  \
0        57315  2015-05-15T00:00:00      08:30:00  2015-05-15T18:16:56.000   
1        43662  2015-05-15T00:00:00      10:22:00  2015-05-15T21:33:28.000   
2        43653  2015-05-15T00:00:00      08:33:00  2015-05-15T19:48:18.000   
3        43982  2015-05-16T00:00:00      11:07:00  2015-05-16T22:50:33.000   
4        43845  2015-05-16T00:00:00      12:13:00  2015-05-16T05:15:03.000   

                  call_type                                initial_call_type  \
0                       911  HAZ - POTENTIAL THRT TO PHYS SAFETY (NO HAZMAT)   
1                    ONVIEW  HAZ - POTENTIAL THRT TO PHYS SAFETY (NO HAZMAT)   
2                       911            PERSON IN BEHAVIORAL/EMOTIONAL CRISIS   
3  TELEPHONE OTHER, NOT 911                          SERVICE - WELFARE CHECK   
4                       911            PERSON IN BEHAVIORAL/EMOTIONAL CRISIS   

                final_call_type                   

Let's see what the most frequent crisis calls are. 

In [11]:
q = '''SELECT initial_call_type, count(*) Counts 
FROM crisis
GROUP BY initial_call_type
ORDER BY Counts DESC
LIMIT 25
'''
print(pd.read_sql(q, conn))

                                  initial_call_type  Counts
0             PERSON IN BEHAVIORAL/EMOTIONAL CRISIS   13700
1      SUICIDE - IP/JO SUICIDAL PERSON AND ATTEMPTS   13314
2                  DISTURBANCE, MISCELLANEOUS/OTHER    6543
3                                              None    5611
4            SUSPICIOUS PERSON, VEHICLE OR INCIDENT    2563
5                           SERVICE - WELFARE CHECK    2244
6             SUICIDE, SUICIDAL PERSON AND ATTEMPTS    2184
7                                          TRESPASS    2080
8                  DIST - IP/JO - DV DIST - NO ASLT    1993
9     THREATS (INCLS IN-PERSON/BY PHONE/IN WRITING)    1802
10   ASLT - IP/JO - WITH OR W/O WPNS (NO SHOOTINGS)    1628
11           SFD - ASSIST ON FIRE OR MEDIC RESPONSE    1624
12                              DIST - DV - NO ASLT    1617
13            ASSIST OTHER AGENCY - ROUTINE SERVICE    1473
14  HAZ - POTENTIAL THRT TO PHYS SAFETY (NO HAZMAT)    1250
15        ASLT - WITH OR W/O WEAPONS (NO

In [17]:
q = '''SELECT strftime('%Y', reported_date) Year, COUNT(DISTINCT template_id) "Requested CIT" 
FROM crisis
WHERE cit_officer_requested = "Y"
GROUP BY Year
ORDER BY Year
'''
print(pd.read_sql(q, conn))

   Year  Requested CIT
0  1900              2
1  2015           2605
2  2016           4520
3  2017           4774
4  2018           5580
5  2019           7454
6  2020           7780
7  2021           8234
8  2022           5523


In [16]:
q = '''SELECT strftime('%Y', reported_date) Year, COUNT(DISTINCT template_id) "No CIT" 
FROM crisis
WHERE cit_officer_requested = "Y" AND
cit_officer_arrived = "N"
GROUP BY Year
ORDER BY Year
'''
print(pd.read_sql(q, conn))

   Year  No CIT
0  2015     661
1  2016     796
2  2017     716
3  2018     774
4  2019     198


In [8]:
q = '''SELECT "Beat", COUNT(DISTINCT template_id) Count 
FROM crisis
WHERE beat != "-"
GROUP BY beat
ORDER BY Count DESC
'''
print(pd.read_sql(q, conn))

   beat  Count
0    K3   2614
1    E3   2515
2    E1   2433
3    K2   2386
4    K1   2028
5    D2   1938
6    G1   1872
7    D3   1832
8    N3   1830
9    D1   1822
10   E2   1686
11   Q3   1633
12   R2   1515
13   B1   1514
14   U2   1469
15   M1   1429
16   N2   1428
17   L3   1400
18   G2   1375
19   B3   1359
20   L1   1358
21   F1   1358
22   M2   1342
23   U1   1339
24   W2   1313
25   U3   1313
26   N1   1263
27   Q2   1262
28   C1   1248
29   G3   1236
30   W1   1225
31   M3   1186
32   C2   1169
33   L2   1145
34   B2   1091
35   W3   1085
36   R1   1041
37   R3   1022
38   J1   1018
39   F2   1006
40   S1   1001
41   J3    983
42   C3    920
43   S2    850
44   J2    848
45   F3    825
46   Q1    818
47   S3    810
48   O1    697
49   O3    566
50   O2    465
51   99     21
52  OOJ     16
