# Creating SQL Database
Building on our initial exploration, we can ingest and clean the data using the techniques from `exploration.ipynb`.
Rather than using a static CSV file, let's download the most recent CSV files for the `Crisis`, `Use of Force`, and `Crime` datasets. 


In [25]:
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 500)


In [None]:
crisis_df = pd.read_csv('https://data.seattle.gov/resource/i2q9-thny.csv?$limit=100000')
crisis_df.replace({'-': None}, inplace=True)

In [2]:
uof_df = pd.read_csv('https://data.seattle.gov/resource/ppi5-g2bj.csv?$limit=100000')

In [3]:
crime_df = pd.read_csv('https://data.seattle.gov/resource/tazs-3rd5.csv?$limit=1500000', parse_dates=['offense_start_datetime', 'report_datetime'])

In [4]:
# import sqlite3
# conn = sqlite3.connect('police-records.db')
# data = crisis_df
# data.to_sql('Crisis', conn, if_exists='replace', index=False)

In [6]:
import sqlite3
conn = sqlite3.connect('police-records.db')
data = crisis_df
data.to_sql('crisis', conn, if_exists='replace', index=False)
data = uof_df
data.to_sql('uof', conn, if_exists='replace', index=False)
data = crime_df
data.to_sql('crime', conn, if_exists='replace', index=False)


1007651

In [7]:
q = '''SELECT COUNT(*)
FROM crisis'''
print(pd.read_sql(q, conn))

   COUNT(*)
0     76214


The crisis dataset website warns us in many places that there is a one to many relationship between the disposition and the event. This means to get accurate counts of the crisis events we need to count the distinct `template_id` values, not rows. 

In [9]:
q = '''SELECT COUNT(DISTINCT template_id) Counts
FROM crisis'''
print(pd.read_sql(q, conn))

   Counts
0   75125


We can check that all our tables in the database look correct:

In [10]:
q = '''SELECT *
FROM crisis
LIMIT 5
OFFSET 10'''
print(pd.read_sql(q, conn))

   template_id        reported_date reported_time        occured_date_time  \
0        43469  2015-05-15T00:00:00      03:57:00  2015-05-15T11:16:25.000   
1        43992  2015-05-15T00:00:00      10:56:00  2015-05-15T15:52:13.000   
2        57315  2015-05-15T00:00:00      08:30:00  2015-05-15T18:16:56.000   
3        43929  2015-05-16T00:00:00      06:20:00  2015-05-16T17:29:10.000   
4        43897  2015-05-16T00:00:00      03:52:00  1900-01-01T00:00:00.000   

  call_type                                initial_call_type  \
0       911            PERSON IN BEHAVIORAL/EMOTIONAL CRISIS   
1       911        THEFT (DOES NOT INCLUDE SHOPLIFT OR SVCS)   
2       911  HAZ - POTENTIAL THRT TO PHYS SAFETY (NO HAZMAT)   
3       911                          SERVICE - WELFARE CHECK   
4      None                                             None   

                final_call_type                         disposition  \
0  --CRISIS COMPLAINT - GENERAL                  Mobile Crisis Team   
1   

Let's see what the most frequent crisis calls are. 

In [11]:
q = '''SELECT *
FROM uof
LIMIT 5
OFFSET 10'''
print(pd.read_sql(q, conn))

                uniqueid  incident_num           incident_type  \
0  2014UOF-0011-1233-223           284  Level 2 - Use of Force   
1  2014UOF-0012-1464-223           271  Level 2 - Use of Force   
2   2014UOF-0013-147-245           292  Level 1 - Use of Force   
3  2014UOF-0014-1202-245           293  Level 1 - Use of Force   
4  2014UOF-0015-1031-169           219  Level 1 - Use of Force   

         occured_date_time precinct   sector beat  officer_id  subject_id  \
0  2014-06-11T02:15:00.000     West     KING   K3        1640         223   
1  2014-06-11T02:15:00.000     West     KING   K3        1141         223   
2  2014-06-17T18:21:00.000     East   GEORGE   G3        1542         245   
3  2014-06-17T18:30:00.000     East   GEORGE   G3        1635         245   
4  2014-05-30T23:23:00.000     East  CHARLIE   C1        1690         169   

                subject_race subject_gender  
0              Not Specified           Male  
1              Not Specified           Male  
2 

In [12]:
q = '''SELECT *
FROM crime
LIMIT 5
OFFSET 10'''
print(pd.read_sql(q, conn))

  report_number   offense_id offense_start_datetime     offense_end_datetime  \
0   2020-044038  12604928711    2020-02-04 20:57:00                     None   
1   2020-043971  12604927228    2019-02-04 00:00:00  2020-02-04T08:00:00.000   
2   2020-043805  12604929082    2020-01-30 19:30:00                     None   
3   2020-043805  12605193820    2020-01-30 19:30:00                     None   
4   2020-043518  12604909238    2020-02-04 07:30:00  2020-02-04T10:00:00.000   

       report_datetime group_a_b crime_against_category  offense_parent_group  \
0  2020-02-04 21:20:35         A               PROPERTY         LARCENY-THEFT   
1  2020-02-04 21:18:52         A               PROPERTY        FRAUD OFFENSES   
2  2020-02-04 21:14:00         A               PROPERTY         LARCENY-THEFT   
3  2020-02-04 21:14:00         A                 PERSON  KIDNAPPING/ABDUCTION   
4  2020-02-04 20:59:01         A               PROPERTY         LARCENY-THEFT   

                    offense offe

# Querying the database
Now that our database is created, we can run various queries to try to extract conclusions.

What are the most common crisis call types?

In [11]:
q = '''SELECT initial_call_type, count(*) Counts 
FROM crisis
GROUP BY initial_call_type
ORDER BY Counts DESC
LIMIT 25
'''
print(pd.read_sql(q, conn))

                                  initial_call_type  Counts
0             PERSON IN BEHAVIORAL/EMOTIONAL CRISIS   13700
1      SUICIDE - IP/JO SUICIDAL PERSON AND ATTEMPTS   13314
2                  DISTURBANCE, MISCELLANEOUS/OTHER    6543
3                                              None    5611
4            SUSPICIOUS PERSON, VEHICLE OR INCIDENT    2563
5                           SERVICE - WELFARE CHECK    2244
6             SUICIDE, SUICIDAL PERSON AND ATTEMPTS    2184
7                                          TRESPASS    2080
8                  DIST - IP/JO - DV DIST - NO ASLT    1993
9     THREATS (INCLS IN-PERSON/BY PHONE/IN WRITING)    1802
10   ASLT - IP/JO - WITH OR W/O WPNS (NO SHOOTINGS)    1628
11           SFD - ASSIST ON FIRE OR MEDIC RESPONSE    1624
12                              DIST - DV - NO ASLT    1617
13            ASSIST OTHER AGENCY - ROUTINE SERVICE    1473
14  HAZ - POTENTIAL THRT TO PHYS SAFETY (NO HAZMAT)    1250
15        ASLT - WITH OR W/O WEAPONS (NO

In [16]:
q = '''SELECT strftime('%Y', reported_date) Year, COUNT(DISTINCT template_id) "No CIT" 
FROM crisis
WHERE cit_officer_requested = "Y" AND
cit_officer_arrived = "N"
GROUP BY Year
ORDER BY Year
'''
print(pd.read_sql(q, conn))

   Year  No CIT
0  2015     661
1  2016     796
2  2017     716
3  2018     774
4  2019     198


In [22]:
q = '''SELECT "Beat", COUNT(*) "Use of Force Count"
FROM uof
WHERE beat != "-"
GROUP BY beat
ORDER BY "Use of Force Count" DESC
'''
print(pd.read_sql(q, conn))

   beat  Use of Force Count
0    E2                1117
1    K2                 521
2    K3                 514
3    K1                 492
4    D2                 472
5    N3                 469
6    R2                 420
7    M3                 392
8    S1                 367
9    E1                 365
10   D1                 357
11   M2                 353
12   XX                 345
13   S2                 332
14   N2                 321
15   U2                 297
16   S3                 274
17   M1                 274
18   R1                 270
19   E3                 265
20   D3                 244
21   O1                 243
22   Q3                 239
23   G2                 235
24   G3                 233
25   G1                 229
26   L1                 224
27   L2                 221
28   R3                 214
29   B1                 210
30   U1                 209
31   L3                 205
32   U3                 201
33   F2                 182
34   B2             

In [8]:
q = '''SELECT "Beat", COUNT(DISTINCT template_id) Count 
FROM crisis
WHERE beat != "-"
GROUP BY beat
ORDER BY Count DESC
'''
print(pd.read_sql(q, conn))

   beat  Count
0    K3   2614
1    E3   2515
2    E1   2433
3    K2   2386
4    K1   2028
5    D2   1938
6    G1   1872
7    D3   1832
8    N3   1830
9    D1   1822
10   E2   1686
11   Q3   1633
12   R2   1515
13   B1   1514
14   U2   1469
15   M1   1429
16   N2   1428
17   L3   1400
18   G2   1375
19   B3   1359
20   L1   1358
21   F1   1358
22   M2   1342
23   U1   1339
24   W2   1313
25   U3   1313
26   N1   1263
27   Q2   1262
28   C1   1248
29   G3   1236
30   W1   1225
31   M3   1186
32   C2   1169
33   L2   1145
34   B2   1091
35   W3   1085
36   R1   1041
37   R3   1022
38   J1   1018
39   F2   1006
40   S1   1001
41   J3    983
42   C3    920
43   S2    850
44   J2    848
45   F3    825
46   Q1    818
47   S3    810
48   O1    697
49   O3    566
50   O2    465
51   99     21
52  OOJ     16


Compare incedent counts per beat with use of force per beat
Most common crime type in each neighborhood