# San Luis Obispo Police Data

## Introduction

In this notebook, you will explore data from the daily logs of the San Luis Obispo Police Department. The SLOPD posts daily logs of their data [here](http://pdreport.slocity.org/policelog/rpcdsum.txt), but unfortunately, they remove the data daily when they post updates. This dataset comes from Professor [Thomas D. Gutierrez](http://www.physics.calpoly.edu/faculty/tgutierrez), who has collected the daily logs for a number of weeks and shared them.

## The dataset

The dataset is located in the `/data` directory and contains daily logs. The date in the filename is the date the log was downloaded, which does not necessarily correspond with the incident dates contained therein.

In [1]:
!ls /data/slo_police_logs_2017-02/ | head

SLOPolice_TDGAcq2016-07-29-2347.txt
SLOPolice_TDGAcq2016-08-01-1200.txt
SLOPolice_TDGAcq2016-08-02-1200.txt
SLOPolice_TDGAcq2016-08-03-1200.txt
SLOPolice_TDGAcq2016-08-03-1700.txt
SLOPolice_TDGAcq2016-08-04-1700.txt
SLOPolice_TDGAcq2016-08-05-1901.txt
SLOPolice_TDGAcq2016-08-10-2050.txt
SLOPolice_TDGAcq2016-08-10-2052.txt
SLOPolice_TDGAcq2016-08-11-1920.txt


Individual files are text file containing a file header and then a set of incident reports. Here is a the start of the single file that shows the basic format:

In [2]:
!tail -n 40 /data/slo_police_logs_2017-02/SLOPolice_TDGAcq2016-07-29-2347.txt

Responsible Officer: Middleton, J
Units: 4244  ,4265  ,4264  ,4269
 Des: incid#=160729015 AP/ULIBARRI, CLARK 100382 WARRANTS clr:RTF oc:WARR
      call=21l
CALL COMMENTS: PLOT BEHIND BUILDING, IN PATIO
160729016 07/29/16 Received:06:46 Dispatched:06:48 Arrived:      Cleared:06:49


Type: Public Works                                            Location:PN6
As Observed:


Addr: 2125 STORY; HAWTHORNE SCHOOL; GRID K-11,   Clearance Code:No Report

Responsible Officer: McCornack, CM
Units: COM5
 Des: incid#=160729016 Completed call disp:NR clr:NR call=22l
CALL COMMENTS: WATER LEAK NEXT TO PLAYGROUND
--------------------------------------------------------------------------------
    Total Incidents for This Report:
--------------------------------------------------------------------------------
Report Includes:
All dates between `07:00:00 07/28/16` and `07:00:00 07/29/16`
All agencies matching `SLP*`
All officers
All dispositions
All natur

## The assignment

Your assignment is to perform the following steps:

* Go through the following steps of the data science process:
  - Import: parse the original data files into Pandas DataFrames.
  - Tidy: one table per entity, variables in columns, samples in rows. You will need
    multiple tables to represent the many-to-many relationship between units and incidents. It may help
    to save the tidy data to a SQL database.
  - Transform: perform transformations of the data to make it more useful. Examples would include putting addresses
    into a standardized format using a web-service, or putting date/times into a standard format such as
    [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601).
  - Visualize: create visualizations using Altair or Matplotlib to generate and answer questions.
  - Explore: explore different questions in the dataset.
* Create a computational narrative of your work. This narrative should intermix code cells with markdown cells that
  describe to the reader what you are doing.

## Your work

### Data Characterization
##### General
* 10 reports in /data/slo_police_logs_2017-02/
* 5 lines at head of report
* 13 lines per incident report
* 4 total empty lines per incident: 2 bar lines, 2 empty lines
* 19 lines at tail of report

##### Attributes
    - Incident# - (int)
    - Date - (dateTime)
    - TimeReceived - (dateTime)
    - TimeDispatched - (dateTime)
    - TimeArrived - (dateTime)
    - TimeCleared - (dateTime)
    - Type - (string):O?
    - As Observed (AsObs) - (string):O?
    - Location (string):O?
        - PN1-13
    - Addr - (string):N?
    - clearance code (CC) - (string):O? 
        NR=No Report; CC=Call Cancelled; FI=Field Interview; RTF=Report to Follow; Unfounded; Unable to Locate; Report to Watch; 72HR Tag for 112; Alarm Malfunctio; alarm-human erro; Gone on Arrival; negative violati
    - Officer - (string)
        - LastName - (string)
        - FirstInitial - (string)
    - Units - (string)O?
        - ex: COM2, (4202, 4217), 42K4, (1+ items)
    - Descript - (string)N    
        - incid# - (int) corresponds to report#
        - call - (string?)
            - ex: 24l, 25l, (1 item)
    - Comments - (string) (1 line?)

##### Tables
    - Incident#
    - Unit
    - Type
    - Location
    - ClearanceCode

Create new code and markdown cells below this point for all of your work.

In [3]:
from datetime import datetime as dt
import numpy as np
import pandas as pd
import altair

In [4]:
def pullReports():
    files = !ls /data/slo_police_logs_2017-02/
    reportList = []
    for i in range(len(files)):
        with open("/data/slo_police_logs_2017-02/" + files[i]) as f:
            reportList.append(f.readlines())
    return reportList

In [5]:
def formatLines(report):
    for line in report:
        line = line.lower()
        line = line.rstrip('\n')
        yield line

In [80]:
def removeReportHeadAndTail(report):
    count = 0
    divbar = '==============================================================================='
    tailbar = '--------------------------------------------------------------------------------'
    while report[count] != divbar:
        count += 1
    headEnd = count
    if count > 5:
        report.insert(5, divbar)
        headEnd = 5
    while report[count] != tailbar:
        count += 1
    return report[headEnd:count]

In [29]:
def divIntoIncidents(report):
    divbar = '==============================================================================='
    iCount = 0; divCount = 0
    incidentList = []; iList = []
    for line in report:
        if line != divbar:
            iList.append(line)
        else:
            divCount += 1
        if divCount == 3:
            incidentList.append(iList)
            iCount += 1
            divCount = 1
            iList = []
    return incidentList

In [37]:
def removeNullStr(report):
    iList0 = []; iList = []
    for incident in report:
        for line in incident:
            if line != '':
                iList.append(line)
        iList0.append(iList)
        iList = []
    return iList0

In [43]:
def divApartIncidents(report):
    iDict = {0:{}, 1:{}, 2:{}, 3:{}, 
             4:{}, 5:{}, 6:{}, 7:{}, 
             8:{}, 9:{}}
    for inci in range(len(report)):
        extra = 0
        inciLen = len(report[inci])
        for j in range(inciLen):
            first3 = report[j][:3]
            if first3 == '   ':
                if extra > 0:
                    iDict[9][inci] = report[j]
                else:
                    iDict[8][inci] = report[j]
                extra += 1
            else:
                iDict[j - extra][inci] = report[j]
    return iDict

In [10]:
def tokenizeInciHeader(header):
    inciHead = {'inciNum':{}, 'date':{}, 'timeRec':{}, 
                'timeDisp':{}, 'timeArr':{}, 'timeClr':{}}
    for line in header:
        split = header[line].split()
        inciHead['inciNum'][line] = split[0]
        date = split[1]
        inciHead['date'][line] = dt.strptime(date, "%m/%d/%y").date()
        time = split[2].strip('received:')
        dateTimeStr = date + time
        if time != '':
            inciHead['timeRec'][line] = dt.strptime(time, "%H:%M").time()
        time = split[3].strip('dispatched:')
        dateTimeStr = date + time
        if time != '':
            inciHead['timeDisp'][line] = dt.strptime(dateTimeStr, "%m/%d/%y%H:%M").time()
        time = split[4].strip('arrived:')
        dateTimeStr = date + time
        if time != '':
            inciHead['timeArr'][line] = dt.strptime(dateTimeStr, "%m/%d/%y%H:%M").time()
        time = split[5].strip('cleared:')
        dateTimeStr = date + time
        if time != '':
            inciHead['timeClr'][line] = dt.strptime(dateTimeStr, "%m/%d/%y%H:%M").time()
    return inciHead

In [11]:
def tokenizeTypeLoc(typeLoc):
    inciTypeLoc = {'type':{}, 'loc':{}}
    for line in typeLoc:
        split = typeLoc[line].split('location:')
        inciTypeLoc['type'][line] = split[0]
        inciTypeLoc['loc'][line] = split[1]
    for i in inciTypeLoc['type']:
        typ = inciTypeLoc['type'][i].split('type:')
        inciTypeLoc['type'][i] = typ[1].strip()
    return inciTypeLoc

In [12]:
def tokenizeAddrCC(addrCC):
    inciAddrCC = {'addr':{}, 'cc':{}}
    for line in addrCC:
        split = addrCC[line].split('clearance code:')
        if len(split) > 1:
            inciAddrCC['addr'][line] = split[0].strip()
            inciAddrCC['cc'][line] = split[1].strip()
        else:
            inciAddrCC['addr'][line] = split[0].strip()
    return inciAddrCC

In [13]:
def tokenizeOfficers(offic):
    ioffic = {'officer':{}}
    for line in offic:
        split = offic[line].split('responsible officer:')
        for i in range(len(split)):
            if split[i] != '':
                ioffic['officer'][line] = split[i].strip()
    return ioffic

In [14]:
def tokenizeUnits(units):
    iunits = {'units':{}}
    for line in units:
        split = units[line].split('units:')
        if len(split) > 1:
            split = split[1].split(' ,')
        iunits['units'][line] = split
    for i in iunits['units']:
        u = []
        for j in iunits['units'][i]:
            cleanUnit = j.strip()
            if cleanUnit != '':
                u.append(cleanUnit)
        iunits['units'][i] = u
    return iunits

In [15]:
def tokenizeDes(des):
    ides = {'descript':{}}
    for line in des:
        split = des[line].split('des:')
        ides['descript'][line] = split[1].strip()
    return ides

In [16]:
def tokenizeCom(comments):
    icomments = {'call comments':{}}
    for line in comments:
        comms = comments[line].split('call comments:')#.strip()
        icomments['call comments'][line] = comms[1].strip()
    return icomments

In [17]:
def tokenizeAO(AO):
    iAO = {'observed':{}}
    for line in AO:
        iAO['observed'][line] = AO[line].strip()
    return iAO

In [25]:
def cleanReport(report):
    formatted = list(formatLines(report)) #strip newlines and make all words lowercase
    formatted = removeReportHeadAndTail(formatted) #remove the head off the report
    incidentDict = divIntoIncidents(formatted) #split report into incidents
    print(incidentDict)
    iDict = removeNullStr(incidentDict) #remove null lines from each incident
    iDict2 = divApartIncidents(iDict) #group lines of incidents by line type

    header = iDict2[0]
    yield tokenizeInciHeader(header) #spec fcn for dealing with header lines

    typeLoc = iDict2[1]
    yield tokenizeTypeLoc(typeLoc) #spec fcn for parsing type/location line

    addrCC = iDict2[3]
    yield tokenizeAddrCC(addrCC)

    offic = iDict2[4]
    yield tokenizeOfficers(offic)

    units = iDict2[5]
    yield tokenizeUnits(units)

    des = iDict2[6]
    yield tokenizeDes(des)

    comments = iDict2[7]
    yield tokenizeCom(comments)

    asobs = iDict2[8]
    yield tokenizeAO(asobs)

In [19]:
reportList = pullReports()

In [20]:
count = 0

In [85]:
for report in reportList:
    count += 1
    print(count)
    genReport = list(cleanReport(report))

39
[['160728012 07/28/16 received:07:11 dispatched:07:13 arrived:      cleared:07:15', 'type: alarm audible                                           location:pn13', 'as observed:', '', '', 'addr: 12318 los osos valley; enterprise rent a   clearance code:call cancelled', '', 'responsible officer: benson, g', 'units: 4245  ,4231', ' des: incid#=160728012 completed call disp:can clr:can call=18l', 'call comments: front door, rear motion'], ['160728013 07/28/16 received:07:35 dispatched:07:45 arrived:07:49 cleared:07:58', 'type: loitering                                               location:pn5', 'as observed:', '', '', 'addr: 890 marsh; jamba juice; grid k-09, san lu  clearance code:no report', '', 'responsible officer: inglehart, b', 'units: 4226  ,4245', ' des: incid#=160728013 completed call disp:nr clr:nr call=19l', 'call comments: check on 3 transients causing a "disturbance" ifo jamba'], ['160728014 07/28/16 received:07:44 dispatched:07:46 arrived:      cleared:07:50', 'type: pub

AttributeError: 'list' object has no attribute 'split'

In [81]:

formatted = list(formatLines(reportList[7])) #strip newlines and make all words lowercase
formatted = removeReportHeadAndTail(formatted) #remove the head off the report
incidentDict = divIntoIncidents(formatted) #split report into incidents
iDict = removeNullStr(incidentDict) #remove null lines from each incident
iDict2 = divApartIncidents(iDict) #group lines of incidents by line type

In [82]:
reportList[7]

['TDG START-------------------------------------------------\n',
 '\n',
 'RCPK\n',
 '08/10/16               San Luis Obispo Police Department                     521\n',
 '160809016 08/09/16 Received:08:16 Dispatched:08:17 Arrived:08:23 Cleared:08:23\n',
 'Type: Assist Req                                              Location:PN5\n',
 'As Observed:\n',
 '      Citizen Assist\n',
 '\n',
 'Addr: 1042 WALNUT; SLOPD; GRID K-08, San Luis O  Clearance Code:No Report\n',
 '\n',
 'Responsible Officer: Peterson, T\n',
 'Units: 4227\n',
 ' Des: (MDC) Completed call incid#=160809016 call=22l\n',
 'CALL COMMENTS: PD LOBBY\n',
 '160809017 08/09/16 Received:08:22 Dispatched:08:24 Arrived:08:30 Cleared:08:36\n',
 'Type: Alarm Audible                                           Location:PN10\n',
 'As Observed:\n',
 '\n',
 '\n',
 'Addr: 2995 MCMILLAN #196; CRESCENT HEALTHCARE;   Clearance Code:ALARM-HUMAN ERRO\n',
 '\n',
 'Responsible Officer: Koznek, J\n',
 'Units: 4233  ,4202\n',
 ' Des: (MDC) Complete

In [83]:
formatted

 '160809016 08/09/16 received:08:16 dispatched:08:17 arrived:08:23 cleared:08:23',
 'type: assist req                                              location:pn5',
 'as observed:',
 '      citizen assist',
 '',
 'addr: 1042 walnut; slopd; grid k-08, san luis o  clearance code:no report',
 '',
 'responsible officer: peterson, t',
 'units: 4227',
 ' des: (mdc) completed call incid#=160809016 call=22l',
 'call comments: pd lobby',
 '160809017 08/09/16 received:08:22 dispatched:08:24 arrived:08:30 cleared:08:36',
 'type: alarm audible                                           location:pn10',
 'as observed:',
 '',
 '',
 'addr: 2995 mcmillan #196; crescent healthcare;   clearance code:alarm-human erro',
 '',
 'responsible officer: koznek, j',
 'units: 4233  ,4202',
 ' des: (mdc) completed call incid#=160809017 call=23l',
 'call comments: s/e corner delay door',
 '160809018 08/09/16 received:08:56 dispatched:09:01 arrived:09:09 cleared:09:53',
 'type: assist req                                 

In [84]:
iDict2

{0: {0: ['160809016 08/09/16 received:08:16 dispatched:08:17 arrived:08:23 cleared:08:23',
   'type: assist req                                              location:pn5',
   'as observed:',
   '      citizen assist',
   'addr: 1042 walnut; slopd; grid k-08, san luis o  clearance code:no report',
   'responsible officer: peterson, t',
   'units: 4227',
   ' des: (mdc) completed call incid#=160809016 call=22l',
   'call comments: pd lobby'],
  1: ['160809016 08/09/16 received:08:16 dispatched:08:17 arrived:08:23 cleared:08:23',
   'type: assist req                                              location:pn5',
   'as observed:',
   '      citizen assist',
   'addr: 1042 walnut; slopd; grid k-08, san luis o  clearance code:no report',
   'responsible officer: peterson, t',
   'units: 4227',
   ' des: (mdc) completed call incid#=160809016 call=22l',
   'call comments: pd lobby'],
  2: ['160809016 08/09/16 received:08:16 dispatched:08:17 arrived:08:23 cleared:08:23',
   'type: assist req     

In [None]:
cleanReportList = list(genReport)

In [None]:
iNum = cleanReportList[0]

In [None]:
iNum

In [None]:
iRec = {'time received':iHead['timeRec']}

In [None]:
table2 = {**icomments, **iRec}

In [None]:
commentTime = pd.DataFrame(table2)

In [None]:
commentTime

In [None]:
iNumTable = pd.DataFrame(table1)

In [None]:
iNumTable