### Building a Data Model
We want to build a model that will serve to influence a metric. The metric we are interested in is the clearance rate of crimes. The clearance rate is the percentage of crimes that are "solved" or "cleared" by the police. A crime is considered cleared when an arrest is made, when a suspect is charged, or when the case is closed in some other way. The clearance rate is calculated by dividing the number of crimes that are cleared by the total number of crimes recorded.


In [1]:
# import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# load the data
data = pd.read_csv('../data/crime_data_cleaned.csv')

data.shape

(549861, 37)

### Fix TIMEOCC column
There is a small formatting issue.  They use a 24 hour clock, but did not include a 0 to preceed time before noon

In [5]:
# select the columns for the data model

model = (data.loc[:,['AREANAME','CrmCdDesc','VictAge','VictSex','VictDescent',
                     'PremisDesc','WeaponDesc','crime_category','YEAR','MONTH',
                     'DAY','HOUR','TIMEOCC','TIME_OF_DAY','REPORTING_DELAY','crime_type',
                     'cleared']])

model.sample(5)

Unnamed: 0,AREANAME,CrmCdDesc,VictAge,VictSex,VictDescent,PremisDesc,WeaponDesc,crime_category,YEAR,MONTH,DAY,HOUR,TIMEOCC,TIME_OF_DAY,REPORTING_DELAY,crime_type,cleared
10209,Newton,BURGLARY,0,X,X,OTHER BUSINESS,,BURGLARY,2020,4,24,2,220,Early Morning,0,Property,0
309423,Wilshire,THEFT PLAIN - PETTY ($950 & UNDER),66,M,W,"MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC)",,THEFT,2022,1,24,12,1200,Morning,5,Property,1
45260,Foothill,THEFT PLAIN - PETTY ($950 & UNDER),52,F,W,SINGLE FAMILY DWELLING,,THEFT,2020,6,25,16,1600,Afternoon,63,Property,1
329462,Devonshire,THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND ...,56,M,H,STREET,,MOTOR VEHICLE THEFT,2022,4,21,15,1540,Afternoon,2,Property,0
231817,Devonshire,THEFT PLAIN - PETTY ($950 & UNDER),35,M,X,MAIL BOX,,THEFT,2021,7,30,4,429,Early Morning,3,Property,0


In [3]:
# resolve nulls
for col in ['CrmCdDesc','WeaponDesc','PremisDesc','VictSex','VictDescent']:
    model[col] = model[col].fillna('missing')

In [6]:
model.loc[model.TIME_OF_DAY.isnull()]

Unnamed: 0,AREANAME,CrmCdDesc,VictAge,VictSex,VictDescent,PremisDesc,WeaponDesc,crime_category,YEAR,MONTH,DAY,HOUR,TIMEOCC,TIME_OF_DAY,REPORTING_DELAY,crime_type,cleared
4,Mission,SODOMY/SEXUAL CONTACT B/W PENIS OF ONE PERS TO...,8,F,H,SINGLE FAMILY DWELLING,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",RAPE,2020,10,1,0,1,,558,Violent,0
44,77th Street,BURGLARY,72,M,B,SINGLE FAMILY DWELLING,,BURGLARY,2020,8,15,0,1,,926,Property,0
51,Hollenbeck,"RAPE, FORCIBLE",10,F,H,SINGLE FAMILY DWELLING,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",RAPE,2020,4,1,0,5,,751,Violent,0
53,Southeast,ORAL COPULATION,6,F,B,SINGLE FAMILY DWELLING,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",RAPE,2020,1,1,0,1,,1001,Violent,1
56,Harbor,"ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT",26,F,H,ALLEY,UNKNOWN FIREARM,AGGRAVATED ASSAULT,2020,12,1,0,1,,478,Violent,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
549820,Hollywood,HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE,33,F,H,OTHER BUSINESS,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",HUMAN TRAFFICKING,2022,4,1,0,1,,691,Other,0
549822,Southeast,HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE,18,F,B,STREET,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",HUMAN TRAFFICKING,2022,8,14,0,15,,0,Other,0
549832,West LA,HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE,25,M,A,PARKING LOT,VERBAL THREAT,HUMAN TRAFFICKING,2023,12,29,0,25,,0,Other,0
549841,Rampart,HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE,16,M,H,MOTEL,,HUMAN TRAFFICKING,2023,3,11,0,1,,158,Other,0


In [None]:
# export to csv
model.to_csv('../data/crime_data_model.csv', index=False)