## Objective

Since what I'm most interested in is which factors are most associated with whether an issue is unresolved, random forests seems like the most appropriate model choice, since I can look at the variable importance plot.

Logistic regression is a close second; random forests usually have better performance.

Random forests can also do a better job at getting signal out of raw lat and long. For logistic regression, there would need to be a linear relationship between lat and long and the log-odds of the issue being unresolved, which is a taller order.

In [1]:
import pandas as pd

In [5]:
df = pd.read_pickle('../data/data_w_transformed_census_and_removed_invalid_rows.pkl')
df.shape

(843352, 48)

In [3]:
df.head(1).T

Unnamed: 0,0
CASE_ENQUIRY_ID,101000958209
OPEN_DT,2013-11-01 09:27:19
TARGET_DT,2013-11-15 09:27:19
CLOSED_DT,2013-11-27 10:15:45
OnTime_Status,OVERDUE
CASE_STATUS,Closed
CLOSURE_REASON,Case Closed Case Resolved
CASE_TITLE,Sign Repair
SUBJECT,Transportation - Traffic Division
REASON,Signs & Signals


## Preprocessing

In [6]:
df['is_issue_unresolved'] = df.CLOSED_DT.isnull()

In [11]:
aa = df.head().copy()

In [20]:
pd.merge(df[df.OnTime_Status == 'OVERDUE'], df[df.CLOSED_DT < df.TARGET_DT])

Unnamed: 0,CASE_ENQUIRY_ID,OPEN_DT,TARGET_DT,CLOSED_DT,OnTime_Status,CASE_STATUS,CLOSURE_REASON,CASE_TITLE,SUBJECT,REASON,...,poverty_pop_w_food_stamps,poverty_pop_w_ssi,COMPLETION_TIME,school,housing,bedroom,value,rent,income,is_issue_unresolved
0,101001716664,2016-02-14 10:43:00,2016-02-15 10:43:55,2016-02-15 01:28:16,OVERDUE,Closed,Case Closed. Closed date : 2016-02-15 13:28:16...,Traffic Signal Inspection,Transportation - Traffic Division,Signs & Signals,...,0.076772,0.047244,14.754444,20_bachelors,rent,1,1250000.0,1750,112500,False
1,101000733890,2013-01-09 10:50:30,2013-01-11 10:50:30,2013-01-11 02:21:18,OVERDUE,Closed,Case Closed Case Resolved Reset 4 Bricks,Sidewalk Repair/Red Brick/Dist 7,Public Works Department,Highway Maintenance,...,0.413793,0.066502,39.513333,18_some_college_no_degree,rent,3,187500.0,1750,5001,False
2,101001712006,2016-02-08 09:09:00,2016-02-09 09:09:46,2016-02-09 01:14:26,OVERDUE,Closed,Case Closed. Closed date : 2016-02-09 13:14:26...,Space Savers,Public Works Department,Sanitation,...,0.475962,0.143269,16.090556,15_hs_diploma,rent,1,187500.0,325,12500,False
3,101000482095,2012-09-20 11:40:42,2012-09-24 11:40:41,2012-09-24 01:09:44,OVERDUE,Closed,Case Closed Case Resolved Location has been ma...,Sidewalk Repair (Make Safe),Public Works Department,Highway Maintenance,...,0.023055,0.000000,85.483889,15_hs_diploma,rent,2,350000.0,1125,67500,False
4,101001751656,2016-03-22 06:10:10,2016-03-24 08:30:00,2016-03-24 02:38:23,OVERDUE,Closed,Case Closed. Closed date : 2016-03-24 14:38:23...,Sidewalk Repair (Make Safe),Public Works Department,Highway Maintenance,...,0.269341,0.071633,44.470278,15_hs_diploma,own,2,350000.0,1375,87500,False
5,101001730826,2016-03-06 04:13:00,2016-03-14 08:30:00,2016-03-14 04:05:55,OVERDUE,Closed,Case Closed. Closed date : 2016-03-14 16:05:55...,Ground Maintenance: Union Park,Parks & Recreation Department,Park Maintenance & Safety,...,0.148201,0.056115,191.881944,20_bachelors,own,2,625000.0,1750,250000,False
6,101001730441,2016-03-05 11:44:07,2016-03-06 11:44:06,2016-03-06 03:04:28,OVERDUE,Closed,Case Closed. Closed date : 2016-03-06 15:04:28...,Traffic Signal Inspection,Transportation - Traffic Division,Signs & Signals,...,0.076772,0.047244,15.339167,20_bachelors,rent,1,1250000.0,1750,112500,False
7,101000499124,2012-10-29 05:17:37,2012-10-31 08:30:00,2012-10-31 04:54:14,OVERDUE,Closed,Case Closed Case Resolved Tree debris removed...,Tree Emergencies,Parks & Recreation Department,Trees,...,0.057002,0.016186,47.610278,20_bachelors,rent,1,625000.0,2750,250000,False
8,101000754019,2013-02-09 03:28:23,2013-02-13 08:30:00,2013-02-13 01:41:30,OVERDUE,Closed,Case Closed Case Resolved,Request for Snow Plowing,Public Works Department,Street Cleaning,...,0.044619,0.026247,94.218611,15_hs_diploma,own,4,350000.0,575,87500,False
9,101001415363,2015-06-23 08:03:40,2015-07-14 08:30:00,2015-07-14 01:12:21,OVERDUE,Closed,Case Closed. Closed date : 2015-07-14 13:12:21...,Request for Recycling Cart,Public Works Department,Recycling,...,0.056977,0.046512,497.144722,20_bachelors,own,2,625000.0,1750,250000,False


In [None]:
df.drop(, axis=1)