## Exploratory Data Analysis
### Business/Domain Understanding
Emergency Responder calls from https://data.cincinnati-oh.gov/

#### Start with why
We're looking at response times in different areas to determine which factors might impact response time. We don't have a hypothesis yet. That's to be developed during this EDA process.

#### Then what
Patterns may indicate time periods or areas where additional EMTs are needed.

In [72]:
import pandas as pd
import numpy as np

In [73]:
fire_df = pd.read_csv("data/fire.csv")

### Data Understanding
#### What data do we have? What type? How complete/clean is it?

In [74]:
fire_df

Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP
0,XX WEST ST,39.147267,-84.511815,CF,06/01/2015 08:06:41 PM,MEDIC DISREGARD,FCF150601000178,6D2,BREATHING PROB-DIFF,AVONDALE,06/01/2015 08:09:43 PM,2331,06/01/2015 08:20:56 PM,06/01/2015 08:06:48 PM,ALS,BREATHING PROBLEMS
1,10XX SYCAMORE ST,39.106060,-84.509984,CF,07/30/2016 01:01:07 AM,MEDIC DISREGARD,FCF160730000013,12D2,SEIZURES-CONTINUOUS,DOWNTOWN,,1247,07/30/2016 01:05:16 AM,07/30/2016 01:01:22 AM,ALS,CONVULSIONS / SEIZURES
2,8XX POPLAR ST,39.114731,-84.528050,CF,10/21/2016 04:33:20 AM,MEDIC TR RESP & TRANSPORTED,FCF161021000029,6D1,BREATHING PROB-NOT A,WEST END,10/21/2016 04:36:25 AM,3267,10/21/2016 05:25:17 AM,10/21/2016 04:33:46 AM,ALS,BREATHING PROBLEMS
3,34XX TRIMBLE AV,39.140692,-84.467465,CF,09/25/2016 09:47:50 PM,MEDIC TR RESP & TRANSPORTED,FCF160925000183,HERONF,HEROIN OD,EVANSTON,09/25/2016 09:52:08 PM,4238,09/25/2016 10:36:30 PM,09/25/2016 09:48:09 PM,ALS,HERION OVERDOSE
4,9XX WILLIAM H TAFT RD,39.127066,-84.489397,CFD,12/25/2017 09:15:20 PM,MEDF: MT RESPONSE - FALSE,CFD171225000153,SICKF-COMBINED,,WALNUT HILLS,12/25/2017 09:20:01 PM,ST23,12/25/2017 09:22:14 PM,12/25/2017 09:17:07 PM,BLS,SICK PERSON
5,12XX CHAPEL ST,39.130968,-84.483490,CFD,11/16/2017 09:48:46 PM,MED: MT RESPONSE NO TRANSPORT,CFD171116000229,EMS,,WALNUT HILLS,11/16/2017 09:51:36 PM,ST23,11/16/2017 10:01:23 PM,11/16/2017 09:49:11 PM,BLS,MEDICAL EMERGENCY
6,18XX BEACON ST,39.086897,-84.380732,CFD,08/06/2017 12:40:35 AM,MED: MT RESPONSE NO TRANSPORT,CFD170806000004,17A4,PUBLIC ASSIST (NO INJURIES AND NO PRIORITY SYM...,MT. WASHINGTON,08/06/2017 12:46:21 AM,ST07,08/06/2017 12:50:52 AM,08/06/2017 12:41:03 AM,BLS,FALLS
7,26XX BUSHNELL ST,39.103393,-84.556751,CF,11/02/2015 09:05:34 PM,MEDIC TR RESP & TRANSPORTED,FCF151102000190,1A1,ABDOMINAL PAIN,EAST PRICE HILL,11/02/2015 09:10:21 PM,7134,11/02/2015 09:45:58 PM,11/02/2015 09:05:45 PM,BLS,ABDOMINAL PAIN / PROBLEMS
8,53XX HAMILTON AV,39.186674,-84.545864,CF,05/03/2015 04:26:28 PM,EXTINGUISHMENT,FCF150503000137,STRUCT,STRUCTURE FIRE,COLLEGE HILL,05/03/2015 04:29:43 PM,3951,05/03/2015 05:02:18 PM,05/03/2015 04:27:13 PM,FIRE,STRUCTURE FIRE
9,7XX DUTCH COLONY DR,39.191573,-84.517068,CF,05/19/2016 12:53:47 AM,EXTINGUISHMENT,FCF160519000008,STRUCT,STRUCTURE FIRE,WINTON HILLS,05/19/2016 01:01:25 AM,1663,05/19/2016 01:30:36 AM,05/19/2016 12:54:24 AM,FIRE,STRUCTURE FIRE


In [75]:
#What are the datatypes?
fire_df.dtypes

ADDRESS_X                      object
LATITUDE_X                    float64
LONGITUDE_X                   float64
AGENCY                         object
CREATE_TIME_INCIDENT           object
DISPOSITION_TEXT               object
EVENT_NUMBER                   object
INCIDENT_TYPE_ID               object
INCIDENT_TYPE_DESC             object
NEIGHBORHOOD                   object
ARRIVAL_TIME_PRIMARY_UNIT      object
BEAT                           object
CLOSED_TIME_INCIDENT           object
DISPATCH_TIME_PRIMARY_UNIT     object
CFD_INCIDENT_TYPE              object
CFD_INCIDENT_TYPE_GROUP        object
dtype: object

## Data Types

### Series
* A series is a one-dimensional structure like an associative array. E.g.

Series
---------------
index | value 
---------------
0     |  -84.5099

1     |  -84.48937

2     |  -84.4432

* You can assign an index to a series. Pandas uses sequential indexes by default
* Operations on numpy arrays also work on Pandas Series

### DataFrame
* Extends a one-dimensional series to multiple dimensions. E.g.

--------------------------------------------
index |   longitude   |  latitude  | agency
--------------------------------------------------
0     |  -84.5099     |   39.1060  |  CF

1     |  -84.48937    |   39.32011 |  CFD

2     |  -84.4432     |   39.2993  |  CF

#### DataFrame indexes
* we have the row index, a sequential integer by default. There is also a column index. 

In [76]:
#Selecting elements
#dataframe_name['column_name'][row_index]
print(fire_df['AGENCY'][3])
# use df_name.loc[row_num] to select all values of an observation
fire_df.loc[35]

CF


ADDRESS_X                             37XX EASTERN HILLS LN
LATITUDE_X                                          39.1467
LONGITUDE_X                                         -84.425
AGENCY                                                   CF
CREATE_TIME_INCIDENT                 08/19/2016 05:13:32 PM
DISPOSITION_TEXT                                  EMS FALSE
EVENT_NUMBER                                FCF160819000191
INCIDENT_TYPE_ID                                       32B2
INCIDENT_TYPE_DESC                      PERSON DOWN-MEDICAL
NEIGHBORHOOD                                         OAKLEY
ARRIVAL_TIME_PRIMARY_UNIT                               NaN
BEAT                                                   8423
CLOSED_TIME_INCIDENT                 08/19/2016 05:18:16 PM
DISPATCH_TIME_PRIMARY_UNIT           08/19/2016 05:13:43 PM
CFD_INCIDENT_TYPE                                       ALS
CFD_INCIDENT_TYPE_GROUP       UNKNOWN PROBLEM (PERSON DOWN)
Name: 35, dtype: object

In [77]:
#Check for completeness
incomp = fire_df[pd.isnull(fire_df).any(axis=1)]
comp = fire_df.dropna()
print(len(comp))
print(len(incomp))
print(len(fire_df))

#Consider which values are missing
incomp

164254
92054
256308


Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP
1,10XX SYCAMORE ST,39.106060,-84.509984,CF,07/30/2016 01:01:07 AM,MEDIC DISREGARD,FCF160730000013,12D2,SEIZURES-CONTINUOUS,DOWNTOWN,,1247,07/30/2016 01:05:16 AM,07/30/2016 01:01:22 AM,ALS,CONVULSIONS / SEIZURES
4,9XX WILLIAM H TAFT RD,39.127066,-84.489397,CFD,12/25/2017 09:15:20 PM,MEDF: MT RESPONSE - FALSE,CFD171225000153,SICKF-COMBINED,,WALNUT HILLS,12/25/2017 09:20:01 PM,ST23,12/25/2017 09:22:14 PM,12/25/2017 09:17:07 PM,BLS,SICK PERSON
5,12XX CHAPEL ST,39.130968,-84.483490,CFD,11/16/2017 09:48:46 PM,MED: MT RESPONSE NO TRANSPORT,CFD171116000229,EMS,,WALNUT HILLS,11/16/2017 09:51:36 PM,ST23,11/16/2017 10:01:23 PM,11/16/2017 09:49:11 PM,BLS,MEDICAL EMERGENCY
10,4XX HICKORY ST,39.143810,-84.498308,CF,03/30/2015 09:41:37 AM,CANCEL INCIDENT,FCF150330000045,INVEST,INVESTIGATION,AVONDALE,,2334,03/30/2015 09:45:11 AM,,FIRE,INVESTIGATION
14,3XX DIXMYTH AV,39.140237,-84.521451,CF,11/01/2016 11:28:19 AM,REMOVE HAZARD,FCF161101000074,LOCK,LOCK IN/LOCK OUT,CUF,,1966,11/01/2016 11:44:11 AM,11/01/2016 11:30:49 AM,FIRE,LOCK IN/LOCK OUT
17,17XX LLANFAIR AV,39.195653,-84.548339,CFD,04/26/2017 07:46:18 AM,IN: INVESTIGATION,CFD170426000042,FALARM,,COLLEGE HILL,04/26/2017 07:50:59 AM,ST51,04/26/2017 07:52:26 AM,04/26/2017 07:47:32 AM,FIRE,AUTOMATIC FIRE ALARM
24,9XX ENRIGHT AV,39.108593,-84.572825,CF,05/22/2015 09:22:10 PM,EMS DISREGARD,FCF150522000187,ACCIF,AUTO ACCIDENT INJURI,EAST PRICE HILL,,7215,05/22/2015 09:32:58 PM,05/22/2015 09:22:36 PM,BLS,ACCIDENT WITH INJURY - FIRE ONLY
30,S I71 AT WH TAFT,,,CF,07/01/2015 03:53:38 PM,MEDIC DISREGARD,FCF150701000131,ACCIF,AUTO ACCIDENT INJURI,,07/01/2015 04:05:17 PM,2625,07/01/2015 04:13:40 PM,07/01/2015 03:55:05 PM,BLS,ACCIDENT WITH INJURY - FIRE ONLY
31,28XX CENTRAL PY,,,CF,11/13/2015 05:24:01 PM,MEDIC TR RESP & TRANSPORTED,FCF151113000182,HERONF,HEROIN OD,,11/13/2015 05:24:29 PM,1461,11/13/2015 06:01:17 PM,11/13/2015 05:24:18 PM,ALS,HERION OVERDOSE
32,7XX N FRED SHUTTLESWORTH CIR,39.154002,-84.487091,CFD,09/20/2017 09:10:02 PM,MED: MT RESPONSE NO TRANSPORT,CFD170920000235,EMS,,AVONDALE,09/20/2017 09:14:14 PM,ST32,09/20/2017 09:19:50 PM,09/20/2017 09:12:15 PM,BLS,MEDICAL EMERGENCY


In [78]:
#check for duplicate entries
fire_df[fire_df.duplicated()]

Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP


In [79]:
# Filter out only those observations that contain crucial missing values
no_nabe = fire_df[pd.isnull(fire_df['NEIGHBORHOOD'])]
print(len(no_nabe))

21658


In [80]:
#What if we want to convert the data type? 
#The date is in format %m/%d/%Y %H:%M:%S %p
#We want it in 24 hour ISO format: %Y-%m-%d %H:%M:%S
fire_df['CREATE_TIME_INCIDENT'] = pd.to_datetime(fire_df['CREATE_TIME_INCIDENT'], format='%m/%d/%Y %I:%M:%S %p')
fire_df['ARRIVAL_TIME_PRIMARY_UNIT'] = pd.to_datetime(fire_df['ARRIVAL_TIME_PRIMARY_UNIT'], format='%m/%d/%Y %I:%M:%S %p')
fire_df['CLOSED_TIME_INCIDENT'] = pd.to_datetime(fire_df['CLOSED_TIME_INCIDENT'], format='%m/%d/%Y %I:%M:%S %p')
fire_df['DISPATCH_TIME_PRIMARY_UNIT'] = pd.to_datetime(fire_df['DISPATCH_TIME_PRIMARY_UNIT'], format='%m/%d/%Y %I:%M:%S %p')
fire_df.dtypes

ADDRESS_X                             object
LATITUDE_X                           float64
LONGITUDE_X                          float64
AGENCY                                object
CREATE_TIME_INCIDENT          datetime64[ns]
DISPOSITION_TEXT                      object
EVENT_NUMBER                          object
INCIDENT_TYPE_ID                      object
INCIDENT_TYPE_DESC                    object
NEIGHBORHOOD                          object
ARRIVAL_TIME_PRIMARY_UNIT     datetime64[ns]
BEAT                                  object
CLOSED_TIME_INCIDENT          datetime64[ns]
DISPATCH_TIME_PRIMARY_UNIT    datetime64[ns]
CFD_INCIDENT_TYPE                     object
CFD_INCIDENT_TYPE_GROUP               object
dtype: object

In [81]:
#Cool. What was the arrival time?
import datetime

TTA = pd.to_timedelta(fire_df['ARRIVAL_TIME_PRIMARY_UNIT'] - fire_df['DISPATCH_TIME_PRIMARY_UNIT'])
TTC = pd.to_timedelta(fire_df['CLOSED_TIME_INCIDENT'] - fire_df['CREATE_TIME_INCIDENT'])


In [82]:
## What if we want the average response time, or the distribution of response time? 
## We'll want a consistent unit of measurement.
## Try converting it to minutes:
print(TTA.dt.seconds / 60)
#print(TTC)

0          2.916667
1               NaN
2          2.650000
3          3.983333
4          2.900000
5          2.416667
6          5.300000
7          4.600000
8          2.500000
9          7.016667
10              NaN
11         5.133333
12         5.166667
13         7.250000
14              NaN
15         4.450000
16         4.883333
17         3.450000
18         3.333333
19         7.800000
20         7.183333
21         4.916667
22         9.400000
23        26.383333
24              NaN
25         9.433333
26         9.216667
27         5.400000
28         2.916667
29         4.300000
            ...    
256278     4.133333
256279    12.150000
256280     7.133333
256281     4.716667
256282          NaN
256283     5.300000
256284     0.050000
256285    18.350000
256286          NaN
256287     5.316667
256288     6.333333
256289     4.133333
256290     4.466667
256291     7.366667
256292     3.800000
256293     5.266667
256294     3.100000
256295    18.500000
256296          NaN


In [83]:
pd.unique(fire_df['INCIDENT_TYPE_DESC'])

array(['BREATHING PROB-DIFF', 'SEIZURES-CONTINUOUS',
       'BREATHING PROB-NOT A', 'HEROIN OD', nan,
       'PUBLIC ASSIST (NO INJURIES AND NO PRIORITY SYMPTOMS)',
       'ABDOMINAL PAIN', 'STRUCTURE FIRE', 'INVESTIGATION',
       'SICK PERSON-ABNORMAL', 'LOCK IN/LOCK OUT', 'SICK PERSON-VOMITING',
       'SICK PERSON/NO PRIOR', 'HEADACHE-NUMBNESS',
       'ALTERED LEVEL OF CONSCIOUSNESS', 'MEDICAL EMERGENCY **',
       'UNCONSCIOUS', 'ABDOM PAIN-FM W/PAIN', 'AUTO ACCIDENT INJURI',
       'CARDIAC/RESPIRATORY/', 'DIFFICULTY SPEAKING BETWEEN BREATHS',
       'UNKNOWN STATUS/OTHER CODES NOT APPLICABLE', 'FALARM',
       'SICK PERSON-SICKLE C', 'PERSON DOWN-MEDICAL', 'WIRES DOWN',
       'OTHER PAIN (NON-OMEGA-LEVEL)', 'PERSON DOWN AND OUT',
       'HEART PROBLEM-DIFFIC', 'FALLS - PUBLIC ASSIS',
       'ABDOMINAL PAIN/CRAMPING (< 6 MONTHS/24 WEEKS AND NO FETUS OR TISSUE)',
       'SICK PERSON-DIZZINES', 'SICK PERSON-NOT ALER', 'NOT ALERT',
       'SEVERE EYE INJURIES',
       'NO PRIORITY

### Selecting rows with booleans
What if we only want a subset of the data? If we just want a radius around UC, which is at 39.1329219,-84.51495039999998.


In [84]:
near_uc = fire_df.loc[(fire_df['LATITUDE_X'] > 39.1329) & (fire_df['LATITUDE_X'] < 39.19)] 

In [85]:
near_uc

Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP
0,XX WEST ST,39.147267,-84.511815,CF,2015-06-01 20:06:41,MEDIC DISREGARD,FCF150601000178,6D2,BREATHING PROB-DIFF,AVONDALE,2015-06-01 20:09:43,2331,2015-06-01 20:20:56,2015-06-01 20:06:48,ALS,BREATHING PROBLEMS
3,34XX TRIMBLE AV,39.140692,-84.467465,CF,2016-09-25 21:47:50,MEDIC TR RESP & TRANSPORTED,FCF160925000183,HERONF,HEROIN OD,EVANSTON,2016-09-25 21:52:08,4238,2016-09-25 22:36:30,2016-09-25 21:48:09,ALS,HERION OVERDOSE
8,53XX HAMILTON AV,39.186674,-84.545864,CF,2015-05-03 16:26:28,EXTINGUISHMENT,FCF150503000137,STRUCT,STRUCTURE FIRE,COLLEGE HILL,2015-05-03 16:29:43,3951,2015-05-03 17:02:18,2015-05-03 16:27:13,FIRE,STRUCTURE FIRE
10,4XX HICKORY ST,39.143810,-84.498308,CF,2015-03-30 09:41:37,CANCEL INCIDENT,FCF150330000045,INVEST,INVESTIGATION,AVONDALE,NaT,2334,2015-03-30 09:45:11,NaT,FIRE,INVESTIGATION
11,55XX CHANDLER ST,39.164254,-84.396943,CF,2015-11-20 20:36:04,MEDIC TR RESP & TRANSPORTED,FCF151120000188,6D2,BREATHING PROB-DIFF,MADISONVILLE,2015-11-20 20:42:04,8371,2015-11-20 21:35:57,2015-11-20 20:36:56,ALS,BREATHING PROBLEMS
12,30XX COSTELLO AV,39.141190,-84.570943,CFD,2017-10-08 08:53:13,MEDT: MEDIC TRANSPORT,CFD171008000063,17A4,PUBLIC ASSIST (NO INJURIES AND NO PRIORITY SYM...,EAST WESTWOOD,2017-10-08 09:00:06,ST35,2017-10-08 09:49:53,2017-10-08 08:54:56,BLS,FALLS
14,3XX DIXMYTH AV,39.140237,-84.521451,CF,2016-11-01 11:28:19,REMOVE HAZARD,FCF161101000074,LOCK,LOCK IN/LOCK OUT,CUF,NaT,1966,2016-11-01 11:44:11,2016-11-01 11:30:49,FIRE,LOCK IN/LOCK OUT
20,27XX QUEENSWOOD DR,39.133209,-84.583048,CFD,2017-10-31 11:36:06,MEDT: MEDIC TRANSPORT,CFD171031000111,26C1,ALTERED LEVEL OF CONSCIOUSNESS,WESTWOOD,2017-10-31 11:43:34,ST35,2017-10-31 12:28:37,2017-10-31 11:36:23,ALS,SICK PERSON
26,48XX ESTE AV,39.180517,-84.509655,CF,2016-10-04 00:58:16,MEDIC TR RESP & TRANSPORTED,FCF161004000013,1A1,ABDOMINAL PAIN,WINTON HILLS,2016-10-04 01:07:43,1657,2016-10-04 01:47:15,2016-10-04 00:58:30,BLS,ABDOMINAL PAIN / PROBLEMS
32,7XX N FRED SHUTTLESWORTH CIR,39.154002,-84.487091,CFD,2017-09-20 21:10:02,MED: MT RESPONSE NO TRANSPORT,CFD170920000235,EMS,,AVONDALE,2017-09-20 21:14:14,ST32,2017-09-20 21:19:50,2017-09-20 21:12:15,BLS,MEDICAL EMERGENCY


In [86]:
cuf = fire_df.loc[fire_df['NEIGHBORHOOD'] == 'CUF']

In [87]:
cuf

Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP
14,3XX DIXMYTH AV,39.140237,-84.521451,CF,2016-11-01 11:28:19,REMOVE HAZARD,FCF161101000074,LOCK,LOCK IN/LOCK OUT,CUF,NaT,1966,2016-11-01 11:44:11,2016-11-01 11:30:49,FIRE,LOCK IN/LOCK OUT
51,2XX CALHOUN ST,39.127472,-84.515784,CFD,2017-05-13 06:09:36,MEDD: MT DISREGARDED,CFD170513000047,16B1,SEVERE EYE INJURIES,CUF,NaT,ST19,2017-05-13 06:39:18,2017-05-13 06:10:58,ALS,EYE PROBLEMS / INJURIES
59,7XX STRAIGHT ST,39.131341,-84.529589,CF,2017-01-18 22:03:41,TREATED BY COMPANY/NO TRNSPORT,FCF170118000211,30A1,TRAUMATIC INJURY-ANK,CUF,NaT,1441,2017-01-18 22:21:51,2017-01-18 22:04:58,BLS,TRAUMATIC INJURIES (SPECIFIC)
82,5XX W MCMILLAN ST,39.128316,-84.527813,CFD,2017-11-12 05:35:18,IN: INVESTIGATION,CFD171112000028,WIRES,,CUF,2017-11-12 05:42:12,ST12,2017-11-12 06:50:48,2017-11-12 05:35:48,FIRE,WIRES DOWN/ARCING
93,XX W MCMILLAN ST,39.127809,-84.512208,CF,2017-01-23 13:04:34,MEDIC TR RESP- NO TRANSPORT,FCF170123000120,32B2,PERSON DOWN-MEDICAL,CUF,2017-01-23 13:08:09,1949,2017-01-23 13:12:57,2017-01-23 13:04:52,ALS,UNKNOWN PROBLEM (PERSON DOWN)
118,25XX CLIFTON AV,39.129378,-84.518062,CF,2015-10-12 01:01:34,MEDIC TR RESP & TRANSPORTED,FCF151012000009,10D4,CHEST PAIN-CLAMMY,CUF,2015-10-12 01:06:01,1956,2015-10-12 01:38:12,2015-10-12 01:01:40,ALS,CHEST PAIN / CHEST DISCOMFORT (NON-TRAUMATIC)
129,STRATFORD AV/WARNER ST,39.125772,-84.522842,CF,2015-07-20 13:45:44,CANCEL INCIDENT,FCF150720000106,INFOF,TELETYPE MESSAGE,CUF,NaT,1423,2015-07-20 14:05:59,NaT,OTHE,INFORMATION TELETYPE-NO DISPATCH
174,23XX FLORA ST,39.127369,-84.523481,CF,2017-01-13 17:49:14,MEDIC TR RESP & TRANSPORTED,FCF170113000174,6D2,BREATHING PROB-DIFF,CUF,2017-01-13 17:54:14,1423,2017-01-13 18:53:42,2017-01-13 17:49:47,ALS,BREATHING PROBLEMS
187,CLIFTON AV/STRAIGHT ST,39.130599,-84.521371,CF,2015-05-14 22:25:46,CANCEL INCIDENT,FCF150514000191,INFOF,TELETYPE MESSAGE,CUF,NaT,1444,2015-05-14 22:29:42,NaT,OTHE,INFORMATION TELETYPE-NO DISPATCH
199,6XX PROBASCO ST,39.134660,-84.528164,CFD,2017-09-07 18:33:53,MEDT: MEDIC TRANSPORT,CFD170907000168,17B1,POSSIBLY DANGEROUS BODY AREA,CUF,2017-09-07 18:38:12,ST12,2017-09-07 19:28:54,2017-09-07 18:34:18,ALS,FALLS


In [88]:
cuf_grouped = cuf.groupby('INCIDENT_TYPE_DESC')

In [89]:
cuf_grouped.describe()

Unnamed: 0_level_0,LATITUDE_X,LATITUDE_X,LATITUDE_X,LATITUDE_X,LATITUDE_X,LATITUDE_X,LATITUDE_X,LATITUDE_X,LONGITUDE_X,LONGITUDE_X,LONGITUDE_X,LONGITUDE_X,LONGITUDE_X,LONGITUDE_X,LONGITUDE_X,LONGITUDE_X
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
INCIDENT_TYPE_DESC,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
2ND TRIMESTER HEMORRHAGE OR MISCARRIAGE,3.0,39.123477,0.004271,39.118735,39.121705,39.124675,39.125848,39.127021,3.0,-84.522244,0.007684,-84.531049,-84.524918,-84.518788,-84.517842,-84.516895
ABDOM PAIN FM-W/FAIN,7.0,39.130976,0.004480,39.126324,39.127926,39.129930,39.132652,39.139425,7.0,-84.526379,0.004788,-84.532225,-84.529424,-84.527214,-84.523437,-84.519492
ABDOM PAIN-FAINTING>,2.0,39.126569,0.011061,39.118748,39.122658,39.126569,39.130479,39.134390,2.0,-84.522070,0.007403,-84.527304,-84.524687,-84.522070,-84.519452,-84.516835
ABDOM PAIN-MALE W/PA,1.0,39.129261,,39.129261,39.129261,39.129261,39.129261,39.129261,1.0,-84.525559,,-84.525559,-84.525559,-84.525559,-84.525559,-84.525559
ABDOMINAL PAIN,61.0,39.130644,0.005354,39.117815,39.127739,39.130226,39.133787,39.143195,61.0,-84.520512,0.005677,-84.531442,-84.523033,-84.520887,-84.516990,-84.509939
ABDOMINAL PAIN-NOT A,4.0,39.130229,0.004626,39.124919,39.128353,39.129905,39.131781,39.136185,4.0,-84.522079,0.005387,-84.526302,-84.525795,-84.523728,-84.520011,-84.514557
ABDOMINAL PAIN/CRAMPING (< 6 MONTHS/24 WEEKS AND NO FETUS OR TISSUE),1.0,39.123366,,39.123366,39.123366,39.123366,39.123366,39.123366,1.0,-84.525077,,-84.525077,-84.525077,-84.525077,-84.525077,-84.525077
ABNORMAL BEHAVIOR,4.0,39.131652,0.001142,39.130004,39.131384,39.131999,39.132267,39.132604,4.0,-84.526309,0.002501,-84.528238,-84.527788,-84.527158,-84.525679,-84.522684
ABNORMAL BREATHING,99.0,39.130782,0.005285,39.119429,39.127144,39.130611,39.134185,39.144761,99.0,-84.522977,0.005878,-84.533491,-84.528061,-84.522690,-84.518769,-84.511776
ABNORMAL BREATHING (PARTIAL OBSTRUCTION),3.0,39.131029,0.003565,39.128609,39.128983,39.129356,39.132240,39.135123,3.0,-84.524351,0.006933,-84.532194,-84.527006,-84.521817,-84.520429,-84.519041


## Data Preparation
* Build the dataset(s) that will be used in our analysis

In [98]:
## add the derived variables to the main dataframe
TTA_int = TTA / np.timedelta64(1, 'm')
TTA_int
fire_df['TTA'] = TTA_int

In [99]:
TTC_int = TTC / np.timedelta64(1, 'm')
TTC_int
fire_df['TTC'] = TTC_int

In [100]:
fire_df.dtypes

ADDRESS_X                             object
LATITUDE_X                           float64
LONGITUDE_X                          float64
AGENCY                                object
CREATE_TIME_INCIDENT          datetime64[ns]
DISPOSITION_TEXT                      object
EVENT_NUMBER                          object
INCIDENT_TYPE_ID                      object
INCIDENT_TYPE_DESC                    object
NEIGHBORHOOD                          object
ARRIVAL_TIME_PRIMARY_UNIT     datetime64[ns]
BEAT                                  object
CLOSED_TIME_INCIDENT          datetime64[ns]
DISPATCH_TIME_PRIMARY_UNIT    datetime64[ns]
CFD_INCIDENT_TYPE                     object
CFD_INCIDENT_TYPE_GROUP               object
TTA                                  float64
TTC                                  float64
dtype: object

In [101]:
# use the describe() method to get a quick overview of the data
fire_df.describe()

Unnamed: 0,LATITUDE_X,LONGITUDE_X,TTA,TTC
count,232824.0,232824.0,214186.0,255972.0
mean,39.138725,-84.514307,-1139.377,-473.4132
std,0.371057,0.377967,40413.9,27060.02
min,-86.998941,-84.821111,-1448638.0,-1447193.0
25%,39.113487,-84.557758,3.333333,13.88333
50%,39.135,-84.515427,4.5,28.05
75%,39.158384,-84.484793,5.966667,50.73333
max,39.435651,42.998769,736.5333,132280.7


In [102]:
fire_df['TTA'].mean()

-1139.3768217032614

In [122]:
#huhwhat? Let's see where the negative response time is coming from
outliers = (fire_df[fire_df['TTA'] <= 0])
outliers2 = (fire_df[fire_df['TTC'] <= 0])
outliers2

Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP,TTA,TTC
71,13XX CHAPEL ST,39.131566,-84.478935,CFD,2017-05-10 06:33:54,CN: CANCEL,CFD170510000025,,,WALNUT HILLS,NaT,ST23,2017-05-10 06:33:54,NaT,,,,0.000000e+00
90,ARLINGTON ST / SPRING GROVE AV,39.144929,-84.541571,CFD,2017-07-02 02:38:30,CN: CANCEL,CFD170702000034,,,CAMP WASHINGTON,NaT,ST12,2017-07-02 02:38:30,NaT,,,,0.000000e+00
188,9XX MCPHERSON AV,39.110022,-84.570996,CFD,2017-07-01 00:52:51,CN: CANCEL,CFD170701000007,,,EAST PRICE HILL,NaT,ST17,2017-07-01 00:52:51,NaT,,,,0.000000e+00
311,6XX DUTCH COLONY DR,39.190582,-84.512866,CFD,2017-07-01 15:10:54,FADV: FIRE ADVISED,CFD170701000148,EMS,,WINTON HILLS,NaT,ST38,2017-07-01 15:10:54,NaT,BLS,MEDICAL EMERGENCY,,0.000000e+00
353,68XX WHITEHALL AV,39.074477,-84.371299,CFD,2017-06-03 12:00:33,CN: CANCEL,CFD170603000078,EMS,,MT. WASHINGTON,NaT,ST07,2017-06-03 12:00:33,NaT,BLS,MEDICAL EMERGENCY,,0.000000e+00
546,S I75 AT 2.8,39.128717,-84.534288,CFD,2018-01-18 07:20:34,CN: CANCEL,CFD180118000045,ACCI-COMBINED,,CAMP WASHINGTON,NaT,ST12,2018-01-18 07:20:34,NaT,BLS,ACCIDENT WITH INJURY,,0.000000e+00
626,20XX RADCLIFF DR,39.119547,-84.547696,CF,2016-07-17 08:32:00,FIRE ADVISED,FCF160717000070,FADV,FIRE ADVISED RUN,EAST PRICE HILL,NaT,5142,2016-07-17 08:32:00,NaT,OTHE,FIRE ADVISED - NO DISPATCH,,0.000000e+00
721,MOHAWK ST / STONEWALL ST,39.120039,-84.523892,CFD,2017-05-26 22:10:25,CN: CANCEL,CFD170526000229,,,OVER-THE-RHINE,NaT,ST05,2017-05-26 22:10:25,NaT,,,,0.000000e+00
861,1XX E 12TH ST,39.108604,-84.511834,CF,2015-02-25 12:33:27,FIRE ADVISED,FCF150225000074,FADV,FIRE ADVISED RUN,OVER-THE-RHINE,NaT,1267,2015-02-25 12:33:27,NaT,OTHE,FIRE ADVISED - NO DISPATCH,,0.000000e+00
915,15XX HOPPLE ST,,,CF,2016-03-10 17:56:24,FIRE ADVISED,FCF160310000162,FADV,FIRE ADVISED RUN,,NaT,3332,2016-03-10 17:56:24,NaT,OTHE,FIRE ADVISED - NO DISPATCH,,0.000000e+00


In [126]:
# So no arrival time was recorded, leaving us with huge negative numbers that skew the mean. Let's drop those.
# We also have many cases of NaN, so we'll remove those as well
print(len(fire_df))

clean_df = fire_df[fire_df['TTA'] > 0]
clean_df = clean_df[clean_df['TTC'] > 0]

print(len(clean_df))

256308
211773


In [127]:
clean_df.describe()

Unnamed: 0,LATITUDE_X,LONGITUDE_X,TTA,TTC
count,193033.0,193033.0,211773.0,211773.0
mean,39.140408,-84.51531,5.690749,41.015419
std,0.031858,0.057082,19.481538,311.709463
min,39.037702,-84.793474,0.016667,0.033333
25%,39.113778,-84.559152,3.366667,16.866667
50%,39.135413,-84.51551,4.516667,33.783333
75%,39.159464,-84.483728,5.983333,52.55
max,39.28921,-84.218196,736.533333,132280.733333


In [129]:
clean_df[clean_df['TTA'] > clean_df['TTC']]

Unnamed: 0,ADDRESS_X,LATITUDE_X,LONGITUDE_X,AGENCY,CREATE_TIME_INCIDENT,DISPOSITION_TEXT,EVENT_NUMBER,INCIDENT_TYPE_ID,INCIDENT_TYPE_DESC,NEIGHBORHOOD,ARRIVAL_TIME_PRIMARY_UNIT,BEAT,CLOSED_TIME_INCIDENT,DISPATCH_TIME_PRIMARY_UNIT,CFD_INCIDENT_TYPE,CFD_INCIDENT_TYPE_GROUP,TTA,TTC
5891,4XX RIDDLE RD,39.135592,-84.523027,CFD,2018-01-20 01:19:14,MEDT: MEDIC TRANSPORT,CFD180120000007,6D1,NOT ALERT,CUF,2018-01-20 00:52:01,ST34,2018-01-20 01:20:48,2018-01-20 00:48:11,ALS,BREATHING PROBLEMS,3.833333,1.566667
10530,3XX TERRACE AV,39.142508,-84.521764,CFD,2017-02-10 00:58:51,MEDT: MEDIC TRANSPORT,CFD170210000011,5A1,NON-TRAUMATIC BACK PAIN,CLIFTON,2017-02-10 13:02:24,ST34,2017-02-10 01:31:19,2017-02-10 00:59:58,BLS,BACK PAIN (NON-TRAUMATIC OR NON-RECENT TRAUMA),722.433333,32.466667
11809,2XX FOSDICK ST,39.131256,-84.506018,CFD,2018-01-11 14:02:26,FALA: FIRE FALSE ACCIDENTAL,CFD180111000148,FALARM,,CORRYVILLE,2018-01-11 14:08:12,ST19,2018-01-11 14:07:12,2018-01-11 14:02:43,FIRE,AUTOMATIC FIRE ALARM,5.483333,4.766667
13396,S I75 AT 3.4,39.138216,-84.534655,CFD,2017-09-12 07:16:19,GI: GOOD INTENT,CFD170912000053,VEH,,CAMP WASHINGTON,2017-09-12 07:24:47,ST12,2017-09-12 07:24:02,2017-09-12 07:16:56,FIRE,VEHICAL FIRE,7.850000,7.716667
14597,12XX ROSSMORE AV,39.178567,-84.476464,CFD,2017-01-26 20:27:16,MEDT: MEDIC TRANSPORT,CFD170126000210,6D2E,DIFFICULTY SPEAKING BETWEEN BREATHS,BOND HILL,2017-01-27 08:34:55,ST09,2017-01-26 21:39:07,2017-01-26 20:28:15,ALS,BREATHING PROBLEMS,726.666667,71.850000
15089,13XX TEAKWOOD AV,39.207020,-84.541768,CFD,2017-11-06 08:14:46,IN: INVESTIGATION,CFD171106000009,COLAPS-COMBINED,,COLLEGE HILL,2017-11-06 00:33:45,ST51,2017-11-06 08:19:04,2017-11-06 00:26:48,FIRE,STRUCTURAL OR TRENCH COLLAPSE,6.950000,4.300000
15596,7XX GRAND AV,39.104599,-84.559817,CFD,2017-01-24 13:07:33,CN: CANCEL,CFD170124000126,26A8,OTHER PAIN (NON-OMEGA-LEVEL),EAST PRICE HILL,2017-01-24 14:22:16,ST17,2017-01-24 14:20:06,2017-01-24 13:08:53,BLS,SICK PERSON,73.383333,72.550000
15660,MOHAWK PL / CENTRAL PKWY,39.120430,-84.524989,CFD,2017-12-02 16:44:33,EMSF: FALSE,CFD171202000150,PERDWN-COMBINED,,OVER-THE-RHINE,2017-12-02 16:47:17,ST29,2017-12-02 16:45:43,2017-12-02 16:44:53,BLS,PERSON DOWN,2.400000,1.166667
18764,ASHTREE DR / HAMILTON AV,39.178331,-84.544728,CFD,2017-10-15 15:28:35,MED: MT RESPONSE NO TRANSPORT,CFD171015000147,29B1,INJURIES,COLLEGE HILL,2017-10-15 15:30:19,ST20,2017-10-15 15:29:20,2017-10-15 15:28:42,ALS,TRAFFIC / TRANSPORTATION INCIDENTS,1.616667,0.750000
20207,84XX VINE ST,39.217031,-84.473460,CFD,2017-05-12 16:47:04,EMSD: DISREGARD,CFD170512000141,PERDWN-COMBINED,,HARTWELL,2017-05-12 16:55:20,ST02,2017-05-12 16:56:37,2017-05-12 16:44:10,BLS,PERSON DOWN,11.166667,9.550000


In [130]:
clean_df = clean_df[clean_df['TTA'] < clean_df['TTC']]

In [131]:
len(clean_df)

211542

In [133]:
clean_df.describe()

Unnamed: 0,LATITUDE_X,LONGITUDE_X,TTA,TTC
count,192802.0,192802.0,211542.0,211542.0
mean,39.1404,-84.515316,5.339079,41.03029
std,0.031862,0.057096,11.495314,311.877592
min,39.037702,-84.793474,0.016667,0.233333
25%,39.113772,-84.55917,3.366667,16.883333
50%,39.135394,-84.515514,4.516667,33.783333
75%,39.159463,-84.483758,5.983333,52.55
max,39.28921,-84.218196,731.916667,132280.733333


## Exploratory Visualization

### Looking for patterns and outliers|
* Matplotlib is simple and easy to get started with
* Not as adept at presentation and storytelling as Tableau
* Designed to offer a similar experience to MatLab