# Map and Categorize variables for
# Matched DMV Crash and Hospital data (MV104/SPARCS)

data provided by dohmh

This checks that mapping script (databuild.py) is accurate for matched data and shows counts of all variable categories.

### Variables given - from crash data
date/time of crash
* date - date of crash

person injured
* f_per_age
* f_per_sex - male, female
* f_per_role (f_per_role_doh) - driver, passenger, pedestrian, bicyclist, motorcyclist (dohmh was able to extract motorcylists), unknown
* f_per_ejected - ejected, not ejected, unknown
* f_per_loc - at intersection, not at intersection, unknown (majority are unknown)

injury information
* f_inj_status_num - 1-6, unknown
* f_inj_status - conscious stated, not conscious states, death
* f_inj_type - 14 types, unknown
* f_inj_loc - 12 locations, unknown

road information
* f_road_light - 5 types, unknown
* f_road_surf_bi - dry, not dry unknown
* f_road_surf - dry, flooded, muddy, slush, snow/ice, wet, unknown (almost all dry)
* f_road_weather - clear, cloudy, rain, snow, sleet, fog, other, unknown
* f_road_control - none, traffic signal, stop sign, other, unknown

vehicle information
* f_act_veh - what the vehicle the person was in was doing - need to narrow donw
* f_veh - car, suburban, pickup, van, truck, pedestrian, bicyclist, unknown
* f_veh_doh - car/van/pickup, truck, taxi, bus, other(bike, motorcycle, ped)
* f_oveh_doh - same as above. but with two more unknown vars 6 and 7


In [1]:
import pandas as pd
pd.options.display.max_rows = 130
pd.options.display.max_columns = 130

import numpy as np

import matplotlib.pyplot as plt
% matplotlib inline
plt.style.use('seaborn-poster')
plt.style.use('ggplot')

import databuild as db
import sys
sys.path.insert(0,'/home/deena/Documents/data_munge/ModaCode/')
import moda

  from pandas.core import datetools


In [2]:
# read in DMV data into 3 tables
crash,ind,veh = db.readDMV()
# pedestrians, bicyclists, single vehicle
ped, twoVeh = db.buildTablesDMV(crash,ind,veh)

#read in DMV-SPARCS linked data
linked = db.readLinked()

# included biss data from linked (dropping anything not in linked)
ped = db.mergeBiss(ped,linked)
twoVeh = db.mergeBiss(twoVeh,linked)

#format and categorize variables
ped = db.formatVars(ped)
twoVeh = db.formatVars(twoVeh)

print 'pedestrians in linked crashes',ped.shape
print 'people in linked two-veh crashes',twoVeh.shape


The minimum supported version is 2.4.6



full crash table (522108, 26)
full person table (1502797, 22)
full vehicle table (1092922, 20)
two veh (651501, 75)


  if self.run_code(code, result):


pedestrians/bicyclists (police reported) (single vehicle) (95292, 80)
linked (76763, 131)
linked after dropping no police reports (69657, 131)
pedestrians in linked crashes (17624, 105)
people in linked two-veh crashes (36907, 97)


## variables
all transformed variables prefixed with f

### injury level: f_inj
### injury type

In [3]:
ped.fillna('-').groupby(['f_inj_type','INJT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_inj_type,INJT_ID,Unnamed: 2_level_1
Abrasion,11,1171
Amputation,1,27
Complaint of Pain,12,8983
Concusion,2,263
Contusion-Bruise,10,1421
Fracture-Dislocation,9,846
Internal,3,269
Minor Bleeding,4,1559
Minor Burn,6,57
Moderate/Severe Burn,7,21


In [4]:
twoVeh.fillna('-').groupby(['f_inj_type','INJT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_inj_type,INJT_ID,Unnamed: 2_level_1
Abrasion,11,558
Amputation,1,30
Complaint of Pain,12,20125
Concusion,2,219
Contusion-Bruise,10,735
Fracture-Dislocation,9,344
Internal,3,325
Minor Bleeding,4,1169
Minor Burn,6,128
Moderate/Severe Burn,7,26


### injury emotional state

In [5]:
ped.fillna('-').groupby(['f_inj_status','EMTNSTATT_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_inj_status,EMTNSTATT_CDE,Unnamed: 2_level_1
conscious states,5,480
conscious states,6,15587
death,1,68
not conscious states,2,285
not conscious states,3,410
not conscious states,4,172
unknown,-1,612
unknown,-2,1
unknown,-3,9


In [6]:
twoVeh.fillna('-').groupby(['f_inj_status','EMTNSTATT_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_inj_status,EMTNSTATT_CDE,Unnamed: 2_level_1
conscious states,5,634
conscious states,6,30826
death,1,16
not conscious states,2,143
not conscious states,3,211
not conscious states,4,110
unknown,-1,1058
unknown,-2,6
unknown,-3,3903


### injury location

In [7]:
ped.fillna('-').groupby(['f_inj_loc','INJLOCT_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_inj_loc,INJLOCT_CDE,Unnamed: 2_level_1
Abdomen-Pelvis,9,309
Back,6,1138
Chest,5,176
Elbow-Lower Arm-Hand,8,1464
Entire Body,12,2166
Eye,3,23
Face,2,504
Head,1,2662
Hip-Upper Leg,10,1663
Knee-Lower Leg-Foot,11,5159


In [8]:
twoVeh.fillna('-').groupby(['f_inj_loc','INJLOCT_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_inj_loc,INJLOCT_CDE,Unnamed: 2_level_1
Abdomen-Pelvis,9,524
Back,6,5701
Chest,5,1807
Elbow-Lower Arm-Hand,8,1468
Entire Body,12,4032
Eye,3,92
Face,2,912
Head,1,3950
Hip-Upper Leg,10,556
Knee-Lower Leg-Foot,11,2258


### person level: f_per
### person and driver sex

In [9]:
ped.fillna('_').groupby(['f_per_sex','CI_SEX_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_sex,CI_SEX_CDE,Unnamed: 2_level_1
female,F,7736
male,M,9888


In [10]:
ped.fillna('_').groupby(['f_driver_sex','CI_SEX_CDE_driver']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_driver_sex,CI_SEX_CDE_driver,Unnamed: 2_level_1
female,F,3423
female,f,1
male,M,10656
male,m,4
unknown,U,19
unknown,unknown,3521


In [11]:
twoVeh.fillna('_').groupby(['f_per_sex','CI_SEX_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_sex,CI_SEX_CDE,Unnamed: 2_level_1
female,F,17376
female,f,49
male,M,19450
male,m,32


### person and driver age

In [12]:
ped.groupby(['f_per_age','INDIV_AGE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_age,INDIV_AGE,Unnamed: 2_level_1
1.0,1.0,21
2.0,2.0,89
3.0,3.0,102
4.0,4.0,91
5.0,5.0,112
6.0,6.0,125
7.0,7.0,150
8.0,8.0,194
9.0,9.0,189
10.0,10.0,249


In [13]:
ped.groupby(['f_driver_age','INDIV_AGE_driver']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_driver_age,INDIV_AGE_driver,Unnamed: 2_level_1
14.0,14.0,1
15.0,15.0,2
16.0,16.0,7
17.0,17.0,36
18.0,18.0,55
19.0,19.0,126
20.0,20.0,144
21.0,21.0,165
22.0,22.0,221
23.0,23.0,250


In [14]:
twoVeh.groupby(['f_per_age','INDIV_AGE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_age,INDIV_AGE,Unnamed: 2_level_1
0.0,0.0,2
1.0,1.0,85
2.0,2.0,97
3.0,3.0,107
4.0,4.0,148
5.0,5.0,155
6.0,6.0,152
7.0,7.0,164
8.0,8.0,179
9.0,9.0,197


In [15]:
ped.groupby(['f_per_age_dec','INDIV_AGE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_age_dec,INDIV_AGE,Unnamed: 2_level_1
0.0,1.0,21
0.0,2.0,89
0.0,3.0,102
0.0,4.0,91
0.0,5.0,112
0.0,6.0,125
0.0,7.0,150
0.0,8.0,194
0.0,9.0,189
10.0,10.0,249


In [16]:
ped.groupby(['f_driver_age_dec','INDIV_AGE_driver']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_driver_age_dec,INDIV_AGE_driver,Unnamed: 2_level_1
10.0,14.0,1
10.0,15.0,2
10.0,16.0,7
10.0,17.0,36
10.0,18.0,55
10.0,19.0,126
20.0,20.0,144
20.0,21.0,165
20.0,22.0,221
20.0,23.0,250


In [17]:
twoVeh.groupby(['f_per_age_dec','INDIV_AGE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_age_dec,INDIV_AGE,Unnamed: 2_level_1
0.0,0.0,2
0.0,1.0,85
0.0,2.0,97
0.0,3.0,107
0.0,4.0,148
0.0,5.0,155
0.0,6.0,152
0.0,7.0,164
0.0,8.0,179
0.0,9.0,197


In [18]:
ped.groupby(['f_per_age_bi','INDIV_AGE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_age_bi,INDIV_AGE,Unnamed: 2_level_1
age < 70,1.0,21
age < 70,2.0,89
age < 70,3.0,102
age < 70,4.0,91
age < 70,5.0,112
age < 70,6.0,125
age < 70,7.0,150
age < 70,8.0,194
age < 70,9.0,189
age < 70,10.0,249


In [19]:
ped.groupby(['f_driver_age_bi','INDIV_AGE_driver']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_driver_age_bi,INDIV_AGE_driver,Unnamed: 2_level_1
age < 70,14.0,1
age < 70,15.0,2
age < 70,16.0,7
age < 70,17.0,36
age < 70,18.0,55
age < 70,19.0,126
age < 70,20.0,144
age < 70,21.0,165
age < 70,22.0,221
age < 70,23.0,250


In [20]:
twoVeh.groupby(['f_per_age_bi','INDIV_AGE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_age_bi,INDIV_AGE,Unnamed: 2_level_1
age < 70,0.0,2
age < 70,1.0,85
age < 70,2.0,97
age < 70,3.0,107
age < 70,4.0,148
age < 70,5.0,155
age < 70,6.0,152
age < 70,7.0,164
age < 70,8.0,179
age < 70,9.0,197


### Crash role

In [21]:
ped.groupby(['f_per_role','CIROLET_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_role,CIROLET_ID,Unnamed: 2_level_1
bicyclist,14,13
bicyclist,7,3478
pedestrian,6,14133


In [22]:
twoVeh.groupby(['f_per_role','CIROLET_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_role,CIROLET_ID,Unnamed: 2_level_1
driver,1,23645
passenger,2,13000
unknown,11,262


### occupant ejected

In [23]:
twoVeh.fillna('-').groupby(['EJCTT_ID','f_per_eject']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
EJCTT_ID,f_per_eject,Unnamed: 2_level_1
-1,unknown,645
-2,unknown,3
-3,unknown,131
1,not ejected,35049
2,ejected,197
3,ejected,882


### pedestrian at intersection or not

In [24]:
ped.fillna('-').groupby(['f_per_loc','PBLOCT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_per_loc,PBLOCT_ID,Unnamed: 2_level_1
at intersection,1,3767
not at intersection,2,1321
unknown,-1,416
unknown,unknown,12120


### road conditions: f_road
### light conditions

In [25]:
ped.fillna('-').groupby(['f_road_light','LGHTCNDT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_light,LGHTCNDT_ID,Unnamed: 2_level_1
Dark-Road,4,5696
Dark-Road,5,119
Dawn/Dusk,2,372
Dawn/Dusk,3,779
Daylight,1,10481
unknown,-1,177


In [26]:
twoVeh.fillna('-').groupby(['f_road_light','LGHTCNDT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_light,LGHTCNDT_ID,Unnamed: 2_level_1
Dark-Road,4,10693
Dark-Road,5,114
Dawn/Dusk,2,784
Dawn/Dusk,3,1317
Daylight,1,23676
unknown,-1,323


### road surface

In [27]:
ped.fillna('-').groupby(['f_road_surf','RDSRFT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_surf,RDSRFT_ID,Unnamed: 2_level_1
Dry,1,13963
Not Dry,2,3218
Not Dry,3,6
Not Dry,4,181
Not Dry,5,69
Not Dry,6,9
unknown,-1,171
unknown,7,7


In [28]:
twoVeh.fillna('-').groupby(['f_road_surf','RDSRFT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_surf,RDSRFT_ID,Unnamed: 2_level_1
Dry,1,29878
Not Dry,2,6130
Not Dry,3,12
Not Dry,4,459
Not Dry,5,75
Not Dry,6,29
unknown,-1,312
unknown,7,12


### weather

In [29]:
ped.fillna('-').groupby(['f_road_weather','WTHRT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_weather,WTHRT_ID,Unnamed: 2_level_1
Clear,1,12716
Cloudy,2,2026
Cloudy,6,33
Percipitation,3,2373
Percipitation,4,203
Percipitation,5,69
unknown,-1,198
unknown,9,6


In [30]:
twoVeh.fillna('-').groupby(['f_road_weather','WTHRT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_weather,WTHRT_ID,Unnamed: 2_level_1
Clear,1,26999
Cloudy,2,4778
Cloudy,6,86
Percipitation,3,4152
Percipitation,4,403
Percipitation,5,120
unknown,-1,359
unknown,9,10


### traffic control

In [31]:
ped.fillna('-').groupby(['f_road_control','TFCCTRLT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_control,TFCCTRLT_ID,Unnamed: 2_level_1
,1,5939
Other,11,10
Other,12,23
Other,13,2
Other,14,3
Other,15,53
Other,4,31
Other,5,11
Other,6,33
Other,7,6


In [32]:
twoVeh.fillna('-').groupby(['f_road_control','TFCCTRLT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_road_control,TFCCTRLT_ID,Unnamed: 2_level_1
,1,12051
Other,10,1
Other,11,17
Other,12,129
Other,13,10
Other,14,3
Other,15,371
Other,4,71
Other,5,94
Other,6,28


### time of day

In [33]:
ped.groupby(['f_period','HR1']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_period,HR1,Unnamed: 2_level_1
0.0,0,275
0.0,1,207
0.0,2,179
1.0,3,139
1.0,4,161
1.0,5,199
2.0,6,386
2.0,7,563
2.0,8,925
3.0,10,685


In [34]:
twoVeh.groupby(['f_period','HR1']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_period,HR1,Unnamed: 2_level_1
0.0,0,878
0.0,1,707
0.0,2,585
1.0,3,505
1.0,4,663
1.0,5,686
2.0,6,852
2.0,7,1145
2.0,8,2031
3.0,10,1692


### vehicle/pedestrian pre-accident action: f_act

In [35]:
ped.fillna('-').groupby(['f_act_ped','PBACTT_DMV_CDE']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_act_ped,PBACTT_DMV_CDE,Unnamed: 2_level_1
Along Highway,5,1198
Along Highway,6,364
"Crossing, Against Signal",2,2247
"Crossing, No Signal or Crosswalk",4,2743
"Crossing, No Signal, Marked Crosswalk",3,1286
"Crossing, With Signal",1,5698
Other,10,9
Other,11,140
Other,12,171
Other,13,797


In [36]:
ped.fillna('-').groupby(['f_act_veh_other','PACCACTT_ID_other']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_act_veh_other,PACCACTT_ID_other,Unnamed: 2_level_1
Backing,15,1078
Going Straight Ahead,1,8191
Making Left Turn,17,35
Making Left Turn,3,4760
Making Left Turn,4,144
Making Right Turn,16,11
Making Right Turn,2,1615
Other,11,9
Other,12,51
Other,13,51


In [37]:
twoVeh.fillna('-').groupby(['f_act_veh','PACCACTT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_act_veh,PACCACTT_ID,Unnamed: 2_level_1
Backing,15,181
Going Straight Ahead,1,24179
Making Left Turn,17,42
Making Left Turn,3,3311
Making Left Turn,4,479
Making Right Turn,16,15
Making Right Turn,2,887
Other,11,50
Other,12,610
Other,13,179


In [38]:
twoVeh.fillna('-').groupby(['f_act_veh_other','PACCACTT_ID_other']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_act_veh_other,PACCACTT_ID_other,Unnamed: 2_level_1
Backing,15,319
Going Straight Ahead,1,23278
Making Left Turn,17,70
Making Left Turn,3,4266
Making Left Turn,4,721
Making Right Turn,16,18
Making Right Turn,2,1112
Other,11,39
Other,12,915
Other,13,224


### vehicles type :  f_veh


need to work on this grouping and match it to DOHMH's derived groups.

In [39]:
ped.fillna('-').groupby(['f_veh_other','VEHBDYT_ID_other']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_veh_other,VEHBDYT_ID_other,Unnamed: 2_level_1
Bus,60.0,332
Car,1.0,77
Car,3.0,60
Car,4.0,233
Car,6.0,6026
Car,7.0,396
Car,16.0,111
Car,63.0,850
Motorcycle,10.0,96
Pickup,44.0,243


In [40]:
twoVeh.fillna('-').groupby(['f_veh','VEHBDYT_ID']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_veh,VEHBDYT_ID,Unnamed: 2_level_1
Bus,60.0,561
Car,1.0,530
Car,3.0,175
Car,4.0,445
Car,6.0,17642
Car,7.0,1446
Car,16.0,298
Car,63.0,1037
Motorcycle,10.0,1537
Pickup,44.0,295


In [41]:
twoVeh.fillna('-').groupby(['f_veh_other','VEHBDYT_ID_other']).count()[['CI_ID']]

Unnamed: 0_level_0,Unnamed: 1_level_0,CI_ID
f_veh_other,VEHBDYT_ID_other,Unnamed: 2_level_1
Bus,60.0,630
Car,1.0,275
Car,3.0,185
Car,4.0,460
Car,6.0,14570
Car,7.0,1307
Car,12.0,1
Car,16.0,295
Car,63.0,917
Motorcycle,10.0,164
