# Grand Rapids Traffic Accident Project
## Part 1: Data Cleaning and Exploration

Created by: Kate Meredith
Date: 11.28.22

## Table of Contents

* [1. Background](#header1)
    * [1.1 Data Dictionary](#subheader11)
* [2. Importing Libraries](#header2)
* [3. Importing the Data](#header3)
* [4. Exploring the Data Composition](#header4)
    * [4.1 Previewing Values](#subheader41)
    * [4.2 Value Options per Column](#subheader42)
    * [4.3 Checking for Unusual Values](#subheader43)
    * [4.4 Checking for Null Values](#subheader44)
* [5. Cleaning the Data](#header5)
    * [5.1 Dropping Unneeded Columns](#subheader51)
    * [5.2 Converting Datatypes](#subheader52)

## 1. Background <a class="anchor" id="header1"></a>

This data represents traffic accidents occuring in Grand Rapids, Michigan, USA from 2007 to 2017. The
data comes from [Grand Rapids Open Data](https://grdata-grandrapids.opendata.arcgis.com/datasets/grandrapids::cgr-crash-data/explore?location=0.000000%2C0.000000%2C2.62).

This [web page](https://services2.arcgis.com/L81TiOwAPO1ZvU9b/arcgis/rest/services/CGRCrashData/FeatureServer/0) provides an overview of the variables and their datatypes.

### 1.1 Data Dictionary <a class="anchor" id="subheader11"></a>

The dataset contains a tremendous amount of detail around the circumstances of the crash (142 variables). The data did not include a data dictionary, so the following provides a best guess at the various fields based on the names and data.

* X: longitude
* Y: latitude
* OBJECTID: accident ID; integer, 1 through 74305
* ROADSOFTID: 7-digit number associated with roadsoft, possibly [this](http://roadsoft.org/)
* BIKE: was a bike involved? yes or no
* CITY: all are Grand Rapids
* COUNTY: all are Kent County
* CRASHDATE: date of accident, formatted year, month, date
* CRASHSEVER: crash severity, categorical options shown below
* CRASHTYPE: crash type, categorical options shown below
* WORKZNEACT: was crash in an active workzone? categorical options shown below
* WORKZNECLO: nearness to workzone, categorical options shown below
* WORKZNETYP: type of workzone, categorical options shown below  
* CTRLMILEPT: unclear, float between 0 and ~16.69
* CTRLSECT: unclear, float between 0 and 41843 
* DAYOFMONTH: day of month, integer 1-31
* DAYOFWEEK: day of week, categorical  
* ANIMAL: were animals involved, categorical (e.g. wildlife not pets)
* D1COND: appears to be driver 1 condition, categorical shown below; drinking separate category 
* D1DRINKIN: alcohol use for driver 1, yes or no
* D1HAZACT: driver 1 hazardous action, categorical list shown below
* D1INJURY: driver 1 injury, letter corresponding with defined category shown below  
* D1INTENT: driver 1 intended action, categorical list below 
* D2COND: driver 2 condition, categorical shown below; drinking separate category  
* D2DRINKIN: alcohol use for driver 2, yes or no
* D2HAZACT: driver 2 hazardous action, categorical list shown below 
* D2INJURY: driver 2 injury, letter corresponding with defined category shown below   
* D2INTENT: driver 2 intended action, categorical list below 
* D3COND: driver 3 condition, categorical shown below; drinking separate category  
* D3DRINKIN: alcohol use for driver 3, yes or no 
* D3HAZACT: driver 3 hazardous action, categorical list shown below  
* D3INJURY: driver 3 injury, letter corresponding with defined category shown below 
* D3INTENT: driver 3 intended action, categorical list below  
* DRINKING: was alcohol a factor, yes or no
* DRIVER1AGE: age of driver 1, numeric
* DRIVER1SEX: sex of driver 1; M, F or U 
* DRIVER2AGE: age of driver 2, numeric
* DRIVER2SEX: sex of driver 2; M, F or U 
* DRIVER3AGE: age of driver 3; numeric 
* DRIVER3SEX: sex of driver 3; M, F or U 
* EMRGVEH: unclear, emeregency vehicle involvement or called? yes or no
* FARMEQUIP: farm equipment involved, yes or no 
* FLEEINGSIT: fleeing situation, yes or no
* FWSEGID: unclear, some kind of ID? appear reused, int64 
* GRTINJSEVE: injury assessment; categorical options 
* HITANDRUN: was it a hit and run, yes or no 
* HOUR: hour of accident, 0 to 23; 99 listed (for unknown?) 
* INTERNAME: streetname where accident occurred 
* LIGHTING: lighting at time of accident, categorical options 
* MDOTREG: MDOT registration? All appear to be "Grand" for Grand Rapids 
* MILEPOINT: maybe location related? string that needs to be converted to float 
* MONTH: month accident occurred in, string that needs converted to numeric
* MOTORCYCLE: motorcycle involvement, yes or no
* NOATYPEINJ: number(?) of A type injury (incapacitating), numeric 
* NOBTYPEINJ: number(?) of B type injury (non-incapacitating), numeric
* NOCTYPEINJ: number(?) of C type injury (possible injury), numeric
* NONTRAFFIC: unclear, all are no
* NUMOFINJ: number of people injured, numeric 
* NUMOFKILL: number of people killed
* NUMOFOCCUP: number of occupants? Some seem pretty high, maybe multicar, buses? numeric  
* NUMOFUNINJ: number of peopel injured, numeric 
* NUMOFVEHIC: number of vehicles involved, numeric 
* ORV: off road vehicle involvement, yes or no
* PEDESTRIAN: pedestrian involved, yes or no 
* PRNAME: street name, unclear how different from name above 
* PRNO: unclear, maybe street name that corresponds with PRNAME? numeric 
* REFDIR: related to cardinal directions of some kind? Unclear what some options would be 
* REFDIST: unclear, numeric 
* ROUTECLASS: road type, categorical options  
* ROUTENUM: route number if applicable, looks like 0 if doesn't apply and maybe 999 if unknown; numeric 
* SCHOOLBUS: school bus involvement, yes or no 
* SNOWMOBILE: snowmobile involvement, yes or no 
* SPDLMTPOST: is speed limit posted, yes or no
* SPEEDLIMIT: speed limit at crash location
* SURFCOND: surface conditions at time of accident, categorical options 
* TRAFCTLDEV: traffic signal at location, categorical options 
* TRAIN: train involvement, yes or no 
* TRUCKBUS: truck or bus involvement, yes or no
* TRUNKLINE: road supporting long distances through traffice, yes or now 
* UD10NUM: ID for UD 10 traffice reporting, all unique numbers?
* V1DEFECT: any defects for vehicle 1, categorical options
* V1DAMAGE: damage to vehicle 1, categorical options 
* V1HARMEVT1: first harm caused by vehicle 1? categorical options
* V1HARMEVT2: second harm caused by vehicle 1? categorical options 
* V1HARMEVT3: third harm caused by vehicle 1? categorical options
* V1HARMEVT4: fourth harm caused by vehicle 1? categorical options
* V1MSTHARME: harm from vehicle 1? unclear how different from above; categorical options 
* V1SPECCAT: vehicle 1 special category? unclear if categories are response to vehicle one, about vehicle 1 (like police car, farm, fire truck, wrecker) 
* V1TRAILER: type of trailer, if any, pulled by vehicle 1; categorical options 
* V1VIOLATOR: maybe if vehicle 1 is at fault, yes or no 
* V1WIMPCTPT: vehicle 1 impact location, categorical options 
* V2DEFECT: any defects for vehicle 2, categorical options
* V2DAMAGE: damage to vehicle 2, categorical options 
* V2HARMEVT1: first harm caused by vehicle 2? categorical options
* V2HARMEVT2: second harm caused by vehicle 2? categorical options 
* V2HARMEVT3: third harm caused by vehicle 2? categorical options
* V2HARMEVT4: fourth harm caused by vehicle 2? categorical options
* V2MSTHARME: harm from vehicle 2? unclear how different from above; categorical options 
* V2SPECCAT: vehicle 2 special category? unclear if categories are response to vehicle one, about vehicle 2 (like police car, farm, fire truck, wrecker) 
* V2TRAILER: type of trailer, if any, pulled by vehicle 2; categorical options 
* V2VIOLATOR: maybe if vehicle 2 is at fault, yes or no 
* V2WIMPCTPT: vehicle 2 impact location, categorical options 
* V3DEFECT: any defects for vehicle 3, categorical options
* V3DAMAGE: damage to vehicle 3, categorical options 
* V3HARMEVT1: first harm caused by vehicle 3? categorical options
* V3HARMEVT2: second harm caused by vehicle 3? categorical options 
* V3HARMEVT3: third harm caused by vehicle 3? categorical options
* V3HARMEVT4: fourth harm caused by vehicle 3? categorical options
* V3MSTHARME: harm from vehicle 3? unclear how different from above; categorical options 
* V3SPECCAT: vehicle 3 special category? unclear if categories are response to vehicle one, about vehicle 3 (like police car, farm, fire truck, wrecker) 
* V3TRAILER: type of trailer, if any, pulled by vehicle 3; categorical options 
* V3VIOLATOR: maybe if vehicle 3 is at fault, yes or no 
* V3WIMPCTPT: vehicle 3 impact location, categorical options 
* VEH1DIR: direction of vehicle 1, categorical 
* VEH1TYPE: vehicle 1 type, categorical 
* VEH1USE: vehical 1 use, categorical 
* VEH2DIR: direction of vehicle 2, categorical 
* VEH2TYPE: vehicle 2 type, categorical 
* VEH2USE: vehical 2 use, categorical 
* VEH3DIR: direction of vehicle 3, categorical 
* VEH3TYPE: vehicle 3 type, categorical 
* VEH3USE: vehical 3 use, categorical 
* WEATHER: weather at time of accident, categorical 
* WHEREONRD: where on the road the accident occurred, categorical 
* YEAR: year of accident, numeric
* RDCITYTWP: road city township, most Grand Rapids (maybe who's resposible for road? all accidents in GR) 
* ROAD_USER1: which ward does the road fall in, categorical 
* ROAD_USER2: some kind of number assigned? 
* ROAD_USER3: two options - 'undefined' or 'from information,not concrete'
* ROAD_USER4: two options - 'undefined' or 'from information,not concrete'
* RDLEGALSYS: legal jurisdiction or main purpose of road? categorical options like trunkline and city major  
* RDLGLCODE: legal code assigned to road? numerical 0 to 5 
* RDNFC: road type like freeway, other principle artery (similar to above but using different terms, unclear why); categorical 
* RDNFCCODE: code representing road type, corresponding to above 'RDNFC'? 
* RDNUMLANES: number of lanes on the road, numeric 0 to 6
* RDSUBTYPDS: materials road made from, categorical
* RDSUBTYPE: number, maybe representing road materials and corresponding to above? unclear; 0 or in 30s
* RDSURFTYPE: road surface type, captures same info at RDSUBTYPDS; categorical
* RDUSRINVID: unique number (maybe count or ID from spreadsheet, 0 to 74308)
* RDWIDTH: roadwidth, convert to float 
* FRAMEWORK: unclear, all 17

## 2. Importing Libraries <a class="anchor" id="header2"></a>

Importing libraries to support data cleaning and exploration.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

## 3. Importing the Data <a class="anchor" id="header3"></a>

In [2]:
#importing the data
crash_df = pd.read_csv('CGR_Crash_Data.csv')

In [3]:
#checking data shape
crash_df.shape

(74309, 142)

Data set has 74,309 rows and 142 variables.

## 4. Exploring the Data Composition <a class="anchor" id="header4"></a>

### 4.1 Previewing Values <a class="anchor" id="subheader41"></a>

In [123]:
#updating setting to show all columns by default
pd.set_option('display.max_columns', None)

In [5]:
#previewing first 5 rows
crash_df.head()

Unnamed: 0,X,Y,OBJECTID,ROADSOFTID,BIKE,CITY,COUNTY,CRASHDATE,CRASHSEVER,CRASHTYPE,WORKZNEACT,WORKZNECLO,WORKZNETYP,CTRLMILEPT,CTRLSECT,DAYOFMONTH,DAYOFWEEK,ANIMAL,D1COND,D1DRINKIN,D1HAZACT,D1INJURY,D1INTENT,D2COND,D2DRINKIN,D2HAZACT,D2INJURY,D2INTENT,D3COND,D3DRINKIN,D3HAZACT,D3INJURY,D3INTENT,DRINKING,DRIVER1AGE,DRIVER1SEX,DRIVER2AGE,DRIVER2SEX,DRIVER3AGE,DRIVER3SEX,EMRGVEH,FARMEQUIP,FLEEINGSIT,FWSEGID,GRTINJSEVE,HITANDRUN,HOUR,INTERNAME,LIGHTING,MDOTREG,MILEPOINT,MONTH,MOTORCYCLE,NOATYPEINJ,NOBTYPEINJ,NOCTYPEINJ,NONTRAFFIC,NUMOFINJ,NUMOFKILL,NUMOFOCCUP,NUMOFUNINJ,NUMOFVEHIC,ORV,PEDESTRIAN,PRNAME,PRNO,PUBLICPROP,REFDIR,REFDIST,ROUTECLASS,ROUTENUM,SCHOOLBUS,SNOWMOBILE,SPDLMTPOST,SPEEDLIMIT,SURFCOND,TRAFCTLDEV,TRAIN,TRUCKBUS,TRUNKLINE,UD10NUM,V1DEFECT,V1DAMAGE,V1HARMEVT1,V1HARMEVT2,V1HARMEVT3,V1HARMEVT4,V1MSTHARME,V1SPECCAT,V1TRAILER,V1VIOLATOR,V1WIMPCTPT,V2DEFECT,V2DAMAGE,V2HARMEVT1,V2HARMEVT2,V2HARMEVT3,V2HARMEVT4,V2MSTHARME,V2SPECCAT,V2TRAILER,V2VIOLATOR,V2WIMPCTPT,V3DEFECT,V3DAMAGE,V3HARMEVT1,V3HARMEVT2,V3HARMEVT3,V3HARMEVT4,V3MSTHARME,V3SPECCAT,V3TRAILER,V3VIOLATOR,V3WIMPCTPT,VEH1DIR,VEH1TYPE,VEH1USE,VEH2DIR,VEH2TYPE,VEH2USE,VEH3DIR,VEH3TYPE,VEH3USE,WEATHER,WHEREONRD,YEAR,RDCITYTWP,ROAD_USER1,ROAD_USER2,ROAD_USER3,ROAD_USER4,RDLEGALSYS,RDLGLCODE,RDNFC,RDNFCCODE,RDNUMLANES,RDSUBTYPDS,RDSUBTYPE,RDSURFTYPE,RDUSRINVID,RDWIDTH,FRAMEWORK
0,-85.650003,42.919854,1,2589528,No,Grand Rapids,Kent,2008/06/16,Property Damage Only,Backing,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,16,Monday,"No, Uncoded & Errors",Appeared Normal,No,Improper Backing,O-No Injury,Backing,Appeared Normal,No,,O-No Injury,Going Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,31,M,43,F,999,U,No,No,No,922088,No Injury,No,8,ALGER,Daylight,Grand,0.101,June,No,0,0,0,No,0,0,2,2,2,No,No,LINDEN,3410438,Uncoded & Errors,S,90,County Road or City Street or Not Known,0,No,No,No,25,Dry,Stop Sign,No,No,Non-Trunkline,7059760,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Right Rear Corner,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Left Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,"Passenger Car, SUV, Van",Private,N,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2008,Grand Rapids,Ward 1,3,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,26.0,17
1,-85.625665,42.92471,2,2593183,No,Grand Rapids,Kent,2008/08/30,Property Damage Only,Fixed Object,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,30,Saturday,"No, Uncoded & Errors",Unknown,No,Reckless Driving,Uncoded & Errors,Going Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,999,M,999,U,999,U,No,No,No,921834,No Injury,Yes,15,ROSEWOOD,Daylight,Grand,0.028,August,No,0,0,0,No,0,0,4,4,1,No,No,LOUISE,413903,Yes,E,150,County Road or City Street or Not Known,0,No,No,No,25,Dry,,No,No,Non-Trunkline,7110374,Uncoded & Errors,Disabling Damage,Hit Utility Pole / Light Support,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Utility Pole / Light Support,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,"Passenger Car, SUV, Van",Other,U,Uncoded & Errors,Uncoded & Errors,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Shoulder,2008,Grand Rapids,Ward 3,55,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,26.0,17
2,-85.655282,43.000972,3,2582102,No,Grand Rapids,Kent,2008/02/13,Property Damage Only,Other Driveway,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,13,Wednesday,"No, Uncoded & Errors",Uncoded & Errors,No,Unknown,Uncoded & Errors,Going Straight,Uncoded & Errors,No,,Uncoded & Errors,Parked,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,999,U,999,U,999,U,No,No,No,899196,No Injury,Yes,8,PLAINFIELD,Daylight,Grand,0.225,February,No,0,0,0,No,0,0,0,0,2,No,No,JULIA,423704,Uncoded & Errors,W,50,County Road or City Street or Not Known,0,No,No,Yes,25,Snowy,,No,No,Non-Trunkline,6948189,Uncoded & Errors,Uncoded & Errors,Hit Traffic Sign Post,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Traffic Sign Post,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,Uncoded & Errors,Uncoded & Errors,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2008,Grand Rapids,Ward 2,42,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,29.0,17
3,-85.643314,42.928172,4,2579820,No,Grand Rapids,Kent,2008/01/25,Property Damage Only,Angle Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,25,Friday,"No, Uncoded & Errors",Appeared Normal,No,Failed to Yield,O-No Injury,Going Straight,Appeared Normal,No,,O-No Injury,Going Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,23,M,18,F,999,U,No,No,No,926958,No Injury,No,17,ARDMORE,Dusk,Grand,0.062,January,No,0,0,0,No,0,0,2,2,2,No,No,BLAIM,3410312,Uncoded & Errors,E,10,County Road or City Street or Not Known,0,No,No,No,25,Slush,Yield Sign,No,No,Non-Trunkline,6918547,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Driver Side,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,S,"Passenger Car, SUV, Van",Private,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Cloudy,On the Road,2008,Grand Rapids,Ward 3,62,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,27.0,17
4,-85.665571,42.968854,5,2594624,No,Grand Rapids,Kent,2008/09/26,Property Damage Only,Backing,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,26,Friday,"No, Uncoded & Errors",Appeared Normal,No,Unknown,O-No Injury,Backing,Appeared Normal,No,Unknown,O-No Injury,Going Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,37,M,18,F,999,U,No,No,No,903417,No Injury,No,17,RANSOM,Daylight,Grand,0.046,September,No,0,0,0,No,0,0,2,2,2,No,No,CRESENT,3416260,Uncoded & Errors,NW,150,County Road or City Street or Not Known,0,No,No,Yes,25,Dry,,No,Yes,Non-Trunkline,7132758,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,Truck / Bus (Commercial),Commercial,W,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2008,Grand Rapids,Ward 2,38,Undefined,Undefined,City Major,4,Local,7,2,Asphalt-Standard,35,Asphalt,,30.0,17


In [6]:
#previewing last 5 rows
crash_df.tail()

Unnamed: 0,X,Y,OBJECTID,ROADSOFTID,BIKE,CITY,COUNTY,CRASHDATE,CRASHSEVER,CRASHTYPE,WORKZNEACT,WORKZNECLO,WORKZNETYP,CTRLMILEPT,CTRLSECT,DAYOFMONTH,DAYOFWEEK,ANIMAL,D1COND,D1DRINKIN,D1HAZACT,D1INJURY,D1INTENT,D2COND,D2DRINKIN,D2HAZACT,D2INJURY,D2INTENT,D3COND,D3DRINKIN,D3HAZACT,D3INJURY,D3INTENT,DRINKING,DRIVER1AGE,DRIVER1SEX,DRIVER2AGE,DRIVER2SEX,DRIVER3AGE,DRIVER3SEX,EMRGVEH,FARMEQUIP,FLEEINGSIT,FWSEGID,GRTINJSEVE,HITANDRUN,HOUR,INTERNAME,LIGHTING,MDOTREG,MILEPOINT,MONTH,MOTORCYCLE,NOATYPEINJ,NOBTYPEINJ,NOCTYPEINJ,NONTRAFFIC,NUMOFINJ,NUMOFKILL,NUMOFOCCUP,NUMOFUNINJ,NUMOFVEHIC,ORV,PEDESTRIAN,PRNAME,PRNO,PUBLICPROP,REFDIR,REFDIST,ROUTECLASS,ROUTENUM,SCHOOLBUS,SNOWMOBILE,SPDLMTPOST,SPEEDLIMIT,SURFCOND,TRAFCTLDEV,TRAIN,TRUCKBUS,TRUNKLINE,UD10NUM,V1DEFECT,V1DAMAGE,V1HARMEVT1,V1HARMEVT2,V1HARMEVT3,V1HARMEVT4,V1MSTHARME,V1SPECCAT,V1TRAILER,V1VIOLATOR,V1WIMPCTPT,V2DEFECT,V2DAMAGE,V2HARMEVT1,V2HARMEVT2,V2HARMEVT3,V2HARMEVT4,V2MSTHARME,V2SPECCAT,V2TRAILER,V2VIOLATOR,V2WIMPCTPT,V3DEFECT,V3DAMAGE,V3HARMEVT1,V3HARMEVT2,V3HARMEVT3,V3HARMEVT4,V3MSTHARME,V3SPECCAT,V3TRAILER,V3VIOLATOR,V3WIMPCTPT,VEH1DIR,VEH1TYPE,VEH1USE,VEH2DIR,VEH2TYPE,VEH2USE,VEH3DIR,VEH3TYPE,VEH3USE,WEATHER,WHEREONRD,YEAR,RDCITYTWP,ROAD_USER1,ROAD_USER2,ROAD_USER3,ROAD_USER4,RDLEGALSYS,RDLGLCODE,RDNFC,RDNFCCODE,RDNUMLANES,RDSUBTYPDS,RDSUBTYPE,RDSURFTYPE,RDUSRINVID,RDWIDTH,FRAMEWORK
74304,-85.68841,42.997895,74305,2558829,No,Grand Rapids,Kent,2017/03/20,Fatal,Other Driveway,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,20,Monday,"No, Uncoded & Errors",Uncoded & Errors,No,Drove Left of Center,K-Fatal injury,Turning Left,Appeared Normal,No,,O-No Injury,Going Straight,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Yes,62,M,51,F,999,U,No,No,No,898439,Killed,No,16,PANNELL,Daylight,Grand,1.536,March,No,0,0,0,No,0,1,2,1,2,No,No,ALPINE,423610,Uncoded & Errors,S,200,County Road or City Street or Not Known,999,No,No,Yes,30,Dry,,No,Yes,Non-Trunkline,1008871,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Right Corner,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Hit Utility Pole / Light Support,Hit Concrete Barrier,Uncoded & Errors,Hit Motor Vehicle in Transport,Bus,Uncoded & Errors,No,Front Left Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,S,"Passenger Car, SUV, Van",Private,N,Truck / Bus (Commercial),Commercial,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2017,Grand Rapids,Ward 1,19,Undefined,Undefined,City Major,4,Other Principal Arterial,3,2,Asphalt-Standard,35,Asphalt,,42.0,17
74305,-85.66167,42.94136,74306,2574652,No,Grand Rapids,Kent,2017/11/18,Fatal,Pedestrian,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,18,Saturday,"No, Uncoded & Errors",Uncoded & Errors,No,,O-No Injury,Going Straight,Unknown,No,Other,K-Fatal injury,Unknown,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,59,M,77,M,999,U,No,No,No,915807,Killed,No,18,LAFAYETTE,"Dark, Lighted",Grand,1.869,November,No,0,0,0,No,0,1,1,0,1,No,Yes,HALL,408705,Uncoded & Errors,W,70,County Road or City Street or Not Known,999,No,No,No,30,Wet,,No,No,Non-Trunkline,1213984,Uncoded & Errors,Functional Damage,Hit Pedestrian,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Pedestrian,Uncoded & Errors,Uncoded & Errors,No,Front Right Corner,Uncoded & Errors,No Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,U,Uncoded & Errors,Uncoded & Errors,Rain,On the Road,2017,Grand Rapids,Ward 3,66,Undefined,Undefined,City Major,4,Minor Arterial,4,2,Asphalt-Standard,35,Asphalt,,42.0,17
74306,-85.661243,42.939724,74307,2563322,No,Grand Rapids,Kent,2017/06/02,Fatal,Side-Swipe Same,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,2,Friday,"No, Uncoded & Errors",Appeared Normal,No,,O-No Injury,Turning Left,Uncoded & Errors,No,Speed too Fast,K-Fatal injury,Going Straight,Unknown,No,,Uncoded & Errors,Parked,No,19,M,34,M,999,M,No,No,No,929128,Killed,No,19,GARDEN,Daylight,Grand,0.189,June,Yes,0,0,0,No,0,1,3,2,3,No,No,LAFAYETTE,406706,Uncoded & Errors,N,1000,County Road or City Street or Not Known,999,No,No,No,25,Dry,,No,No,Non-Trunkline,1062116,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Right Rear Corner,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Loss of Control,Hit Parked Vehicle,Uncoded & Errors,Hit Parked Vehicle,Uncoded & Errors,Uncoded & Errors,Yes,Right Rear Corner,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Rear Left Corner,N,"Passenger Car, SUV, Van",Private,N,Motorcycle,Private,N,"Passenger Car, SUV, Van",Private,Clear,On the Road,2017,Grand Rapids,Ward 3,70,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,30.0,17
74307,-85.584837,42.971042,74308,2568387,No,Grand Rapids,Kent,2017/08/23,Fatal,Pedestrian,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,23,Wednesday,"No, Uncoded & Errors",Appeared Normal,Yes,Reckless Driving,C-Possible injury,Negotiating Curve,Unknown,No,,K-Fatal injury,In Road with Traffic,Unknown,No,,A-Incapacitating Injury,In Road with Traffic,Yes,34,F,75,F,77,M,No,No,No,902303,Killed,No,19,TWIN LAKES,Daylight,Grand,0.004,August,No,1,0,2,No,3,1,2,0,1,No,Yes,MICHIGAN,422105,Uncoded & Errors,E,20,County Road or City Street or Not Known,999,No,No,Yes,25,Dry,,No,No,Non-Trunkline,1126006,Uncoded & Errors,Disabling Damage,Hit Pedestrian,Hit Pedestrian,Hit Animal,Uncoded & Errors,Hit Pedestrian,Uncoded & Errors,Uncoded & Errors,Yes,Multiple Areas,Uncoded & Errors,No Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,,Uncoded & Errors,No Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Shoulder,2017,Other,Ward 2,46,Undefined,Undefined,County Local,3,Local,7,2,Undefined,30,Undefined,,26.1,17
74308,-85.676419,42.984836,74309,2558314,No,Grand Rapids,Kent,2017/03/18,Fatal,Angle Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,15.223,41131,18,Saturday,"No, Uncoded & Errors",Uncoded & Errors,No,Disobeyed TCD,K-Fatal injury,Going Straight,Appeared Normal,No,,C-Possible injury,Going Straight,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,54,M,32,M,999,U,No,No,No,895439,Killed,No,18,LEONARD,Daylight,Grand,1.308,March,No,0,0,1,No,1,1,2,0,2,No,No,SCRIBNER,3410357,Uncoded & Errors,X,0,Interstate Route,296,No,No,Yes,30,Dry,Signal,No,No,Trunkline,1001927,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Left Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,Pickup Truck,Private,N,Pickup Truck,Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2017,Grand Rapids,Ward 1,Undefined,Undefined,Undefined,State Trunkline,1,Major Collector,5,2,Composite,36,Composite,,46.0,17


In [7]:
#previewing sample of data
crash_df.sample(10)

Unnamed: 0,X,Y,OBJECTID,ROADSOFTID,BIKE,CITY,COUNTY,CRASHDATE,CRASHSEVER,CRASHTYPE,WORKZNEACT,WORKZNECLO,WORKZNETYP,CTRLMILEPT,CTRLSECT,DAYOFMONTH,DAYOFWEEK,ANIMAL,D1COND,D1DRINKIN,D1HAZACT,D1INJURY,D1INTENT,D2COND,D2DRINKIN,D2HAZACT,D2INJURY,D2INTENT,D3COND,D3DRINKIN,D3HAZACT,D3INJURY,D3INTENT,DRINKING,DRIVER1AGE,DRIVER1SEX,DRIVER2AGE,DRIVER2SEX,DRIVER3AGE,DRIVER3SEX,EMRGVEH,FARMEQUIP,FLEEINGSIT,FWSEGID,GRTINJSEVE,HITANDRUN,HOUR,INTERNAME,LIGHTING,MDOTREG,MILEPOINT,MONTH,MOTORCYCLE,NOATYPEINJ,NOBTYPEINJ,NOCTYPEINJ,NONTRAFFIC,NUMOFINJ,NUMOFKILL,NUMOFOCCUP,NUMOFUNINJ,NUMOFVEHIC,ORV,PEDESTRIAN,PRNAME,PRNO,PUBLICPROP,REFDIR,REFDIST,ROUTECLASS,ROUTENUM,SCHOOLBUS,SNOWMOBILE,SPDLMTPOST,SPEEDLIMIT,SURFCOND,TRAFCTLDEV,TRAIN,TRUCKBUS,TRUNKLINE,UD10NUM,V1DEFECT,V1DAMAGE,V1HARMEVT1,V1HARMEVT2,V1HARMEVT3,V1HARMEVT4,V1MSTHARME,V1SPECCAT,V1TRAILER,V1VIOLATOR,V1WIMPCTPT,V2DEFECT,V2DAMAGE,V2HARMEVT1,V2HARMEVT2,V2HARMEVT3,V2HARMEVT4,V2MSTHARME,V2SPECCAT,V2TRAILER,V2VIOLATOR,V2WIMPCTPT,V3DEFECT,V3DAMAGE,V3HARMEVT1,V3HARMEVT2,V3HARMEVT3,V3HARMEVT4,V3MSTHARME,V3SPECCAT,V3TRAILER,V3VIOLATOR,V3WIMPCTPT,VEH1DIR,VEH1TYPE,VEH1USE,VEH2DIR,VEH2TYPE,VEH2USE,VEH3DIR,VEH3TYPE,VEH3USE,WEATHER,WHEREONRD,YEAR,RDCITYTWP,ROAD_USER1,ROAD_USER2,ROAD_USER3,ROAD_USER4,RDLEGALSYS,RDLGLCODE,RDNFC,RDNFCCODE,RDNUMLANES,RDSUBTYPDS,RDSUBTYPE,RDSURFTYPE,RDUSRINVID,RDWIDTH,FRAMEWORK
73554,-85.628502,42.912647,73555,2569219,No,Grand Rapids,Kent,2017/09/19,Injury,Rear End Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,1.901,41063,19,Tuesday,"No, Uncoded & Errors",Appeared Normal,No,Fail to Stop ACD,C-Possible injury,Going Straight,Appeared Normal,No,,O-No Injury,Stopped on Road,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,30,F,33,M,999,U,No,No,No,925394,Possible Injury,No,22,KALAMAZOO,"Dark, Lighted",Grand,13.672,September,No,0,0,1,No,1,0,2,1,2,No,No,28TH,409008,Uncoded & Errors,W,120,M Route,11,No,No,Yes,45,Dry,Signal,No,No,Trunkline,1136807,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Rear Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,"Passenger Car, SUV, Van",Private,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Cloudy,On the Road,2017,Grand Rapids,Undefined,73,Undefined,Undefined,State Trunkline,1,Other Principal Arterial,3,5,Composite,36,Composite,,0.0,17
28143,-85.681188,42.968376,28144,2661597,No,Grand Rapids,Kent,2012/01/01,Property Damage Only,Head-on,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,1,Sunday,"No, Uncoded & Errors",Appeared Normal,No,Drove Left of Center,O-No Injury,Going Straight,Uncoded & Errors,No,,Uncoded & Errors,Parked,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,21,M,999,U,999,U,No,No,No,904018,No Injury,No,1,BLUMRICH,"Dark, Lighted",Grand,0.448,January,No,0,0,0,No,0,0,1,1,2,No,No,WINTER,429009,Uncoded & Errors,S,100,County Road or City Street or Not Known,0,No,No,No,25,Dry,,No,No,Non-Trunkline,8239090,Uncoded & Errors,Disabling Damage,Hit Parked Vehicle,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Parked Vehicle,Uncoded & Errors,Uncoded & Errors,Yes,Front Left Corner,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Left Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,S,"Passenger Car, SUV, Van",Private,N,Pickup Truck,Uncoded & Errors,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2012,Grand Rapids,Ward 1,02,Undefined,Undefined,City Major,4,Local,7,2,Asphalt-Standard,35,Asphalt,,27.0,17
67837,-85.609629,42.912642,67838,2566845,No,Grand Rapids,Kent,2017/08/14,Property Damage Only,Rear End Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,2.859,41063,14,Monday,"No, Uncoded & Errors",Appeared Normal,No,Fail to Stop ACD,O-No Injury,Going Straight,Appeared Normal,No,,O-No Injury,Slowing or Stopped on Road,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,24,M,35,F,999,U,No,No,No,924782,No Injury,No,10,BRETON,Daylight,Grand,14.63,August,No,0,0,0,No,0,0,5,5,2,No,No,28TH,409008,Uncoded & Errors,W,390,M Route,11,No,No,Yes,45,Dry,Signal,No,No,Trunkline,1108314,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Right Corner,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Rear Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,"Passenger Car, SUV, Van",Private,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2017,Grand Rapids,Undefined,Undefined,Undefined,Undefined,State Trunkline,1,Other Principal Arterial,3,5,Composite,36,Composite,,0.0,17
50925,-85.665613,42.913029,50926,2738750,No,Grand Rapids,Kent,2015/07/02,Property Damage Only,Rear End Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.019,41063,2,Thursday,"No, Uncoded & Errors",Appeared Normal,No,,O-No Injury,Stopped on Road,Appeared Normal,No,Fail to Stop ACD,O-No Injury,Going Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,49,M,20,F,999,U,No,No,No,923854,No Injury,No,12,DIVISION,Daylight,Grand,11.79,July,No,0,0,0,No,0,0,2,2,2,No,No,28TH,409008,Uncoded & Errors,E,100,M Route,11,No,No,Yes,45,Dry,,No,No,Trunkline,9318849,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Rear Center,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,"Passenger Car, SUV, Van",Private,W,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2015,Grand Rapids,Undefined,Undefined,Undefined,Undefined,State Trunkline,1,Other Principal Arterial,3,5,Composite,36,Composite,,0.0,17
57556,-85.600607,42.986825,57557,2769970,No,Grand Rapids,Kent,2016/10/31,Property Damage Only,Angle Turn,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,31,Monday,"No, Uncoded & Errors",Appeared Normal,No,Failed to Yield,O-No Injury,Turning Left,Appeared Normal,No,,O-No Injury,Going Straight,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,16,M,29,F,999,U,No,No,No,894471,No Injury,No,14,LEONARD,Daylight,Grand,1.022,October,No,0,0,0,No,0,0,2,2,2,No,No,LEFFINGWELL,428802,Uncoded & Errors,N,900,County Road or City Street or Not Known,999,No,No,Yes,45,Dry,,No,No,Non-Trunkline,9842977,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Passenger Side,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Driver Side,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,"Passenger Car, SUV, Van",Private,S,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Cloudy,On the Road,2016,Grand Rapids,Ward 2,46,"From Information, Not Concrete",Undefined,City Major,4,Major Collector,5,2,Concrete-Standard,37,Concrete,,40.0,17
61624,-85.624654,42.941013,61625,2762705,No,Grand Rapids,Kent,2016/07/16,Property Damage Only,Rear End Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,16,Saturday,"No, Uncoded & Errors",Appeared Normal,No,Fail to Stop ACD,O-No Injury,Going Straight,Appeared Normal,No,,O-No Injury,Stopped on Road,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,18,F,25,F,999,U,No,No,No,914907,No Injury,No,23,PLYMOUTH,"Dark, Unlighted",Grand,3.746,July,No,0,0,0,No,0,0,2,2,2,No,No,HALL,408705,Uncoded & Errors,NE,25,County Road or City Street or Not Known,999,No,No,Yes,25,Dry,Signal,No,No,Non-Trunkline,9749301,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Rear Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,"Passenger Car, SUV, Van",Private,W,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2016,Grand Rapids,Ward 3,59,Undefined,Undefined,City Major,4,Major Collector,5,2,Asphalt-Standard,35,Asphalt,,42.1,17
8879,-85.598566,42.912601,8880,2604605,No,Grand Rapids,Kent,2009/03/10,Property Damage Only,Other Driveway,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,3.42,41063,10,Tuesday,"No, Uncoded & Errors",Appeared Normal,No,,O-No Injury,Turning Left,Appeared Normal,No,,O-No Injury,Turning Left,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,53,M,26,M,999,U,No,No,No,924928,No Injury,No,20,WOODLAWN,"Dark, Lighted",Grand,15.191,March,No,0,0,0,No,0,0,3,3,2,No,No,28TH,409008,Uncoded & Errors,E,10,M Route,11,No,No,Yes,45,Wet,Stop Sign,No,No,Trunkline,7291528,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Driver Side,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Left Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,SE,Motorhome,Private,NW,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Rain,On the Road,2009,Grand Rapids,Undefined,Undefined,Undefined,Undefined,State Trunkline,1,Other Principal Arterial,3,5,Composite,36,Composite,,0.0,17
71355,-85.69378,42.973162,71356,2576036,No,Grand Rapids,Kent,2017/12/10,Property Damage Only,Rear End Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,8.92,41029,10,Sunday,"No, Uncoded & Errors",Sick,No,,O-No Injury,Slowing or Stopped on Road,Appeared Normal,No,Fail to Stop ACD,O-No Injury,Slowing or Stopped on Road,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,60,M,22,M,999,U,No,No,No,901596,No Injury,No,18,PINE,"Dark, Lighted",Grand,0.302,December,No,0,0,0,No,0,0,2,2,2,No,No,2ND,407104,Uncoded & Errors,E,10,Interstate Route,196,No,No,Yes,25,Wet,,No,No,Trunkline,1231628,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Right Rear Corner,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Right Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,"Passenger Car, SUV, Van",Private,W,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Snow,On the Road,2017,Grand Rapids,Ward 1,12,Undefined,Undefined,City Major,4,Major Collector,5,2,Asphalt-Standard,35,Asphalt,,27.0,17
68500,-85.678018,42.976686,68501,2573568,No,Grand Rapids,Kent,2017/11/17,Property Damage Only,Fixed Object,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,14.841,41131,17,Friday,"No, Uncoded & Errors",Appeared Normal,No,Improper Lane Use,O-No Injury,Going Straight,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Unknown,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,18,F,999,U,999,U,No,No,No,907848,No Injury,No,22,RAMP 077C,"Dark, Lighted",Grand,14.841,November,No,0,0,0,No,0,0,1,1,1,No,No,S US 131,410907,Uncoded & Errors,N,24,Interstate Route,296,No,No,Yes,70,Wet,,No,No,Trunkline,1197780,Uncoded & Errors,Disabling Damage,Loss of Control,"Ran off road, Left",Hit Concrete Barrier,Uncoded & Errors,Hit Concrete Barrier,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,S,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,U,Uncoded & Errors,Uncoded & Errors,Rain,On the Shoulder,2017,Grand Rapids,Undefined,Undefined,Undefined,Undefined,State Trunkline,1,Interstate,1,3,Concrete-Standard,37,Concrete,,0.0,17
27252,-85.590754,42.984753,27253,2680235,No,Grand Rapids,Kent,2012/10/26,Property Damage Only,Rear End Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,5.048,41051,26,Friday,"No, Uncoded & Errors",Appeared Normal,No,Fail to Stop ACD,O-No Injury,Slowing or Stopped on Road,Appeared Normal,No,,O-No Injury,Slowing or Stopped on Road,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,18,F,20,M,999,U,No,No,No,894163,No Injury,No,12,LEONARD,Daylight,Grand,0.674,October,No,0,0,0,No,0,0,2,2,2,No,No,BELTLIN,3412182,Uncoded & Errors,N,150,M Route,44,No,No,Yes,55,Dry,Signal,No,No,Trunkline,8500004,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Rear Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,S,"Passenger Car, SUV, Van",Private,S,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2012,Grand Rapids,Undefined,Undefined,Undefined,Undefined,State Trunkline,1,Other Principal Arterial,3,2,Asphalt-Standard,35,Asphalt,,0.0,17


In [8]:
#getting a list of all column names
print(crash_df.columns.tolist())

['X', 'Y', 'OBJECTID', 'ROADSOFTID', 'BIKE', 'CITY', 'COUNTY', 'CRASHDATE', 'CRASHSEVER', 'CRASHTYPE', 'WORKZNEACT', 'WORKZNECLO', 'WORKZNETYP', 'CTRLMILEPT', 'CTRLSECT', 'DAYOFMONTH', 'DAYOFWEEK', 'ANIMAL', 'D1COND', 'D1DRINKIN', 'D1HAZACT', 'D1INJURY', 'D1INTENT', 'D2COND', 'D2DRINKIN', 'D2HAZACT', 'D2INJURY', 'D2INTENT', 'D3COND', 'D3DRINKIN', 'D3HAZACT', 'D3INJURY', 'D3INTENT', 'DRINKING', 'DRIVER1AGE', 'DRIVER1SEX', 'DRIVER2AGE', 'DRIVER2SEX', 'DRIVER3AGE', 'DRIVER3SEX', 'EMRGVEH', 'FARMEQUIP', 'FLEEINGSIT', 'FWSEGID', 'GRTINJSEVE', 'HITANDRUN', 'HOUR', 'INTERNAME', 'LIGHTING', 'MDOTREG', 'MILEPOINT', 'MONTH', 'MOTORCYCLE', 'NOATYPEINJ', 'NOBTYPEINJ', 'NOCTYPEINJ', 'NONTRAFFIC', 'NUMOFINJ', 'NUMOFKILL', 'NUMOFOCCUP', 'NUMOFUNINJ', 'NUMOFVEHIC', 'ORV', 'PEDESTRIAN', 'PRNAME', 'PRNO', 'PUBLICPROP', 'REFDIR', 'REFDIST', 'ROUTECLASS', 'ROUTENUM', 'SCHOOLBUS', 'SNOWMOBILE', 'SPDLMTPOST', 'SPEEDLIMIT', 'SURFCOND', 'TRAFCTLDEV', 'TRAIN', 'TRUCKBUS', 'TRUNKLINE', 'UD10NUM', 'V1DEFECT', 'V

### 4.2 Value Options per Column <a class="anchor" id="subheader42"></a>

Exploring options and corresponding count for categorical columns:

In [10]:
#bike involvement
crash_df['BIKE'].value_counts()

No     73383
Yes      926
Name: BIKE, dtype: int64

In [11]:
#city crash occurred in, all Grand Rapids
crash_df['CITY'].value_counts()

Grand Rapids    74309
Name: CITY, dtype: int64

In [12]:
#count of crash, all Kent
crash_df['COUNTY'].value_counts()

Kent    74309
Name: COUNTY, dtype: int64

In [13]:
#crash severity
crash_df['CRASHSEVER'].value_counts()

Property Damage Only    60761
Injury                  13443
Fatal                     105
Name: CRASHSEVER, dtype: int64

In [14]:
#type of crash
crash_df['CRASHTYPE'].value_counts()

Rear End Straight                                 21746
Side-Swipe Same                                   11214
Angle Straight                                     8703
Fixed Object                                       7276
Misc. Multiple Vehicle                             4141
Angle Turn                                         2907
Backing                                            2179
Parking                                            2166
Angle Driveway                                     2135
Side-Swipe Opposite                                1583
Head-On Left-Turn Not Associated with Driveway     1576
Rear End Driveway                                  1211
Pedestrian                                         1044
Head-on                                            1011
Animal                                              910
Bicycle                                             908
Other Driveway                                      776
Rear End Left Turn                              

In [15]:
#active workzone and type of work occuring
crash_df['WORKZNEACT'].value_counts()

Uncoded & Errors              72707
Other                           750
Work on Shoulder / Median       637
Lane Closure                    177
Lane Shift / Crossover           32
Intermittent / Moving Work        6
Name: WORKZNEACT, dtype: int64

In [16]:
#closeness to the workzone
crash_df['WORKZNECLO'].value_counts()

Uncoded & Errors                                      74048
Name: WORKZNECLO, dtype: int64

In [17]:
#workzone type (contruction/maintenance or utility)
crash_df['WORKZNETYP'].value_counts()

Uncoded & Errors               73197
Construction or Maintenance     1046
Utility                           66
Name: WORKZNETYP, dtype: int64

In [18]:
#day of month 
crash_df['DAYOFMONTH'].value_counts()

14    2666
1     2619
10    2619
22    2595
21    2558
19    2516
23    2494
11    2490
13    2487
15    2482
18    2474
3     2460
9     2441
4     2431
12    2425
8     2415
28    2412
2     2406
7     2405
17    2401
20    2398
6     2381
24    2370
26    2368
16    2367
27    2354
5     2250
29    2202
30    2195
25    2166
31    1462
Name: DAYOFMONTH, dtype: int64

In [19]:
#day of week
crash_df['DAYOFWEEK'].value_counts()

Friday       12546
Wednesday    11929
Thursday     11439
Tuesday      10974
Monday       10442
Saturday      9739
Sunday        7240
Name: DAYOFWEEK, dtype: int64

In [20]:
#animal involvement
crash_df['ANIMAL'].value_counts()

No, Uncoded & Errors    73446
Deer                      854
Other                       7
Turkey                      2
Name: ANIMAL, dtype: int64

In [21]:
#driver 1 condition
crash_df['D1COND'].value_counts()

Appeared Normal        57431
Uncoded & Errors       10106
Unknown                 5552
Other                    429
Fatigue / Asleep         323
Sick                     230
Medication               110
Emotional                104
Physically Disabled       24
Name: D1COND, dtype: int64

In [22]:
#driver 1 alcohol use
crash_df['D1DRINKIN'].value_counts()

No     71627
Yes     2682
Name: D1DRINKIN, dtype: int64

In [23]:
#driver 1 hazardous actions
crash_df['D1HAZACT'].value_counts()

None                    22900
Fail to Stop ACD        14251
Failed to Yield          8597
Speed too Fast           6283
Other                    5286
Unknown                  4147
Disobeyed TCD            2572
Improper Lane Use        2420
Improper Backing         2134
Careless Driving         2095
Improper Turn            1378
Reckless Driving          681
Improper Pass             569
Drove Left of Center      399
Uncoded & Errors          210
Improper Signal           166
Drove Wrong Way           131
Speed too Slow             90
Name: D1HAZACT, dtype: int64

In [24]:
#driver 2 injury
crash_df['D2INJURY'].value_counts()

O-No Injury                    45388
Uncoded & Errors               22866
C-Possible injury               4811
B-Non-incapacitating Injury      932
A-Incapacitating Injury          283
K-Fatal injury                    29
Name: D2INJURY, dtype: int64

In [25]:
#driver 2 intented action 
crash_df['D2INTENT'].value_counts()

Going Straight                             26805
Uncoded & Errors                            9992
Stopped on Road                             8949
Slowing or Stopped on Road                  8112
Parked                                      6142
Turning Left                                3618
Unknown                                     1776
Turning Right                               1703
Changing Lanes                              1649
Backing                                     1321
Starting up on Road                         1042
Entering Road                                527
Crossing at Intersection                     511
Overtaking or Passing                        449
Avoiding Vehicle from the front or back      361
Crossing Midblock                            216
Leaving Parking                              181
Other                                        123
Avoiding Vehicle at an angle                 122
Slowing or Stopped, Other                    115
In Prior Crash      

In [26]:
#driver 2 condition
crash_df['D2COND'].value_counts()

Appeared Normal        47110
Uncoded & Errors       19549
Unknown                 7361
Other                     96
Sick                      72
Fatigue / Asleep          47
Emotional                 46
Medication                23
Physically Disabled        5
Name: D2COND, dtype: int64

In [27]:
#driver 2 alcohol involvement
crash_df['D2DRINKIN'].value_counts()

No                  63979
Uncoded & Errors     9808
Yes                   522
Name: D2DRINKIN, dtype: int64

In [28]:
#driver 2 hazardous actions
crash_df['D2HAZACT'].value_counts()

None                    42033
Uncoded & Errors        10177
Fail to Stop ACD         5786
Unknown                  4377
Failed to Yield          3437
Other                    2033
Improper Lane Use        1347
Disobeyed TCD            1095
Improper Backing         1066
Speed too Fast            821
Careless Driving          593
Improper Turn             561
Improper Pass             373
Drove Left of Center      223
Reckless Driving          172
Improper Signal            98
Drove Wrong Way            59
Speed too Slow             58
Name: D2HAZACT, dtype: int64

In [29]:
#driver 2 injury
crash_df['D2INJURY'].value_counts()

O-No Injury                    45388
Uncoded & Errors               22866
C-Possible injury               4811
B-Non-incapacitating Injury      932
A-Incapacitating Injury          283
K-Fatal injury                    29
Name: D2INJURY, dtype: int64

In [30]:
#driver 2 intended action
crash_df['D2INTENT'].value_counts()

Going Straight                             26805
Uncoded & Errors                            9992
Stopped on Road                             8949
Slowing or Stopped on Road                  8112
Parked                                      6142
Turning Left                                3618
Unknown                                     1776
Turning Right                               1703
Changing Lanes                              1649
Backing                                     1321
Starting up on Road                         1042
Entering Road                                527
Crossing at Intersection                     511
Overtaking or Passing                        449
Avoiding Vehicle from the front or back      361
Crossing Midblock                            216
Leaving Parking                              181
Other                                        123
Avoiding Vehicle at an angle                 122
Slowing or Stopped, Other                    115
In Prior Crash      

In [31]:
#driver 3 condition
crash_df['D3COND'].value_counts()

Uncoded & Errors       55377
Unknown                14259
Appeared Normal         4651
Other                      7
Sick                       5
Fatigue / Asleep           5
Emotional                  3
Medication                 1
Physically Disabled        1
Name: D3COND, dtype: int64

In [32]:
#driver 3 alcohol use
crash_df['D3DRINKIN'].value_counts()

Uncoded & Errors    68128
No                   6152
Yes                    29
Name: D3DRINKIN, dtype: int64

In [33]:
#driver 3 hazardous actions
crash_df['D3HAZACT'].value_counts()

Uncoded & Errors        68159
None                     4736
Fail to Stop ACD          918
Unknown                   152
Speed too Fast             89
Other                      66
Failed to Yield            49
Disobeyed TCD              39
Improper Lane Use          27
Careless Driving           25
Reckless Driving           13
Improper Backing           12
Improper Pass               6
Improper Turn               6
Improper Signal             5
Drove Left of Center        3
Speed too Slow              2
Drove Wrong Way             2
Name: D3HAZACT, dtype: int64

In [34]:
#driver 3 injury
crash_df['D3INJURY'].value_counts()

Uncoded & Errors               69391
O-No Injury                     4392
C-Possible injury                452
B-Non-incapacitating Injury       54
A-Incapacitating Injury           20
Name: D3INJURY, dtype: int64

In [35]:
#driver 3 intended action
crash_df['D3INTENT'].value_counts()

Uncoded & Errors                           68173
Stopped on Road                             2038
Going Straight                              1533
Slowing or Stopped on Road                  1268
Parked                                       815
Starting up on Road                           59
Changing Lanes                                58
Turning Left                                  58
Avoiding Vehicle from the front or back       57
Unknown                                       55
Turning Right                                 29
In Prior Crash                                20
Slowing or Stopped, Other                     19
Crossing at Intersection                      18
Avoiding Vehicle at an angle                  16
Entering Road                                 16
Driverless, Moving                            14
Leaving Parking                                9
Not in Road                                    8
Backing                                        7
Other               

In [36]:
#drinking
crash_df['DRINKING'].value_counts()

No     71048
Yes     3261
Name: DRINKING, dtype: int64

In [37]:
#age of driver 1
#note: 999 listed sometimes, maybe a placeholder if unknown or error
crash_df['DRIVER1AGE'].value_counts()

999    9092
21     2854
22     2700
23     2650
20     2616
       ... 
0         2
1         1
2         1
3         1
97        1
Name: DRIVER1AGE, Length: 100, dtype: int64

In [38]:
#sex of driver 1
crash_df['DRIVER1SEX'].value_counts()

M    36025
F    30769
U     7515
Name: DRIVER1SEX, dtype: int64

In [39]:
#age of driver 2
#note: 999 listed sometimes, maybe a placeholder if unknown or error
crash_df['DRIVER2AGE'].value_counts()

999    23393
22      1742
23      1706
21      1666
24      1644
       ...  
95         2
97         2
96         2
106        1
116        1
Name: DRIVER2AGE, Length: 102, dtype: int64

In [40]:
#sex of driver 2
crash_df['DRIVER2SEX'].value_counts()

M    27740
F    24831
U    21738
Name: DRIVER2SEX, dtype: int64

In [41]:
#age of driver 3
crash_df['DRIVER3AGE'].value_counts()

999    69433
22       166
21       154
19       145
23       142
       ...  
1          1
9          1
10         1
2          1
11         1
Name: DRIVER3AGE, Length: 86, dtype: int64

In [42]:
#sex of driver 3
crash_df['DRIVER3SEX'].value_counts()

U    69322
M     2630
F     2357
Name: DRIVER3SEX, dtype: int64

Looks like most data was collected for driver 1, and other drivers either weren't involved or not as vigilantly collected

In [43]:
#Unclear - either emergency vehicles called or involved?
crash_df['EMRGVEH'].value_counts()

No     73786
Yes      523
Name: EMRGVEH, dtype: int64

In [44]:
#farm equipment involvement
crash_df['FARMEQUIP'].value_counts()

No     74300
Yes        9
Name: FARMEQUIP, dtype: int64

In [45]:
#did driver feel situation?
crash_df['FLEEINGSIT'].value_counts()

No     74128
Yes      181
Name: FLEEINGSIT, dtype: int64

In [46]:
#unclear, some kind of ID number? appear to be reused
crash_df['FWSEGID'].value_counts()

925087    429
924623    403
924778    386
906449    353
924475    332
         ... 
920187      1
937501      1
875293      1
914255      1
909814      1
Name: FWSEGID, Length: 6181, dtype: int64

In [47]:
#injury severity assessment
crash_df['GRTINJSEVE'].value_counts()

No Injury                    60761
Possible Injury              10004
Non-Incapacitating Injury     2618
Incapacitating Injury          821
Killed                         105
Name: GRTINJSEVE, dtype: int64

In [48]:
#was it a hit and run?
crash_df['HITANDRUN'].value_counts()

No     58411
Yes    15898
Name: HITANDRUN, dtype: int64

In [49]:
#time of accident
crash_df['HOUR'].value_counts()

17    6607
15    6046
16    6041
14    4871
12    4418
13    4289
18    4221
8     4006
7     3623
11    3597
10    3119
9     2987
19    2926
20    2423
21    2206
22    2077
0     1801
23    1799
2     1691
6     1507
1     1460
3      870
5      845
4      604
99     275
Name: HOUR, dtype: int64

In [50]:
#stree name where accident occurred
crash_df['INTERNAME'].value_counts()

LEONARD           2483
BURTON            2239
EASTERN           1750
WEALTHY           1659
FULLER            1569
                  ... 
INVERNNESS           1
CAMELET              1
FOUTAIN              1
OTTAWA ON RAMP       1
SHANGRAI-LA          1
Name: INTERNAME, Length: 1980, dtype: int64

In [51]:
#lighting at time of accident
crash_df['LIGHTING'].value_counts()

Daylight            49684
Dark, Lighted       17371
Dark, Unlighted      2620
Dusk                 1853
Dawn                 1773
Unknown               929
Uncoded & Errors       64
Other                  15
Name: LIGHTING, dtype: int64

In [52]:
#MDOT registration? All look to be Grand for Grand Rapids
crash_df['MDOTREG'].value_counts()

Grand    74309
Name: MDOTREG, dtype: int64

In [53]:
#milepoint, unclear what this is referencing, related to location?
crash_df['MILEPOINT'].value_counts()

0.019     556
0.038     386
0.009     307
0.000     281
0.002     250
         ... 
8.951       1
9.309       1
12.370      1
12.640      1
9.558       1
Name: MILEPOINT, Length: 11046, dtype: int64

In [54]:
#month accident occurred in 
crash_df['MONTH'].value_counts()

December     7502
January      7415
October      7086
February     7012
September    6252
November     6090
May          5710
August       5673
March        5651
June         5483
July         5334
April        5101
Name: MONTH, dtype: int64

In [55]:
#was a motorcycle involved?
crash_df['MOTORCYCLE'].value_counts()

No     73596
Yes      713
Name: MOTORCYCLE, dtype: int64

In [56]:
#number(?) A type injury (incapacitating), see injury categories above 
crash_df['NOATYPEINJ'].value_counts()

0    73474
1      759
2       63
3        8
4        4
6        1
Name: NOATYPEINJ, dtype: int64

In [57]:
#number(?) B type injury (non-incapacitating injury), see injury categories above 
crash_df['NOBTYPEINJ'].value_counts()

0    71562
1     2467
2      226
3       40
4       10
5        3
7        1
Name: NOBTYPEINJ, dtype: int64

In [58]:
#number(?) C type injury (possible injury), see injury categories above 
crash_df['NOCTYPEINJ'].value_counts()

0    63607
1     8791
2     1504
3      328
4       60
5       15
6        2
8        1
7        1
Name: NOCTYPEINJ, dtype: int64

In [59]:
#unclear, all are no
crash_df['NONTRAFFIC'].value_counts()

No    74309
Name: NONTRAFFIC, dtype: int64

In [60]:
#number of people injured
crash_df['NUMOFINJ'].value_counts()

0     60842
1     10712
2      2084
3       491
4       128
5        35
6         9
7         5
10        2
8         1
Name: NUMOFINJ, dtype: int64

In [61]:
#number of people killed
crash_df['NUMOFKILL'].value_counts()

0    74204
1      101
2        4
Name: NUMOFKILL, dtype: int64

In [62]:
#number of occupants?
crash_df['NUMOFOCCUP'].value_counts()

2     29634
1     15661
3     13294
4      5786
0      4776
5      2718
6      1187
7       512
8       257
9       129
10       70
11       46
12       38
13       17
16       17
17       16
15       15
14       13
18       11
21       10
26        9
19        8
23        7
41        6
27        6
22        6
20        6
42        5
24        4
36        4
32        3
31        3
98        3
28        2
40        2
49        2
56        2
38        2
25        2
99        1
33        1
54        1
94        1
37        1
29        1
39        1
34        1
53        1
30        1
71        1
66        1
47        1
43        1
90        1
51        1
45        1
82        1
60        1
91        1
Name: NUMOFOCCUP, dtype: int64

In [63]:
#number of people injured
crash_df['NUMOFINJ'].value_counts()

0     60842
1     10712
2      2084
3       491
4       128
5        35
6         9
7         5
10        2
8         1
Name: NUMOFINJ, dtype: int64

In [64]:
#number of vehicles involved
crash_df['NUMOFVEHIC'].value_counts()

2     56457
1     11750
3      5162
4       737
5       153
6        33
7        12
8         4
12        1
Name: NUMOFVEHIC, dtype: int64

In [65]:
#off road vehicle involved?
crash_df['ORV'].value_counts()

No     74306
Yes        3
Name: ORV, dtype: int64

In [66]:
#pedestrian involvement
crash_df['PEDESTRIAN'].value_counts()

No     73239
Yes     1070
Name: PEDESTRIAN, dtype: int64

In [67]:
#street name, unclear how this is different from street above
crash_df['PRNAME'].value_counts()

BELTLINE        3905
28TH            3655
LEONARD         2862
BURTON          2406
FULLER          2308
                ... 
S/B U S 131        1
S BUS US 131       1
LOEONARD           1
RAMP # 075B        1
DICKINSON ST       1
Name: PRNAME, Length: 1708, dtype: int64

In [68]:
#unclear, maybe identifying number that goes with street name?
crash_df['PRNO'].value_counts()

409008     3690
410203     3484
407204     2899
410907     2614
3030181    2516
           ... 
3410210       1
3412430       1
428002        1
424415        1
429402        1
Name: PRNO, Length: 1580, dtype: int64

In [69]:
#related to cardinal directions? unclear what some combos would be if so
crash_df['REFDIR'].value_counts()

S     15335
N     15331
E     14734
W     14467
NE     2748
NW     2718
SW     2532
SE     2429
ER     1690
X      1380
BR      926
WE       10
NS        6
U         3
Name: REFDIR, dtype: int64

In [70]:
#unclear
crash_df['REFDIST'].value_counts()

100     6102
10      5148
50      4558
5       4152
20      3455
        ... 
571        1
876        1
722        1
614        1
1356       1
Name: REFDIST, Length: 890, dtype: int64

In [71]:
#type of road
crash_df['ROUTECLASS'].value_counts()

County Road or City Street or Not Known    43948
M Route                                    11055
Interstate Route                            9870
U.S. Route                                  6442
U.S. Business Route                         1860
Interstate Business Loop or Spur             635
Connector                                    498
Not Located                                    1
Name: ROUTECLASS, dtype: int64

In [72]:
#route number (if applicable)
crash_df['ROUTENUM'].value_counts()

0      34468
999     9481
131     8364
196     6612
11      4563
37      2621
44      2299
96      2261
45      1912
296     1632
21        96
Name: ROUTENUM, dtype: int64

In [73]:
#was a school bus involved?
crash_df['SCHOOLBUS'].value_counts()

No     74043
Yes      266
Name: SCHOOLBUS, dtype: int64

In [74]:
#was a snowmobile involved?
crash_df['SNOWMOBILE'].value_counts()

No     74308
Yes        1
Name: SNOWMOBILE, dtype: int64

In [75]:
#was speed limited posted?
crash_df['SPDLMTPOST'].value_counts()

Yes                 55719
No                  18236
Uncoded & Errors      354
Name: SPDLMTPOST, dtype: int64

In [76]:
#speed limit at crash site
crash_df['SPEEDLIMIT'].value_counts()

25    29900
70     9627
35     9101
30     8749
45     6913
55     4556
40     2006
65     1634
99      786
50      402
60      284
15      130
10      117
5        34
20       29
0        11
53        5
75        5
23        5
76        3
4         2
43        1
46        1
26        1
57        1
56        1
38        1
95        1
61        1
72        1
77        1
Name: SPEEDLIMIT, dtype: int64

In [77]:
#surface conditions at time of accident
crash_df['SURFCOND'].value_counts()

Dry                  46538
Wet                  14511
Snowy                 6162
Icy                   5187
Slush                  981
Other                  645
Unknown                131
Uncoded & Errors        87
Debris                  30
Mud, Dirt, Gravel       22
Water                   12
Oily                     3
Name: SURFCOND, dtype: int64

In [78]:
#traffic signal at accident location
crash_df['TRAFCTLDEV'].value_counts()

None                              42368
Signal                            21348
Stop Sign                          8624
Yield Sign                         1015
Uncoded & Errors                    923
Stop Sign with Flashing Beacon       31
Name: TRAFCTLDEV, dtype: int64

In [79]:
#train involvement
crash_df['TRAIN'].value_counts()

No     74306
Yes        3
Name: TRAIN, dtype: int64

In [80]:
#truck or bus involvement
crash_df['TRUCKBUS'].value_counts()

No     71059
Yes     3250
Name: TRUCKBUS, dtype: int64

In [81]:
#trunkline refers to whether transportation supports long distance through traffic
crash_df['TRUNKLINE'].value_counts()

Non-Trunkline    43948
Trunkline        30361
Name: TRUNKLINE, dtype: int64

In [82]:
#id for UD10 traffic crash reporting
crash_df['UD10NUM'].value_counts()

7059760    1
9340130    1
9358313    1
9391335    1
9383936    1
          ..
8152572    1
8240432    1
8242689    1
8037431    1
1001927    1
Name: UD10NUM, Length: 74309, dtype: int64

In [83]:
#defect of any kind for vehicle 1
crash_df['V1DEFECT'].value_counts()

Uncoded & Errors                                  73785
Brakes                                              194
Tires / Rims                                        149
Other                                               112
Steering                                             38
Lights                                               24
Truck Coupling / Trailer Hitch / Safety Chains        4
Windows / Windshield                                  3
Name: V1DEFECT, dtype: int64

In [84]:
#damage to vehicle 1
crash_df['V1DAMAGE'].value_counts()

Minor Damage         37159
Disabling Damage     20437
Functional Damage    11169
Uncoded & Errors      3739
No Damage             1020
Unknown                785
Name: V1DAMAGE, dtype: int64

In [85]:
#harm caused by vehicle 1, first count?
crash_df['V1HARMEVT1'].value_counts()

Hit Motor Vehicle in Transport                             54075
Loss of Control                                             7760
Hit Parked Vehicle                                          4963
Ran off road, Right                                         1208
Hit Animal                                                   933
Hit Pedestrian                                               708
Ran off road, Left                                           678
Hit Bicycle                                                  534
Crossed Center Line                                          532
Hit Other Non-Fixed Object                                   467
Other Non-Collision                                          330
Hit Utility Pole / Light Support                             294
Hit Curb                                                     252
Hit Other Fixed Object                                       193
Hit Concrete Barrier                                         152
Hit Tree                 

In [86]:
#second type of harm caused by vehicle 1?
crash_df['V1HARMEVT2'].value_counts()

Uncoded & Errors                                           61698
Hit Motor Vehicle in Transport                              3027
Ran off road, Right                                         1550
Ran off road, Left                                          1438
Hit Concrete Barrier                                        1006
Hit Parked Vehicle                                           855
Hit Guardrail Face                                           741
Hit Utility Pole / Light Support                             510
Hit Curb                                                     510
Loss of Control                                              355
Hit Tree                                                     351
Crossed Center Line                                          254
Hit Traffic Sign Post                                        239
Hit Other Fixed Object                                       233
Overturned                                                   206
Separation of Units      

In [87]:
#3rd tyoe of harm caused by vehicle 1?
crash_df['V1HARMEVT3'].value_counts()

Uncoded & Errors                    72235
Hit Concrete Barrier                  331
Hit Guardrail Face                    205
Hit Motor Vehicle in Transport        187
Ran off road, Right                   147
Ran off road, Left                    142
Hit Traffic Sign Post                 101
Hit Curb                              101
Hit Tree                               98
Overturned                             87
Hit Parked Vehicle                     80
Hit Utility Pole / Light Support       68
Re-entered Road                        58
Hit Fence                              55
Loss of Control                        41
Hit Cable Barrier                      39
Hit Ditch                              37
Hit Other Fixed Object                 34
Separation of Units                    32
Hit Other Post, Pole, Support          29
Hit Embankment                         26
Hit Bridge Rail                        23
Hit Fire Hydrant                       21
Hit Building                      

In [88]:
#4th harm type caused by vehicle 1?
crash_df['V1HARMEVT4'].value_counts()

Uncoded & Errors                    73283
Hit Tree                              117
Overturned                            116
Hit Guardrail Face                    111
Hit Motor Vehicle in Transport        110
Hit Concrete Barrier                  103
Hit Utility Pole / Light Support       52
Hit Parked Vehicle                     46
Re-entered Road                        41
Hit Ditch                              37
Hit Other Fixed Object                 37
Hit Fence                              32
Hit Traffic Sign Post                  27
Hit Building                           23
Ran off road, Right                    21
Hit Curb                               21
Hit Other Post, Pole, Support          17
Hit Bridge Rail                        16
Hit Bridge, Pier, or Abutment          16
Hit Embankment                         15
Ran off road, Left                     15
Hit Fire Hydrant                       14
Individual Fell from vehicle            9
Hit Guardrail End                 

In [89]:
#harm by vehicle 1? unclear how different from above
crash_df['V1MSTHARME'].value_counts()

Hit Motor Vehicle in Transport                             56001
Hit Parked Vehicle                                          5581
Uncoded & Errors                                            2739
Hit Concrete Barrier                                        1279
Hit Guardrail Face                                           920
Hit Animal                                                   903
Hit Utility Pole / Light Support                             789
Hit Pedestrian                                               720
Hit Tree                                                     554
Hit Bicycle                                                  551
Hit Other Non-Fixed Object                                   482
Hit Curb                                                     412
Hit Other Fixed Object                                       391
Loss of Control                                              359
Overturned                                                   311
Hit Traffic Sign Post    

In [90]:
#vehicle 1 special category? unclear
crash_df['V1SPECCAT'].value_counts()

Uncoded & Errors       73524
Bus                      404
Police                   171
Construction             119
Ambulance                 43
Fire                      31
Tow Truck / Wrecker       12
Farm                       5
Name: V1SPECCAT, dtype: int64

In [91]:
#did vehicle one have a trailer and what type
crash_df['V1TRAILER'].value_counts()

Uncoded & Errors       73970
Utility                  199
Other                     58
Travel Trailer            38
Boat Trailer              21
Towed Auto                11
Farm Equipment             8
Recreational Double        4
Name: V1TRAILER, dtype: int64

In [92]:
#was vehicle 1 at fault? unclear
crash_df['V1VIOLATOR'].value_counts()

Yes    47365
No     26944
Name: V1VIOLATOR, dtype: int64

In [93]:
#impact location on vehicle 1
crash_df['V1WIMPCTPT'].value_counts()

Front Center          21550
Front Right Corner    10338
Front Left Corner      9944
Rear Center            7003
Driver Side            5579
Passenger Side         4689
Rear Left Corner       3683
Right Rear Corner      3342
None                   2321
Multiple Areas         2048
Unknown                1703
Roof                   1306
Uncoded & Errors        449
Undercarriage           354
Name: V1WIMPCTPT, dtype: int64

Skipping V2 and V3 counts for now, may come back to it later.

In [94]:
#direction vehicle 1 is facing
crash_df['VEH1DIR'].value_counts()

N     18232
W     17993
E     17486
S     17310
U      2189
NW      296
SE      276
SW      273
NE      254
Name: VEH1DIR, dtype: int64

In [95]:
#vehical 1 type
crash_df['VEH1TYPE'].value_counts()

Passenger Car, SUV, Van                  59474
Pickup Truck                              5905
Motorhome                                 3546
Uncoded & Errors                          2358
Truck / Bus (Commercial)                  1578
Truck Under 10,000 lbs                     717
Motorcycle                                 476
Other Non-Commercial                       138
Moped                                      114
Go-cart / Golf Cart                          2
Off-Road Vehicle, All-Terrain Vehicle        1
Name: VEH1TYPE, dtype: int64

In [96]:
#vehical 1 use
crash_df['VEH1USE'].value_counts()

Private                             66126
Uncoded & Errors                     3688
Commercial                           3264
Other Government, Non-Emergency       481
School or Education                   217
Other                                 178
In Pursuit or Emergency (in use)      105
Club or Church                         96
Road Construction or Maintenance       62
Utility                                52
Farm                                   25
Military Vehicle                       15
Name: VEH1USE, dtype: int64

Skipping V2 and V3 use, type, direction for now, may come back later.

In [97]:
#weather at time of accident
crash_df['WEATHER'].value_counts()

Clear               36888
Cloudy              19914
Snow                 8050
Rain                 8048
Unknown               894
Sleet or Hail         185
Fog                   162
Severe Crosswind       81
Blowing Snow           54
Uncoded & Errors       31
Smoke                   2
Name: WEATHER, dtype: int64

In [98]:
#crash location on road?
crash_df['WHEREONRD'].value_counts()

On the Road                    65904
On the Shoulder                 2951
Outside of Shoulder or Curb     2932
Unknown                          918
In the Median                    735
On-Street Parking                457
In the Gore                      229
Uncoded & Errors                 133
Sidewalk                          41
Bicycle Lane                       9
Name: WHEREONRD, dtype: int64

In [99]:
#year of accident
crash_df['YEAR'].value_counts()

2017    8898
2016    8476
2015    7952
2014    7760
2013    7423
2008    7113
2012    6944
2011    6928
2009    6437
2010    6378
Name: YEAR, dtype: int64

In [100]:
#road city township, most Grand Rapids
crash_df['RDCITYTWP'].value_counts()

Grand Rapids    72658
Other            1505
                  146
Name: RDCITYTWP, dtype: int64

In [101]:
#which ward road falls in
crash_df['ROAD_USER1'].value_counts()

Undefined               21814
Ward 1                  18307
Ward 2                  16858
Ward 3                  15674
Shared Wards 1 and 3      657
Shared Wards 1 and 2      506
Shared Wards 2 and 3      347
                          146
Name: ROAD_USER1, dtype: int64

In [102]:
#some kind of number assigned to road?
crash_df['ROAD_USER2'].value_counts()

Undefined    22438
38            3331
64            2831
49            1670
09            1555
             ...  
61             155
               146
08              84
07              77
11              35
Name: ROAD_USER2, Length: 79, dtype: int64

In [103]:
#unclear, doesn't seem very meaningful
crash_df['ROAD_USER3'].value_counts()

Undefined                         70452
From Information, Not Concrete     3711
                                    146
Name: ROAD_USER3, dtype: int64

In [104]:
#unclear, doesn't seem very meaningful
crash_df['ROAD_USER3'].value_counts()

Undefined                         70452
From Information, Not Concrete     3711
                                    146
Name: ROAD_USER3, dtype: int64

In [105]:
#road legal system? or main purpose?
crash_df['RDLEGALSYS'].value_counts()

City Major                36911
State Trunkline           26727
City Minor                 9793
County Primary              314
Undefined                   301
                            146
Uncertified w/FUNCLASS       76
County Local                 41
Name: RDLEGALSYS, dtype: int64

In [106]:
#legal code assigned to road?
crash_df['RDLGLCODE'].value_counts()

4    36911
1    26727
5     9793
0      447
2      314
7       76
3       41
Name: RDLGLCODE, dtype: int64

In [107]:
#way of classifying road type
crash_df['RDNFC'].value_counts()

Other Principal Arterial       27363
Minor Arterial                 14476
Local                          12623
Interstate                      9498
Other Freeway                   5228
Major Collector                 4674
Not a certified public road      301
                                 146
Name: RDNFC, dtype: int64

In [108]:
#code representing road type?
crash_df['RDNFCCODE'].value_counts()

3    27363
4    14476
7    12623
1     9498
2     5228
5     4674
0      447
Name: RDNFCCODE, dtype: int64

In [109]:
#number of lanes on the road
crash_df['RDNUMLANES'].value_counts()

2    57889
5     6294
3     5785
1     2340
4     1685
6      170
0      146
Name: RDNUMLANES, dtype: int64

In [110]:
#materials road made from
crash_df['RDSUBTYPDS'].value_counts()

Asphalt-Standard     42998
Concrete-Standard    18582
Composite            11246
Brick                 1010
Undefined              308
                       146
Unimproved Earth        14
Gravel-Standard          4
Sealcoat-Standard        1
Name: RDSUBTYPDS, dtype: int64

In [111]:
#number representing road subtype?
crash_df['RDSUBTYPE'].value_counts()

35    42998
37    18582
36    11246
38     1010
30      308
0       146
31       14
33        4
34        1
Name: RDSUBTYPE, dtype: int64

In [112]:
#materials road made from, another way to define above
crash_df['RDSURFTYPE'].value_counts()

Asphalt      42998
Concrete     18582
Composite    11246
Brick         1010
Undefined      308
               146
Earth           14
Gravel           4
Seal Coat        1
Name: RDSURFTYPE, dtype: int64

In [113]:
#0 to 74308 of accident (ID unique to data set or just count it falls in?)
crash_df['RDUSRINVID'].value_counts()

     74309
Name: RDUSRINVID, dtype: int64

In [114]:
#road width
crash_df['RDWIDTH'].value_counts()

0.0     25318
42.0     5205
40.0     3861
44.0     3427
26.0     2798
30.0     2603
27.0     2246
36.0     2047
39.0     1805
50.0     1567
53.0     1552
38.0     1398
34.0     1379
41.0     1246
52.0     1187
29.0     1175
25.0     1162
22.0     1099
35.0     1019
32.0      984
47.0      972
37.0      954
28.0      800
46.0      769
48.0      655
55.0      650
65.0      623
33.0      519
24.0      504
60.0      463
54.0      450
44.1      448
43.0      419
51.0      347
56.0      342
23.0      336
31.0      255
45.0      252
58.0      243
42.1      223
21.0      206
57.0      160
20.0      139
61.0      130
80.0       82
26.1       62
40.1       56
19.0       54
16.0       26
18.0       26
64.0       19
15.0       14
17.0       12
12.0        9
14.0        5
32.1        3
9.0         2
13.0        1
10.0        1
Name: RDWIDTH, dtype: int64

In [115]:
#unclear, all 17
crash_df['FRAMEWORK'].value_counts()

17    74309
Name: FRAMEWORK, dtype: int64

### 4.4 Checking for Unusual Values <a class="anchor" id="subheader43"></a>

In [117]:
crash_df.describe()

Unnamed: 0,X,Y,OBJECTID,ROADSOFTID,CTRLMILEPT,CTRLSECT,DAYOFMONTH,DRIVER1AGE,DRIVER2AGE,DRIVER3AGE,FWSEGID,HOUR,MILEPOINT,NOATYPEINJ,NOBTYPEINJ,NOCTYPEINJ,NUMOFINJ,NUMOFKILL,NUMOFOCCUP,NUMOFUNINJ,NUMOFVEHIC,PRNO,REFDIST,ROUTENUM,SPEEDLIMIT,UD10NUM,YEAR,RDLGLCODE,RDNFCCODE,RDNUMLANES,RDSUBTYPE,RDWIDTH,FRAMEWORK
count,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0,74309.0
mean,-85.653561,42.956494,37155.0,2670271.0,2.404829,16780.303288,15.659436,154.512253,340.889152,936.005141,918227.2,13.396358,4.917666,0.012529,0.041731,0.176627,0.230887,0.001467,2.320984,2.091348,1.940398,1038803.0,128.13246,173.594921,38.634566,7785196.0,2012.777106,3.02277,3.656058,2.351034,35.601879,25.549317,17.0
std,0.034017,0.029451,21451.304914,64716.96,4.337907,20189.023052,8.764828,315.686635,446.285033,237.750798,45898.92,7.641107,5.284108,0.125552,0.228671,0.477755,0.559503,0.039653,2.114018,2.103085,0.544675,1194263.0,222.491412,323.917909,17.106006,2307781.0,2.905732,1.592737,1.835688,0.939998,1.850789,20.026456,0.0
min,-85.751682,42.883679,1.0,2558094.0,0.0,0.0,1.0,0.0,0.0,0.0,872828.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,405205.0,0.0,0.0,0.0,1000000.0,2008.0,0.0,0.0,0.0,0.0,0.0,17.0
25%,-85.677046,42.93232,18578.0,2613603.0,0.0,0.0,8.0,24.0,29.0,999.0,902812.0,9.0,0.48,0.0,0.0,0.0,0.0,0.0,1.0,1.0,2.0,407204.0,20.0,0.0,25.0,7413624.0,2010.0,1.0,3.0,2.0,35.0,0.0,17.0
50%,-85.660144,42.962892,37155.0,2671346.0,0.0,0.0,16.0,36.0,49.0,999.0,912308.0,14.0,2.169,0.0,0.0,0.0,0.0,0.0,2.0,2.0,2.0,410501.0,55.0,11.0,30.0,8372076.0,2013.0,4.0,3.0,2.0,35.0,30.0,17.0
75%,-85.635768,42.974368,55732.0,2726270.0,2.895,41051.0,23.0,56.0,999.0,999.0,922604.0,17.0,10.155,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,434905.0,150.0,131.0,45.0,9139631.0,2015.0,4.0,4.0,2.0,37.0,42.0,17.0
max,-85.568651,43.028975,74309.0,2781429.0,16.691,41843.0,31.0,999.0,999.0,999.0,1247932.0,99.0,16.708,6.0,7.0,8.0,10.0,2.0,99.0,99.0,12.0,5503513.0,10560.0,999.0,99.0,9999999.0,2017.0,7.0,7.0,6.0,38.0,80.0,17.0


**Commentary on Unusual Values**

There are several columns that include 99 or 999, which seems to be a placeholder when something is unknown, not a valid value. 

- DRIVER1AGE, DRIVER2AGE, DRIVER3AGE: Some list 999, likely a placeholder code if unknown or doesn't apply (more frequent for DRIVER3AGE)
- HOUR: 99 listed, likely placeholder if not known
- NUMOFOCCUP: 99, possible large event (buses) but likely placeholder for unknown
- NUMOFINJ: 99, possible large event (buses) but likely placeholder for unknown
- ROUTENUM: 999, much higher than rest of values, perhaps placeholder for unknown
- SPEEDLIMIT: 99, placeholder for unknown (no places in GR where 99 is a speedlimit)

Many columns need converted to integer to be more meaningful. Will need to explore unusual values as part of that process too.

### 4.4 Checking for Null Values <a class="anchor" id="subheader44"></a>

In [134]:
#checking if the dataset contains any null values
crash_df.isnull().sum()

X             0
Y             0
OBJECTID      0
ROADSOFTID    0
BIKE          0
             ..
RDSUBTYPE     0
RDSURFTYPE    0
RDUSRINVID    0
RDWIDTH       0
FRAMEWORK     0
Length: 142, dtype: int64

In [135]:
#can't see all summed values above, so checking sum of sum to see if there are any
#there are no null values
crash_df.isnull().sum().sum()

0

## 5. Cleaning the Data <a class="anchor" id="header5"></a>

### 5.1 Dropping Unneeded Columns <a class="anchor" id="subheader51"></a>

There are a few columns where the data is all the same and not adding information, so dropping those here.

In [136]:
#dropping city, county and framework because the values are the same for all rows
crash_df.drop(['CITY', 'COUNTY', 'FRAMEWORK'], axis=1, inplace = True)

In [175]:
#after first drop, realized MDOTREG also all have same value so dropping it
crash_df.drop(['MDOTREG'], axis = 1, inplace = True)

In [183]:
#dropping NONTRAFFIC, all are 'no'
crash_df.drop(['NONTRAFFIC'], axis = 1, inplace = True)

In [184]:
#checking shape, expect to see 138 down from 142 after dropping the 3 colums
crash_df.shape

(74309, 137)

In [185]:
#can also see the columns are dropped when previewing first 5 rows
crash_df.head()

Unnamed: 0,X,Y,OBJECTID,ROADSOFTID,BIKE,CRASHDATE,CRASHSEVER,CRASHTYPE,WORKZNEACT,WORKZNECLO,WORKZNETYP,CTRLMILEPT,CTRLSECT,DAYOFMONTH,DAYOFWEEK,ANIMAL,D1COND,D1DRINKIN,D1HAZACT,D1INJURY,D1INTENT,D2COND,D2DRINKIN,D2HAZACT,D2INJURY,D2INTENT,D3COND,D3DRINKIN,D3HAZACT,D3INJURY,D3INTENT,DRINKING,DRIVER1AGE,DRIVER1SEX,DRIVER2AGE,DRIVER2SEX,DRIVER3AGE,DRIVER3SEX,EMRGVEH,FARMEQUIP,FLEEINGSIT,FWSEGID,GRTINJSEVE,HITANDRUN,HOUR,INTERNAME,LIGHTING,MILEPOINT,MONTH,MOTORCYCLE,NOATYPEINJ,NOBTYPEINJ,NOCTYPEINJ,NUMOFINJ,NUMOFKILL,NUMOFOCCUP,NUMOFUNINJ,NUMOFVEHIC,ORV,PEDESTRIAN,PRNAME,PRNO,PUBLICPROP,REFDIR,REFDIST,ROUTECLASS,ROUTENUM,SCHOOLBUS,SNOWMOBILE,SPDLMTPOST,SPEEDLIMIT,SURFCOND,TRAFCTLDEV,TRAIN,TRUCKBUS,TRUNKLINE,UD10NUM,V1DEFECT,V1DAMAGE,V1HARMEVT1,V1HARMEVT2,V1HARMEVT3,V1HARMEVT4,V1MSTHARME,V1SPECCAT,V1TRAILER,V1VIOLATOR,V1WIMPCTPT,V2DEFECT,V2DAMAGE,V2HARMEVT1,V2HARMEVT2,V2HARMEVT3,V2HARMEVT4,V2MSTHARME,V2SPECCAT,V2TRAILER,V2VIOLATOR,V2WIMPCTPT,V3DEFECT,V3DAMAGE,V3HARMEVT1,V3HARMEVT2,V3HARMEVT3,V3HARMEVT4,V3MSTHARME,V3SPECCAT,V3TRAILER,V3VIOLATOR,V3WIMPCTPT,VEH1DIR,VEH1TYPE,VEH1USE,VEH2DIR,VEH2TYPE,VEH2USE,VEH3DIR,VEH3TYPE,VEH3USE,WEATHER,WHEREONRD,YEAR,RDCITYTWP,ROAD_USER1,ROAD_USER2,ROAD_USER3,ROAD_USER4,RDLEGALSYS,RDLGLCODE,RDNFC,RDNFCCODE,RDNUMLANES,RDSUBTYPDS,RDSUBTYPE,RDSURFTYPE,RDUSRINVID,RDWIDTH
0,-85.650003,42.919854,1,2589528,0,2008/06/16,Property Damage Only,Backing,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,16,Monday,"No, Uncoded & Errors",Appeared Normal,0,Improper Backing,O-No Injury,Backing,Appeared Normal,0,,O-No Injury,Going Straight,Uncoded & Errors,0,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0,31,M,43,F,999,U,0,0,0,922088,No Injury,0,8,ALGER,Daylight,0.101,June,0,0,0,0,0,0,2,2,2,No,No,LINDEN,3410438,Uncoded & Errors,S,90,County Road or City Street or Not Known,0,No,No,No,25,Dry,Stop Sign,No,No,Non-Trunkline,7059760,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Right Rear Corner,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Left Corner,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,"Passenger Car, SUV, Van",Private,N,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2008,Grand Rapids,Ward 1,3,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,26.0
1,-85.625665,42.92471,2,2593183,0,2008/08/30,Property Damage Only,Fixed Object,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,30,Saturday,"No, Uncoded & Errors",Unknown,0,Reckless Driving,Uncoded & Errors,Going Straight,Uncoded & Errors,0,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0,999,M,999,U,999,U,0,0,0,921834,No Injury,1,15,ROSEWOOD,Daylight,0.028,August,0,0,0,0,0,0,4,4,1,No,No,LOUISE,413903,Yes,E,150,County Road or City Street or Not Known,0,No,No,No,25,Dry,,No,No,Non-Trunkline,7110374,Uncoded & Errors,Disabling Damage,Hit Utility Pole / Light Support,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Utility Pole / Light Support,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,"Passenger Car, SUV, Van",Other,U,Uncoded & Errors,Uncoded & Errors,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Shoulder,2008,Grand Rapids,Ward 3,55,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,26.0
2,-85.655282,43.000972,3,2582102,0,2008/02/13,Property Damage Only,Other Driveway,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,13,Wednesday,"No, Uncoded & Errors",Uncoded & Errors,0,Unknown,Uncoded & Errors,Going Straight,Uncoded & Errors,0,,Uncoded & Errors,Parked,Uncoded & Errors,0,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0,999,U,999,U,999,U,0,0,0,899196,No Injury,1,8,PLAINFIELD,Daylight,0.225,February,0,0,0,0,0,0,0,0,2,No,No,JULIA,423704,Uncoded & Errors,W,50,County Road or City Street or Not Known,0,No,No,Yes,25,Snowy,,No,No,Non-Trunkline,6948189,Uncoded & Errors,Uncoded & Errors,Hit Traffic Sign Post,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Traffic Sign Post,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,Uncoded & Errors,Minor Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,W,Uncoded & Errors,Uncoded & Errors,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2008,Grand Rapids,Ward 2,42,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,29.0
3,-85.643314,42.928172,4,2579820,0,2008/01/25,Property Damage Only,Angle Straight,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,25,Friday,"No, Uncoded & Errors",Appeared Normal,0,Failed to Yield,O-No Injury,Going Straight,Appeared Normal,0,,O-No Injury,Going Straight,Uncoded & Errors,0,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0,23,M,18,F,999,U,0,0,0,926958,No Injury,0,17,ARDMORE,Dusk,0.062,January,0,0,0,0,0,0,2,2,2,No,No,BLAIM,3410312,Uncoded & Errors,E,10,County Road or City Street or Not Known,0,No,No,No,25,Slush,Yield Sign,No,No,Non-Trunkline,6918547,Uncoded & Errors,Functional Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Yes,Front Center,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Driver Side,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,S,"Passenger Car, SUV, Van",Private,E,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Cloudy,On the Road,2008,Grand Rapids,Ward 3,62,Undefined,Undefined,City Minor,5,Local,7,2,Asphalt-Standard,35,Asphalt,,27.0
4,-85.665571,42.968854,5,2594624,0,2008/09/26,Property Damage Only,Backing,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0.0,0,26,Friday,"No, Uncoded & Errors",Appeared Normal,0,Unknown,O-No Injury,Backing,Appeared Normal,0,Unknown,O-No Injury,Going Straight,Uncoded & Errors,0,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,0,37,M,18,F,999,U,0,0,0,903417,No Injury,0,17,RANSOM,Daylight,0.046,September,0,0,0,0,0,0,2,2,2,No,No,CRESENT,3416260,Uncoded & Errors,NW,150,County Road or City Street or Not Known,0,No,No,Yes,25,Dry,,No,Yes,Non-Trunkline,7132758,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,,Uncoded & Errors,Disabling Damage,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Hit Motor Vehicle in Transport,Uncoded & Errors,Uncoded & Errors,No,Front Center,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,Uncoded & Errors,No,Uncoded & Errors,E,Truck / Bus (Commercial),Commercial,W,"Passenger Car, SUV, Van",Private,U,Uncoded & Errors,Uncoded & Errors,Clear,On the Road,2008,Grand Rapids,Ward 2,38,Undefined,Undefined,City Major,4,Local,7,2,Asphalt-Standard,35,Asphalt,,30.0


### 5.2 Converting Datatypes <a class="anchor" id="subheader52"></a>

Starting datatype updates with binary column options, where no becomes 0 and yes becomes 1.

In [141]:
#replace bike values of no to 0 and yes to 1
crash_df['BIKE'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [143]:
#checking updated numbers against values above, expect 73383 zeros and 926 ones, which it gives me
crash_df['BIKE'].value_counts()

0    73383
1      926
Name: BIKE, dtype: int64

In [148]:
#replace driver 1 drinking values of no to 0 and yes to 1
crash_df['D1DRINKIN'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [153]:
crash_df['D1DRINKIN'].value_counts()

0    71627
1     2682
Name: D1DRINKIN, dtype: int64

In [154]:
#replace driver 2 drinking values of no to 0, uncoded and errors to 0 yes to 1
#there were a number of 'uncoded and errors', choosing to code these as 0 and only mark acohol as 1 where known
#uncoded and errors is likely because driver 2 wasn't documented (because doesn't occur with driver 1)

crash_df['D2DRINKIN'].replace({'No' : 0, 'Uncoded & Errors': 0, 'Yes' : 1}, inplace = True)

In [155]:
#checking drinking 2 values
crash_df['D2DRINKIN'].value_counts()

0    73787
1      522
Name: D2DRINKIN, dtype: int64

In [157]:
#replace driver 3 values to 0 and 1
#similar to above, if unknown replacing with 0

crash_df['D3DRINKIN'].replace({'No' : 0, 'Uncoded & Errors': 0, 'Yes' : 1}, inplace = True)

In [158]:
#checking drinking 3 update
crash_df['D3DRINKIN'].value_counts()

0    74280
1       29
Name: D3DRINKIN, dtype: int64

In [160]:
#replace drinking values of no to 0 and yes to 1
crash_df['DRINKING'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [161]:
#checking drinking update
crash_df['DRINKING'].value_counts()

0    71048
1     3261
Name: DRINKING, dtype: int64

In [163]:
#replace EMERGVEH values of no to 0 and yes to 1
crash_df['EMRGVEH'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [164]:
#checking update
crash_df['EMRGVEH'].value_counts()

0    73786
1      523
Name: EMRGVEH, dtype: int64

In [166]:
#replace FARMEQUIP values of no to 0 and yes to 1
crash_df['FARMEQUIP'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [167]:
#checking update
crash_df['FARMEQUIP'].value_counts()

0    74300
1        9
Name: FARMEQUIP, dtype: int64

In [169]:
#replace FLEEINGSIT values of no to 0 and yes to 1
crash_df['FLEEINGSIT'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [170]:
#checking update
crash_df['FLEEINGSIT'].value_counts()

0    74128
1      181
Name: FLEEINGSIT, dtype: int64

In [172]:
#replace HITANDRUN values of no to 0 and yes to 1
crash_df['HITANDRUN'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [173]:
#checking update
crash_df['HITANDRUN'].value_counts()

0    58411
1    15898
Name: HITANDRUN, dtype: int64

In [179]:
#replace MOTORCYCLE values of no to 0 and yes to 1
crash_df['MOTORCYCLE'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [180]:
#checking update
crash_df['MOTORCYCLE'].value_counts()

0    73596
1      713
Name: MOTORCYCLE, dtype: int64

In [187]:
#replace ORV values of no to 0 and yes to 1
crash_df['ORV'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [188]:
#checking update
crash_df['ORV'].value_counts()

0    74306
1        3
Name: ORV, dtype: int64

In [190]:
#replace ORV values of no to 0 and yes to 1
crash_df['PEDESTRIAN'].replace({'No' : 0, 'Yes' : 1}, inplace = True)

In [191]:
#checking update
crash_df['PEDESTRIAN'].value_counts()

0    73239
1     1070
Name: PEDESTRIAN, dtype: int64

## Next Steps

* Remove columns where values are all the same (like city) X (unles I find more)
* Address dummy placeholder values
* Convert yes or no to binary - in process, continue with columns to right of PEDESTRIAN
* Convert datatypes (float, integer, etc.)
* Visualize data - Tableau?

* Convert categorical to numbers? Or use 1 hot encoding


## References

* [Showing all columns as a list](https://www.statology.org/pandas-show-all-columns/)
* [Showing all columns by default](https://stackoverflow.com/questions/49188960/how-to-show-all-columns-names-on-a-large-pandas-dataframe)
* [Supplemental code help from ChatGPT](https://chat.openai.com/chat) 