<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-Keys" data-toc-modified-id="Load-Keys-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Load Keys</a></span></li><li><span><a href="#Get-Data" data-toc-modified-id="Get-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Get Data</a></span></li><li><span><a href="#Borough-Assignment" data-toc-modified-id="Borough-Assignment-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Borough Assignment</a></span><ul class="toc-item"><li><span><a href="#Borough-Assignment-by-Road-Type" data-toc-modified-id="Borough-Assignment-by-Road-Type-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Borough Assignment by Road Type</a></span></li></ul></li><li><span><a href="#Cleaned-Final-Data" data-toc-modified-id="Cleaned-Final-Data-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Cleaned Final Data</a></span><ul class="toc-item"><li><span><a href="#Write-to-CSV" data-toc-modified-id="Write-to-CSV-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Write to CSV</a></span></li></ul></li></ul></div>

# Cyclist and Pedestrain Deaths NYC

Collecting pedestrain and cyclist death by motor vehicle collision data via the NYC Open Data API. Not all boroughs have been properly assigned so we will attempt to assign as many as we can. Once the data is in a good place, we can build out an interactive dashboard in Bokeh to visualize.

## Load Keys

In [48]:
import json
with open('keys.json') as f:
    keys = json.load(f)
    nyc_od_token = keys['nycOD']

## Get Data

First we need to access the NYC Open Data API via an App Token. We can then easily pull in the data on motor vehicle incidents with pedestrains and cyclists. 

In [1]:
# make sure to install these packages before running:
# pip install sodapy
# pip install reverse_geocoder
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import pandas as pd
import os
import re
from sodapy import Socrata

In [49]:
# Go to NYC Open Data website and create an App token to access the aPI
client = Socrata("data.cityofnewyork.us",
                app_token=nyc_od_token)

Socrata has a nice functionality where users can simply query, SQL style, to extract information. It seems like it only accepts basic SQL commands so we will pull down a few columns and use Pandas for the analysis. 

In [50]:
peds_query = """
SELECT  
    borough, crash_date, location, 
    longitude, latitude, on_street_name, 
    off_street_name,cross_street_name,
    number_of_pedestrians_killed, number_of_cyclist_killed,
    contributing_factor_vehicle_1, vehicle_type_code1
WHERE number_of_pedestrians_killed > 0
   OR number_of_cyclist_killed > 0
LIMIT 2000
"""


In [51]:
# Returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("h9gi-nx95",
                    query=peds_query)

In [52]:
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

In [53]:
results_df.shape

(1198, 12)

In [7]:
results_df['year'] = pd.DatetimeIndex(results_df['crash_date']).year
results_df['month'] = pd.DatetimeIndex(results_df['crash_date']).month
results_df['day'] = pd.DatetimeIndex(results_df['crash_date']).dayofweek
results_df['date'] = pd.DatetimeIndex(results_df['crash_date']).date


In [8]:
results_df.year.value_counts(sort=False)

2012     78
2013    184
2014    153
2015    146
2016    166
2017    146
2018    130
2019    159
2020     36
Name: year, dtype: int64

In [9]:
results_df.borough.value_counts(dropna=False)

NaN              346
BROOKLYN         281
QUEENS           241
MANHATTAN        192
BRONX            107
STATEN ISLAND     31
Name: borough, dtype: int64

In [10]:
results_df = results_df[['borough','year','month','day', 'date',
                         'location', 'latitude', 'longitude',
                         'cross_street_name','off_street_name', 
                         'on_street_name', 'vehicle_type_code1', 
                         'contributing_factor_vehicle_1', 
                         'number_of_cyclist_killed', 
                         'number_of_pedestrians_killed']]

## Borough Assignment

Many incidents do not have borough assignments, but we want to the know borough assignment for downstream analysis. The data have longitude, latitude values that we can geo-locate, but not all cases have location data. 

In [11]:
import reverse_geocoder as rg

In [12]:
def make_borough_gps(row):
    dict_idx = 0 # one OrderedDict in a list
    borough = 4 # Fourth key is borough
    name = 1 # value index for key/value pair
    cord = (row['coordinates'][1],row['coordinates'][0])
    location = rg.search(cord)
    label = list(location[dict_idx].items())[borough][name]
    if label == 'Kings County': 
        return 'BROOKLYN'
    if label == 'Queens County':
        return 'QUEENS'
    if label == 'New York County':
        return 'MANHATTAN'
    if label == 'Bronx':
        return 'BRONX'
    if label == 'Richmond County':
        return 'STATEN ISLAND'
    else: label = 'NOT NYC'
    return label

In [13]:
results_df['borough_gps'] = results_df['location'].dropna().apply(lambda x: make_borough_gps(x))

Loading formatted geocoded file...


In [14]:
results_df.borough_gps.value_counts(dropna=False)

BROOKLYN         295
QUEENS           286
NOT NYC          188
MANHATTAN        163
NaN              145
BRONX             96
STATEN ISLAND     25
Name: borough_gps, dtype: int64

In [15]:
results_df[(results_df['borough'] != results_df['borough_gps']) & 
           (results_df['borough_gps'].isna() != True) & 
           (results_df['borough'].isna() == True)][['borough', 'borough_gps']].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 236 entries, 0 to 1195
Data columns (total 2 columns):
borough        0 non-null object
borough_gps    236 non-null object
dtypes: object(2)
memory usage: 5.5+ KB


There are 236 places where we can fill by the GPS location uncontested. There are more unknowns to fill though since there are many incidents where the GPS location differs from the reported borough. There also places where no location data is present. 

In [16]:
results_df[(results_df['borough'] != results_df['borough_gps']) & 
           (results_df['borough_gps'].isna() != True) &
           (results_df['borough'].isna() != True)][['borough', 'borough_gps']].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 249 entries, 5 to 1194
Data columns (total 2 columns):
borough        249 non-null object
borough_gps    249 non-null object
dtypes: object(2)
memory usage: 5.8+ KB


249 incidents where the GPS location differs from the reported borough. We will keep the reported borough as the correct assignment. It should be noted that the police do not always report incidents and locations correctly, but for this analysis we will keep those assignments. 

In [17]:
results_df.loc[results_df.borough.isna() == True, 'borough'] = \
    results_df.loc[results_df.borough.isna() == True, 'borough_gps']

In [18]:
results_df.borough.value_counts(dropna=False)

BROOKLYN         333
QUEENS           308
MANHATTAN        236
BRONX            138
NaN              110
STATEN ISLAND     38
NOT NYC           35
Name: borough, dtype: int64

145 cases where there is no GPS or borough information. These cases could be decided by street names if those are present. We are down to the by-hand work. The geo-locator seems to have only medium reliable so we will keep it in here and potentially reassign borough based on street. 

### Borough Assignment by Road Type

We will now filter down the dataframe by street columns to determine borough assignment. This is mainly to try and break this down into small pieces for assignment. 

In [19]:
missing_boro_df = results_df.loc[(results_df.borough.isna() == True) |
                                 (results_df.borough == 'NOT NYC'), :] \
                                [['location', 'on_street_name', 'off_street_name', 
                                  'cross_street_name', 'borough', 'borough_gps', 
                                  'latitude', 'longitude']]

In [20]:
missing_boro_df[missing_boro_df.on_street_name.str.contains('STREET', regex=True, na=False)].head()

Unnamed: 0,location,on_street_name,off_street_name,cross_street_name,borough,borough_gps,latitude,longitude
6,"{'type': 'Point', 'coordinates': [-73.98257, 4...",EAST 14 STREET,,,NOT NYC,NOT NYC,40.73136,-73.98257
54,"{'type': 'Point', 'coordinates': [-73.988205, ...",PEARL STREET,,,NOT NYC,NOT NYC,40.692265,-73.988205
134,"{'type': 'Point', 'coordinates': [-73.99266, 4...",EAST 16 STREET,,,NOT NYC,NOT NYC,40.737328,-73.99266
183,"{'type': 'Point', 'coordinates': [-73.964966, ...",WEST 116 STREET,,,NOT NYC,NOT NYC,40.808483,-73.964966
207,"{'type': 'Point', 'coordinates': [-73.99491, 4...",DOUGLASS STREET,,,NOT NYC,NOT NYC,40.68443,-73.99491


In [21]:
def make_boro_assignment(df, index, borough):
    df.loc[index:index+1, 'borough'] = borough
    return df

In [22]:
boros = ['MANHATTAN', "BROOKLYN", "QUEENS", "STATEN ISLAND", "BRONX"]
results_df = make_boro_assignment(results_df, 176, boros[0])
results_df = make_boro_assignment(results_df, 207, boros[1])
results_df = make_boro_assignment(results_df, 716, boros[0])
results_df = make_boro_assignment(results_df, 493, boros[0])
results_df = make_boro_assignment(results_df, 723, boros[3])
results_df = make_boro_assignment(results_df, 776, boros[2])
results_df = make_boro_assignment(results_df, 816, boros[2])
results_df = make_boro_assignment(results_df, 873, boros[1])
results_df = make_boro_assignment(results_df, 969, boros[2])
results_df = make_boro_assignment(results_df, 1036, boros[1])
results_df = make_boro_assignment(results_df, 1065, boros[1])
results_df = make_boro_assignment(results_df, 1177, boros[3])

In [23]:
avenue_df = results_df.loc[(results_df.borough.isna() == True) |
                            (results_df.borough == 'NOT NYC'), :] \
                            [['on_street_name', 'off_street_name', 
                              'cross_street_name', 'borough', 'borough_gps']]
avenue_df[(avenue_df.on_street_name.str.contains('AVENUE',regex=True, na=False))].head()

Unnamed: 0,on_street_name,off_street_name,cross_street_name,borough,borough_gps
23,BORDEN AVENUE,58 ROAD,,,
145,11 AVENUE,,,NOT NYC,NOT NYC
215,10 AVENUE,,,NOT NYC,NOT NYC
278,TOMPKINS AVENUE,,,NOT NYC,NOT NYC
306,AVENUE OF THE AMERICAS,,,NOT NYC,NOT NYC


In [24]:
results_df = make_boro_assignment(results_df, 27, boros[2])
results_df = make_boro_assignment(results_df, 304, boros[0])
results_df = make_boro_assignment(results_df, 399, boros[0])
results_df = make_boro_assignment(results_df, 619, boros[4])
results_df = make_boro_assignment(results_df, 637, boros[4])
results_df = make_boro_assignment(results_df, 663, boros[1])
results_df = make_boro_assignment(results_df, 673, boros[1])
results_df = make_boro_assignment(results_df, 678, boros[3])
results_df = make_boro_assignment(results_df, 739, boros[1])
results_df = make_boro_assignment(results_df, 751, boros[2])
results_df = make_boro_assignment(results_df, 755, boros[2])
results_df = make_boro_assignment(results_df, 757, boros[2])
results_df = make_boro_assignment(results_df, 764, boros[4])
results_df = make_boro_assignment(results_df, 785, boros[1])
results_df = make_boro_assignment(results_df, 790, boros[1])
results_df = make_boro_assignment(results_df, 835, boros[2])
results_df = make_boro_assignment(results_df, 891, boros[2])
results_df = make_boro_assignment(results_df, 895, boros[1])
results_df = make_boro_assignment(results_df, 914, boros[1])
results_df = make_boro_assignment(results_df, 915, boros[2])
results_df = make_boro_assignment(results_df, 953, boros[2])
results_df = make_boro_assignment(results_df, 961, boros[4])

In [25]:
avenue_df = results_df.loc[(results_df.borough.isna() == True) |
                            (results_df.borough == 'NOT NYC'), :] \
                            [['on_street_name', 'off_street_name', 
                              'cross_street_name', 'borough', 'borough_gps']]
avenue_df[avenue_df.on_street_name.str.contains('AVENUE',regex=True, na=False)].head()

Unnamed: 0,on_street_name,off_street_name,cross_street_name,borough,borough_gps
23,BORDEN AVENUE,58 ROAD,,,
145,11 AVENUE,,,NOT NYC,NOT NYC
215,10 AVENUE,,,NOT NYC,NOT NYC
278,TOMPKINS AVENUE,,,NOT NYC,NOT NYC
306,AVENUE OF THE AMERICAS,,,NOT NYC,NOT NYC


In [26]:
results_df = make_boro_assignment(results_df, 959, boros[2])
results_df = make_boro_assignment(results_df, 975, boros[1])
results_df = make_boro_assignment(results_df, 1004, boros[2])
results_df = make_boro_assignment(results_df, 1005, boros[4])
results_df = make_boro_assignment(results_df, 1028, boros[0])
results_df = make_boro_assignment(results_df, 1031, boros[1])
results_df = make_boro_assignment(results_df, 1039, boros[1])
results_df = make_boro_assignment(results_df, 1070, boros[4])
results_df = make_boro_assignment(results_df, 1083, boros[1])
results_df = make_boro_assignment(results_df, 1086, boros[2])
results_df = make_boro_assignment(results_df, 1087, boros[2])
results_df = make_boro_assignment(results_df, 1174, boros[0])

In [27]:
blvd_df = results_df.loc[(results_df.borough.isna() == True) |
              (results_df.borough == 'NOT NYC'), :] \
              [['location', 'on_street_name', 'off_street_name', 
              'cross_street_name', 'borough', 'borough_gps', 
              'latitude', 'longitude']]
blvd_df[blvd_df.on_street_name.str.contains('BOULEVARD|SQUARE', regex=True, na=False)].head()

Unnamed: 0,location,on_street_name,off_street_name,cross_street_name,borough,borough_gps,latitude,longitude
83,,CONDUIT BOULEVARD,CRESCENT STREET,,,,,
88,,ROCKAWAY BOULEVARD,,,,,,
221,"{'type': 'Point', 'coordinates': [-73.824684, ...",PARSONS BOULEVARD,,,NOT NYC,NOT NYC,40.77306,-73.824684
342,,NORTHERN BOULEVARD,,,,,,
461,,BRUCKNER BOULEVARD,HUTCHINSON RIVER PARKWAY,,,,,


In [28]:
# boros = ['MANHATTAN', "BROOKLYN", "QUEENS", "STATEN ISLAND", "BRONX"]
results_df = make_boro_assignment(results_df, 82, boros[2])
results_df = make_boro_assignment(results_df, 85, boros[1])
results_df = make_boro_assignment(results_df, 223, boros[2])
results_df = make_boro_assignment(results_df, 347, boros[2])
results_df = make_boro_assignment(results_df, 453, boros[4])
results_df = make_boro_assignment(results_df, 540, boros[2])
results_df = make_boro_assignment(results_df, 553, boros[2])
results_df = make_boro_assignment(results_df, 567, boros[2])
results_df = make_boro_assignment(results_df, 644, boros[2])
results_df = make_boro_assignment(results_df, 658, boros[2])
results_df = make_boro_assignment(results_df, 680, boros[2])
results_df = make_boro_assignment(results_df, 801, boros[1])
results_df = make_boro_assignment(results_df, 833, boros[2])
results_df = make_boro_assignment(results_df, 906, boros[0])
results_df = make_boro_assignment(results_df, 917, boros[4])
results_df = make_boro_assignment(results_df, 932, boros[2])
results_df = make_boro_assignment(results_df, 1077, boros[2])
results_df = make_boro_assignment(results_df, 1082, boros[3])
results_df = make_boro_assignment(results_df, 1156, boros[1])
results_df = make_boro_assignment(results_df, 1163, boros[2])

In [29]:
off_avenue = results_df.loc[(results_df.borough.isna() == True) |
              (results_df.borough == 'NOT NYC'), :] \
              [['location', 'on_street_name', 'off_street_name', 
              'cross_street_name', 'borough', 'borough_gps']]
off_avenue[off_avenue.off_street_name.str.contains('AVENUE', regex=True, na=False)]

Unnamed: 0,location,on_street_name,off_street_name,cross_street_name,borough,borough_gps
650,,BRUCKNER BOULEVARD,WILLIS AVENUE,,,
676,,WESTCHESTER AVENUE,EAST TREMONT AVENUE,,,
774,,165 STREET,HILLSIDE AVENUE,,,
789,,FLATBUSH AVENUE,DEKALB AVENUE,,,
927,,ASTORIA BOULEVARD,31 AVENUE,,,
931,,69 PLACE,GRAND AVENUE,,,
947,,WEBSTER AVENUE,CLAY AVENUE,,,
949,,MAURICE AVENUE,BORDEN AVENUE,,,
950,,EASTERN PARKWAY,SAINT MARKS AVENUE,,,
955,,DRUMGOOLE ROAD EAST,WOLCOTT AVENUE,,,


In [30]:
results_df = make_boro_assignment(results_df, 937, boros[1])
results_df = make_boro_assignment(results_df, 939, boros[2])
results_df = make_boro_assignment(results_df, 949, boros[3])
results_df = make_boro_assignment(results_df, 968, boros[1])
results_df = make_boro_assignment(results_df, 990, boros[3])
results_df = make_boro_assignment(results_df, 1050, boros[4])

In [31]:
results_df = make_boro_assignment(results_df, 73, boros[2])
results_df = make_boro_assignment(results_df, 148, boros[2])
results_df = make_boro_assignment(results_df, 234, boros[4])
results_df = make_boro_assignment(results_df, 300, boros[2])
results_df = make_boro_assignment(results_df, 308, boros[2])
results_df = make_boro_assignment(results_df, 397, boros[2])
results_df = make_boro_assignment(results_df, 401, boros[1])
results_df = make_boro_assignment(results_df, 405, boros[0])
results_df = make_boro_assignment(results_df, 468, boros[2])
results_df = make_boro_assignment(results_df, 589, boros[0])

In [32]:
results_df = make_boro_assignment(results_df, 546, boros[3])
results_df = make_boro_assignment(results_df, 558, boros[2])
results_df = make_boro_assignment(results_df, 591, boros[4])
results_df = make_boro_assignment(results_df, 574, boros[2])
results_df = make_boro_assignment(results_df, 594, boros[0])
results_df = make_boro_assignment(results_df, 604, boros[4])
results_df = make_boro_assignment(results_df, 628, boros[4])
results_df = make_boro_assignment(results_df, 697, boros[3])
results_df = make_boro_assignment(results_df, 762, boros[4])
results_df = make_boro_assignment(results_df, 822, boros[4])
results_df = make_boro_assignment(results_df, 850, boros[1])
results_df = make_boro_assignment(results_df, 863, boros[2])
results_df = make_boro_assignment(results_df, 969, boros[3])
results_df = make_boro_assignment(results_df, 1009, boros[2])
results_df = make_boro_assignment(results_df, 1038, boros[1])
results_df = make_boro_assignment(results_df, 1041, boros[2])
results_df = make_boro_assignment(results_df, 1138, boros[1])

## Cleaned Final Data

The remaining data that do not have borough assignments are locations that could be in one of several boroughs or is a collision that occured on an interstate highway such as I-87, I-495, I-278, etc. 

In [33]:
results_df.head()

Unnamed: 0,borough,year,month,day,date,location,latitude,longitude,cross_street_name,off_street_name,on_street_name,vehicle_type_code1,contributing_factor_vehicle_1,number_of_cyclist_killed,number_of_pedestrians_killed,borough_gps
0,STATEN ISLAND,2020,5,4,2020-05-22,"{'type': 'Point', 'coordinates': [-74.1672, 40...",40.602074,-74.1672,,SIGNS ROAD,ARLENE STREET,Pick-up Truck,Failure to Yield Right-of-Way,1,0,STATEN ISLAND
1,QUEENS,2020,5,0,2020-05-18,"{'type': 'Point', 'coordinates': [-73.827286, ...",40.704857,-73.827286,124-50 METROPOLITAN AVENUE,,,Box Truck,View Obstructed/Limited,0,1,QUEENS
2,QUEENS,2020,3,5,2020-03-14,"{'type': 'Point', 'coordinates': [-73.89384, 4...",40.760437,-73.89384,,30 AVENUE,74 STREET,Station Wagon/Sport Utility Vehicle,Driver Inattention/Distraction,0,1,QUEENS
3,,2020,3,1,2020-03-17,,,,,,NEW ENGLAND THRUWAY,Tractor Truck Diesel,Unspecified,0,1,
4,BROOKLYN,2020,4,1,2020-04-28,"{'type': 'Point', 'coordinates': [-73.95166, 4...",40.643063,-73.95166,,CLARENDON ROAD,ROGERS AVENUE,Bus,Pedestrian/Bicyclist/Other Pedestrian Error/Co...,1,0,BROOKLYN


In [34]:
results_df.borough.value_counts(dropna=False)

QUEENS           338
BROOKLYN         334
MANHATTAN        229
BRONX            146
NaN               65
STATEN ISLAND     55
NOT NYC           31
Name: borough, dtype: int64

Once most of the unassigned deaths were manually reassigned or reassigned by geo-location, Queens has moved into unfortunate position of most deaths for pedestrains and cyclists. 

In [35]:
results_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1198 entries, 0 to 1197
Data columns (total 16 columns):
borough                          1133 non-null object
year                             1198 non-null int64
month                            1198 non-null int64
day                              1198 non-null int64
date                             1198 non-null object
location                         1053 non-null object
latitude                         1053 non-null object
longitude                        1053 non-null object
cross_street_name                77 non-null object
off_street_name                  910 non-null object
on_street_name                   1082 non-null object
vehicle_type_code1               1182 non-null object
contributing_factor_vehicle_1    1187 non-null object
number_of_cyclist_killed         1198 non-null object
number_of_pedestrians_killed     1198 non-null object
borough_gps                      1053 non-null object
dtypes: int64(3), object(13)
memory

In [36]:
results_df.contributing_factor_vehicle_1.value_counts()

Unspecified                                              545
Driver Inattention/Distraction                           169
Failure to Yield Right-of-Way                            156
Pedestrian/Bicyclist/Other Pedestrian Error/Confusion     48
Passenger Distraction                                     48
Traffic Control Disregarded                               38
Alcohol Involvement                                       37
Backing Unsafely                                          33
Unsafe Speed                                              21
View Obstructed/Limited                                   11
Other Vehicular                                            9
Physical Disability                                        9
Driver Inexperience                                        8
Pavement Slippery                                          5
Aggressive Driving/Road Rage                               5
Prescription Medication                                    4
Following Too Closely   

In [37]:
def apply_row_regex(df_col, old_str, replacement_str):
    df_col.replace(to_replace=old_str,
                    value= replacement_str,
                    regex=True, 
                    inplace=True)
    return df_col

In [38]:
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Driver Inattention/Distraction",
                replacement_str="Driver Distraction")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Failure to Yield Right-of-Way",
                replacement_str="Failure to Yield")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Traffic Control Disregarded",
                replacement_str="Traffic Control Ignored")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Aggressive Driving/Road Rage",
                replacement_str="Aggressive Driving")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Passing or Lane Usage Improper",
                replacement_str="Lane Usage Improper")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Driverless/Runaway Vehicle",
                replacement_str="Driverless Vehicle")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Reaction to Uninvolved Vehicle",
                replacement_str="Reacted to Other Vehicle")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Pedestrian/Bicyclist/Other Pedestrian Error/Confusion",
                replacement_str="Pedestrian/Cyclist Error")
apply_row_regex(results_df.contributing_factor_vehicle_1,
                old_str="Drugs \(illegal\)",
                replacement_str="Drugs (Illegal)")

0               Failure to Yield
1        View Obstructed/Limited
2             Driver Distraction
3                    Unspecified
4       Pedestrian/Cyclist Error
5            Alcohol Involvement
6               Backing Unsafely
7        Traffic Control Ignored
8                    Unspecified
9        Traffic Control Ignored
10              Failure to Yield
11            Driver Distraction
12              Failure to Yield
13                   Unspecified
14                   Unspecified
15                   Unspecified
16                  Unsafe Speed
17                   Unspecified
18            Driver Distraction
19       View Obstructed/Limited
20            Driver Distraction
21                  Unsafe Speed
22            Driver Distraction
23              Failure to Yield
24            Driver Distraction
25                   Unspecified
26              Failure to Yield
27              Failure to Yield
28                   Unspecified
29                   Unspecified
          

In [39]:
results_df.vehicle_type_code1 = results_df.vehicle_type_code1 \
                 .replace(np.nan, 'NaN', regex=True)

In [40]:
results_df.vehicle_type_code1 = results_df.vehicle_type_code1 \
                 .apply(lambda x: x.title())

In [41]:
apply_row_regex(results_df.vehicle_type_code1,
                old_str="Sport Utility / Station Wagon",
                replacement_str="SUV / Station Wagon")
apply_row_regex(results_df.vehicle_type_code1,
                old_str="Station Wagon/Sport Utility Vehicle",
                replacement_str="SUV / Station Wagon")
apply_row_regex(results_df.vehicle_type_code1,
                old_str='Large Com Veh\(6 Or More Tires\)',
                replacement_str="Com Veh, 6+ Tires")
apply_row_regex(results_df.vehicle_type_code1,
                old_str='Small Com Veh\(4 Tires\)',
                replacement_str="Com Veh, 4 Tires")
apply_row_regex(results_df.vehicle_type_code1,
                old_str="Tractor Truck Gasoline",
                replacement_str="Tractor Truck")
apply_row_regex(results_df.vehicle_type_code1,
                old_str="Tractor Truck Diesel",
                replacement_str="Tractor Truck")

0             Pick-Up Truck
1                 Box Truck
2       SUV / Station Wagon
3             Tractor Truck
4                       Bus
5       SUV / Station Wagon
6                      Taxi
7       SUV / Station Wagon
8                     Sedan
9                Motorcycle
10      SUV / Station Wagon
11                    Sedan
12      SUV / Station Wagon
13                    Sedan
14                      Bus
15                    E-Sco
16                    Sedan
17                    Sedan
18                      Bus
19                     Dump
20                      Van
21                      Nan
22           Concrete Mixer
23      SUV / Station Wagon
24                Flat Rack
25      SUV / Station Wagon
26                    Sedan
27      SUV / Station Wagon
28                Flat Rack
29                     Dump
               ...         
1168    SUV / Station Wagon
1169      Passenger Vehicle
1170             Motorcycle
1171      Passenger Vehicle
1172      Passenger 

In [42]:
results_df.vehicle_type_code1.value_counts()

SUV / Station Wagon    345
Passenger Vehicle      327
Sedan                   69
Bus                     68
Unknown                 53
Com Veh, 6+ Tires       42
Taxi                    36
Pick-Up Truck           35
Van                     34
Bicycle                 26
Other                   23
Box Truck               19
Dump                    18
Nan                     16
Com Veh, 4 Tires        11
Motorcycle              10
Tractor Truck           10
Livery Vehicle           7
Bike                     6
Tanker                   4
Tk                       3
Concrete Mixer           3
Garbage Or Refuse        3
Ds                       3
Fb                       3
Tow Truck / Wrecker      3
Flat Rack                2
Bu                       2
Scooter                  1
Stake Or Rack            1
Van (                    1
E-Sco                    1
Concr                    1
Cb                       1
Flat Bed                 1
Tt                       1
Ambul                    1
L

In [43]:
results_df.contributing_factor_vehicle_1.value_counts()

Unspecified                 545
Driver Distraction          169
Failure to Yield            156
Pedestrian/Cyclist Error     48
Passenger Distraction        48
Traffic Control Ignored      38
Alcohol Involvement          37
Backing Unsafely             33
Unsafe Speed                 21
View Obstructed/Limited      11
Other Vehicular               9
Physical Disability           9
Driver Inexperience           8
Drugs (Illegal)               6
Pavement Slippery             5
Aggressive Driving            5
Following Too Closely         4
Prescription Medication       4
Turning Improperly            4
Lane Usage Improper           3
Other Electronic Device       3
Driverless Vehicle            3
Outside Car Distraction       3
Illnes                        3
Passing Too Closely           2
Lost Consciousness            2
Glare                         2
Oversized Vehicle             1
Reacted to Other Vehicle      1
Fatigued/Drowsy               1
Fell Asleep                   1
Unsafe L

### Write to CSV

In [44]:
results_df.to_csv('peds_death_data')

Now that we have the data in a good place we can build out an interactive dashboard. 