# Estimating the Microcides committed by different types of road user in the UK using STATS19 data 2005-2014

This notebook takes traffic volume and accident data from the DfT and estimates the number of microcides that are attributable to a variety of vehicle types. Please see my blog article (link) regarding microcides.

These are essentially estimates of the marginal increase in deaths. We want to be able to say, for example, how much the risk to others changes if you cycle instead of drive a given distance. On this basis I tried to attribute ‘blame’ for each death in each accident in the data. 

A vehicle should be ‘blamed’ to the degree to which its absence from the road would have prevented the deaths. Given the complex and diverse ways in which road accidents are caused I relied on a heuristic to come up with the estimates in Table 1.
For single vehicle accidents no one is blamed (risk to oneself is ignored as these are not microcides).
In two vehicle crashes each death is blamed on the other vehicle. 
For each death in larger accidents the largest of the other vehicle types involved in the accident are all blamed. 

The 'Vehicle_Encoding' dictionary variable in this notebook (code cell 6) defines the size ordering.

Examples of multi-vehicle crashes:
1.	If a crash involving two bicycles and a car occurs in which one cyclist dies. The driver is blamed but the other cyclist is not. 
2.	If an accident involving a car, a pedestrian and a HGV results in the deaths of the HGV driver and the pedestrian. The HGV is blamed for the pedestrian’s death and car is blamed for the HGV driver’s death.
3.	If an accident involving two cars and a pedestrian results in the death of the pedestrian, both cars are blamed.

Given that an accident's causes will become harder to identify as they become larger I needed to ignore accidents above a given size for the heuristic to hold. See the Appendix for the justification for using 4 as the maximum accident size. This could produce an incorrectly low microcide value for vehicle types that are typically involved in large accidents. Given the rarity of accidents with large numbers of vehicles I doubt this is a significant problem.

In [1]:
import pandas as pd
import requests, zipfile
import io
from numpy import where
import datetime

In [2]:
"""The links to the STATS19 data and 2 Traffic Volume Files"""

Offline = False

if Offline:
    STATS19 = "http://localhost:8888/files/Stats19_Data_2005-2014.zip"
    Traffic_Volume = "http://localhost:8888/files/tra0201.xlsx"
    Cycle_Volume = "http://localhost:8888/files/tra0401.xlsx"
else:
    STATS19_Online = "http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19_Data_2005-2014.zip"
    Traffic_Volume = "https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/498684/tra0201.xls"
    Cycle_Volume = "https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/428727/tra0401.xls"

In [3]:
"""Extracts and the 3 Stats 19 tables and extracts the relevent columns"""

stats19_file = zipfile.ZipFile(io.BytesIO(requests.get(STATS19).content))
Accidents = pd.read_csv(stats19_file.open('Accidents0514.csv'), 
                       index_col = 0, 
                       usecols = [0, 6, 7])
Vehicles = pd.read_csv(stats19_file.open('Vehicles0514.csv'),
                        usecols = [0, 1, 2])
Casualties = pd.read_csv(stats19_file.open('Casualties0514.csv'),
                          usecols = [0, 1, 3, 7])

We only care about fatalites so the Casualties and Accidents can both be filtered

In [4]:
Maximum_Vehicles = 4 # Maximum accident size considered
Minimum_Vehicles = 0 # Minimum accident size considered

Include_Pedestrians = True # If False, the microcides from pedestrans are ignored

Deaths = Casualties[Casualties['Casualty_Severity'] == 1]
if not Include_Pedestrians:
    Deaths = Casualties[Casualties['Casualty_Class'] != 3]
    
Fatal_Accidents = Accidents[Accidents['Accident_Severity'] == 1]
Fatal_Small_Accidents = Fatal_Accidents[(Fatal_Accidents['Number_of_Vehicles'] <= Maximum_Vehicles) & 
                                        (Fatal_Accidents['Number_of_Vehicles'] >= Minimum_Vehicles)]

Next we produce a dataframe that summerises the outcome of each accident in terms of the vehicles involved and the deaths in those vehicles.

In [5]:
Deaths.loc[Deaths['Casualty_Class'] == 3, 'Vehicle_Reference'] = 0 # Removes pedestrians vehicle
Grouped_Deaths = Deaths.groupby(by = [Deaths.columns[0], 'Vehicle_Reference']).sum()

"""'Deaths.columns[0]' is used throughout this notebook to refer to 'Accident_Index' which pandas had an issue with."""

Grouped_Deaths.drop('Casualty_Class', axis=1, inplace=True)
Grouped_Deaths.columns = ['Fatalities']

Fatality_Vehicles = Fatal_Small_Accidents.merge(Vehicles, how = 'left', left_index = 
                                          True, right_on = Deaths.columns[0])

"""Left merging fatal accidents with vehicles creates a data frame with one row for each vehicle in a fatal accident"""

Fatality_Vehicles = Fatality_Vehicles[[Deaths.columns[0], 'Vehicle_Reference', 'Vehicle_Type']] # Irrelevnt columns removed

"""Outer merging involved vehicles accidents with deaths grouped by vehicle creates a data frame with one row for each vehicle 
(or group of pedestrians) in a fatal accident with a column for the number of fatalies in each"""

All_Microcide_Vehicles = Fatality_Vehicles.merge(Grouped_Deaths, how = 'outer', 
                                                 left_on = [Deaths.columns[0],'Vehicle_Reference'],
                                                 right_index = True)

"""The line below gives pedestrians a vehicle type of 0 and gives vehicles in which noone died a fatality totol of 0"""

All_Microcide_Vehicles.fillna(0, inplace = True) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s


The vehicle types are reassigned based on the vehicle types used by the DfT in their traffic volume and road death tables.
The number codes used also double as a sorting mechanism where by 'smaller' vehicles are assigned lower numbers.

In [6]:
Vehicle_Encoding = {
1: (16, 22),             # Horses and Mobility Scooters
2: (1,),                 # Pedal Cycle
3: (2, 3, 4 ,5, 23, 97), # Mopeds and Motorcycle 
4: (9, 8, 19, 20),       # Cars and taxis
5: (19,),                # Vans 
6: (17,),                # Agricultural Vehicles
7: (18,),                # Trams
8: (10, 11),             # Buses and Coaches
9: (20, 21, 98)          # HGVs
}

In [7]:
All_Microcide_Vehicles['Vehicle_Group'] = 0
for k, v in Vehicle_Encoding.items():
    All_Microcide_Vehicles.loc[All_Microcide_Vehicles['Vehicle_Type'].isin(v), 'Vehicle_Group'] = k

The cell below creates a series of every 'type' of accident. An accident is identified by the vehicles involved and the number of deaths in each. So a 2 vehicle, car and motorcycle accident where the motorcyclist dies is different from one in which one car occupant dies and they are both different from one in which two car occupants die etc..

In [8]:
All_Microcide_Vehicles.sort_values(by = [Deaths.columns[0], 'Vehicle_Group', 'Fatalities'], inplace = True)
Accident_Group = All_Microcide_Vehicles.groupby(Deaths.columns[0]).agg(lambda x: tuple(x))
Accident_Group['Accident_Type'] = Accident_Group[['Vehicle_Group', 'Fatalities']].apply(tuple, axis = 1)
All_Accident_Types = Accident_Group[['Accident_Type']]
Accident_Type_Counts = All_Accident_Types['Accident_Type'].value_counts()

The function below does the 'blaming'. It iterates over the types of accident and decides how many of the deaths involved can be attributed to each of the vehicle. For each death in an accident the 'largest' of the other vehicles is given the blame, if there are more than one then they are both blamed (For two vehicle accidents this is irrevelant as there is only ever one 'other vehicle'). Size ordering is determined by the vehicle encoding number. 

The validity of this heuristic is unclear but it makes intuitive sense, and is somewhat corroborated by the frequncey of accident types. Looking at 2 vehicle accidents in Accident_Type_Counts smaller vehicles appear to always come off worse in terms of the number of deaths.

In [9]:
Caused_Fatalities = {} # A dictionary of deaths by vehicle. The keys are the vehicle types and values are running totals.
for n in range(0,10):
    Caused_Fatalities[n] = 0

def Calculate_Microcides(Accident_Type, Number):
    vehicles = Accident_Type[0]
    if len(vehicles) < 2:
        return None
    deaths = Accident_Type[1]
    for i, death in enumerate(deaths):
        if death > 0:
            other_vehicles = vehicles[:i]+vehicles[i+1:]
            Caused_Fatalities[max(other_vehicles)] += other_vehicles.count(max(other_vehicles))*death*Number

for acc_type, count in Accident_Type_Counts.iteritems():
    Calculate_Microcides(acc_type, count)

The cell below reads in the traffic volume Excel files and sums the relevent rows (2005 - 2014). The units are billions of kilometers.

In [10]:
Traffic_Volume = pd.read_excel(Traffic_Volume, sheet_name = "TRA0201", skiprows = list(range(0,6)), index_col = 0)
Cycle_Volume = pd.read_excel(Cycle_Volume, sheet_name = "TRA0401", skiprows = list(range(0,5)), index_col = 0)
Ten_Year_Volume = {}
for col in list(Traffic_Volume.columns.values)[1:6]:
    Ten_Year_Volume[col] = sum(Traffic_Volume[col][56:66]) #Selected rows refer to 2005 to 2014
Ten_Year_Volume['Pedal Cycles'] = sum(Cycle_Volume['Kilometres'][56:66])

The column names in the traffic files and vehicle codes described by the Vehicle_Encoding dictionary are mapped using the dictionary below.

In [11]:
Volume_Accident_Coding = {
'Buses & Coaches':  8,
'Cars and taxis':   4,
'Goods vehicles 2': 9,
'Light\nvans 1':    5,
'Motorcycles':      3,
'Pedal Cycles':     2
}

This cell puts everything together. The traffic volumes and deaths per vehicle type are assembled into a dataframe and the microcides are calculated. Deaths per billion kilometers is the same as microcides per 1,000 kilometers so a simple division is used to calculate the microcides. (Yay metric system).

In [12]:
Microcides = pd.DataFrame.from_dict(Ten_Year_Volume, orient = 'index')
Microcides.columns = ['Traffic Volume']
Microcides['Deaths'] = 0

for k, v, in Volume_Accident_Coding.items():
    Microcides = Microcides.set_value(k, 'Deaths', Caused_Fatalities[v])

Microcides['Microcides'] = Microcides['Deaths']/Microcides['Traffic Volume']

In [13]:
Microcides

Unnamed: 0,Traffic Volume,Deaths,Microcides
Motorcycles,48.6,363,7.469136
Goods vehicles 2,269.8,3163,11.723499
Light\nvans 1,665.8,1526,2.29198
Pedal Cycles,47.2,50,1.059322
Cars and taxis,3916.6,13267,3.387377
Buses & Coaches,48.9,894,18.282209


# Appendix

The cells below were used as part of the exploration of the data. Set the maximum vehicles to 34 (code cell 4) and rerun to see the full fatality dataset. 

In [14]:
Accident_Type_Exploration = pd.DataFrame(Accident_Type_Counts)
Accident_Type_Exploration.columns = ['Accident_Count']
Accident_Type_Exploration['Type'] = Accident_Type_Counts.index
Accident_Type_Exploration['Vehicles'] = Accident_Type_Exploration['Type'].apply(lambda x: x[0]) 
Accident_Type_Exploration['Deaths'] = Accident_Type_Exploration['Type'].apply(lambda x: x[1])
Accident_Type_Exploration['Vehicle_Count'] = Accident_Type_Exploration['Vehicles'].apply(len)
Accident_Type_Exploration['Deaths_Per_Accident'] = Accident_Type_Exploration['Deaths'].apply(sum)
Accident_Type_Exploration['Total_Deaths'] = Accident_Type_Exploration['Accident_Count'] * \
                                            Accident_Type_Exploration['Deaths_Per_Accident']
Accident_Type_Exploration[['Deaths_Per_Accident','Total_Deaths','Accident_Count']].groupby('Deaths_Per_Accident').sum()

Unnamed: 0_level_0,Total_Deaths,Accident_Count
Deaths_Per_Accident,Unnamed: 1_level_1,Unnamed: 2_level_1
1,20007,20007
2,2296,1148
3,516,172
4,136,34
5,55,11
6,48,8
7,14,2


If the maximum vehicles is set to 34 or greater we can see from the table produced below that accidents involving 4 or fewer vehicles account for >98% of all fatal accidents. This subset also covers >94% of vehicles involved in fatal accidents.

In [15]:
Accident_Sizes = Accident_Type_Exploration[['Vehicle_Count', 'Accident_Count', 'Total_Deaths']].groupby('Vehicle_Count').sum()
Accident_Sizes['Total_Vehicles_Involved'] = Accident_Sizes['Accident_Count']*Accident_Sizes.index
Accident_Sizes['Percentage_Deaths'] = Accident_Sizes['Total_Deaths']/sum(Accident_Sizes['Total_Deaths'])*100
Accident_Sizes['Percentage_Accidents'] = Accident_Sizes['Accident_Count']/sum(Accident_Sizes['Accident_Count'])*100
Accident_Sizes['Cumulative_%_Vehicles'] = 100*Accident_Sizes['Total_Vehicles_Involved'].cumsum()/ \
                                              Accident_Sizes['Total_Vehicles_Involved'].sum()
Accident_Sizes['Cumulative_%_Accidents'] = 100*Accident_Sizes['Accident_Count'].cumsum()/Accident_Sizes['Accident_Count'].sum()
Accident_Sizes

Unnamed: 0_level_0,Accident_Count,Total_Deaths,Total_Vehicles_Involved,Percentage_Deaths,Percentage_Accidents,Cumulative_%_Vehicles,Cumulative_%_Accidents
Vehicle_Count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,5605,6071,5605,26.31328,26.213638,13.548138,26.213638
2,12403,13237,24806,57.372573,58.006735,73.508013,84.220372
3,2584,2880,7752,12.482663,12.084931,92.245776,96.305304
4,742,828,2968,3.588766,3.470209,99.419883,99.775512
5,48,56,240,0.242718,0.224488,100.0,100.0


See the microcide exploration file for further exploration of this dataset