# Prediction of Road Traffic Collision Severity

## Introduction

Road traffic collisions pose a significant safety risk to both drivers and pedestrians. In 2019, the UK saw 1748 deaths and 25,975 serious injuries from road traffic accidents [(source)](https://www.gov.uk/government/statistics/reported-road-casualties-great-britain-provisional-results-2019). Whilst a downward trend has been identified, it is nontheless important to be able to identify predictors of road traffic accidents, for both number of accidents and severity. This is useful for allocation of resources (such as emergency services) - if there are conditions that imply a greater incidence of sever accidents, it may be prudent to increase the number of paramedics on call, for example. 

It is also useful to identify these predictors in order to further reduce the number of severe accidents. Identifying that a particular signage or junction reduces risk allows for local governments to make progressive changes to the road network in order to improve overall safety in the future.

## Data

The data have been sourced from https://www.kaggle.com/akshay4/road-accidents-incidence/. It is an aggregation of year-on-year data published by the United Kingdom Department of Transport. 

The CSV file contains the following fields:

|Field Name|Description|
|----------|-----------|
|Accident_Index|Government ID for incident|
|1st_Road_Class|Classification of road based on [UK numbering scheme](https://en.wikipedia.org/wiki/Great_Britain_road_numbering_scheme)|
|1st_Road_Number|Unique road identifier|
|2nd_Road_Class|Classification of road based on [UK numbering scheme](https://en.wikipedia.org/wiki/Great_Britain_road_numbering_scheme)|
|2nd_Road_Number|Unique road identifier|
|Accident_Severity|Slight, Serious, Fatal|
|Carriageway_Hazards|Potential hazards such as objects in road, animals in road etc.|
|Date|Date of incident|
|Day_of_Week|Day of incident (Mon, Tues, Weds...)|
|Did_Police_Officer_Attend_Scene_of_Accident|Attendance of police|
|Junction_Control|If at junction, what control in place? e.g. traffic light, stop sign etc.|
|Junction_Detail|Additional information about the junction e.g. private drive, staggered etc.|
|Latitude| Geographic coordinate that specifies the north–south position|
|Light_Conditions|Light condition at incident e.g. Light, Dark with no lighting, Dark with street lighting etc.|
|Local_Authority_(District)| Geographic district in which incident occured|
|Local_Authority_(Highway)| Authority responsible for highway|
|Location_Easting_OSGR| Ordnance Survey Grid Reference |
|Location_Northing_OSGR| Ordnance Survey Grid Reference| 
|Longitude| Geographic coordinate that specifies the east-west position|
|LSOA_of_Accident_Location| Lower Super Output Area (ONS statistic reporting area)|
|Number_of_Casualties| How many casualties|
|Number_of_Vehicles| How many vehicles|
|Pedestrian_Crossing-Human_Control| Human controled crossing ('lollipop' person or traffic officer)|
|Pedestrian_Crossing-Physical_Facilities| Pedestrian crossing e.g. zebra crossing, pelican crossing etc.|
|Police_Force|Police force responsible for area of incident|
|Road_Surface_Conditions|Dry, wet, snow etc.|
|Road_Type|Single, dual, roundabout etc.|
|Special_Conditions_at_Site|Roadworks, oil, mud etc.|
|Speed_limit|Speed limit in force at incident location|
|Time|Time of incident|
|Urban_or_Rural_Area|Incident location type (urban or rural)|
|Weather_Conditions|Weather at incident (rain, fog etc)
|Year| Year of incident|
|InScotland|Was the incident in Scotland?

## Proposed Methodology

As there are many predictors, the first step will be cleaning the data and selecting the predictors. It will also be important to identify numeric data and categorical data for the purposes of one-hot encoding.

One issue that is likely to occur is the undersampling of the target categories of interest. We are trying to predict 'Serious' or 'Fatal' accidents, but there are far fewer of these in the dataset which means any models built will have a bias towards predicting severities classified as 'Slight'. 

This could be addressed by downsamplng at random the 'Slight' accidents to have a similar number to the 'Serious' and 'Fatal' accidents. Alternatively there are methods such as the 'Synthetic Minority Over-sampling TEchnique' (SMOTE) which produces synthetic data based on the real data, in order to redress the imablanced data.