# Reducing the number of high fatality accidents

## 📖 Background
You work for the road safety team within the department of transport and are looking into how they can reduce the number of serious accidents. The safety team classes serious accidents as fatal accidents involving 3+ casualties. They are trying to learn more about the characteristics of these serious accidents so they can brainstorm interventions that could lower the number of deaths. They have asked for your assistance with answering a number of questions.

## 💾 The data
The reporting department have been collecting data on every accident that is reported. They've included this along with a lookup file for 2020's accidents.

*Published by the department for transport. https://data.gov.uk/dataset/road-accidents-safety-data* 
*Contains public sector information licensed under the Open Government Licence v3.0.*

In [2]:
import pandas as pd
accidents = pd.read_csv(r'./data/accident-data.csv')
accidents.head()

Unnamed: 0,accident_index,accident_year,accident_reference,longitude,latitude,accident_severity,number_of_vehicles,number_of_casualties,date,day_of_week,...,second_road_class,second_road_number,pedestrian_crossing_human_control,pedestrian_crossing_physical_facilities,light_conditions,weather_conditions,road_surface_conditions,special_conditions_at_site,carriageway_hazards,urban_or_rural_area
0,2020010219808,2020,10219808,-0.254001,51.462262,3,1,1,04/02/2020,3,...,6,0,9,9,1,9,9,0,0,1
1,2020010220496,2020,10220496,-0.139253,51.470327,3,1,2,27/04/2020,2,...,6,0,0,4,1,1,1,0,0,1
2,2020010228005,2020,10228005,-0.178719,51.529614,3,1,1,01/01/2020,4,...,6,0,0,0,4,1,2,0,0,1
3,2020010228006,2020,10228006,-0.001683,51.54121,2,1,1,01/01/2020,4,...,6,0,0,4,4,1,1,0,0,1
4,2020010228011,2020,10228011,-0.137592,51.515704,3,1,2,01/01/2020,4,...,5,0,0,0,4,1,1,0,0,1


In [3]:
lookup = pd.read_csv(r'./data/road-safety-lookups.csv')
lookup.head()

Unnamed: 0,table,field name,code/format,label,note
0,Accident,accident_index,,,unique value for each accident. The accident_i...
1,Accident,accident_year,,,
2,Accident,accident_reference,,,In year id used by the police to reference a c...
3,Accident,longitude,,,Null if not known
4,Accident,Latitude,,,Null if not known


## 💪 Competition challenge

Create a report that covers the following:

1. What time of day and day of the week do most serious accidents happen?
2. Are there any patterns in the time of day/ day of the week when serious accidents occur?
3. What characteristics stand out in serious accidents compared with other accidents?
4. On what areas would you recommend the planning team focus their brainstorming efforts to reduce serious accidents?

## 🧑‍⚖️ Judging criteria

| CATEGORY | WEIGHTING | DETAILS                                                              |
|:---------|:----------|:---------------------------------------------------------------------|
| **Recommendations** | 35%       | <ul><li>Clarity of recommendations - how clear and well presented the recommendation is.</li><li>Quality of recommendations - are appropriate analytical techniques used & are the conclusions valid?</li><li>Number of relevant insights found for the target audience.</li></ul>       |
| **Storytelling**  | 30%       | <ul><li>How well the data and insights are connected to the recommendation.</li><li>How the narrative and whole report connects together.</li><li>Balancing making the report in depth enough but also concise.</li></ul> |
| **Visualizations** | 25% | <ul><li>Appropriateness of visualization used.</li><li>Clarity of insight from visualization.</li></ul> |
| **Votes** | 10% | <ul><li>Up voting - most upvoted entries get the most points.</li></ul> |

## ✅ Checklist before publishing into the competition
- Rename your workspace to make it descriptive of your work. N.B. you should leave the notebook name as notebook.ipynb.
- Remove redundant cells like the judging criteria so the workbook is focused on your story.
- Make sure the workbook reads well and explains how you found your insights.
- Check that all the cells run without error.

## ⌛️ Time is ticking. Good luck!

# 1. What time of day and day of the week do most serious accidents happen?

In [5]:
accidents.columns

Index(['accident_index', 'accident_year', 'accident_reference', 'longitude',
       'latitude', 'accident_severity', 'number_of_vehicles',
       'number_of_casualties', 'date', 'day_of_week', 'time',
       'first_road_class', 'first_road_number', 'road_type', 'speed_limit',
       'junction_detail', 'junction_control', 'second_road_class',
       'second_road_number', 'pedestrian_crossing_human_control',
       'pedestrian_crossing_physical_facilities', 'light_conditions',
       'weather_conditions', 'road_surface_conditions',
       'special_conditions_at_site', 'carriageway_hazards',
       'urban_or_rural_area'],
      dtype='object')

In [17]:
accidents.time = pd.to_datetime(accidents.time)

In [41]:
acc_hour = accidents.time.dt.hour.value_counts()
acc_hour.head()

17    7813
16    7381
15    7361
18    6618
14    6245
Name: time, dtype: int64

In [65]:
pd.cut(accidents.time.dt.hour, bins=[0,5,10,13,16,19,22,25]).value_counts()

(13, 16]    20987
(16, 19]    19479
(5, 10]     18923
(10, 13]    15948
(19, 22]     9376
(0, 5]       3502
(22, 25]     1796
Name: time, dtype: int64

## It seems that most of the accidents happens past 3pm, especially between 3pm and 7pm.

In [85]:
accidents['day_week_name'] = accidents.day_of_week.replace({1:'sunday',
                               2:'monday',
                              3:'tueday',
                               4:'wednesday',
                              5:'thursday',
                              6:'friday',
                              7:'saturday'})
accidents.day_week_name.value_counts(normalize=True)*100

friday       16.325837
thursday     15.412450
wednesday    14.872970
tueday       14.547309
monday       14.004540
saturday     13.526464
sunday       11.310431
Name: day_week_name, dtype: float64

## Also, it seems that, although well distributed, most of the accidents happens on Fridays.

# 2. Are there any patterns in the time of day/ day of the week when serious accidents occur?

In [95]:
serious_acc = accidents[accidents['accident_severity'] == 3]
serious_acc.head()

Unnamed: 0,accident_index,accident_year,accident_reference,longitude,latitude,accident_severity,number_of_vehicles,number_of_casualties,date,day_of_week,...,second_road_number,pedestrian_crossing_human_control,pedestrian_crossing_physical_facilities,light_conditions,weather_conditions,road_surface_conditions,special_conditions_at_site,carriageway_hazards,urban_or_rural_area,day_week_name
0,2020010219808,2020,10219808,-0.254001,51.462262,3,1,1,04/02/2020,3,...,0,9,9,1,9,9,0,0,1,tueday
1,2020010220496,2020,10220496,-0.139253,51.470327,3,1,2,27/04/2020,2,...,0,0,4,1,1,1,0,0,1,monday
2,2020010228005,2020,10228005,-0.178719,51.529614,3,1,1,01/01/2020,4,...,0,0,0,4,1,2,0,0,1,wednesday
4,2020010228011,2020,10228011,-0.137592,51.515704,3,1,2,01/01/2020,4,...,0,0,0,4,1,1,0,0,1,wednesday
5,2020010228012,2020,10228012,-0.02588,51.476278,3,1,1,01/01/2020,4,...,0,0,0,4,1,1,0,0,1,wednesday


In [98]:
serious_acc.groupby('day_of_week').size()

day_of_week
1     7810
2    10121
3    10531
4    10758
5    11026
6    11724
7     9483
dtype: int64