# 400_mapping_values

## Purpose
In this notebook we will get started on the cleaning and verification of our raw data, focusing mainly on replacing our numeric values with their respective string values from our dataset guide. This notebook focuses on the prep on parts c and d of the 1979-2004 data and 2015-2016 data.


## Notebook Contents:
* __1:__ Loading the Datasets
   
* __2:__ Mapping Values
    * __2.1:__ Dataset 1979-2004 (c)
    * __2.2:__ Dataset 1979-2004 (d)
    * __2.3:__ Dataset 2015-2016
    
    
* __3:__ Saving Data to Pickle Files

* __4:__ Creating Data Dictionaries


## Datasets
__Input:__ 
* 100_7904c_accidents.pkl  (accident data for years 1979-2004 part c)
* 100_7904d_accidents.pkl  (accident data for years 1979-2004 part d)
* 200_all_2015-2016.pkl    (accident data for years 2015-2016)


__Output:__ 
* 400_mapping_values_7904c.pkl   (data for years 1979-2004 (c) with mapped strings)
* 400_mapping_values_7904d.pkl (data for years 1979-2004 (d) with mapped strings)
* 400_mapping_values_1516.pkl   (data for years 2015-2016 with mapped strings)

In [1]:
import os
import sys

import pandas as pd

module_path = os.path.abspath(os.path.join('../../data/..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src.helpers import data_dictionary

%matplotlib inline

# 1. Loading the Datasets

In [2]:
df7904c = pd.read_pickle('../../data/processed/100_7904c_accidents.pkl')
df7904c.shape

(2999999, 27)

In [3]:
df7904d = pd.read_pickle('../../data/processed/100_7904d_accidents.pkl')
df7904d.shape

(1981967, 27)

In [4]:
df1516 = pd.read_pickle('../../data/processed/200_all_2015-2016.pkl')
df1516.shape

(479548, 27)

# 2. Mapping Values
We will using mapping in order to sub in the string values associated with each number in  each dataset using the dataset guide provided with the dataset.

_Note:_ In the dataset each string for a given column is represented by a number(e.g): In the 'Day_of_Week' column 'fatal' is represented by a 1.

In [5]:
Accident_Severity_map = {1:'fatal', 2:'serious', 3:'slight'}
Day_of_Week_map = {1:'monday', 2:'tuesday', 3:'wednesday', 4:'thursday', 5:'friday', 6:'saturday', 7:'sunday'}
Road_Type_map = {1:'roundabout', 2:'one way street', 3:'dual carriageway', 6: 'single carriageway', 7:'slip road', 9:'unknown', 12:'one way street/slip road'}
Junction_Detail_map = {0:'not at junction or within 20 metres', 1:'roundabout', 2:'mini-roundabout', 3:'T or staggered junction', 5:'slip road', 6:'crossroads', 7:'more than 4 arms (not roundabout)', 8:'private drive or entrance', 9:'other'}
Light_Conditions_map = {1:'daylight', 4:'darkness - lights lit', 5:'darkness - lights unlit', 6:'darkness - no lighting', 7:'darkness - lighting unknown'}
Weather_Conditions_map = {1:'fine no high winds', 2:'raining no high winds', 3:'snowing no high winds', 4:'fine + high winds', 5:'raining + high winds', 6:'snowing + high winds', 7:'fog or mist', 8:'other', 9:'unknown'}
Road_Surface_Conditions_map = {1:'dry', 2:'wet or damp', 3:'snow', 4:'frost or ice', 5:'flood over 3cm. deep', 6:'oil or diesel', 7:'mud'}
Special_Conditions_at_Site_map = {0:'none', 1:'auto traffic signal - out', 2:'auto signal part defective', 3:'road sign or marking defective or obscured', 4:'roadworks', 5:'road surface defective', 6:'oil or diesel', 7:'mud'}
Urban_or_Rural_Area_map = {1:'urban', 2:'rural', 3:'unknown'}
Vehicle_Type_map = {1:'pedal cycle', 2:'motorcycle', 3:'motorcycle', 4:'motorcycle', 5:'motorcycle', 8:'taxi/private hire car', 9:'car', 10:'minibus', 11:'bus or coach', 16:'ridden horse', 17:'agricultural vehicle', 18:'tram', 19:'van / goods 3.5 tonnes mgw or under', 20:'goods over 3.5t. and under 7.5t', 21:'goods 7.5 tonnes mgw and over', 22:'mobility scooter', 23:'electric motocycle', 90:'unknown', 97:'motorcycle', 98:'goods vehicle', 103:'motorcycle - scooter', 104:'motorcycle', 105:'motorcycle', 106:'motorcyle', 108:'taxi', 109:'car', 110:'minibus', 113:'goods vehicle over 3.5 tonnes'}
Vehicle_Manoeuvre_map = {1:'reversing', 2:'parked', 3:'waiting to go - held up', 4:'slowing or stopping', 5:'moving off', 6:'u-turn', 7:'turning left', 8:'waiting to turn left', 9:'turning right', 10:'waiting to turn right', 11:'changing lane to left', 12:'changing lane to right', 13:'overtaking moving vehicle - offside', 14:'overtaking static vehicle - offside', 15:'overtaking - nearside', 16:'going ahead left-hand bend', 17:'goinf ahead right-hand bend', 18:'going ahead other'}
Location_Restricted_map = {0:'On main carriageway - not in restricted lane', 1:'tram/light rail track', 2:'bus lane', 3:'bus lane', 4:'cycle lane (on main carriageway)', 5:'cycleway or shared use footway (not part of  main carriageway)', 6:'on lay-by or hard shoulder', 7:'entering lay-by or hard shoulder', 8:'leaving lay-by or hard shoulder', 9:'footpath', 10:'not on carriageway'}
Journey_Purpose_of_Driver_map = {1:'occupational', 2:'commuting to/from work', 3:'taking pupil to/from school', 4:'pupil riding to/from school', 5:'other', 6:'unknown', 15:'other/unknown (2005-10)'}
Sex_of_Driver_map = {1:'male', 2:'female', 3:'unknown'}
Age_Band_of_Driver_map = {1:'0 - 5', 2:'6 - 10', 3:'11 - 15', 4:'16 - 20', 5:'21 - 25', 6:'26 - 35', 7:'36 - 45', 8:'46 - 55', 9:'56 - 65', 10:'66 - 75', 11:'75+'}
Driver_IMD_Decile_map = {1:'most deprived 10%', 2:'most deprived 10-20%', 3:'most deprived 20-30%', 4:'most deprived 30-40%', 5:'most deprived 40-50%', 6:'less deprived 40-50%', 7:'less deprived 30-40%', 8:'less deprived 20-30%', 9:'less deprived 10-20%', 10:'least deprived 10%'}
Driver_Home_Area_Type_map = {1:'urban', 2:'small town', 3:'rural'}

Next, we will apply these to the columns in each dataset (1979-2004 (c), 1979-2004 (d), 2015-2016) and check that they have been applied successfully by looking at the first 5 rows of each dataset.

## 2.1 
## Dataset 1979 - 2004 (c)
For each column in the dataset we will map the string values specified above.
Here is the
<a href="https://stackoverflow.com/questions/17114904/python-pandas-replacing-strings-in-dataframe-with-numbers">link</a> to where we found how to replace ints with strings in pandas.

For each column in the dataset we apply the corresponding map specified above.

In [6]:
df7904c['Accident_Severity'] = df7904c['Accident_Severity'].apply(lambda i: Accident_Severity_map.get(i) if i in Accident_Severity_map else i)
df7904c['Day_of_Week'] = df7904c['Day_of_Week'].apply(lambda i: Day_of_Week_map.get(i) if i in Day_of_Week_map else i)
df7904c['Road_Type'] = df7904c['Road_Type'].apply(lambda i: Road_Type_map.get(i) if i in Road_Type_map else i)
df7904c['Junction_Detail'] = df7904c['Junction_Detail'].apply(lambda i: Junction_Detail_map.get(i) if i in Junction_Detail_map else i)
df7904c['Light_Conditions'] = df7904c['Light_Conditions'].apply(lambda i: Light_Conditions_map.get(i) if i in Light_Conditions_map else i)
df7904c['Weather_Conditions'] = df7904c['Weather_Conditions'].apply(lambda i: Weather_Conditions_map.get(i) if i in Weather_Conditions_map else i)
df7904c['Road_Surface_Conditions'] = df7904c['Road_Surface_Conditions'].apply(lambda i: Road_Surface_Conditions_map.get(i) if i in Road_Surface_Conditions_map else i)
df7904c['Special_Conditions_at_Site'] = df7904c['Special_Conditions_at_Site'].apply(lambda i: Special_Conditions_at_Site_map.get(i) if i in Special_Conditions_at_Site_map else i)
df7904c['Urban_or_Rural_Area'] = df7904c['Urban_or_Rural_Area'].apply(lambda i: Urban_or_Rural_Area_map.get(i) if i in Urban_or_Rural_Area_map else i)
df7904c['Vehicle_Type'] = df7904c['Vehicle_Type'].apply(lambda i: Vehicle_Type_map.get(i) if i in Vehicle_Type_map else i)
df7904c['Vehicle_Manoeuvre'] = df7904c['Vehicle_Manoeuvre'].apply(lambda i: Vehicle_Manoeuvre_map.get(i) if i in Vehicle_Manoeuvre_map else i)
df7904c['Vehicle_Location-Restricted_Lane'] = df7904c['Vehicle_Location-Restricted_Lane'].apply(lambda i: Location_Restricted_map.get(i) if i in Location_Restricted_map else i)
df7904c['Journey_Purpose_of_Driver'] = df7904c['Journey_Purpose_of_Driver'].apply(lambda i: Journey_Purpose_of_Driver_map.get(i) if i in Journey_Purpose_of_Driver_map else i)
df7904c['Sex_of_Driver'] = df7904c['Sex_of_Driver'].apply(lambda i: Sex_of_Driver_map.get(i) if i in Sex_of_Driver_map else i)
df7904c['Age_Band_of_Driver'] = df7904c['Age_Band_of_Driver'].apply(lambda i: Age_Band_of_Driver_map.get(i) if i in Age_Band_of_Driver_map else i)
df7904c['Driver_IMD_Decile'] = df7904c['Driver_IMD_Decile'].apply(lambda i: Driver_IMD_Decile_map.get(i) if i in Driver_IMD_Decile_map else i)
df7904c['Driver_Home_Area_Type'] = df7904c['Driver_Home_Area_Type'].apply(lambda i: Driver_Home_Area_Type_map.get(i) if i in Driver_Home_Area_Type_map else i)

Below you can see that the mapping has been applied to the dataset successfully.

In [7]:
df7904c.head()

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
6000001,199301NI00616,,,slight,2,1,20/10/1993,thursday,11:25,3,...,-1,car,turning left,On main carriageway - not in restricted lane,-1,male,26 - 35,6,-1,-1
6000002,199301NI00617,,,slight,2,1,15/11/1993,tuesday,15:43,3,...,-1,van / goods 3.5 tonnes mgw or under,turning right,On main carriageway - not in restricted lane,-1,male,36 - 45,4,-1,-1
6000003,199301NI00617,,,slight,2,1,15/11/1993,tuesday,15:43,3,...,-1,pedal cycle,going ahead other,On main carriageway - not in restricted lane,-1,male,36 - 45,-1,-1,-1
6000004,199301NI00618,,,slight,2,3,15/11/1993,tuesday,20:40,3,...,-1,motorcycle,going ahead other,On main carriageway - not in restricted lane,-1,male,21 - 25,-1,-1,-1
6000005,199301NI00618,,,slight,2,3,15/11/1993,tuesday,20:40,3,...,-1,car,turning right,On main carriageway - not in restricted lane,-1,female,-1,-1,-1,-1


## 2.2
## Dataset 1979 - 2004 (d)
For each column in the dataset we will map the string values specified above.
Here is the
<a href="https://stackoverflow.com/questions/17114904/python-pandas-replacing-strings-in-dataframe-with-numbers">link</a> to where we found how to replace ints with strings in pandas.

For each column in the dataset we apply the corresponding map specified at the beginning of this section of the notebook.

In [8]:
df7904d['Accident_Severity'] = df7904d['Accident_Severity'].apply(lambda i: Accident_Severity_map.get(i) if i in Accident_Severity_map else i)
df7904d['Day_of_Week'] = df7904d['Day_of_Week'].apply(lambda i: Day_of_Week_map.get(i) if i in Day_of_Week_map else i)
df7904d['Road_Type'] = df7904d['Road_Type'].apply(lambda i: Road_Type_map.get(i) if i in Road_Type_map else i)
df7904d['Junction_Detail'] = df7904d['Junction_Detail'].apply(lambda i: Junction_Detail_map.get(i) if i in Junction_Detail_map else i)
df7904d['Light_Conditions'] = df7904d['Light_Conditions'].apply(lambda i: Light_Conditions_map.get(i) if i in Light_Conditions_map else i)
df7904d['Weather_Conditions'] = df7904d['Weather_Conditions'].apply(lambda i: Weather_Conditions_map.get(i) if i in Weather_Conditions_map else i)
df7904d['Road_Surface_Conditions'] = df7904d['Road_Surface_Conditions'].apply(lambda i: Road_Surface_Conditions_map.get(i) if i in Road_Surface_Conditions_map else i)
df7904d['Special_Conditions_at_Site'] = df7904d['Special_Conditions_at_Site'].apply(lambda i: Special_Conditions_at_Site_map.get(i) if i in Special_Conditions_at_Site_map else i)
df7904d['Urban_or_Rural_Area'] = df7904d['Urban_or_Rural_Area'].apply(lambda i: Urban_or_Rural_Area_map.get(i) if i in Urban_or_Rural_Area_map else i)
df7904d['Vehicle_Type'] = df7904d['Vehicle_Type'].apply(lambda i: Vehicle_Type_map.get(i) if i in Vehicle_Type_map else i)
df7904d['Vehicle_Manoeuvre'] = df7904d['Vehicle_Manoeuvre'].apply(lambda i: Vehicle_Manoeuvre_map.get(i) if i in Vehicle_Manoeuvre_map else i)
df7904d['Vehicle_Location-Restricted_Lane'] = df7904d['Vehicle_Location-Restricted_Lane'].apply(lambda i: Location_Restricted_map.get(i) if i in Location_Restricted_map else i)
df7904d['Journey_Purpose_of_Driver'] = df7904d['Journey_Purpose_of_Driver'].apply(lambda i: Journey_Purpose_of_Driver_map.get(i) if i in Journey_Purpose_of_Driver_map else i)
df7904d['Sex_of_Driver'] = df7904d['Sex_of_Driver'].apply(lambda i: Sex_of_Driver_map.get(i) if i in Sex_of_Driver_map else i)
df7904d['Age_Band_of_Driver'] = df7904d['Age_Band_of_Driver'].apply(lambda i: Age_Band_of_Driver_map.get(i) if i in Age_Band_of_Driver_map else i)
df7904d['Driver_IMD_Decile'] = df7904d['Driver_IMD_Decile'].apply(lambda i: Driver_IMD_Decile_map.get(i) if i in Driver_IMD_Decile_map else i)
Below you can see that the mapping has been applied to the dataset successfully.df7904d['Driver_Home_Area_Type'] = df7904d['Driver_Home_Area_Type'].apply(lambda i: Driver_Home_Area_Type_map.get(i) if i in Driver_Home_Area_Type_map else i)

Below you can see that the mapping has been applied to the dataset successfully.

In [9]:
df7904d.head()

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
9000001,200001TX00967,-0.314746,51.489498,slight,3,2,11/09/2000,tuesday,06:47,25,...,urban,van / goods 3.5 tonnes mgw or under,going ahead other,On main carriageway - not in restricted lane,-1,male,26 - 35,5,-1,-1
9000002,200001TX00967,-0.314746,51.489498,slight,3,2,11/09/2000,tuesday,06:47,25,...,urban,car,slowing or stopping,On main carriageway - not in restricted lane,-1,male,26 - 35,-1,-1,-1
9000003,200001TX00967,-0.314746,51.489498,slight,3,2,11/09/2000,tuesday,06:47,25,...,urban,car,slowing or stopping,On main carriageway - not in restricted lane,-1,female,26 - 35,9,-1,-1
9000004,200001TX00969,-0.398485,51.451105,slight,2,1,10/09/2000,monday,14:30,25,...,urban,bus or coach,going ahead other,On main carriageway - not in restricted lane,-1,male,26 - 35,-1,-1,-1
9000005,200001TX00969,-0.398485,51.451105,slight,2,1,10/09/2000,monday,14:30,25,...,urban,pedal cycle,going ahead other,On main carriageway - not in restricted lane,-1,male,6 - 10,-1,-1,-1


## 2.3
## Dataset 2015 - 2016
For each column in the dataset we will map the string values specified above.
Here is the
<a href="https://stackoverflow.com/questions/17114904/python-pandas-replacing-strings-in-dataframe-with-numbers">link</a> to where we found how to replace ints with strings in pandas.

For each column in the dataset we apply the corresponding map specified at the beginning of this section of the notebook.

In [10]:
df1516['Accident_Severity'] = df1516['Accident_Severity'].apply(lambda i: Accident_Severity_map.get(i) if i in Accident_Severity_map else i)
df1516['Day_of_Week'] = df1516['Day_of_Week'].apply(lambda i: Day_of_Week_map.get(i) if i in Day_of_Week_map else i)
df1516['Road_Type'] = df1516['Road_Type'].apply(lambda i: Road_Type_map.get(i) if i in Road_Type_map else i)
df1516['Junction_Detail'] = df1516['Junction_Detail'].apply(lambda i: Junction_Detail_map.get(i) if i in Junction_Detail_map else i)
df1516['Light_Conditions'] = df1516['Light_Conditions'].apply(lambda i: Light_Conditions_map.get(i) if i in Light_Conditions_map else i)
df1516['Weather_Conditions'] = df1516['Weather_Conditions'].apply(lambda i: Weather_Conditions_map.get(i) if i in Weather_Conditions_map else i)
df1516['Road_Surface_Conditions'] = df1516['Road_Surface_Conditions'].apply(lambda i: Road_Surface_Conditions_map.get(i) if i in Road_Surface_Conditions_map else i)
df1516['Special_Conditions_at_Site'] = df1516['Special_Conditions_at_Site'].apply(lambda i: Special_Conditions_at_Site_map.get(i) if i in Special_Conditions_at_Site_map else i)
df1516['Urban_or_Rural_Area'] = df1516['Urban_or_Rural_Area'].apply(lambda i: Urban_or_Rural_Area_map.get(i) if i in Urban_or_Rural_Area_map else i)
df1516['Vehicle_Type'] = df1516['Vehicle_Type'].apply(lambda i: Vehicle_Type_map.get(i) if i in Vehicle_Type_map else i)
df1516['Vehicle_Manoeuvre'] = df1516['Vehicle_Manoeuvre'].apply(lambda i: Vehicle_Manoeuvre_map.get(i) if i in Vehicle_Manoeuvre_map else i)
df1516['Vehicle_Location-Restricted_Lane'] = df1516['Vehicle_Location-Restricted_Lane'].apply(lambda i: Location_Restricted_map.get(i) if i in Location_Restricted_map else i)
df1516['Journey_Purpose_of_Driver'] = df1516['Journey_Purpose_of_Driver'].apply(lambda i: Journey_Purpose_of_Driver_map.get(i) if i in Journey_Purpose_of_Driver_map else i)
df1516['Sex_of_Driver'] = df1516['Sex_of_Driver'].apply(lambda i: Sex_of_Driver_map.get(i) if i in Sex_of_Driver_map else i)
df1516['Age_Band_of_Driver'] = df1516['Age_Band_of_Driver'].apply(lambda i: Age_Band_of_Driver_map.get(i) if i in Age_Band_of_Driver_map else i)
df1516['Driver_IMD_Decile'] = df1516['Driver_IMD_Decile'].apply(lambda i: Driver_IMD_Decile_map.get(i) if i in Driver_IMD_Decile_map else i)
df1516['Driver_Home_Area_Type'] = df1516['Driver_Home_Area_Type'].apply(lambda i: Driver_Home_Area_Type_map.get(i) if i in Driver_Home_Area_Type_map else i)

Below you can see that the mapping has been applied to the dataset successfully.

In [11]:
df1516.head()

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
0,201501BS70001,-0.198465,51.505538,slight,1,1,12/01/2015,tuesday,18:45,12,...,urban,van / goods 3.5 tonnes mgw or under,turning right,On main carriageway - not in restricted lane,occupational,male,-1,4,-1,-1
1,201501BS70002,-0.178838,51.491836,slight,1,1,12/01/2015,tuesday,07:50,12,...,urban,car,turning right,On main carriageway - not in restricted lane,unknown,male,-1,3,-1,-1
2,201501BS70004,-0.20559,51.51491,slight,1,1,12/01/2015,tuesday,18:08,12,...,urban,car,turning right,On main carriageway - not in restricted lane,unknown,male,26 - 35,10,-1,urban
3,201501BS70005,-0.208327,51.514952,slight,1,1,13/01/2015,wednesday,07:40,12,...,urban,car,turning right,On main carriageway - not in restricted lane,unknown,male,-1,-1,-1,-1
4,201501BS70008,-0.206022,51.496572,serious,2,1,09/01/2015,saturday,07:30,12,...,urban,pedal cycle,going ahead other,On main carriageway - not in restricted lane,commuting to/from work,male,46 - 55,-1,-1,urban


# 3. Saving Data to Pickle Files
Here we will save each of the three datasets into a pickle file.

In [12]:
pickle_save_time = %timeit -o df7904c.to_pickle("../../data/processed/400_mapping_values_7904c.pkl")

pickle_save_time

41.5 s ± 15 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 41.5 s ± 15 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>

In [13]:
pickle_save_time = %timeit -o df7904d.to_pickle("../../data/processed/400_mapping_values_7904d.pkl")

pickle_save_time

22.8 s ± 8.24 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 22.8 s ± 8.24 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>

In [14]:
pickle_save_time = %timeit -o df1516.to_pickle("../../data/processed/400_mapping_values_1516.pkl")

pickle_save_time

3.76 s ± 146 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 3.76 s ± 146 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

# 4. Creating Data Dictionaries
Below we will create a data dictionary for each of the 3 pickle files created above.

In [15]:
data_dictionary.save(
    '../../data/processed/400_mapping_values_7904c.pkl', 

"""\
Aggregate raw data for UK Road Safety data for years 1979 - 2004 (c).
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,2999999.0,1653282.0,199722AX01148,88.0,,,,,,,,0,0.0
Longitude,476279.0,,,,-1.29859,1.35915,-7.51329,-2.22435,-1.19521,-0.143788,1.75861,2523720,84.124028
Latitude,476279.0,,,,52.4471,1.35793,49.9143,51.4898,51.8747,53.3991,60.6934,2523720,84.124028
Accident_Severity,2999999.0,3.0,slight,2526120.0,,,,,,,,0,0.0
Number_of_Vehicles,3000000.0,,,,2.11218,1.2306,1.0,2.0,2.0,2.0,88.0,0,0.0


In [16]:
data_dictionary.save(
    '../../data/processed/400_mapping_values_7904d.pkl', 

"""\
Aggregate raw data for UK Road Safety data for years 1979 - 2004 (d).
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,1981967.0,1078292.0,200346SW73961,66.0,,,,,,,,0,0.0
Longitude,1977320.0,,,,-1.47628,1.37502,-7.53617,-2.37656,-1.46803,-0.288233,1.76059,4648,0.234514
Latitude,1977320.0,,,,52.5957,1.41162,49.9128,51.5014,52.4046,53.4749,60.8017,4648,0.234514
Accident_Severity,1981967.0,3.0,slight,1706942.0,,,,,,,,0,0.0
Number_of_Vehicles,1981970.0,,,,2.12637,1.00562,1.0,2.0,2.0,2.0,66.0,0,0.0


In [17]:
data_dictionary.save(
    '../../data/processed/400_mapping_values_1516.pkl', 

"""\
Aggregate raw data for UK Road Safety data for years 2015-2016.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,479548,260293.0,201543P296025,37.0,,,,,,,,0,0.0
Longitude,479483,,,,-1.38182,1.38744,-7.42291,-2.25925,-1.31897,-0.19258,1.75844,65,0.013554
Latitude,479483,,,,52.5624,1.39747,49.9156,51.5108,52.2473,53.4407,60.6611,65,0.013554
Accident_Severity,479548,3.0,slight,407099.0,,,,,,,,0,0.0
Number_of_Vehicles,479548,,,,2.11341,0.894927,1.0,2.0,2.0,2.0,37.0,0,0.0
