# 300_mapping_values

## Purpose

In this notebook we will get started on the cleaning and verification of our raw data, focusing mainly on replacing our numeric values with their respective string values from our dataset guide ('Road-Accident-Safety-Data-Guide.xls' in our Data > Raw Folder)

Due to the size of our Datasets (~ 3 million rows each), we will be performing tasks on only three datasets within this notebook.

As such, we will process each dataset separately, one after the other.


### Notebook Contents:

* __1:__ Loading our Datasets

* __2:__ Mapping Values

     * __2.1:__ Dataset 1979 - 2004 (a)
     * __2.2:__ Dataset 1979 - 2004 (b)
     * __2.3:__ Dataset 2005 - 2014


* __3:__ Saving Datasets to Pickle Files

* __4:__ Creating Data Dictionaries


## Datasets
* __Input__: 


* 100_0514_accidents.pkl (Vehicle and Accident Data for all Recorded UK Road Accidents from 2005 - 2014)


* 100_7904a_accidents.pkl (First 3 million lines containing Vehicle and Accident Data from combined input files:  Vehicles7904.csv &  Accidents7904.csv)


* 100_7904b_accidents.pkl (Second 3 million lines containing Vehicle and Accident Data from combined input files:  Vehicles7904.csv &  Accidents7904.csv)


* __Output__: 


* 300_mapping_values_0514.pkl (Vehicle and Accident Data for all Recorded UK Road Accidents from 2005 - 2014, with values mapped)


* 300_mapping_values_7904a.pkl (First 3 million rows of Vehicle and Accident Data for all Recorded UK Road Accidents from 1979 - 2004 dataset, with values mapped)


* 300_mapping_values_7904b.pkl (Second 3 million rows of Vehicle and Accident Data for all Recorded UK Road Accidents from 1979 - 2004 dataset, with values mapped)

In [1]:
import os
import sys

import pandas as pd

module_path = os.path.abspath(os.path.join('../../data/..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src.helpers import data_dictionary

%matplotlib inline

## 1.  Loading the Datasets

Below we will load our input files using the pd.read_pickle method.

In [2]:
df7904a= pd.read_pickle('../../data/processed/100_7904a_accidents.pkl')
df7904a.shape

(3000000, 27)

In [3]:
df7904b= pd.read_pickle('../../data/processed/100_7904b_accidents.pkl')
df7904b.shape

(2999999, 27)

In [4]:
df0514= pd.read_pickle('../../data/processed/100_0514_accidents.pkl')
df0514.shape

(3004425, 27)

## 2. Mapping Values

Curently, our 3 datasets consist of columns filled integer values which correspond to string values within our ''Road-Accident-Safety-Data-Guide.xls'. 

For example, in the 'Sex_of_Driver' column, rather than listing 'Male', 'Female' or 'Unknown, it is filled with 1, 2 or 0 respectively. 

Using mapping, we will specify the string values for each associated integer value within each column of our 3 datasets. We will do this so that we can easily analyse our data in future notebooks.

For example, as above, in our 'Sex_of_Driver' column, our data will be converted from integers 1's, 2's and 0's to String Values for 'Male', 'Female' or 'Unknown'.

In order to do this, we have defined the string values which need to be substituted for each specified integer value within the columns of all 3 datasets:

In [17]:
Accident_Severity_map = {1:'fatal', 2:'serious', 3:'slight'}
Day_of_Week_map = {1:'monday', 2:'tuesday', 3:'wednesday', 4:'thursday', 5:'friday', 6:'saturday', 7:'sunday'}
Road_Type_map = {1:'roundabout', 2:'one way street', 3:'dual carriageway', 6: 'single carriageway', 7:'slip road', 9:'unknown', 12:'one way street/slip road'}
Junction_Detail_map = {0:'not at junction or within 20 metres', 1:'roundabout', 2:'mini-roundabout', 3:'T or staggered junction', 5:'slip road', 6:'crossroads', 7:'more than 4 arms (not roundabout)', 8:'private drive or entrance', 9:'other'}
Light_Conditions_map = {1:'daylight', 4:'darkness - lights lit', 5:'darkness - lights unlit', 6:'darkness - no lighting', 7:'darkness - lighting unknown'}
Weather_Conditions_map = {1:'fine no high winds', 2:'raining no high winds', 3:'snowing no high winds', 4:'fine + high winds', 5:'raining + high winds', 6:'snowing + high winds', 7:'fog or mist', 8:'other', 9:'unknown'}
Road_Surface_Conditions_map = {1:'dry', 2:'wet or damp', 3:'snow', 4:'frost or ice', 5:'flood over 3cm. deep', 6:'oil or diesel', 7:'mud'}
Special_Conditions_at_Site_map = {0:'none', 1:'auto traffic signal - out', 2:'auto signal part defective', 3:'road sign or marking defective or obscured', 4:'roadworks', 5:'road surface defective', 6:'oil or diesel', 7:'mud'}
Urban_or_Rural_Area_map = {1:'urban', 2:'rural', 3:'unknown'}
Vehicle_Type_map = {1:'pedal cycle', 2:'motorcycle', 3:'motorcycle', 4:'motorcycle', 5:'motorcycle', 8:'taxi/private hire car', 9:'car', 10:'minibus', 11:'bus or coach', 16:'ridden horse', 17:'agricultural vehicle', 18:'tram', 19:'van / goods 3.5 tonnes mgw or under', 20:'goods over 3.5t. and under 7.5t', 21:'goods 7.5 tonnes mgw and over', 22:'mobility scooter', 23:'electric motocycle', 90:'unknown', 97:'motorcycle', 98:'goods vehicle', 103:'motorcycle - scooter', 104:'motorcycle', 105:'motorcycle', 106:'motorcyle', 108:'taxi', 109:'car', 110:'minibus', 113:'goods vehicle over 3.5 tonnes'}
Vehicle_Manoeuvre_map = {1:'reversing', 2:'parked', 3:'waiting to go - held up', 4:'slowing or stopping', 5:'moving off', 6:'u-turn', 7:'turning left', 8:'waiting to turn left', 9:'turning right', 10:'waiting to turn right', 11:'changing lane to left', 12:'changing lane to right', 13:'overtaking moving vehicle - offside', 14:'overtaking static vehicle - offside', 15:'overtaking - nearside', 16:'going ahead left-hand bend', 17:'goinf ahead right-hand bend', 18:'going ahead other'}
Location_Restricted_map = {0:'On main carriageway - not in restricted lane', 1:'tram/light rail track', 2:'bus lane', 3:'bus lane', 4:'cycle lane (on main carriageway)', 5:'cycleway or shared use footway (not part of  main carriageway)', 6:'on lay-by or hard shoulder', 7:'entering lay-by or hard shoulder', 8:'leaving lay-by or hard shoulder', 9:'footpath', 10:'not on carriageway'}
Journey_Purpose_of_Driver_map = {1:'occupational', 2:'commuting to/from work', 3:'taking pupil to/from school', 4:'pupil riding to/from school', 5:'other', 6:'unknown', 15:'other/unknown (2005-10)'}
Sex_of_Driver_map = {1:'male', 2:'female', 3:'unknown'}
Age_Band_of_Driver_map = {1:'0 - 5', 2:'6 - 10', 3:'11 - 15', 4:'16 - 20', 5:'21 - 25', 6:'26 - 35', 7:'36 - 45', 8:'46 - 55', 9:'56 - 65', 10:'66 - 75', 11:'75+'}
Driver_IMD_Decile_map = {1:'most deprived 10%', 2:'most deprived 10-20%', 3:'most deprived 20-30%', 4:'most deprived 30-40%', 5:'most deprived 40-50%', 6:'less deprived 40-50%', 7:'less deprived 30-40%', 8:'less deprived 20-30%', 9:'less deprived 10-20%', 10:'least deprived 10%'}
Driver_Home_Area_Type_map = {1:'urban', 2:'small town', 3:'rural'}

## 2.1
### Dataset 1979 - 2004 (a)

Now that we have defined our mapping values, we will perform the map function on each column of our df794a dataset.

In [5]:
df7904a.head() #print the first lines of the original dataset, containing integer values before mapping.

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
0,197901A11AD14,,,3,2,1,18/01/1979,5,08:00,11,...,-1,109,18,-1,-1,1,7,-1,-1,-1
1,197901A11AD14,,,3,2,1,18/01/1979,5,08:00,11,...,-1,104,13,-1,-1,1,-1,-1,-1,-1
2,197901A1BAW34,,,3,1,1,01/01/1979,2,01:00,23,...,-1,109,18,-1,-1,1,-1,-1,-1,-1
3,197901A1BFD77,,,3,2,3,01/01/1979,2,01:25,17,...,-1,109,18,-1,-1,1,5,-1,-1,-1
4,197901A1BFD77,,,3,2,3,01/01/1979,2,01:25,17,...,-1,109,18,-1,-1,1,7,-1,-1,-1


Now we will apply the map functions defined above to each column of the df7904a dataset.

To apply these map functions:

* We must specify the column which we want to apply the mapping function to.
* We then must apply a lambda function to our map, which will replace any occurrence of a value that is present in the map, otherwise it will leave the value as the original.

We have used code from [this](https://stackoverflow.com/questions/17114904/python-pandas-replacing-strings-in-dataframe-with-numbers) stack overflow question to carry out the mapping function.

In [13]:
df7904a['Accident_Severity'] = df7904a['Accident_Severity'].apply(lambda s: Accident_Severity_map.get(s) if s in Accident_Severity_map else s)
df7904a['Day_of_Week'] = df7904a['Day_of_Week'].apply(lambda s: Day_of_Week_map.get(s) if s in Day_of_Week_map else s)
df7904a['Road_Type'] = df7904a['Road_Type'].apply(lambda s: Road_Type_map.get(s) if s in Road_Type_map else s)
df7904a['Junction_Detail'] = df7904a['Junction_Detail'].apply(lambda s: Junction_Detail_map.get(s) if s in Junction_Detail_map else s)
df7904a['Light_Conditions'] = df7904a['Light_Conditions'].apply(lambda s: Light_Conditions_map.get(s) if s in Light_Conditions_map else s)
df7904a['Weather_Conditions'] = df7904a['Weather_Conditions'].apply(lambda s: Weather_Conditions_map.get(s) if s in Weather_Conditions_map else s)
df7904a['Road_Surface_Conditions'] = df7904a['Road_Surface_Conditions'].apply(lambda s: Road_Surface_Conditions_map.get(s) if s in Road_Surface_Conditions_map else s)
df7904a['Special_Conditions_at_Site'] = df7904a['Special_Conditions_at_Site'].apply(lambda s: Special_Conditions_at_Site_map.get(s) if s in Special_Conditions_at_Site_map else s)
df7904a['Urban_or_Rural_Area'] = df7904a['Urban_or_Rural_Area'].apply(lambda s: Urban_or_Rural_Area_map.get(s) if s in Urban_or_Rural_Area_map else s)
df7904a['Vehicle_Type'] = df7904a['Vehicle_Type'].apply(lambda s: Vehicle_Type_map.get(s) if s in Vehicle_Type_map else s)
df7904a['Vehicle_Manoeuvre'] = df7904a['Vehicle_Manoeuvre'].apply(lambda s: Vehicle_Manoeuvre_map.get(s) if s in Vehicle_Manoeuvre_map else s)
df7904a['Vehicle_Location-Restricted_Lane'] = df7904a['Vehicle_Location-Restricted_Lane'].apply(lambda s: Location_Restricted_map.get(s) if s in Location_Restricted_map else s)
df7904a['Journey_Purpose_of_Driver'] = df7904a['Journey_Purpose_of_Driver'].apply(lambda s: Journey_Purpose_of_Driver_map.get(s) if s in Journey_Purpose_of_Driver_map else s)
df7904a['Sex_of_Driver'] = df7904a['Sex_of_Driver'].apply(lambda s: Sex_of_Driver_map.get(s) if s in Sex_of_Driver_map else s)
df7904a['Age_Band_of_Driver'] = df7904a['Age_Band_of_Driver'].apply(lambda s: Age_Band_of_Driver_map.get(s) if s in Age_Band_of_Driver_map else s)
df7904a['Driver_IMD_Decile'] = df7904a['Driver_IMD_Decile'].apply(lambda s: Driver_IMD_Decile_map.get(s) if s in Driver_IMD_Decile_map else s)
df7904a['Driver_Home_Area_Type'] = df7904a['Driver_Home_Area_Type'].apply(lambda s: Driver_Home_Area_Type_map.get(s) if s in Driver_Home_Area_Type_map else s)

Below you can see the updated dataset once we have applied to the mapping functions to each of the columns within the df7904 dataset.

In [14]:
df7904a.head()

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
0,197901A11AD14,,,slight,2,1,18/01/1979,friday,08:00,11,...,-1,car,going ahead other,-1,-1,male,36 - 45,-1,-1,-1
1,197901A11AD14,,,slight,2,1,18/01/1979,friday,08:00,11,...,-1,motorcycle,overtaking moving vehicle - offside,-1,-1,male,-1,-1,-1,-1
2,197901A1BAW34,,,slight,1,1,01/01/1979,tuesday,01:00,23,...,-1,car,going ahead other,-1,-1,male,-1,-1,-1,-1
3,197901A1BFD77,,,slight,2,3,01/01/1979,tuesday,01:25,17,...,-1,car,going ahead other,-1,-1,male,21 - 25,-1,-1,-1
4,197901A1BFD77,,,slight,2,3,01/01/1979,tuesday,01:25,17,...,-1,car,going ahead other,-1,-1,male,36 - 45,-1,-1,-1


## 2.2
### Dataset 1979 - 2004 (b)

We will now perform the same map functions on each column of our df794b dataset, as we did on our 7904a dataset.

In [15]:
df7904b['Accident_Severity'] = df7904b['Accident_Severity'].apply(lambda s: Accident_Severity_map.get(s) if s in Accident_Severity_map else s)
df7904b['Day_of_Week'] = df7904b['Day_of_Week'].apply(lambda s: Day_of_Week_map.get(s) if s in Day_of_Week_map else s)
df7904b['Road_Type'] = df7904b['Road_Type'].apply(lambda s: Road_Type_map.get(s) if s in Road_Type_map else s)
df7904b['Junction_Detail'] = df7904b['Junction_Detail'].apply(lambda s: Junction_Detail_map.get(s) if s in Junction_Detail_map else s)
df7904b['Light_Conditions'] = df7904b['Light_Conditions'].apply(lambda s: Light_Conditions_map.get(s) if s in Light_Conditions_map else s)
df7904b['Weather_Conditions'] = df7904b['Weather_Conditions'].apply(lambda s: Weather_Conditions_map.get(s) if s in Weather_Conditions_map else s)
df7904b['Road_Surface_Conditions'] = df7904b['Road_Surface_Conditions'].apply(lambda s: Road_Surface_Conditions_map.get(s) if s in Road_Surface_Conditions_map else s)
df7904b['Special_Conditions_at_Site'] = df7904b['Special_Conditions_at_Site'].apply(lambda s: Special_Conditions_at_Site_map.get(s) if s in Special_Conditions_at_Site_map else s)
df7904b['Urban_or_Rural_Area'] = df7904b['Urban_or_Rural_Area'].apply(lambda s: Urban_or_Rural_Area_map.get(s) if s in Urban_or_Rural_Area_map else s)
df7904b['Vehicle_Type'] = df7904b['Vehicle_Type'].apply(lambda s: Vehicle_Type_map.get(s) if s in Vehicle_Type_map else s)
df7904b['Vehicle_Manoeuvre'] = df7904b['Vehicle_Manoeuvre'].apply(lambda s: Vehicle_Manoeuvre_map.get(s) if s in Vehicle_Manoeuvre_map else s)
df7904b['Vehicle_Location-Restricted_Lane'] = df7904b['Vehicle_Location-Restricted_Lane'].apply(lambda s: Location_Restricted_map.get(s) if s in Location_Restricted_map else s)
df7904b['Journey_Purpose_of_Driver'] = df7904b['Journey_Purpose_of_Driver'].apply(lambda s: Journey_Purpose_of_Driver_map.get(s) if s in Journey_Purpose_of_Driver_map else s)
df7904b['Sex_of_Driver'] = df7904b['Sex_of_Driver'].apply(lambda s: Sex_of_Driver_map.get(s) if s in Sex_of_Driver_map else s)
df7904b['Age_Band_of_Driver'] = df7904b['Age_Band_of_Driver'].apply(lambda s: Age_Band_of_Driver_map.get(s) if s in Age_Band_of_Driver_map else s)
df7904b['Driver_IMD_Decile'] = df7904b['Driver_IMD_Decile'].apply(lambda s: Driver_IMD_Decile_map.get(s) if s in Driver_IMD_Decile_map else s)
df7904b['Driver_Home_Area_Type'] = df7904b['Driver_Home_Area_Type'].apply(lambda s: Driver_Home_Area_Type_map.get(s) if s in Driver_Home_Area_Type_map else s)

Below you can see the updated dataset once we have applied to the mapping functions to each of the columns within the df7904b dataset.

In [16]:
df7904b.head()

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
3000001,198601TD00755,,,serious,1,1,08/08/1986,saturday,00:13,25,...,-1,bus or coach,going ahead other,On main carriageway - not in restricted lane,-1,unknown,-1,-1,-1,-1
3000002,198601TD00756,,,slight,1,1,05/08/1986,wednesday,11:10,25,...,-1,bus or coach,slowing or stopping,On main carriageway - not in restricted lane,-1,male,26 - 35,-1,-1,-1
3000003,198601TD00757,,,slight,1,1,05/08/1986,wednesday,10:45,25,...,-1,unknown,parked,On main carriageway - not in restricted lane,-1,female,46 - 55,-1,-1,-1
3000004,198601TD00758,,,slight,2,1,07/08/1986,friday,13:30,25,...,-1,car,overtaking moving vehicle - offside,On main carriageway - not in restricted lane,-1,female,16 - 20,-1,-1,-1
3000005,198601TD00758,,,slight,2,1,07/08/1986,friday,13:30,25,...,-1,car,turning left,On main carriageway - not in restricted lane,-1,female,26 - 35,-1,-1,-1


## 2.3
### Dataset 2005 - 2014

We will now perform the same map functions on each column of our df0514 dataset, as we did on our 7904a and df794b datasets.

In [18]:
df0514['Accident_Severity'] = df0514['Accident_Severity'].apply(lambda s: Accident_Severity_map.get(s) if s in Accident_Severity_map else s)
df0514['Day_of_Week'] = df0514['Day_of_Week'].apply(lambda s: Day_of_Week_map.get(s) if s in Day_of_Week_map else s)
df0514['Road_Type'] = df0514['Road_Type'].apply(lambda s: Road_Type_map.get(s) if s in Road_Type_map else s)
df0514['Junction_Detail'] = df0514['Junction_Detail'].apply(lambda s: Junction_Detail_map.get(s) if s in Junction_Detail_map else s)
df0514['Light_Conditions'] = df0514['Light_Conditions'].apply(lambda s: Light_Conditions_map.get(s) if s in Light_Conditions_map else s)
df0514['Weather_Conditions'] = df0514['Weather_Conditions'].apply(lambda s: Weather_Conditions_map.get(s) if s in Weather_Conditions_map else s)
df0514['Road_Surface_Conditions'] = df0514['Road_Surface_Conditions'].apply(lambda s: Road_Surface_Conditions_map.get(s) if s in Road_Surface_Conditions_map else s)
df0514['Special_Conditions_at_Site'] = df0514['Special_Conditions_at_Site'].apply(lambda s: Special_Conditions_at_Site_map.get(s) if s in Special_Conditions_at_Site_map else s)
df0514['Urban_or_Rural_Area'] = df0514['Urban_or_Rural_Area'].apply(lambda s: Urban_or_Rural_Area_map.get(s) if s in Urban_or_Rural_Area_map else s)
df0514['Vehicle_Type'] = df0514['Vehicle_Type'].apply(lambda s: Vehicle_Type_map.get(s) if s in Vehicle_Type_map else s)
df0514['Vehicle_Manoeuvre'] = df0514['Vehicle_Manoeuvre'].apply(lambda s: Vehicle_Manoeuvre_map.get(s) if s in Vehicle_Manoeuvre_map else s)
df0514['Vehicle_Location-Restricted_Lane'] = df0514['Vehicle_Location-Restricted_Lane'].apply(lambda s: Location_Restricted_map.get(s) if s in Location_Restricted_map else s)
df0514['Journey_Purpose_of_Driver'] = df0514['Journey_Purpose_of_Driver'].apply(lambda s: Journey_Purpose_of_Driver_map.get(s) if s in Journey_Purpose_of_Driver_map else s)
df0514['Sex_of_Driver'] = df0514['Sex_of_Driver'].apply(lambda s: Sex_of_Driver_map.get(s) if s in Sex_of_Driver_map else s)
df0514['Age_Band_of_Driver'] = df0514['Age_Band_of_Driver'].apply(lambda s: Age_Band_of_Driver_map.get(s) if s in Age_Band_of_Driver_map else s)
df0514['Driver_IMD_Decile'] = df0514['Driver_IMD_Decile'].apply(lambda s: Driver_IMD_Decile_map.get(s) if s in Driver_IMD_Decile_map else s)
df0514['Driver_Home_Area_Type'] = df0514['Driver_Home_Area_Type'].apply(lambda s: Driver_Home_Area_Type_map.get(s) if s in Driver_Home_Area_Type_map else s)

Below you can see the updated dataset once we have applied to the mapping functions to each of the columns within the df0514 dataset

In [19]:
df0514.head()

Unnamed: 0,Accident_Index,Longitude,Latitude,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),...,Urban_or_Rural_Area,Vehicle_Type,Vehicle_Manoeuvre,Vehicle_Location-Restricted_Lane,Journey_Purpose_of_Driver,Sex_of_Driver,Age_Band_of_Driver,Age_of_Vehicle,Driver_IMD_Decile,Driver_Home_Area_Type
0,200501BS00001,-0.19117,51.489096,serious,1,1,04/01/2005,wednesday,17:42,12,...,urban,car,going ahead other,On main carriageway - not in restricted lane,other/unknown (2005-10),female,66 - 75,-1,less deprived 30-40%,urban
1,200501BS00002,-0.211708,51.520075,slight,1,1,05/01/2005,thursday,17:36,12,...,urban,bus or coach,slowing or stopping,On main carriageway - not in restricted lane,occupational,male,36 - 45,3,-1,-1
2,200501BS00003,-0.206458,51.525301,slight,2,1,06/01/2005,friday,00:15,12,...,urban,bus or coach,goinf ahead right-hand bend,On main carriageway - not in restricted lane,occupational,male,26 - 35,5,most deprived 10-20%,urban
3,200501BS00003,-0.206458,51.525301,slight,2,1,06/01/2005,friday,00:15,12,...,urban,car,parked,On main carriageway - not in restricted lane,other/unknown (2005-10),male,56 - 65,6,most deprived 10%,urban
4,200501BS00004,-0.173862,51.482442,slight,1,1,07/01/2005,saturday,10:35,12,...,urban,car,going ahead other,On main carriageway - not in restricted lane,other/unknown (2005-10),female,46 - 55,4,most deprived 10-20%,urban


## 3. Saving to Pickle Files

Once we have mapped all of the values in each of our 3 datasets, we will save each updated dataset to a pickle file, printing the save times for each instance.

__(1979 - 2004 (A)__

In [20]:
pickle_save_time = %timeit -o df7904a.to_pickle("../../data/processed/300_mapping_values_7904a.pkl") #save 1979-2004 (a) dataset into a pickle file and print the save time

pickle_save_time

15.7 s ± 602 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 15.7 s ± 602 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

__1979 - 2004 (B)__

In [21]:
pickle_save_time = %timeit -o df7904b.to_pickle("../../data/processed/300_mapping_values_7904b.pkl") #save 1979-2004 (b) dataset into a pickle file and print the save time

pickle_save_time

15.3 s ± 200 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 15.3 s ± 200 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

__2005 - 2014__

In [22]:
pickle_save_time = %timeit -o df0514.to_pickle("../../data/processed/300_mapping_values_0514.pkl") #save 2005 - 2014 dataset into a pickle file and print the save time

pickle_save_time

15.8 s ± 87.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 15.8 s ± 87.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

## 4. Create Data Dictionaries

Create data dictionaries for each of our three pickle files, summarising their contents.

__1979 - 2004 (A)__

In [23]:
data_dictionary.save(
    '../../data/processed/300_mapping_values_7904a.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,3000000.0,1783184.0,198213Q011682,61.0,,,,,,,,0,0.0
Longitude,0.0,,,,,,,,,,,3000000,100.0
Latitude,0.0,,,,,,,,,,,3000000,100.0
Accident_Severity,3000000.0,3.0,slight,2204789.0,,,,,,,,0,0.0
Number_of_Vehicles,3000000.0,,,,1.97101,0.949776,1.0,2.0,2.0,2.0,61.0,0,0.0


__1979 - 2004 (B)__

In [24]:
data_dictionary.save(
    '../../data/processed/300_mapping_values_7904b.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,2999999.0,1709440.0,199213MU34592,192.0,,,,,,,,0,0.0
Longitude,0.0,,,,,,,,,,,2999999,100.0
Latitude,0.0,,,,,,,,,,,2999999,100.0
Accident_Severity,2999999.0,3.0,slight,2360217.0,,,,,,,,0,0.0
Number_of_Vehicles,3000000.0,,,,2.08314,1.98272,1.0,2.0,2.0,2.0,192.0,0,0.0


__2005 - 2014__

In [25]:
data_dictionary.save(
    '../../data/processed/300_mapping_values_0514.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,3004425.0,1640597.0,2013460234852,67.0,,,,,,,,0,0.0
Longitude,3004230.0,,,,-1.42592,1.3926,-7.51623,-2.34207,-1.38969,-0.226342,1.76201,198,0.00659
Latitude,3004230.0,,,,52.557,1.42629,49.9129,51.4873,52.2679,53.4524,60.7575,198,0.00659
Accident_Severity,3004425.0,3.0,slight,2593061.0,,,,,,,,0,0.0
Number_of_Vehicles,3004420.0,,,,2.11068,0.937462,1.0,2.0,2.0,2.0,67.0,0,0.0
