# 700_Q1_final_prep

## Purpose
The Purpose of this Notebook is to finalise our preparation of datasets for our first research question. We will focus on our 'Date' column, sorting our data by year and concatenating our datasets to form our final prep pickle files which will enable us to answer research question one: _" Is There a Safest Time for Travel?"_


### Notebook Contents:

* __1:__ Loading our Datasets

* __2:__ Question One: (A) and (B)
     * __2.1:__ Sort by Year
     * __2.2:__ Creating new Datasets
     * __2.3:__ Check for Null Values

* __3:__ Question One: (C)     
     * __3.1:__ Sort by Year
     * __3.2:__ Creating new Datasets
     * __3.3:__ Check for Null Values

* __4:__ Saving Datasets to pickle files


* __5:__ Creating Data Dictionaries


## Datasets

* __Input__: 

* 500_prep_missing_values_0514_Q1AB.pkl (Recorded UK Road Accident Data 2005 - 2014 for RQ1 (A) and (B), with missing values removed)


* 500_prep_missing_values_0514_Q1C.pkl (Recorded UK Road Accident Data 2005 - 2014 for RQ1 (C), with missing values removed)


* 500_prep_missing_values_0514_Q2A.pkl (Recorded UK Road Accident Data 2005 - 2014 for RQ2 (A), with missing values removed)


* 500_prep_missing_values_0514_Q2B.pkl (Recorded UK Road Accident Data 2005 - 2014 for RQ2 (B), with missing values removed)


* 500_prep_missing_values_0514_Q3A.pkl (Recorded UK Road Accident Data 2005 - 2014 for RQ3 (A), with missing values removed)


* 500_prep_missing_values_0514_Q3B.pkl (Recorded UK Road Accident Data 2005 - 2014 for RQ3 (B), with missing values removed)


* 500_prep_missing_values_7904a_Q1AB.pkl (Recorded UK Road Accident Data 1979 - 2004 (a) for RQ1 (A) and (B), with missing values removed)


* 500_prep_missing_values_7904a_Q2B.pkl (Recorded UK Road Accident Data 1979 - 2004 (a) for RQ2 (B), with missing values removed)


* 500_prep_missing_values_7904b_Q1AB.pkl (Recorded UK Road Accident Data 1979 - 2004 (b) for RQ1 (A) and (B), with missing values removed)


* 500_prep_missing_values_df7904b_Q2B.pkl (Recorded UK Road Accident Data 1979 - 2004 (b) for RQ2 (B), with missing values removed)


* __Output__: 

* 700_Q1AB_final_prep_1.pkl (Fully prepared dataset 1 of UK Road Safety Data from 1979 - 2016, for RQ1(A) and (B))


* 700_Q1AB_final_prep_2.pkl (Fully prepared dataset 2 of UK Road Safety Data from 1979 - 2016, for RQ1(A) and (B))


* 700_Q1AB_final_prep_3.pkl (Fully prepared dataset 3 of UK Road Safety Data from 1979 - 2016, for RQ1(A) and (B))


* 700_Q1AB_final_prep_4.pkl (Fully prepared dataset 4 of UK Road Safety Data from 1979 - 2016, for RQ1(A) and (B))


* 700_Q1C_final_prep.pkl (Fully prepared dataset of UK Road Safety Data from 2005 - 2014, for RQ1(C))

In [1]:
import os
import sys

import pandas as pd
import numpy as np

module_path = os.path.abspath(os.path.join('../../data/..'))
if module_path not in sys.path:
    sys.path.append(module_path)
    
from src.helpers import data_dictionary

%matplotlib inline

## 1.
## Loading the Datasets

Loading our input files using the pd.read_pickle method.

In [2]:
df_7904a_Q1AB= pd.read_pickle('../../data/processed/500_prep_missing_values_7904a_Q1AB.pkl')
df_7904a_Q1AB.shape

(2411743, 14)

In [3]:
df_7904b_Q1AB= pd.read_pickle('../../data/processed/500_prep_missing_values_7904b_Q1AB.pkl')
df_7904b_Q1AB.shape

(2621931, 14)

In [4]:
df_7904c_Q1AB= pd.read_pickle('../../data/processed/600_prep_missing_values_7904c_Q1AB.pkl')
df_7904c_Q1AB.shape

(2695080, 14)

In [5]:
df_7904d_Q1AB= pd.read_pickle('../../data/processed/600_prep_missing_values_7904d_Q1AB.pkl')
df_7904d_Q1AB.shape

(1725560, 14)

In [6]:
df_0514_Q1AB= pd.read_pickle('../../data/processed/500_prep_missing_values_0514_Q1AB.pkl')
df_0514_Q1AB.shape

(2600036, 14)

In [7]:
df_0514_Q1C= pd.read_pickle('../../data/processed/500_prep_missing_values_0514_Q1C.pkl')
df_0514_Q1C.shape

(792713, 15)

In [8]:
df_1516_Q1AB= pd.read_pickle('../../data/processed/600_prep_missing_values_1516_Q1AB.pkl')
df_1516_Q1AB.shape

(410860, 14)

## 2.
# Question One: 

## (A) and (B)

First we will begin by adding up the rows of each of our datasets (As seen in section __1.__ above), to see how much data we will be working with for (A) and (B).

In [9]:
2411743 + 2621931 + 2695080 + 1725560 + 2600036 + 410860

12465210

From this summation, you can see that each of our datasets for Q1 (A) and (B) adds up to just under 12,500,000 rows.

As this is such a large amount of rows, we will continue to use multiple datasets rather than concatenating them for the time being.

## 2.1
## _Sort by Year_

In order to ensure that each of our questions are answered using consistent data from the same years, we will sort our 6 datasets by year and check to see whether we are missing some data for certain years.

To begin, we must first split our 'Date' column up into three separate columns 'Date_Day', 'Month', 'Year'. This will be done using the str.split method.

In [10]:
df_7904a_Q1AB[['Date_Day', 'Month', 'Year']] = df_7904a_Q1AB['Date'].str.split(pat = '/', n=-1, expand=True)  

In [11]:
df_7904b_Q1AB[['Date_Day', 'Month', 'Year']] = df_7904b_Q1AB['Date'].str.split(pat = '/', n=-1, expand=True)   

In [12]:
df_7904c_Q1AB[['Date_Day', 'Month', 'Year']] = df_7904c_Q1AB['Date'].str.split(pat = '/', n=-1, expand=True) 

In [13]:
df_7904d_Q1AB[['Date_Day', 'Month', 'Year']] = df_7904d_Q1AB['Date'].str.split(pat = '/', n=-1, expand=True)  

In [14]:
df_0514_Q1AB[['Date_Day', 'Month', 'Year']] = df_0514_Q1AB['Date'].str.split(pat = '/', n=-1, expand=True)

In [15]:
df_1516_Q1AB[['Date_Day', 'Month', 'Year']] = df_1516_Q1AB['Date'].str.split(pat = '/', n=-1, expand=True)

We will now sort our 6 datasets in ascending order according to the 'Year' Column.

In [16]:
df_7904a_Q1AB.sort_values('Year')
df_7904a_Q1AB.head(2) #print the first 2 lines of our sorted dataset, to see the first date recorded.

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Date_Day,Month,Year
85587,197903A102220,serious,2,2,07/04/1979,sunday,12:30,63,daylight,fine no high winds,wet or damp,bus or coach,male,26 - 35,7,4,1979
85588,197903A102220,serious,2,2,07/04/1979,sunday,12:30,63,daylight,fine no high winds,wet or damp,car,male,46 - 55,7,4,1979


In [17]:
df_7904b_Q1AB.sort_values('Year')
df_7904b_Q1AB.head(2)

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Date_Day,Month,Year
3000004,198601TD00758,slight,2,1,07/08/1986,friday,13:30,25,daylight,fine no high winds,dry,car,female,16 - 20,7,8,1986
3000005,198601TD00758,slight,2,1,07/08/1986,friday,13:30,25,daylight,fine no high winds,dry,car,female,26 - 35,7,8,1986


In [18]:
df_7904c_Q1AB.sort_values('Year')
df_7904c_Q1AB.head(2)

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Date_Day,Month,Year
6000001,199301NI00616,slight,2,1,20/10/1993,thursday,11:25,3,daylight,fine no high winds,dry,car,male,26 - 35,20,10,1993
6000002,199301NI00617,slight,2,1,15/11/1993,tuesday,15:43,3,daylight,fine no high winds,dry,van / goods 3.5 tonnes mgw or under,male,36 - 45,15,11,1993


In [19]:
df_7904d_Q1AB.sort_values('Year')
df_7904d_Q1AB.head(2)

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Date_Day,Month,Year
9000001,200001TX00967,slight,3,2,11/09/2000,tuesday,06:47,25,daylight,fine no high winds,dry,van / goods 3.5 tonnes mgw or under,male,26 - 35,11,9,2000
9000002,200001TX00967,slight,3,2,11/09/2000,tuesday,06:47,25,daylight,fine no high winds,dry,car,male,26 - 35,11,9,2000


In [20]:
df_0514_Q1AB.sort_values('Year')
df_0514_Q1AB.head(2)

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Date_Day,Month,Year
0,200501BS00001,serious,1,1,04/01/2005,wednesday,17:42,12,daylight,raining no high winds,wet or damp,car,female,66 - 75,4,1,2005
1,200501BS00002,slight,1,1,05/01/2005,thursday,17:36,12,darkness - lights lit,fine no high winds,dry,bus or coach,male,36 - 45,5,1,2005


In [21]:
df_1516_Q1AB.sort_values('Year')
df_1516_Q1AB.tail(2) #print the last 2 lines of our sorted dataset, to see the most recent date recorded.

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Date_Day,Month,Year
479546,2016984131316,slight,1,3,29/10/2016,sunday,20:00,917,darkness - lights lit,fine no high winds,dry,car,male,16 - 20,29,10,2016
479547,2016984133416,slight,1,2,25/12/2016,monday,12:30,917,daylight,raining + high winds,wet or damp,car,male,46 - 55,25,12,2016


### _Analysis_

From above, we can see that our final datasets for Question One parts (A) and (B) contain data from 1979 to 2016.

## 2.2
## Creating new Datasets

Now that we have our final 6 datasets fully prepared and sorted for question 1 (A) and (B), we will concatenate our 6 datasets so that we can reduce the number of datasets we will need to perform our analysis on to answer our research question one (A) and (B).

We will do this by creating five datasets (reduced from eight), each with around 3 million rows (as this sized dataset can be easily managed by our processors). 

In order to create these datasets, we will first need to break them up into smaller subsets, however we will ensure that the accidents recorded in each final dataset will consist of accidents in ascending years. 

In [22]:
a_7904b_Q1AB = df_7904b_Q1AB[:600000] #create a new dataset containing only the first 600000 rows of the original
df_7904a_Q1AB = pd.concat([df_7904a_Q1AB,a_7904b_Q1AB]) #concatenate datasets 
df_7904a_Q1AB.shape

(3011743, 17)

In [23]:
b_7904b_Q1AB = df_7904b_Q1AB[600000:] #create a new dataset consisting of rows from 600000 to the end of the original dataframe
a_7904c_Q1AB = df_7904c_Q1AB[:1000000] #create a new dataset consisting of the first 1000000 rows from the original
df_7904b_Q1AB = pd.concat([b_7904b_Q1AB,a_7904c_Q1AB]) #concatenate both dataframes
df_7904b_Q1AB.shape

(3021931, 17)

In [24]:
b_7904c_Q1AB = df_7904c_Q1AB[1000000:] #create a new dataset containing rows from 1,000,000 upwards from original dataset
df_7904c_Q1AB = pd.concat([b_7904c_Q1AB,df_7904d_Q1AB]) #concatenate datasets
df_7904c_Q1AB.shape

(3420640, 17)

In [25]:
df_0516_Q1AB = pd.concat([df_0514_Q1AB,df_1516_Q1AB]) #concatenating datasets
df_0516_Q1AB.shape

(3010896, 17)

## 2.3
## Check for Null Values

Ensure our Final Datasets do not contain null values

In [26]:
df_7904a_Q1AB.isnull().sum()

Accident_Index                0
Accident_Severity             0
Number_of_Vehicles            0
Number_of_Casualties          0
Date                          0
Day_of_Week                   0
Time                          0
Local_Authority_(District)    0
Light_Conditions              0
Weather_Conditions            0
Road_Surface_Conditions       0
Vehicle_Type                  0
Sex_of_Driver                 0
Age_Band_of_Driver            0
Date_Day                      0
Month                         0
Year                          0
dtype: int64

In [27]:
df_7904b_Q1AB.isnull().sum()

Accident_Index                0
Accident_Severity             0
Number_of_Vehicles            0
Number_of_Casualties          0
Date                          0
Day_of_Week                   0
Time                          0
Local_Authority_(District)    0
Light_Conditions              0
Weather_Conditions            0
Road_Surface_Conditions       0
Vehicle_Type                  0
Sex_of_Driver                 0
Age_Band_of_Driver            0
Date_Day                      0
Month                         0
Year                          0
dtype: int64

In [28]:
df_7904c_Q1AB.isnull().sum()

Accident_Index                0
Accident_Severity             0
Number_of_Vehicles            0
Number_of_Casualties          0
Date                          0
Day_of_Week                   0
Time                          0
Local_Authority_(District)    0
Light_Conditions              0
Weather_Conditions            0
Road_Surface_Conditions       0
Vehicle_Type                  0
Sex_of_Driver                 0
Age_Band_of_Driver            0
Date_Day                      0
Month                         0
Year                          0
dtype: int64

In [29]:
df_0516_Q1AB.isnull().sum()

Accident_Index                0
Accident_Severity             0
Number_of_Vehicles            0
Number_of_Casualties          0
Date                          0
Day_of_Week                   0
Time                          0
Local_Authority_(District)    0
Light_Conditions              0
Weather_Conditions            0
Road_Surface_Conditions       0
Vehicle_Type                  0
Sex_of_Driver                 0
Age_Band_of_Driver            0
Date_Day                      0
Month                         0
Year                          0
dtype: int64

From above, we can see that our final 4 datasets for Question 1 parts (A) and (B) contain no Null Values.

## 3.

# Question One: (C)


## 3.1

## _Sort by Year_

As we did above, we will split our 'Date' column up into three separate columns so that we can sort our dataset by year, using the str.split method.

In [30]:
df_0514_Q1C[['Date_Day', 'Month', 'Year']] = df_0514_Q1C['Date'].str.split(pat = '/', n=-1, expand=True)

We will again sort this dataset by the Year column and take a look at the years which are present in the dataset

In [31]:
df_0514_Q1C.sort_values('Year')
df_0514_Q1C.head(2) #display first years recorded

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Journey_Purpose_of_Driver,Date_Day,Month,Year
1,200501BS00002,slight,1,1,05/01/2005,thursday,17:36,12,darkness - lights lit,fine no high winds,dry,bus or coach,male,36 - 45,occupational,5,1,2005
2,200501BS00003,slight,2,1,06/01/2005,friday,00:15,12,darkness - lights lit,fine no high winds,dry,bus or coach,male,26 - 35,occupational,6,1,2005


In [32]:
df_0514_Q1C.tail(2) #display most recent years recorded.

Unnamed: 0,Accident_Index,Accident_Severity,Number_of_Vehicles,Number_of_Casualties,Date,Day_of_Week,Time,Local_Authority_(District),Light_Conditions,Weather_Conditions,Road_Surface_Conditions,Vehicle_Type,Sex_of_Driver,Age_Band_of_Driver,Journey_Purpose_of_Driver,Date_Day,Month,Year
3004421,2014984138414,serious,3,2,17/12/2014,thursday,06:55,917,darkness - no lighting,raining no high winds,wet or damp,van / goods 3.5 tonnes mgw or under,male,36 - 45,commuting to/from work,17,12,2014
3004423,2014984139614,slight,2,2,24/12/2014,thursday,15:00,917,daylight,fine no high winds,wet or damp,bus or coach,male,36 - 45,occupational,24,12,2014


### _Analysis_

From above, we can see that our dataset for Question One, part (C) contains data for the years 2005 - 2014.

As part (C) *does not* rely on parts (A) or (B) of this question, we have decided to perform Queston three on this smaller set of years for the purpose of our research question.

## 3.2
## Check for Null Values

Ensure our Final Datasets do not contain null values

In [33]:
df_0514_Q1C.isnull().sum()

Accident_Index                0
Accident_Severity             0
Number_of_Vehicles            0
Number_of_Casualties          0
Date                          0
Day_of_Week                   0
Time                          0
Local_Authority_(District)    0
Light_Conditions              0
Weather_Conditions            0
Road_Surface_Conditions       0
Vehicle_Type                  0
Sex_of_Driver                 0
Age_Band_of_Driver            0
Journey_Purpose_of_Driver     0
Date_Day                      0
Month                         0
Year                          0
dtype: int64

Once we have seen that our final sorted dataset for Q1 part (C) contains no Null Values, we will save all of our prepped datasets into pickle files below.

## 4.
## Save to Pickle Files

### (A) and (B)

In [45]:
pickle_save_time = %timeit -o df_7904a_Q1AB.to_pickle("../../data/processed/700_Q1AB_final_prep_1.pkl") #save dataset into a pickle file and print the save time

pickle_save_time

21.7 s ± 1.57 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 21.7 s ± 1.57 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>

In [46]:
pickle_save_time = %timeit -o df_7904b_Q1AB.to_pickle("../../data/processed/700_Q1AB_final_prep_2.pkl") #save dataset into a pickle file and print the save time

pickle_save_time

24 s ± 2.25 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 24 s ± 2.25 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>

In [34]:
pickle_save_time = %timeit -o df_7904c_Q1AB.to_pickle("../../data/processed/700_Q1AB_final_prep_3.pkl") #save dataset into a pickle file and print the save time

pickle_save_time

29.5 s ± 1.07 s per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 29.5 s ± 1.07 s per loop (mean ± std. dev. of 7 runs, 1 loop each)>

In [35]:
pickle_save_time = %timeit -o df_0516_Q1AB.to_pickle("../../data/processed/700_Q1AB_final_prep_4.pkl") #save dataset into a pickle file and print the save time

pickle_save_time

21 s ± 172 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 21 s ± 172 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

### (C)

In [36]:
pickle_save_time = %timeit -o df_0514_Q1C.to_pickle("../../data/processed/700_Q1C_final_prep.pkl") #save dataset into a pickle file and print the save time

pickle_save_time

6.46 s ± 351 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<TimeitResult : 6.46 s ± 351 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>

## 5.
## Create Data Dictionaries

Create Data Dictionaries for each of our pickle files, summarising their contents.

### (A) and (B)

In [50]:
data_dictionary.save(
    '../../data/processed/700_Q1AB_final_prep_1.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,3011743.0,1844292.0,198213Q011682,61.0,,,,,,,,0,0.0
Accident_Severity,3011743.0,3.0,slight,2182857.0,,,,,,,,0,0.0
Number_of_Vehicles,3011740.0,,,,1.9929,1.03643,1.0,2.0,2.0,2.0,61.0,0,0.0
Number_of_Casualties,3011740.0,,,,1.36819,0.927364,1.0,1.0,1.0,1.0,70.0,0,0.0
Date,3011743.0,3287.0,25/11/1983,2019.0,,,,,,,,0,0.0


In [51]:
data_dictionary.save(
    '../../data/processed/700_Q1AB_final_prep_2.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,3021931.0,1790862.0,199213MU34592,192.0,,,,,,,,0,0.0
Accident_Severity,3021931.0,3.0,slight,2423116.0,,,,,,,,0,0.0
Number_of_Vehicles,3021930.0,,,,2.10448,2.00019,1.0,2.0,2.0,2.0,192.0,0,0.0
Number_of_Casualties,3021930.0,,,,1.41774,1.08154,1.0,1.0,1.0,2.0,80.0,0,0.0
Date,3021931.0,3287.0,03/07/1992,2211.0,,,,,,,,0,0.0


In [37]:
data_dictionary.save(
    '../../data/processed/700_Q1AB_final_prep_3.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,3420640.0,2009451.0,199722AX01148,88.0,,,,,,,,0,0.0
Accident_Severity,3420640.0,3.0,slight,2908269.0,,,,,,,,0,0.0
Number_of_Vehicles,3420640.0,,,,2.11503,1.0643,1.0,2.0,2.0,2.0,88.0,0,0.0
Number_of_Casualties,3420640.0,,,,1.46296,1.02219,1.0,1.0,1.0,2.0,90.0,0,0.0
Date,3420640.0,3653.0,25/04/1997,2249.0,,,,,,,,0,0.0


In [38]:
data_dictionary.save(
    '../../data/processed/700_Q1AB_final_prep_4.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,3010896.0,1784301.0,2013460234852,66.0,,,,,,,,0,0.0
Accident_Severity,3010896.0,3.0,slight,2576810.0,,,,,,,,0,0.0
Number_of_Vehicles,3010900.0,,,,2.10581,0.924935,1.0,2.0,2.0,2.0,67.0,0,0.0
Number_of_Casualties,3010900.0,,,,1.45239,1.014,1.0,1.0,1.0,2.0,93.0,0,0.0
Date,3010896.0,4383.0,21/10/2005,1400.0,,,,,,,,0,0.0


### (C)

In [39]:
data_dictionary.save(
    '../../data/processed/700_Q1C_final_prep.pkl', 

"""\
Aggregate raw data for road accidents.
""").head()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max,Missing,%Missing
Accident_Index,792713,591495.0,200506X039355,18.0,,,,,,,,0,0.0
Accident_Severity,792713,3.0,slight,684785.0,,,,,,,,0,0.0
Number_of_Vehicles,792713,,,,2.09972,0.885037,1.0,2.0,2.0,2.0,67.0,0,0.0
Number_of_Casualties,792713,,,,1.39917,0.993247,1.0,1.0,1.0,2.0,93.0,0,0.0
Date,792713,3652.0,07/12/2005,522.0,,,,,,,,0,0.0
