<h2>Project: United Kingdom Road Accident Data Analysis</h2>
<h3>Inclusive Years: 2019 - 2022</h3>
<h4>Analyst: John Paul Cortes</h4>
<hr>
<p> Data Preparation</p>

In [1]:
import numpy as np
import pandas as pd
import warnings
from scipy.stats import f_oneway 
warnings.filterwarnings('ignore')

<h2>Importing Data Sets using Pandas and Converting to Dataframes</h2>

In [18]:
aksi = pd.read_csv('uk_acc/uk_road_accident.csv')

In [3]:
aksi

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type
0,200701BS64157,Serious,5/6/2019,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car
1,200701BS65737,Serious,2/7/2019,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car
2,200701BS66127,Serious,26-08-2019,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,,Urban,,Taxi/Private hire car
3,200701BS66128,Serious,16-08-2019,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats)
4,200701BS66837,Slight,3/9/2019,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,,Urban,,Other vehicle
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,18-02-2022,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car
660675,201091NM01881,Slight,21-02-2022,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660676,201091NM01935,Slight,23-02-2022,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car
660677,201091NM01964,Serious,23-02-2022,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc


<h2>Describing Data</h2>

In [4]:
aksi.describe()

Unnamed: 0,Latitude,Longitude,Number_of_Casualties,Number_of_Vehicles
count,660654.0,660653.0,660679.0,660679.0
mean,52.553866,-1.43121,1.35704,1.831255
std,1.406922,1.38333,0.824847,0.715269
min,49.91443,-7.516225,1.0,1.0
25%,51.49069,-2.332291,1.0,1.0
50%,52.315641,-1.411667,1.0,2.0
75%,53.453452,-0.232869,1.0,2.0
max,60.757544,1.76201,68.0,32.0


<h2>Checking and Filling Up Null Values</h2>

In [5]:
aksi['Latitude'] = aksi['Latitude'].fillna(aksi['Latitude'].mode()[0])
aksi['Longitude'] = aksi['Longitude'].fillna(aksi['Longitude'].mode()[0])
aksi['Urban_or_Rural_Area'] = aksi['Urban_or_Rural_Area'].fillna(aksi['Urban_or_Rural_Area'].mode()[0])
aksi['Road_Surface_Conditions'] = aksi['Road_Surface_Conditions'].fillna('unaccounted')
aksi['Road_Type'] = aksi['Road_Type'].fillna('unaccounted')
aksi['Weather_Conditions'] = aksi['Weather_Conditions'].fillna('unaccounted')

In [6]:
aksi.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

<h2> Adjusting Data Types</h2>

In [10]:
aksi['Accident Date'] = pd.to_datetime(aksi['Accident Date'], dayfirst = True, errors = 'coerce')

In [20]:
aksi['Accident_Severity'] = aksi['Accident_Severity'].astype('category')
aksi['Light_Conditions'] = aksi['Light_Conditions'].astype('category')
aksi['Road_Surface_Conditions'] = aksi['Road_Surface_Conditions'].astype('category')
aksi['Road_Type'] = aksi['Road_Type'].astype('category')
aksi['Urban_or_Rural_Area'] = aksi['Urban_or_Rural_Area'].astype('category')
aksi['Weather_Conditions'] = aksi['Weather_Conditions'].astype('category')
aksi['Vehicle_Type'] = aksi['Vehicle_Type'].astype('category')

In [16]:
aksi.dtypes

Index                              object
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                          float64
Light_Conditions                 category
District Area                      object
Longitude                         float64
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                          object
Urban_or_Rural_Area                object
Weather_Conditions                 object
Vehicle_Type                       object
dtype: object