<H1>UK ROAD ACCIDENT DATA ANALYSYS</H1>
<h2>INCLUSIVE YEAR 2019-2022</h2>
<h3>Analyst: Richie M. Alcantara</h3>

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import f_oneway
import warnings
warnings.filterwarnings('ignore')

In [2]:
accident = pd.read_csv('datasets\\accident_data.csv')

In [3]:
accident.isnull().sum()

Index                          0
Accident_Severity              0
Accident Date                  0
Latitude                      25
Light_Conditions               0
District Area                  0
Longitude                     26
Number_of_Casualties           0
Number_of_Vehicles             0
Road_Surface_Conditions      726
Road_Type                   4520
Urban_or_Rural_Area           15
Weather_Conditions         14128
Vehicle_Type                   0
dtype: int64

In [4]:
accident['Latitude'] = accident['Latitude'].fillna(accident['Latitude'].mode()[0])
accident['Longitude'] = accident['Longitude'].fillna(accident['Longitude'].mode()[0])
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].fillna('unknown condition')
accident['Road_Type'] = accident['Road_Type'].fillna(accident['Road_Type'].mode()[0])
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].fillna(accident['Urban_or_Rural_Area'].mode()[0])
accident['Weather_Conditions'] = accident['Weather_Conditions'].fillna('unknown weather')

In [5]:
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

In [6]:
accident.dtypes

Index                       object
Accident_Severity           object
Accident Date               object
Latitude                   float64
Light_Conditions            object
District Area               object
Longitude                  float64
Number_of_Casualties         int64
Number_of_Vehicles           int64
Road_Surface_Conditions     object
Road_Type                   object
Urban_or_Rural_Area         object
Weather_Conditions          object
Vehicle_Type                object
dtype: object

In [7]:
accident['Accident_Severity'] = accident['Accident_Severity'].astype('category')
accident['Light_Conditions'] = accident['Light_Conditions'].astype('category')
accident['Latitude'] = accident['Latitude'].astype('category')
accident['District Area'] = accident['District Area'].astype('category')
accident['Longitude'] = accident['Longitude'].astype('category')
accident['Road_Surface_Conditions'] = accident['Road_Surface_Conditions'].astype('category')
accident['Road_Type'] = accident['Road_Type'].astype('category')
accident['Urban_or_Rural_Area'] = accident['Urban_or_Rural_Area'].astype('category')
accident['Weather_Conditions'] = accident['Weather_Conditions'].astype('category')
accident['Vehicle_Type'] = accident['Vehicle_Type'].astype('category')

In [8]:
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

In [9]:
accident['Accident Date'] = accident['Accident Date'].astype('str')
accident['Accident Date'] = accident['Accident Date'].str.strip()
accident['Accident Date'] = accident['Accident Date'].str.replace('/','-')

In [10]:
accident['Accident Date'] = pd.to_datetime(accident['Accident Date'],dayfirst=True,errors = 'coerce')

In [11]:
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
dtype: int64

In [12]:
accident['Year'] = accident['Accident Date'].dt.year
accident['Month'] = accident['Accident Date'].dt.month
accident['DayofWeek'] = accident['Accident Date'].dt.dayofweek
accident['Day'] = accident['Accident Date'].dt.day

In [13]:
accident.isnull().sum()

Index                      0
Accident_Severity          0
Accident Date              0
Latitude                   0
Light_Conditions           0
District Area              0
Longitude                  0
Number_of_Casualties       0
Number_of_Vehicles         0
Road_Surface_Conditions    0
Road_Type                  0
Urban_or_Rural_Area        0
Weather_Conditions         0
Vehicle_Type               0
Year                       0
Month                      0
DayofWeek                  0
Day                        0
dtype: int64

In [14]:
accident['Year'] = accident['Year'].astype('category')
accident['Month'] = accident['Month'].astype('category')
accident['DayofWeek'] = accident['DayofWeek'].astype('category')
accident['Day'] = accident['Day'].astype('category')

In [15]:
accident.dtypes

Index                              object
Accident_Severity                category
Accident Date              datetime64[ns]
Latitude                         category
Light_Conditions                 category
District Area                    category
Longitude                        category
Number_of_Casualties                int64
Number_of_Vehicles                  int64
Road_Surface_Conditions          category
Road_Type                        category
Urban_or_Rural_Area              category
Weather_Conditions               category
Vehicle_Type                     category
Year                             category
Month                            category
DayofWeek                        category
Day                              category
dtype: object

<H1>Insights</H1>

<H1> In what area did the accident happened the most? and the least?</H1>
<h2>1. Birmingham has the most number of accidents while the Clackmannanshire has the least</h2>

In [16]:
accident['District Area'].value_counts()

District Area
Birmingham            13491
Leeds                  8898
Manchester             6720
Bradford               6212
Sheffield              5710
                      ...  
Berwick-upon-Tweed      153
Teesdale                142
Shetland Islands        133
Orkney Islands          117
Clackmannanshire         91
Name: count, Length: 422, dtype: int64

<h1> On what district area has the highest number of casualty in a single day?</h1>
<h2>2. South Bucks has the highest number of casualty in a single day</h2>

In [17]:
distmax = accident.groupby(['District Area'])['Number_of_Casualties'].max()

In [18]:
distmax.idxmax()

'South Bucks'

In [19]:
distmax.max()

np.int64(68)

In [25]:
accident['Year'].value_counts()

Year
2019    182115
2020    170591
2021    163554
2022    144419
Name: count, dtype: int64

In [27]:
accident[accident['Year'] == 2019]

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type,Year,Month,DayofWeek,Day
0,200701BS64157,Serious,2019-06-05,51.506187,Darkness - lights lit,Kensington and Chelsea,-0.209082,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car,2019,6,2,5
1,200701BS65737,Serious,2019-07-02,51.495029,Daylight,Kensington and Chelsea,-0.173647,1,2,Wet or damp,Single carriageway,Urban,Raining no high winds,Car,2019,7,1,2
2,200701BS66127,Serious,2019-08-26,51.517715,Darkness - lighting unknown,Kensington and Chelsea,-0.210215,1,3,Dry,Single carriageway,Urban,unknown weather,Taxi/Private hire car,2019,8,0,26
3,200701BS66128,Serious,2019-08-16,51.495478,Daylight,Kensington and Chelsea,-0.202731,1,4,Dry,Single carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats),2019,8,4,16
4,200701BS66837,Slight,2019-09-03,51.488576,Darkness - lights lit,Kensington and Chelsea,-0.192487,1,2,Dry,Single carriageway,Urban,unknown weather,Other vehicle,2019,9,1,3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
182110,2.01E+12,Slight,2019-12-20,54.985289,Darkness - no lighting,Dumfries and Galloway,-3.210294,1,1,Frost or ice,Single carriageway,Rural,Other,Car,2019,12,4,20
182111,2.01E+12,Serious,2019-12-21,54.984105,Daylight,Dumfries and Galloway,-3.193693,2,1,Frost or ice,Single carriageway,Rural,Other,Car,2019,12,5,21
182112,2.01E+12,Slight,2019-12-23,55.166369,Darkness - no lighting,Dumfries and Galloway,-2.992068,1,1,Frost or ice,Single carriageway,Rural,Fog or mist,Van / Goods 3.5 tonnes mgw or under,2019,12,0,23
182113,2.01E+12,Slight,2019-12-23,54.995154,Darkness - lights lit,Dumfries and Galloway,-3.058338,1,1,Wet or damp,Single carriageway,Rural,Fine no high winds,Car,2019,12,0,23


In [36]:
accident[accident['Year'] == 2020]

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type,Year,Month,DayofWeek,Day
182115,200801BS69439,Serious,2020-01-23,51.506812,Darkness - lights lit,Kensington and Chelsea,-0.214677,1,1,Dry,Single carriageway,Urban,Fine no high winds,Car,2020,1,3,23
182116,200801BS69594,Serious,2020-02-15,51.496323,Daylight,Kensington and Chelsea,-0.170138,1,1,Dry,Dual carriageway,Urban,Fine no high winds,Bus or coach (17 or more pass seats),2020,2,5,15
182117,200801BS69698,Serious,2020-02-27,51.502042,Darkness - lights lit,Kensington and Chelsea,-0.190946,1,1,Dry,Single carriageway,Urban,Fine no high winds,Van / Goods 3.5 tonnes mgw or under,2020,2,3,27
182118,200801BS69935,Serious,2020-02-25,51.492733,Daylight,Kensington and Chelsea,-0.193763,1,1,Dry,Roundabout,Urban,Fine no high winds,Car,2020,2,1,25
182119,200801BS69938,Serious,2020-02-27,51.493271,Darkness - lights lit,Kensington and Chelsea,-0.199504,1,2,Dry,Dual carriageway,Urban,Fine no high winds,Motorcycle over 500cc,2020,2,3,27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
352701,2.01E+12,Slight,2020-11-24,55.233969,Darkness - no lighting,Dumfries and Galloway,-3.394276,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Minibus (8 - 16 passenger seats),2020,11,1,24
352702,2.01E+12,Slight,2020-12-06,54.983911,Daylight,Dumfries and Galloway,-3.195094,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Motorcycle 125cc and under,2020,12,6,6
352703,2.01E+12,Slight,2020-12-17,55.008072,Darkness - no lighting,Dumfries and Galloway,-3.334825,1,1,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc,2020,12,3,17
352704,2.01E+12,Slight,2020-12-16,54.986388,Daylight,Dumfries and Galloway,-3.180789,1,1,Dry,Single carriageway,Rural,Fine no high winds,Bus or coach (17 or more pass seats),2020,12,2,16


In [38]:
accident[accident['Year'] == 2021]

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type,Year,Month,DayofWeek,Day
352706,200901BS70001,Serious,2021-01-01,51.512273,Daylight,Kensington and Chelsea,-0.201349,1,2,Dry,One way street,Urban,Fine no high winds,Car,2021,1,4,1
352707,200901BS70002,Serious,2021-01-05,51.514399,Daylight,Kensington and Chelsea,-0.199248,11,2,Wet or damp,Single carriageway,Urban,Fine no high winds,Taxi/Private hire car,2021,1,1,5
352708,200901BS70003,Slight,2021-01-04,51.486668,Daylight,Kensington and Chelsea,-0.179599,1,2,Dry,Single carriageway,Urban,Fine no high winds,Taxi/Private hire car,2021,1,0,4
352709,200901BS70004,Serious,2021-01-05,51.507804,Daylight,Kensington and Chelsea,-0.203110,1,2,Frost or ice,Single carriageway,Urban,Other,Motorcycle over 500cc,2021,1,1,5
352710,200901BS70005,Serious,2021-01-06,51.482076,Darkness - lights lit,Kensington and Chelsea,-0.173445,1,2,Dry,Single carriageway,Urban,Fine no high winds,Car,2021,1,2,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
516255,2.01E+12,Serious,2021-12-15,55.072192,Darkness - no lighting,Dumfries and Galloway,-3.297647,1,5,Dry,Dual carriageway,Rural,Fine no high winds,Motorcycle 125cc and under,2021,12,2,15
516256,2.01E+12,Serious,2021-12-20,55.008460,Darkness - lights lit,Dumfries and Galloway,-3.088242,1,1,Frost or ice,Slip road,Rural,Fine no high winds,Car,2021,12,0,20
516257,2.01E+12,Slight,2021-12-23,54.995302,Daylight,Dumfries and Galloway,-3.259680,1,1,Frost or ice,Single carriageway,Rural,Snowing + high winds,Motorcycle over 125cc and up to 500cc,2021,12,3,23
516258,2.01E+12,Slight,2021-12-21,55.120172,Darkness - lights lit,Dumfries and Galloway,-3.356438,1,2,Frost or ice,Single carriageway,Rural,Other,Car,2021,12,1,21


In [41]:
accident[accident['Year'] == 2022]

Unnamed: 0,Index,Accident_Severity,Accident Date,Latitude,Light_Conditions,District Area,Longitude,Number_of_Casualties,Number_of_Vehicles,Road_Surface_Conditions,Road_Type,Urban_or_Rural_Area,Weather_Conditions,Vehicle_Type,Year,Month,DayofWeek,Day
516260,201001BS70003,Slight,2022-01-11,51.484087,Daylight,Kensington and Chelsea,-0.164002,1,2,Wet or damp,Single carriageway,Urban,Other,Car,2022,1,1,11
516261,201001BS70004,Slight,2022-01-11,51.509212,Darkness - lights lit,Kensington and Chelsea,-0.195273,1,1,Wet or damp,Single carriageway,Urban,Raining no high winds,Car,2022,1,1,11
516262,201001BS70006,Slight,2022-01-12,51.507804,Daylight,Kensington and Chelsea,-0.203110,1,2,Dry,Single carriageway,Urban,Fine no high winds,Motorcycle over 500cc,2022,1,2,12
516263,201001BS70007,Slight,2022-01-02,51.513314,Darkness - lights lit,Kensington and Chelsea,-0.198858,1,2,Dry,Roundabout,Urban,Fine no high winds,Van / Goods 3.5 tonnes mgw or under,2022,1,6,2
516264,201001BS70008,Slight,2022-01-04,51.484361,Darkness - lights lit,Kensington and Chelsea,-0.175802,1,2,Wet or damp,Single carriageway,Urban,Fine no high winds,Motorcycle 125cc and under,2022,1,1,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660674,201091NM01760,Slight,2022-02-18,57.374005,Daylight,Highland,-3.467828,2,1,Dry,Single carriageway,Rural,Fine no high winds,Car,2022,2,4,18
660675,201091NM01881,Slight,2022-02-21,57.232273,Darkness - no lighting,Highland,-3.809281,1,1,Frost or ice,Single carriageway,Rural,Fine no high winds,Car,2022,2,0,21
660676,201091NM01935,Slight,2022-02-23,57.585044,Daylight,Highland,-3.862727,1,3,Frost or ice,Single carriageway,Rural,Fine no high winds,Car,2022,2,2,23
660677,201091NM01964,Serious,2022-02-23,57.214898,Darkness - no lighting,Highland,-3.823997,1,2,Wet or damp,Single carriageway,Rural,Fine no high winds,Motorcycle over 500cc,2022,2,2,23


In [51]:
year19 = accident[accident['Year'] == 2019]

In [62]:
year19['Accident_Severity'].value_counts()

Accident_Severity
Slight     155079
Serious     24322
Fatal        2714
Name: count, dtype: int64

In [67]:
fatalkind = year19[year19['Accident_Severity'] =='Fatal']

In [71]:
fatalkind['Urban_or_Rural_Area'].value_counts()

Urban_or_Rural_Area
Rural          1773
Urban           941
Unallocated       0
Name: count, dtype: int64

<H1>3. In year 2019, Rural area has 65.32% fatality rate </H1>

In [53]:
year20 = accident[accident['Year'] == 2020]

In [63]:
year20['Accident_Severity'].value_counts()

Accident_Severity
Slight     145129
Serious     23121
Fatal        2341
Name: count, dtype: int64

In [58]:
year21 = accident[accident['Year'] == 2021]

In [64]:
year21['Accident_Severity'].value_counts()

Accident_Severity
Slight     139500
Serious     21997
Fatal        2057
Name: count, dtype: int64

In [60]:
year22 = accident[accident['Year'] == 2022]

In [65]:
year22['Accident_Severity'].value_counts()

Accident_Severity
Slight     124093
Serious     18777
Fatal        1549
Name: count, dtype: int64

<H1> On what specific day of the week did the accident happened the least and the most?</H1>
<h2>2. Saturday has the most number of accidents while the Monday has the least</h2>

In [29]:
accident['DayofWeek'].value_counts()

DayofWeek
5    107178
2     99558
3     99511
4     97900
1     94550
6     89302
0     72680
Name: count, dtype: int64

<H1> On what month did the accident happened the least and the most?</H1>
<h2>3. November has the highest recorder number of accidents while the February has the least</h2>

In [30]:
accident['Month'].value_counts()

Month
11    60424
10    59580
7     57445
6     56481
9     56455
5     56352
3     54086
8     53913
1     52872
12    51836
4     51744
2     49491
Name: count, dtype: int64

<H1> On what Year has the highest and lowest recorded accidents?</H1>
<h2>4. Year 2019 has the highest recorder number of accidents while year 2022 has the least</h2>

In [31]:
accident['Year'].value_counts()

Year
2019    182115
2020    170591
2021    163554
2022    144419
Name: count, dtype: int64

<H1> What Road Type did the accident happened the most? and the least?</H1>
<h2>5. Single carriageway is the Road Type that has most number of accidents while the Slip road has the least</h2>

In [32]:
accident['Road_Type'].value_counts()

Road_Type
Single carriageway    496663
Dual carriageway       99424
Roundabout             43992
One way street         13559
Slip road               7041
Name: count, dtype: int64

<H1> What Weather Condition did the accident happened the most? and the least?</H1>
<h2>6. The Weather Condition that has most number of accidents is Fine no high winds while the Snowing + high winds has the least</h2>

In [33]:
accident['Weather_Conditions'].value_counts()

Weather_Conditions
Fine no high winds       520885
Raining no high winds     79696
Other                     17150
unknown weather           14128
Raining + high winds       9615
Fine + high winds          8554
Snowing no high winds      6238
Fog or mist                3528
Snowing + high winds        885
Name: count, dtype: int64

<H1> What is the largest number of casualties happened considering the Road Type and Weather Conditions </H1>
<h2>7. The maximum number of casualties recorded was 524703 happened in Single Carriageway during Fine no high winds</h2>

In [34]:
accident.groupby(['Road_Type','Weather_Conditions'])['Number_of_Casualties'].sum()

Road_Type           Weather_Conditions   
Dual carriageway    Fine + high winds          2191
                    Fine no high winds       113679
                    Fog or mist                1136
                    Other                      3246
                    Raining + high winds       3000
                    Raining no high winds     19598
                    Snowing + high winds        257
                    Snowing no high winds      1427
                    unknown weather            2343
One way street      Fine + high winds           187
                    Fine no high winds        13104
                    Fog or mist                  36
                    Other                       381
                    Raining + high winds        200
                    Raining no high winds      1775
                    Snowing + high winds         15
                    Snowing no high winds        96
                    unknown weather             378
Roundabout          Fi

<h1> What type of vehicle has the highest recorded accident on both urban and rural area? What is the total number of casualties of that vehicle on respective areas?</h1>
<H2>8. Car is the type of vehicle that has the highest record of accident, 181922 is the total number of casualties in Rural Area while 316062 in Urban area which also concludes that Car accidents happened more in Urban Area</H2>

In [35]:
casual = accident.groupby(['Urban_or_Rural_Area','Vehicle_Type'])['Number_of_Casualties'].size()

In [36]:
casual

Urban_or_Rural_Area  Vehicle_Type                         
Rural                Agricultural vehicle                        675
                     Bus or coach (17 or more pass seats)       9025
                     Car                                      181922
                     Data missing or out of range                  0
                     Goods 7.5 tonnes mgw and over              6156
                     Goods over 3.5t. and under 7.5t            2232
                     Minibus (8 - 16 passenger seats)            718
                     Motorcycle 125cc and under                 5023
                     Motorcycle 50cc and under                  2710
                     Motorcycle over 125cc and up to 500cc      2674
                     Motorcycle over 500cc                      8957
                     Other vehicle                              1994
                     Pedal cycle                                  70
                     Ridden horse           

<H1>Is there a correlation between the number of vehicles been accident and the number of casualties </H1>
<h2>9. The number of vehicles been accident has no correlation on the number of casualties</h2>

In [37]:
cas_veh = accident['Number_of_Vehicles'].corr(accident['Number_of_Casualties'])

In [38]:
cas_veh

np.float64(0.22888886126927557)

<H1> How many casualties happened in a Urban and Rural area every day of the week</H1>
<h2>10. The table shows the number of casualties every day of the week in Rural and Urban area, as we can see Saturday (5) has the highest number in both of the area. We can also say that Wednesday(2) and Thursday(3) has somehow the same number. Lastly, Monday(0) has the lowest number of casualties among the day</h2>

In [39]:
dow = accident.groupby(['DayofWeek','Urban_or_Rural_Area'])['Number_of_Casualties'].sum()

In [40]:
dow.unstack()

Urban_or_Rural_Area,Rural,Unallocated,Urban
DayofWeek,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,47870,1,58655
1,48668,0,77253
2,49877,3,81330
3,49815,3,81652
4,49190,2,80353
5,56301,3,88491
6,51794,1,75306


<H1> How was the number of casualties in accident change every month in Urban and Rural Area?</H1>
<h2>11. The table shows the number of casualties every Month in Rural and Urban area, we can see the changes in of numbers in every month. As shown, Urban area has greater number compared to Rural, from the month of January the number decreases on February but it increases again on March. We can see this pattern on the given table, sometimes increases and sometimes decreases. But to conclude, November got the highest number of casualties on both area while February has the lowest. </h2>

In [41]:
mot = accident.groupby(['Month','Urban_or_Rural_Area'])['Number_of_Casualties'].sum()

In [42]:
mot.unstack()

Urban_or_Rural_Area,Rural,Unallocated,Urban
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,28298,1,42739
2,26352,0,40161
3,27615,1,45460
4,27645,1,42974
5,29995,4,46797
6,29762,2,46532
7,31603,1,46886
8,32216,0,43064
9,29361,0,46692
10,30791,0,49511


<h1>On what exact date has the maximum number of casualty?</h1>
<H2>12. As shown in the graph, it is January 3, 2019 that has the maximum number of casualties with total of 68</H2>

In [43]:
maxi = accident.groupby(['Month','Day','Year'])['Number_of_Casualties'].max()

In [44]:
maxi.unstack()

Unnamed: 0_level_0,Year,2019,2020,2021,2022
Month,Day,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,8.0,9.0,7.0,5.0
1,2,7.0,5.0,6.0,8.0
1,3,68.0,7.0,5.0,6.0
1,4,9.0,7.0,5.0,4.0
1,5,8.0,7.0,11.0,5.0
...,...,...,...,...,...
12,27,5.0,5.0,10.0,7.0
12,28,9.0,10.0,10.0,7.0
12,29,7.0,6.0,6.0,7.0
12,30,5.0,4.0,5.0,5.0


In [45]:
maxi.max()

np.float64(68.0)

In [46]:
maxi.idxmax()

(np.int32(1), np.int32(3), np.int32(2019))

<h1>On what exact date has the highest total number of casualty?</h1>
<H2>13. It is 07/13/2019 or August 13, 2019 that has the highest total number of casualties with total of 963</H2>

In [47]:
mixi = accident.groupby(['Month','Day','Year'])['Number_of_Casualties'].sum()

In [48]:
mixi.unstack()

Unnamed: 0_level_0,Year,2019,2020,2021,2022
Month,Day,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,494,360,360,373
1,2,505,349,427,400
1,3,605,426,506,371
1,4,553,539,413,484
1,5,587,489,620,450
...,...,...,...,...,...
12,27,373,402,342,266
12,28,486,383,421,274
12,29,445,377,391,303
12,30,251,435,333,242


In [49]:
mixi.max()

np.int64(963)

In [50]:
mixi.idxmax()

(np.int32(7), np.int32(13), np.int32(2019))

<H1>Does the accident severity correlates to Number of Casualties?</H1>
<H2>14. NO, Accident Severity has no correlation between number of casualties</H2>

In [51]:
f_stats,p_value = f_oneway(accident[accident['Accident_Severity'] == 'Slight']['Number_of_Casualties'],
                           accident[accident['Accident_Severity'] == 'Serious']['Number_of_Casualties'],
                           accident[accident['Accident_Severity'] == 'Fatal']['Number_of_Casualties'])
print(p_value)

0.0


In [52]:
accident['Light_Conditions'].value_counts()

Light_Conditions
Daylight                       484880
Darkness - lights lit          129335
Darkness - no lighting          37437
Darkness - lighting unknown      6484
Darkness - lights unlit          2543
Name: count, dtype: int64

<H1>Does the Light Conditions correlates to Number of Casualties being accident?</H1>
<H2>15. NO,Light Conditions has no correlation between number of Casualties being accident</H2>

In [53]:
f_stats,p_value = f_oneway(accident[accident['Light_Conditions'] == 'Daylight']['Number_of_Casualties'],
                           accident[accident['Light_Conditions'] == 'Darkness - lights lit']['Number_of_Casualties'],
                           accident[accident['Light_Conditions'] == 'Darkness - no lighting']['Number_of_Casualties'],
                           accident[accident['Light_Conditions'] == 'Darkness - lighting unknown']['Number_of_Casualties'],
                           accident[accident['Light_Conditions'] == 'Darkness - lights unlit']['Number_of_Casualties'])
                           
print(p_value)

0.0


<h1>What weather condition result to FATAL accident severity?</h1>
<h2>16. As we can see on the table, Fine no high winds weather condition mostly leads to a Fatal accident severity</h2>

In [54]:
was = accident.groupby(['Weather_Conditions','Accident_Severity']).size()

In [55]:
was.unstack()

Accident_Severity,Fatal,Serious,Slight
Weather_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fine + high winds,175,1245,7134
Fine no high winds,7100,72046,441739
Fog or mist,82,483,2963
Other,165,1801,15184
Raining + high winds,145,1261,8209
Raining no high winds,848,9468,69380
Snowing + high winds,3,109,773
Snowing no high winds,36,565,5637
unknown weather,107,1239,12782


<h1>What Light condition contributes to SERIOUS accident severity?</h1>
<h2>17. As we can see on the table, Daylight Light condition mostly contributes to a Serious accident severity</h2>

In [56]:
acs = accident.groupby(['Light_Conditions','Accident_Severity']).size()

In [57]:
acs.unstack()

Accident_Severity,Fatal,Serious,Slight
Light_Conditions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Darkness - lighting unknown,68,794,5622
Darkness - lights lit,1860,19130,108345
Darkness - lights unlit,45,360,2138
Darkness - no lighting,1612,7174,28651
Daylight,5076,60759,419045


<h1> What Road Surface condition causes most casualty?</h1>
<h2>19. Wet or Damp road surface condition mostly causes casualty</h2>

In [58]:
monthmax = accident.groupby(['Road_Surface_Conditions'])['Number_of_Casualties'].size()

In [59]:
monthmax.idxmax()

'Dry'

<h1>How does the motorcycle accidents mostly categorized (Accident Severity)?  </h1>
<h2>20. As shown, most of motorcycle accidents are categorized as slight severe</h2>

In [60]:
wey = accident.groupby(['Vehicle_Type','Accident_Severity']).size()

In [61]:
wey.unstack()

Accident_Severity,Fatal,Serious,Slight
Vehicle_Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Agricultural vehicle,21,282,1644
Bus or coach (17 or more pass seats),325,3373,22180
Car,6577,66461,424954
Data missing or out of range,0,0,6
Goods 7.5 tonnes mgw and over,216,2321,14770
Goods over 3.5t. and under 7.5t,67,857,5172
Minibus (8 - 16 passenger seats),29,276,1671
Motorcycle 125cc and under,189,2031,13049
Motorcycle 50cc and under,95,1014,6494
Motorcycle over 125cc and up to 500cc,105,1014,6537
