# Findings:
In this section we have looked into the accidents dataset to investigate if there is any correlation between accident occurrences and weather conditions. The findings showed that 69% of accidents occurred on dry road conditions with no rain or snow. Surprisingly, 80% of accidents happened in fine weather with no high wind speed, rain or snow, whilst only around 12% of accidents happened in rainy weather. The dataset also shows that road conditions were reported to be damp or wet when 27% of accidents occurred, which would indicate that accidents occurred not long after rain or snow fall.

# Main conclusions:
Accidents mainly occurred in good weather conditions. This could be related to the fact that drivers are more attentive when driving in bad weather conditions and tend to be more relaxed in fine weather conditions.
Around 27% of accidents are reported to have happened on wet or damp roads but during no rain or snow. This could mean the roads were still slippery and could influence the accident occurrences when drivers were making various maneuvers.
There are no particular months with higher number of accident occurrences. The number of accidents are evenly distributed throughout the year.

In [1]:
# Dependencies and Setup
%matplotlib notebook
import matplotlib.pyplot as plt
import matplotlib as mpl
import pandas as pd
import scipy.stats as sts
from scipy.stats import linregress
import numpy as np
import seaborn as sns

In [2]:
#Read in cleased data file
file_5 = "Resources/all.csv"
when_df = pd.read_csv(file_5)

when_df.head()

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0.1,Unnamed: 0,Accident_Index,1st_Road_Class,Accident_Severity,Year,Date,Day_of_Week,Latitude,Light_Conditions,Local_Authority_(District),...,Journey_Purpose_of_Driver,Junction_Location,make,model,Propulsion_Code,Sex_of_Driver,Towing_and_Articulation,Vehicle_Manoeuvre,Vehicle_Type,Was_Vehicle_Left_Hand_Drive
0,478587,201001BS70003,B,Slight,2010,2010-01-11,Monday,51.484087,Daylight,Kensington and Chelsea,...,Commuting to/from work,Mid Junction - on roundabout or on main road,CITROEN,BERLINGO FIRST 600,Petrol,Female,No tow/articulation,Turning right,Van / Goods 3.5 tonnes mgw or under,No
1,478588,201001BS70004,A,Slight,2010,2010-01-11,Monday,51.509212,Darkness - lights lit,Kensington and Chelsea,...,Journey as part of work,Mid Junction - on roundabout or on main road,RENAULT,SCENIC DYN DCI 130,Heavy oil,Male,No tow/articulation,Going ahead other,Car,No
2,478589,201001BS70007,Unclassified,Slight,2010,2010-01-02,Saturday,51.513314,Darkness - lights lit,Kensington and Chelsea,...,Other/Not known (2005-10),Mid Junction - on roundabout or on main road,NISSAN,PRIMERA SVE CVT,Petrol,Female,No tow/articulation,Going ahead right-hand bend,Car,No
3,478590,201001BS70007,Unclassified,Slight,2010,2010-01-02,Saturday,51.513314,Darkness - lights lit,Kensington and Chelsea,...,Other/Not known (2005-10),Mid Junction - on roundabout or on main road,MERCEDES,A140 ELEGANCE,Petrol,Female,No tow/articulation,Going ahead other,Car,No
4,478591,201001BS70008,A,Slight,2010,2010-01-04,Monday,51.484361,Darkness - lights lit,Kensington and Chelsea,...,Journey as part of work,Mid Junction - on roundabout or on main road,VAUXHALL,ZAFIRA ELEGANCE DTI,Heavy oil,Male,No tow/articulation,Turning right,Taxi/Private hire car,No


In [3]:
# create a stacked plot and save to Images folder
accident_sev_dow = when_df.groupby(['Day_of_Week','Accident_Severity'])['Accident_Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_dow.sort_values(by=['Fatal','Serious','Slight'], ascending=False)

bar1 = sorted_df['Fatal']
bar2 = sorted_df['Serious']
bar3 = sorted_df['Slight']

r = sorted_df['Day_of_Week']

sum12 = bar1+bar2

plt.figure(figsize=(10,6))
plt.title('Days of the week vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Days of the week")
plt.ylabel("Accident counts")
plt.bar(r,bar1,color='red', label='Fatal')
plt.bar(r,bar2, bottom=bar1,color='orange', label='Serious')
plt.bar(r,bar3,bottom=sum12,color='green',label='Slight')
plt.legend(loc="upper left")
plt.savefig("Images/Accident_distribution_Day_of_week.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [4]:
#  Group all accidents by gender to create a dataframe to use to plot the distribution

group_by_gender = when_df.groupby(["Accident_Index","Sex_of_Driver"])
gender_df = pd.DataFrame(group_by_gender.size())

# Create the dataframe with total count of accidents by gender
gender = pd.DataFrame(gender_df.groupby(["Sex_of_Driver"]).count())
gender.columns = ["QTY"]

# caclcualte percentage of accidents by gender
gender["%"] = (100*(gender["QTY"]/gender["QTY"].sum()))

# create gender data frame
gender

Unnamed: 0_level_0,QTY,%
Sex_of_Driver,Unnamed: 1_level_1,Unnamed: 2_level_1
Data missing or out of range,24,0.002692
Female,321393,36.043738
Male,543098,60.907618
Not known,27160,3.045953


In [5]:
# Generate a pie plot showing the distribution of accidents accros genders

colors = ['red', 'yellow','blue','green']

plot = gender.plot.pie(y='QTY',figsize=(10,10), colors = colors, startangle=270, shadow = False, autopct="%1.1f%%")

plt.title('Distribution of accidents by gender',fontsize = 20)
plt.ylabel('Gender',fontsize = 20)

plt.savefig("Images/Accident_distribution_gender.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [6]:
# reduce to just the data that i want to report on and rename columns
df_1 = when_df[['Accident_Index', 'Year', 'Date','Day_of_Week' ,'Time','Accident_Severity','Number_of_Casualties','Junction_Location','Sex_of_Driver']]
df_1.columns = ['Index', 'Year', 'Date', 'Day','Time','Severity','Casualties','Location','Gender']
df_1

Unnamed: 0,Index,Year,Date,Day,Time,Severity,Casualties,Location,Gender
0,201001BS70003,2010,2010-01-11,Monday,07:30,Slight,1,Mid Junction - on roundabout or on main road,Female
1,201001BS70004,2010,2010-01-11,Monday,18:35,Slight,1,Mid Junction - on roundabout or on main road,Male
2,201001BS70007,2010,2010-01-02,Saturday,21:21,Slight,1,Mid Junction - on roundabout or on main road,Female
3,201001BS70007,2010,2010-01-02,Saturday,21:21,Slight,1,Mid Junction - on roundabout or on main road,Female
4,201001BS70008,2010,2010-01-04,Monday,20:35,Slight,1,Mid Junction - on roundabout or on main road,Male
...,...,...,...,...,...,...,...,...,...
1077648,2016984130916,2016,2016-10-28,Friday,06:45,Slight,1,Cleared junction or waiting/parked at junction...,Female
1077649,2016984130916,2016,2016-10-28,Friday,06:45,Slight,1,Cleared junction or waiting/parked at junction...,Not known
1077650,2016984131116,2016,2016-11-01,Tuesday,16:45,Slight,2,Mid Junction - on roundabout or on main road,Female
1077651,2016984131316,2016,2016-10-29,Saturday,20:00,Slight,3,Not at or within 20 metres of junction,Male


In [9]:
#Add a column to place the hour in for binning
df_1['Time_f'] = ''

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_1['Time_f'] = ''


In [10]:
# use to_datetime to convert the data in the Time column into data that can be binned using the cut function
df_1['Time_f']=pd.to_datetime(df_1['Time']).dt.hour

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_1['Time_f']=pd.to_datetime(df_1['Time']).dt.hour


In [12]:
# Set new index to 'Index'
df_2 = df_1.set_index("Index")
# Create the bins in which Data will be held  
bins = [0, 6, 10, 16, 19, 24]

# Create the names for the five bins
group_names = ["Early AM", "Morning Commute", "Day Time", "Evening Commute", "Late PM"]

In [13]:
# use pd.cut to add the relevant names into the Time Bin column and save the file as a csv into the resources folder 
df_2["Time_Bin"] = pd.cut(df_2["Time_f"], bins, labels=group_names, include_lowest=True)
df_2.to_csv("Resources/time_bin.csv")
df_2

Unnamed: 0_level_0,Year,Date,Day,Time,Severity,Casualties,Location,Gender,Time_f,Time_Bin
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
201001BS70003,2010,2010-01-11,Monday,07:30,Slight,1,Mid Junction - on roundabout or on main road,Female,7,Morning Commute
201001BS70004,2010,2010-01-11,Monday,18:35,Slight,1,Mid Junction - on roundabout or on main road,Male,18,Evening Commute
201001BS70007,2010,2010-01-02,Saturday,21:21,Slight,1,Mid Junction - on roundabout or on main road,Female,21,Late PM
201001BS70007,2010,2010-01-02,Saturday,21:21,Slight,1,Mid Junction - on roundabout or on main road,Female,21,Late PM
201001BS70008,2010,2010-01-04,Monday,20:35,Slight,1,Mid Junction - on roundabout or on main road,Male,20,Late PM
...,...,...,...,...,...,...,...,...,...,...
2016984130916,2016,2016-10-28,Friday,06:45,Slight,1,Cleared junction or waiting/parked at junction...,Female,6,Early AM
2016984130916,2016,2016-10-28,Friday,06:45,Slight,1,Cleared junction or waiting/parked at junction...,Not known,6,Early AM
2016984131116,2016,2016-11-01,Tuesday,16:45,Slight,2,Mid Junction - on roundabout or on main road,Female,16,Day Time
2016984131316,2016,2016-10-29,Saturday,20:00,Slight,3,Not at or within 20 metres of junction,Male,20,Late PM


In [15]:
# create a series to count the severity of all accidents
acc_count_2 = df_2['Time_Bin'].value_counts()
acc_count_2

Day Time           429112
Evening Commute    234266
Morning Commute    230698
Late PM            114544
Early AM            69033
Name: Time_Bin, dtype: int64

In [16]:
#  Generate a bar plot showing how all accidents from 2010 to 2016 are distributed accross the time of day
Time_of_Day= acc_count_2.index.values
No_of_accidents = acc_count_2.values
x_axis = np.arange(len(No_of_accidents))
# plot the bar chart 
plt.figure(figsize=(10,8))
plt.bar(x_axis, No_of_accidents, color="b", align="center")
# create a variable for Accident_Severity on the x axis and set the rotation
tick_locations = [value for value in x_axis]
plt.xticks(tick_locations, Time_of_Day, rotation="vertical")
# set the limit of the x axis to position the bars within the chart
plt.xlim(-0.75, len(x_axis))
# set the limit of the y axis to position the top of the bars within the chart
plt.ylim(0, max(acc_count_2)+10000)
# add the titles
plt.title("All accidents from 2010 to 2016 distributed accross Time of Day")
plt.xlabel("Time of Day")
plt.ylabel("Number of accidents")
# Generate a bar plot
plt.tight_layout()
plt.savefig("Images/Accident_distribution_time_of_day")

<IPython.core.display.Javascript object>

In [17]:
# Using GroupBy in order to separate the data into fields according to the Year and Time Bin

when_count = df_2.groupby('Year')["Time_Bin"].value_counts()

when_count.head(20)

Year  Time_Bin       
2010  Day Time           58786
      Morning Commute    31436
      Evening Commute    30791
      Late PM            15616
      Early AM            9205
2011  Day Time           58991
      Evening Commute    32074
      Morning Commute    31516
      Late PM            15677
      Early AM            9134
2012  Day Time           56937
      Evening Commute    31022
      Morning Commute    30640
      Late PM            14965
      Early AM            8978
2013  Day Time           56231
      Evening Commute    31212
      Morning Commute    30742
      Late PM            14804
      Early AM            8798
Name: Time_Bin, dtype: int64

In [18]:
# visualise the data on a simple bar chart
when_count.plot(kind="bar")
plt.show()

<IPython.core.display.Javascript object>

In [19]:
#Read in cleased data file
file_6 = "Resources/time_bin.csv"
time_df = pd.read_csv(file_6)

time_df.head()

Unnamed: 0,Index,Year,Date,Day,Time,Severity,Casualties,Location,Gender,Time_f,Time_Bin
0,201001BS70003,2010,2010-01-11,Monday,07:30,Slight,1,Mid Junction - on roundabout or on main road,Female,7,Morning Commute
1,201001BS70004,2010,2010-01-11,Monday,18:35,Slight,1,Mid Junction - on roundabout or on main road,Male,18,Evening Commute
2,201001BS70007,2010,2010-01-02,Saturday,21:21,Slight,1,Mid Junction - on roundabout or on main road,Female,21,Late PM
3,201001BS70007,2010,2010-01-02,Saturday,21:21,Slight,1,Mid Junction - on roundabout or on main road,Female,21,Late PM
4,201001BS70008,2010,2010-01-04,Monday,20:35,Slight,1,Mid Junction - on roundabout or on main road,Male,20,Late PM


In [20]:
tod_group = time_df.groupby(["Time_Bin","Severity"])
tod_group_count = tod_group["Severity"].count().unstack()
tod_group_count = tod_group_count.sort_values(by=["Fatal","Serious","Slight"], ascending=False)
tod_group_count = tod_group_count[["Slight","Serious","Fatal"]]
tod_group_count.plot.bar(stacked=True, color={"Fatal": "red", "Serious": "orange", "Slight":"green"})
# plt.figure(figsize=(10,6))
plt.title('Time of the day vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Time of the day")
plt.ylabel("Accident counts")
#print(tod_group_count)
plt.savefig("Images/Accident_distribution_time_of_day.png", bbox_inches = "tight")

<IPython.core.display.Javascript object>

In [22]:
loc_group = time_df.groupby(["Location","Severity"])
loc_group_count = loc_group["Severity"].count().unstack()
loc_group_count = loc_group_count.sort_values(by=["Fatal","Serious","Slight"], ascending=False)
loc_group_count = loc_group_count[["Slight","Serious","Fatal"]]
loc_group_count.plot.bar(stacked=True, color={"Fatal": "red", "Serious": "orange", "Slight":"green"})
# plt.figure(figsize=(10,6))
plt.title('Location on the Road vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Location on the road")
plt.ylabel("Accident counts")
#print(tod_group_count)
plt.savefig("Images/Accident_distribution_location_on_the_road.png", bbox_inches = "tight")

<IPython.core.display.Javascript object>

In [24]:
gen_group = time_df.groupby(["Gender","Severity"])
gen_group_count = gen_group["Severity"].count().unstack()
gen_group_count = gen_group_count.sort_values(by=["Fatal","Serious","Slight"], ascending=False)
gen_group_count = gen_group_count[["Slight","Serious","Fatal"]]
gen_group_count.plot.bar(stacked=True, color={"Fatal": "red", "Serious": "orange", "Slight":"green"})
# plt.figure(figsize=(10,6))
plt.title('Gender of driver vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Gender of driver")
plt.ylabel("Accident counts")
#print(tod_group_count)
plt.savefig("Images/Accident_distribution_gender.png", bbox_inches = "tight")

<IPython.core.display.Javascript object>

In [26]:
tod1_group = time_df.groupby(["Time_f","Severity"])
tod1_group_count = tod1_group["Severity"].count().unstack()
tod1_group_count = tod1_group_count.sort_values(by=["Fatal","Serious","Slight"], ascending=False)
tod1_group_count = tod1_group_count[["Slight","Serious","Fatal"]]
tod1_group_count.plot.bar(stacked=True, color={"Fatal": "red", "Serious": "orange", "Slight":"green"})
# plt.figure(figsize=(10,6))
plt.title('Hours of the day vs Accidents')
plt.xticks(rotation = 0, horizontalalignment="right")

plt.xlabel("Hours of the day")
plt.ylabel("Accident counts")
#print(tod_group_count)
plt.savefig("Images/Accident_distribution_hours_of_the_day.png", bbox_inches = "tight")

<IPython.core.display.Javascript object>

In [27]:
accident_sev_time = time_df.groupby(['Time_f','Severity'])['Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_time.sort_values(by=['Fatal','Serious','Slight'], ascending=False)

bar1 = sorted_df['Fatal']
bar2 = sorted_df['Serious']
bar3 = sorted_df['Slight']

r = sorted_df['Time_f']

sum12 = bar1+bar2

plt.figure(figsize=(10,6))
plt.title('Time of the day vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Time of the day")
plt.ylabel("Accident counts")
plt.bar(r,bar1,color='red', label='Fatal')
plt.bar(r,bar2, bottom=bar1,color='orange', label='Serious')
plt.bar(r,bar3,bottom=sum12,color='green',label='Slight')
plt.legend(loc="upper right")
plt.savefig("Images/Accident_distribution_time_of_day_hrs.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [28]:
year_group = time_df.groupby(["Year","Severity"])
year_group_count = year_group["Severity"].count().unstack()
year_group_count = year_group_count.sort_values(by=["Fatal","Serious","Slight"], ascending=False)
year_group_count = year_group_count[["Slight","Serious","Fatal"]]
year_group_count.plot.bar(stacked=True, color={"Fatal": "red", "Serious": "orange", "Slight":"green"})
# plt.figure(figsize=(10,6))
plt.title('Years vs Accidents')
plt.xticks(rotation = 0, horizontalalignment="right")

plt.xlabel("Years")
plt.ylabel("Accident counts")
#print(tod_group_count)
plt.savefig("Images/Accident_distribution_Years.png", bbox_inches = "tight")

<IPython.core.display.Javascript object>

In [29]:
accident_sev_yr = time_df.groupby(['Year','Severity'])['Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_yr.sort_values(by=['Fatal','Serious','Slight'], ascending=False)

bar1 = sorted_df['Fatal']
bar2 = sorted_df['Serious']
bar3 = sorted_df['Slight']

r = sorted_df['Year']

sum12 = bar1+bar2

plt.figure(figsize=(10,6))
plt.title('Year vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Year")
plt.ylabel("Accident counts")
plt.bar(r,bar1,color='red', label='Fatal')
plt.bar(r,bar2, bottom=bar1,color='orange', label='Serious')
plt.bar(r,bar3,bottom=sum12,color='green',label='Slight')
plt.legend(loc="upper left")
plt.savefig("Images/Accident_distribution_year.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [33]:
all_df = when_df

all_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1077653 entries, 0 to 1077652
Data columns (total 33 columns):
 #   Column                       Non-Null Count    Dtype  
---  ------                       --------------    -----  
 0   Unnamed: 0                   1077653 non-null  int64  
 1   Accident_Index               1077653 non-null  object 
 2   1st_Road_Class               1077653 non-null  object 
 3   Accident_Severity            1077653 non-null  object 
 4   Year                         1077653 non-null  int64  
 5   Date                         1077653 non-null  object 
 6   Day_of_Week                  1077653 non-null  object 
 7   Latitude                     1077653 non-null  float64
 8   Light_Conditions             1077653 non-null  object 
 9   Local_Authority_(District)   1077653 non-null  object 
 10  Longitude                    1077653 non-null  float64
 11  Number_of_Casualties         1077653 non-null  int64  
 12  Number_of_Vehicles           1077653 non-n

In [34]:
accident_sev_dist = all_df.groupby(['Local_Authority_(District)','Accident_Severity'])['Accident_Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_dist.sort_values(by=['Fatal','Serious','Slight'], ascending=True)

bar1 = sorted_df['Fatal']
#bar2 = sorted_df['Serious']
#bar3 = sorted_df['Slight']

r = sorted_df['Local_Authority_(District)']

#sum12 = bar1+bar2

plt.figure(figsize=(30,80))
plt.title('Local Authority (District) vs Fatal Accidents')
plt.xticks(rotation = 0, horizontalalignment="right")

plt.xlabel("Number of Fatal accidents")
plt.ylabel("Local_Authority_(District)")
plt.barh(r,bar1,color='red', label='Fatal')
#plt.barh(r,bar2, bottom=bar1,color='green', label='Serious')
#plt.barh(r,bar3,bottom=sum12,color='orange',label='Slight')
plt.legend(loc="upper right")
plt.savefig("Images/Fatal_Accident_distribution_Local_Authority_(District).png", bbox_inches = "tight")
plt.show()


<IPython.core.display.Javascript object>

In [35]:
accident_sev_dist = all_df.groupby(['Local_Authority_(District)','Accident_Severity'])['Accident_Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_dist.sort_values(by=['Serious','Slight'], ascending=True)

bar1 = sorted_df['Serious']
#bar2 = sorted_df['Serious']
#bar3 = sorted_df['Slight']

r = sorted_df['Local_Authority_(District)']

#sum12 = bar1+bar2

plt.figure(figsize=(30,80))
plt.title('Local Authority (District) vs Serious Accidents')
plt.xticks(rotation = 0, horizontalalignment="right")

plt.xlabel("Number of Serious accidents")
plt.ylabel("Local_Authority_(District)")
plt.barh(r,bar1,color='orange', label='Serious')
#plt.barh(r,bar2, bottom=bar1,color='green', label='Serious')
#plt.barh(r,bar3,bottom=sum12,color='orange',label='Slight')
plt.legend(loc="upper right")
plt.savefig("Images/Serious_Accident_distribution_Local_Authority_(District).png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [36]:
accident_sev_dist = all_df.groupby(['Local_Authority_(District)','Accident_Severity'])['Accident_Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_dist.sort_values(by=['Slight'], ascending=True)

bar1 = sorted_df['Slight']
#bar2 = sorted_df['Serious']
#bar3 = sorted_df['Slight']

r = sorted_df['Local_Authority_(District)']

#sum12 = bar1+bar2

plt.figure(figsize=(30,80))
plt.title('Local Authority (District) vs Slight Accidents')
plt.xticks(rotation = 0, horizontalalignment="right")

plt.xlabel("Number of Slight accidents")
plt.ylabel("Local_Authority_(District)")
plt.barh(r,bar1,color='green', label='Slight')
#plt.barh(r,bar2, bottom=bar1,color='green', label='Serious')
#plt.barh(r,bar3,bottom=sum12,color='orange',label='Slight')
plt.legend(loc="upper right")
plt.savefig("Images/Slight_Accident_distribution_Local_Authority_(District).png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [37]:
accident_sev_age = all_df.groupby(['Age_Band_of_Driver','Accident_Severity'])['Accident_Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_age.sort_values(by=['Fatal','Serious','Slight'], ascending=False)

bar1 = sorted_df['Fatal']
bar2 = sorted_df['Serious']
bar3 = sorted_df['Slight']

r = sorted_df['Age_Band_of_Driver']

sum12 = bar1+bar2

plt.figure(figsize=(10,6))
plt.title('Age Band vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Age Band")
plt.ylabel("Accident counts")
plt.bar(r,bar1,color='red', label='Fatal')
plt.bar(r,bar2, bottom=bar1,color='orange', label='Serious')
plt.bar(r,bar3,bottom=sum12,color='green',label='Slight')
plt.legend(loc="upper right")
plt.savefig("Images/Accident_distribution_Age_Band.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [38]:
accident_sev_type = all_df.groupby(['Vehicle_Type','Accident_Severity'])['Accident_Index'].nunique().unstack().reset_index()
sorted_df=accident_sev_type.sort_values(by=['Fatal','Serious','Slight'], ascending=False)

bar1 = sorted_df['Fatal']
bar2 = sorted_df['Serious']
bar3 = sorted_df['Slight']

r = sorted_df['Vehicle_Type']

sum12 = bar1+bar2

plt.figure(figsize=(10,6))
plt.title('Vehicle Type vs Accidents')
plt.xticks(rotation = 50, horizontalalignment="right")

plt.xlabel("Vehicle_Type")
plt.ylabel("Accident counts")
plt.bar(r,bar1,color='red', label='Fatal')
plt.bar(r,bar2, bottom=bar1,color='orange', label='Serious')
plt.bar(r,bar3,bottom=sum12,color='green',label='Slight')
plt.legend(loc="upper right")
plt.savefig("Images/Accident_distribution_Vehicle_Type.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>

In [21]:
accident_counts = time_df.groupby(['Time_Bin', 'Severity']).size()
accident_counts = accident_counts.unstack(['Severity'])
acc_counts_dropped = accident_counts.drop(accident_counts.columns[[1,2]],axis=1)
total=acc_counts_dropped['Fatal'].sum()
acc_counts_dropped['Percent Fatality']=round(acc_counts_dropped['Fatal']/total*100, 2)
final_acc_df = acc_counts_dropped.reset_index()
plt.figure(figsize=(8,6))
explode = [0.05,0.05,0.05,0.05,0.05]
my_data = acc_counts_dropped['Percent Fatality'].tolist()
my_labels = final_acc_df['Time_Bin'].tolist()
plt.pie(my_data,labels=my_labels,autopct='%1.1f%%', explode=explode)
plt.rcParams['font.size'] = 13
plt.title('Accident Fatality Percentage by Time of Day')
plt.savefig("Images/Accident_Fatality_%_by_Time_of_Day.png", bbox_inches = "tight")
plt.show()

<IPython.core.display.Javascript object>