# Road Death Analysis
## A road death or fatality is a person who dies within 30 days of a crash as a result of injuries received in that crash.

Australian Road Deaths Database https://www.bitre.gov.au/statistics/safety/fatal_road_crash_database

Australia Population Data https://www.abs.gov.au/ausstats/abs@.nsf/0/D56C4A3E41586764CA2581A70015893E?Opendocument https://www.abs.gov.au/ausstats/abs@.nsf/0/632CDC28637CF57ECA256F1F0080EBCC?Opendocument https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3101.0Dec%202017?OpenDocument https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3101.0Dec%202018?OpenDocument https://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/3101.0Dec%202019?OpenDocument

Australia Breathe Testing Data https://data.gov.au/data/dataset/australian-random-breath-testing/resource/6c5cbea3-79dc-40b9-9775-49521a57eacb

Australian Crash Data https://data.gov.au/dataset/ds-sa-21386a53-56a1-4edf-bd0b-61ed15f10acf/details?q=

In [1]:
import folium
import pandas as pd
import json
import numpy as np
import glob
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px
import datetime
import calendar

In [2]:
# given the cvs filename read the csv data
def read_file(filename):
    return pd.read_csv(filename, sep=',', error_bad_lines=False, index_col=False, dtype='unicode')

data_roaddeaths = read_file('ardd_fatalities.csv')
data_roaddeaths.head()

Unnamed: 0,Crash ID,State,Month,Year,Dayweek,Time,Crash Type,Bus Involvement,Heavy Rigid Truck Involvement,Articulated Truck Involvement,...,Age,National Remoteness Areas,SA4 Name 2016,National LGA Name 2017,National Road Type,Christmas Period,Easter Period,Age Group,Day of week,Time of day
0,20205023,WA,5,2020,Wednesday,16:10,Multiple,No,No,No,...,88,,,,,No,No,75_or_older,Weekday,Day
1,20201099,NSW,5,2020,Saturday,23:25,Multiple,No,No,No,...,25,Major Cities of Australia,Sydney - Outer South West,Campbelltown,National or State Highway,No,No,17_to_25,Weekend,Night
2,20207006,NT,5,2020,Sunday,12:00,Single,No,No,No,...,47,,,,,No,No,40_to_64,Weekend,Day
3,20205049,WA,5,2020,Friday,19:45,Multiple,No,No,No,...,0,,,,,No,No,0_to_16,Weekend,Night
4,20205049,WA,5,2020,Friday,19:45,Multiple,No,No,No,...,21,,,,,No,No,17_to_25,Weekend,Night


In [3]:
data_pop = read_file('bitre_enforcement_data-rbt.csv')
data_pop.head()

Unnamed: 0,Year,State,RBT conducted,Positive RBT,Licences,Number of drivers and motorcycle riders killed with a blood alcohol concentration (BAC) above the legal limit,Number of deaths from crashes involving a driver or motorcycle rider with a blood alcohol concentration (BAC) above the legal limit
0,2008,NSW,4204525,27368,,58,78
1,2009,NSW,4440862,26595,,68,94
2,2010,NSW,4637033,24411,4791490.0,48,74
3,2011,NSW,4520010,22117,4893688.0,52,70
4,2012,NSW,4735462,19982,4984973.0,44,56


In [169]:
data_pop = read_file('apbs_population.csv')
data_pop.head()

Unnamed: 0,State,1989,1990,1991,1992,1993,1994,1995,1996,1997,...,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,NSW,5776283,5834021,5898731,5957822,5995055,6044819,6105560,6176461,6246267,...,7144292,7218529,7304244,7404032,7508353,7616168,7732858,7915069,8046070,8128984
1,Vic,4320164,4378592,4420373,4450217,4462766,4472989,4497660,4534984,4569297,...,5461101,5537817,5651091,5772669,5894917,6022322,6173172,6385849,6526413,6651074
2,Qld,2827637,2899283,2960951,3023198,3096185,3166566,3237380,3303192,3355417,...,4404744,4476778,4568687,4652824,4719653,4777692,4845152,4965033,5052827,5129996
3,SA,1419029,1432056,1446299,1455442,1458632,1463089,1465340,1469079,1475658,...,1627322,1639614,1656725,1671488,1686945,1700668,1712843,1728053,1742744,1759184
4,WA,1578434,1613049,1636067,1658544,1678722,1704649,1736066,1768206,1798341,...,2290845,2353409,2425507,2486944,2517608,2540672,2555978,2584768,2606338,2639080


In [5]:
all_filenames = [i for i in glob.glob("*_DATA_SA_Crash.csv")]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
combined_csv.to_csv( "combined_crash.csv", index=False, encoding='utf-8-sig')

data_crash = read_file('combined_crash.csv')
data_crash.head()

Unnamed: 0,REPORT_ID,Stats Area,Suburb,Postcode,LGA Name,Total Units,Total Cas,Total Fats,Total SI,Total MI,...,Crash Type,Unit Resp,Entity Code,CSEF Severity,Traffic Ctrls,DUI Involved,Drugs Involved,ACCLOC_X,ACCLOC_Y,UNIQUE_LOC
0,2018-1-17/01/2020,2 Metropolitan,MITCHELL PARK,5043,CC MARION.,4,0,0,0,0,...,Right Angle,2,Driver Rider,1: PDO,No Control,,,1324362.05,1662130.48,13243621662130.0
1,2018-2-17/01/2020,2 Metropolitan,GLANVILLE,5015,CITY OF PORT ADELAIDE ENFIELD,2,0,0,0,0,...,Rear End,2,Driver Rider,1: PDO,No Control,,,1319117.45,1679740.62,13191171679741.0
2,2018-3-17/01/2020,2 Metropolitan,GOLDEN GROVE,5125,CITY OF TEA TREE GULLY,2,0,0,0,0,...,Right Angle,1,Driver Rider,1: PDO,Traffic Signals,,,1337889.71,1685361.47,13378901685361.0
3,2018-4-17/01/2020,2 Metropolitan,ELIZABETH SOUTH,5112,CITY OF PLAYFORD.,2,4,0,1,3,...,Right Turn,1,Driver Rider,3: SI,Traffic Signals,,,1334568.88,1691271.22,13345691691271.0
4,2018-5-17/01/2020,2 Metropolitan,CROYDON,5008,CITY OF CHARLES STURT,2,3,0,0,3,...,Right Turn,2,Driver Rider,2: MI,Traffic Signals,,,1325517.03,1673428.59,13255171673429.0


In [6]:
all_filenames = [i for i in glob.glob("*_DATA_SA_Casualty.csv")]
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ])
combined_csv.to_csv( "combined_casualty.csv", index=False, encoding='utf-8-sig')

data_casualty = read_file('combined_casualty.csv')
data_casualty.head()

Unnamed: 0,REPORT_ID,UND_UNIT_NUMBER,CASUALTY_NUMBER,Casualty Type,Sex,AGE,Position In Veh,Thrown Out,Injury Extent,Seat Belt,Helmet,Hospital
0,2012-17-21/08/2019,2,1,Driver,Male,36,Driver,Not Thrown Out,By Private,Fitted - Worn,,
1,2012-21-21/08/2019,2,1,Rider,Male,31,,Thrown Out,By Private,,Worn,
2,2012-24-21/08/2019,2,1,Passenger,Female,33,Front Seat Left Passenger,Not Thrown Out,By Private,Fitted - Worn,,
3,2012-26-21/08/2019,2,1,Driver,Female,51,Driver,Not Thrown Out,By Private,Fitted - Worn,,
4,2012-33-21/08/2019,2,1,Driver,Female,56,Driver,Not Thrown Out,By Private,Fitted - Worn,,


# Road Deaths and fatality rate per 100,000 population from 1989 to 2019 in Australia

In [7]:
data_rate = read_file('ardd_fatalities.csv')

# Count the total number for each year except 2020
years_total = {}
for year in data_rate["Year"]:
    if str(year) != "2020":
        if year not in years_total:
            years_total[year] = 1
        else:
            years_total[year] += 1
            
#Calculate the fatality rate for each year
data_pop = read_file('apbs_population.csv')

years_rate = {}
index_for_total = 8
TEN_THS = 100000

for key,value in years_total.items():
    years_rate[key] = round(int(value) / int(data_pop[key][index_for_total]) * TEN_THS,3)

# convert to dictionary
re_dic = {"year":[],"number":[], "rate":[]}
for key,value in years_total.items():
    re_dic['year'].append(key)
    re_dic["number"].append(value)
    re_dic["rate"].append(years_rate[str(key)])

# convert to csv
yearly_states_total_data = pd.DataFrame(data=re_dic)
yearly_states_total_data.head()

Unnamed: 0,year,number,rate
0,2019,1194,4.678
1,2018,1134,4.504
2,2017,1222,4.933
3,2016,1292,5.341
4,2015,1204,5.055


In [34]:
# Draw the graph
fig_rate = make_subplots(specs=[[{"secondary_y": True}]])

fig_rate.add_trace(
    go.Bar(
        x=yearly_states_total_data.year,
        y=yearly_states_total_data.number, 
        name="Road Deaths",
        opacity=1,
        marker=dict(color=yearly_states_total_data.number, coloraxis="coloraxis", cmax=2800, cmin=0,),
        ),
    secondary_y=False
)

fig_rate.update_layout(coloraxis=dict(colorscale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58']), showlegend=False)

fig_rate.add_trace(
    go.Scatter(
        x=yearly_states_total_data.year,
        y=yearly_states_total_data.rate, 
        name="Rate/pop", #change
        line=dict(width=1.5*2, color='darkred',dash='dash'), #change
        showlegend=True
        ),
    secondary_y=True,
)

fig_rate.update_layout(
    title_text="<b>Road deaths and fatality rate per 100,000 population from 1989 to 2019 in Australia<b>",
    title_x=0.5,
    plot_bgcolor='white',
    xaxis_title_text="<b>Year<b>",
)

fig_rate.update_yaxes(title_text="<b>Total number of road deaths</b>", secondary_y=False)
fig_rate.update_yaxes(title_text="<b>Fatality rate/100,000 population", secondary_y=True)

fig_rate.show()

Comment:
Total number of road deaths clearly shows a decrasing trend in Australia from 1989 to 2019. The decrease in 1990-1995 can be attributed to the Road Safety 2000 Strategy, and some decreases also correspond to breath testing. Then the sudden decrease in 2008 is related to the economic crisis. Meanwhile, the fatality rate per 100,000 population further demonstrates the effort that givernment has made in regulation and road safety designs to reduce the death number.

# Road Deaths by State from 1989 to 2019 in Australia

In [9]:
data_states = read_file('ardd_fatalities.csv')

# Count the total number for each year and each state except 2020
states_total = {}
for year,states in zip(data_states["Year"],data_states["State"]):
    if str(year) != "2020":
        if year not in states_total:
            states_total[year] = {}
        if states not in states_total[year]:
            states_total[year][states] = 1
        else:
            states_total[year][states] += 1

# convert to dictionary
states_total_data = {}
for year in states_total: 
    if not bool(states_total_data):
        states_total_data["year"] = []
        states_total_data["state"] = []
        states_total_data["number"] = []
    for states,values in states_total[year].items():
        states_total_data["year"].append(year)
        states_total_data["state"].append(states)
        states_total_data["number"].append(values)

# convert to csv
states_total_data = pd.DataFrame(data=states_total_data)
states_total_data.head()

Unnamed: 0,year,state,number
0,2019,Vic,270
1,2019,SA,114
2,2019,WA,163
3,2019,NSW,357
4,2019,Qld,219


In [35]:
# create a list with all states name
data = read_file('ardd_fatalities.csv')
states_total={}
for states in data["State"]:
    if states not in states_total:
        states_total[states] = 1
    else:
        states_total[states] += 1
states_list = sorted(states_total.keys(), key=lambda k: states_total[k])


# from the cvsdata reanage
def get_cvsdata(name_list, cvs_data):
    traceCVS_list = []
    for states in name_list:
        traceCVS_list.append(cvs_data[cvs_data['state'].isin([states])])
    return  traceCVS_list

traceCVS_list = get_cvsdata(states_list, states_total_data)

def get_traceScatter(traceCVS_list, y_name, x_name, name_list, color_list):
    traceScatter_list=[]
    index_color = 0
    for trace,state in zip(traceCVS_list, name_list):
        traceScatter_list.append(go.Scatter(
            x=trace[x_name][0:],
            y=trace[y_name][0:],
            mode='lines',
            name=state,  
            line=dict(width=1.5, color=color_list[index_color]), #change the color 'color = color in the color list'
            stackgroup='one'
        
        ))
        index_color += 1
    return  traceScatter_list


# from the cvsdata -> frames data
year_range = 30
frames = [dict(data = [
    dict(type='scatter', x=trace['year'][k:],y=trace['number'][k:])
    for trace in traceCVS_list],
    traces=[i for i in range(len(states_list))],
)for k in range(year_range, -1, -1)]


# Draw the graph
layout = go.Layout(
                title="<b>Road deaths by state from 1989 to 2019 in Australia<b>",
                xaxis_title="<b>Year</b>", title_x=0.5,
                yaxis_title="<b>Total number of road deaths<b>",
                plot_bgcolor= 'white',
                showlegend=True,
                hovermode='x unified',
                updatemenus=[
                    dict(
                        type='buttons', 
                        showactive=True,
                        y=1.2,
                        x=1.2,
                        xanchor='right',
                        yanchor='top',
                        pad=dict(t=0, r=10),
                        buttons=[dict(label='Play',
                        method='animate',
                        args=[
                            None, 
                            dict(frame=dict(duration=200, 
                                redraw=False),
                                transition=dict(duration=0),
                                fromcurrent=True,
                                mode='immediate')]
                        )]
                    )
                ]
            )


color_list = ['#ffffd9','#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494']
traceScatter_list = get_traceScatter(traceCVS_list, 'number', 'year', states_list, color_list)

fig = go.Figure(data=traceScatter_list, frames=frames, layout=layout)
fig.show()

Comment: Almost all of the states follows the decreasing trend of road deaths numbers. It can also be seen that NSW and Vic account for half of Australian total deaths numbers.

# Average Road Death per year by State

In [11]:
states_name_list = ['NSW', 'Vic', 'Qld','SA','WA','Tas','NT','ACT']
id_list = [i for i in range(8)]

# Calculate the fatality rate per 100,000 population
def c_fatality_rate(road_death, pop):
    return int(road_death)/int(pop.replace(',','')) * 100000

# Calculate the total road death for each state
def statestotal_eachyear(year):
    data_death=read_file('ardd_fatalities.csv')
    states_total={}
    for states, years in zip(data_death["State"],data_death["Year"]):
        if str(year) != "2020":
            if year==None:
                if states not in states_total:
                    states_total[states] = 1
                else:
                    states_total[states] += 1
            elif years==year:
                if states not in states_total:
                    states_total[states] = 1
                else:
                    states_total[states] += 1 
    return states_total

# Match the state id
def get_state_id(states_name_list, state_name):
    ids = 0
    for i in states_name_list: 
        if i == state_name:
            return ids
        ids+=1
        
# Match year and state
def get_year_state_pop(year, state_id, data_pop):
    return data_pop[str(year)][state_id]

# Get the state fatality rate for each year
def calculate_stack(roaddeath_filename, pop_filename, states_name_list, id_list):
    data_pop = read_file(pop_filename)
    year_start=1989
    year_end=2020
    states_fatality_rate_each_year=statestotal_eachyear(str(year_start))
    while year_start != year_end: 
        temp=statestotal_eachyear(str(year_start))
        ids=0
        for key,value in temp.items():  
            pop = get_year_state_pop(year_start,get_state_id(states_name_list, key), data_pop)
            if str(year_start)=='1989':
                states_fatality_rate_each_year[key] = c_fatality_rate(value,pop)
            else:
                states_fatality_rate_each_year[key] +=  c_fatality_rate(value,pop)
            ids += 1
        year_start += 1
    return states_fatality_rate_each_year

# Average the value by year
def calculate_ave(stack_dic, year_total):
    for key,value in stack_dic.items(): 
        stack_dic[key] = round(value/year_total,3)
    return stack_dic

# Convert to dictionary and csv
def covert_dic_csv(states_name_list, dic_data, string_Col):
    re_dic = {"state_name":[],"state_id":[], string_Col:[]}
    for key,value in dic_data.items():
        re_dic['state_name'].append(key)
        re_dic["state_id"].append(get_state_id(states_name_list, key))
        re_dic[string_Col].append(value)
    yearly_states_average_data = pd.DataFrame(data=re_dic)
    return yearly_states_average_data

# Add jason file
def add_newinfinjson(csv_data, column_name, newkey_name):
    with open('australian-states.json') as f:
        data = json.load(f)
    for id_s in data['features']:
        check_id = str(id_s['id']) 
        for check,Average in zip(csv_data['state_id'], csv_data[column_name]):
            if check_id == str(check):
                id_s['properties'][newkey_name] = id_s['properties']['STATE_NAME'] + ' : ' + str(Average)  
    return data

In [12]:
states_total = statestotal_eachyear(None)
states_total = calculate_ave(states_total, 31)
yearly_states_average_data = covert_dic_csv(states_name_list, states_total, "Average")
yearly_states_average_data.head()

Unnamed: 0,state_name,state_id,Average
0,WA,4,195.387
1,NSW,0,514.161
2,NT,6,51.387
3,Vic,1,363.935
4,Qld,2,325.871


In [36]:
# Draw the graph
fig_state = px.bar(yearly_states_average_data, x='state_name', y='Average',
             color='Average',
             color_continuous_scale=px.colors.sequential.YlGnBu,
             labels={'state_name':'State','Average':''}
                  )

fig_state.update_layout(
    title_text="<b>Average road deaths per year by state in Australia<b>",
    plot_bgcolor='white',
    xaxis_title_text="<b>State<b>",
    yaxis_title_text="<b>Total number of road deaths<b>",title_x=0.5
)

fig_state.show()


# Draw the map
data = add_newinfinjson(yearly_states_average_data, 'Average', 'Average death per year')

Australia_Location = [-25.274398, 133.775136]
m = folium.Map(Australia_Location, zoom_start=4)
folium.TileLayer('cartodbpositron').add_to(m)
choropleth = folium.Choropleth(
    geo_data=data,    # this data is for shpae of states 
    name='choropleth',
    data=yearly_states_average_data, # this data is for the toatal numbers for each states
    columns=['state_id', 'Average'], 
    key_on='feature.id',  # this is the key that indefine states
    fill_color='YlGnBu', # set the color of the map YlGnBu
    fill_opacity=1,
    line_opacity=1,
    legend_name='yearly road death number', 
    bins=np.arange(0,600,60),
).add_to(m)

choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['Average death per year'])
)
m

Comment: On average, more than 500 people die from car accidents per year in NSW, which is the highest figure among all the states. By contrast, ACT and NT has the lowest average road death number of around 15 and 50 respectively per year. The road deaths numbers in Victoria and Queensland are also considerably large, with around 350 per year. When it comes to SA and WA, both of them has an average of approximate 200 per year.

# Average fatality rate per 100,000 population per year by state

In [18]:
stacked_dict = calculate_stack('ardd_fatalities.csv', 'apbs_population.csv', states_name_list, id_list)
states_average = calculate_ave(stacked_dict,31)
yearly_states_rate_data = covert_dic_csv(states_name_list, states_average, 'rate')
yearly_states_rate_data.head()

Unnamed: 0,state_name,state_id,rate
0,SA,3,9.237
1,Vic,1,7.343
2,ACT,7,4.597
3,Qld,2,8.797
4,WA,4,9.739


In [19]:
# Draw the graph
fig_state_rate = px.bar(yearly_states_rate_data, x='state_name', y='rate',
             color='rate',
             color_continuous_scale=px.colors.sequential.YlGnBu,
             labels={'state_name':'State','rate':''}
                  )

fig_state_rate.update_layout(
    title_text="<b>Average fatality rate per 100,000 population per year by state in Australia<b>",
    plot_bgcolor='white',
    xaxis_title_text="<b>State<b>",
    yaxis_title_text="<b>Fatality rate per 100,000 population<b>",title_x=0.5
)

fig_state_rate.show()

# Draw the map
data = add_newinfinjson(yearly_states_rate_data, 'rate', 'Average fatality rate per 100,000 population per year')  

m = folium.Map(Australia_Location, zoom_start=4)
folium.TileLayer('cartodbpositron').add_to(m)
choropleth = folium.Choropleth(
    geo_data=data,    # this data is for shpae of states 
    name='choropleth',
    data=yearly_states_rate_data, # this data is for the toatal numbers for each states
    columns=['state_id', 'rate'], 
    key_on='feature.id',  # this is the key that indefine states
    fill_color='YlGnBu', # set the color of the map YlGnBu
    fill_opacity=1,
    line_opacity=1,
    legend_name='yearly fatality rate per 100,000 population', 
    bins=np.arange(3,33,3),
).add_to(m)


choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['Average fatality rate per 100,000 population per year'])
)
m

Comment: However, by normalizing the road deaths by the population for each state and each year, it can be seen that NT ranked first with 25 in 100,000 people die from road accidents per year, while the rest of states has an value below 10.

# Why NT has the highest fatality rate?

In [20]:
data_test = read_file('bitre_enforcement_data-rbt.csv')
data_test.head()

# Set the correct format
def divided_by(string1, string2):
    return int(str(string1).replace(',', '')) / int(str(string2).replace(',', ''))

# Set the divide function
def divided_dict(dict1, dict2):
    for key,value in dict1.items():
        dict1[key] = round(value/dict2[key],5)
        
# Calculate the average rate for each state
def get_total_untis(data):
    states_untis={}
    states_occurs={}
    for state,untis,RBT in zip(data['State'], data[' RBT conducted '],data[' Positive RBT ']):
        if str(untis) != 'nan' and str(RBT) != 'nan':
            if state not in states_untis:
                states_untis[state] = divided_by(RBT, untis)
                states_occurs[state] = 1
            else:
                states_untis[state] += divided_by(RBT, untis)
                states_occurs[state] += 1
    divided_dict(states_untis,states_occurs)
    return states_untis

# convert to dictionary
re_dic = {"state":[], "rate":[]}
for key,value in get_total_untis(data_test).items():
    re_dic['state'].append(key)
    re_dic['rate'].append(value)

# convert to csv
test_state_data = pd.DataFrame(data=re_dic)
test_state_data.head()

Unnamed: 0,state,rate
0,NSW,0.00433
1,Vic,0.00198
2,Qld,0.00745
3,SA,0.01404
4,WA,0.01281


In [37]:
# draw the graph
fig_state_breath = px.bar(test_state_data, x='state', y='rate',
             color='rate',
             color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'],
             labels={'state_name':'State','rate':''}
                  )
    
fig_state_breath.update_layout(
    title_text="<b>Positive breath testing rate by state in Australia<b>",
    plot_bgcolor='white',
    xaxis_title_text="<b>State<b>",
    yaxis_title_text="<b>Positive breath testing rate<b>",title_x=0.5
)

fig_state_breath.show()


Comment: The most common reason for road accidents is drunk driving, so another dataset about postive breath testing is explored. The result matches our initial guess that the main reason for NT has an extremely high road fatality rate is because of alcohol. Thus, the advice for givernment would be putting more effort on alcohol testing on road in order to reduce the death rate. 

# Road deaths by remoteness area and crash type in Australia

In [221]:
data_remote = read_file('ardd_fatalities.csv')

# Count the total number for each year and each state except 2020
remote_total = {}
for remote,crash in zip(data_remote["National Remoteness Areas"],data_remote["Crash Type"]):
    if str(crash) != "" and str(crash) != "nan" and str(remote) != "nan":
        if remote not in remote_total:
            remote_total[remote] = {}
        if crash not in remote_total[remote]:
            remote_total[remote][crash] = 1
        else:
            remote_total[remote][crash] += 1

# convert to dictionary
remote_crash = {}
for remote in remote_total: 
    if not bool(remote_crash):
        remote_crash["remote"] = []
        remote_crash["crash"] = []
        remote_crash["number"] = []
    for crash,values in remote_total[remote].items():
        remote_crash["remote"].append(remote)
        remote_crash["crash"].append(crash)
        remote_crash["number"].append(values)

# convert to csv
remote_crash_data = pd.DataFrame(data=remote_crash)
remote_crash_data.head()

Unnamed: 0,remote,crash,number
0,Major Cities of Australia,Multiple,890
1,Major Cities of Australia,Single,604
2,Major Cities of Australia,Pedestrian,407
3,Inner Regional Australia,Single,879
4,Inner Regional Australia,Multiple,800


In [222]:
fig_remote = px.histogram(remote_crash_data, 
                   x="remote", 
                   y="number", 
                   color='crash',  
                   barmode="group",
                   color_discrete_sequence=['#225ea8','#7fcdbb','#c7e9b4'],
                   opacity=0.9
                  )

fig_remote.update_layout(
    title_text="<b>Road deaths by remoteness area and crash type in Australia<b>", 
    plot_bgcolor='white', 
    xaxis_title_text="<b>Remoteness area<b>",
    yaxis_title_text="<b>Total number of road deaths<b>",
    legend_title_text="",
    title_x=0.5,
    bargap=0.1,
    bargroupgap=0.1 
)

fig_remote.show()

Comment: It is clear that major cities of Australia has more road deaths than the other three remoteness areas, with the highest multiple (890) and pedestrian (407) crash types in this region. Meanwhile, the highest road deaths number for single crashes lies in inner regional Australia, of 807 in total. By contrast, remote and very remote Australia have the lowest road deaths numbers of below 200 for all the crash types.

# Road Deaths Distribution by month and time in Australia

In [76]:
data_month = read_file('ardd_fatalities.csv')

def c_stirng_time(string):
    time_obj = datetime.datetime.strptime(string, '%H:%M')
    return time_obj

def create_timezone():
    begin_t = '00:00'
    return [c_stirng_time(begin_t) + datetime.timedelta(hours=i) for i in range(0,27,3)]

def set_time():
    return {'00:00-02:59':0, '03:00-05:59':0, '06:00-08:59':0, '09:00-11:59':0, '12:00-14:59':0,'15:00-17:59':0, '18:00-20:59':0, '21:00-23:59':0}

def find_zone(time_string):
    timezone_string = set_time()
    timezone = create_timezone()
    time = c_stirng_time(time_string)
    if timezone[0] < time < timezone[1]:
        return '00:00-02:59'
    elif timezone[1] < time < timezone[2]:
        return '03:00-05:59'
    elif timezone[2] < time < timezone[3]:
        return '06:00-08:59'
    elif timezone[3] < time < timezone[4]:
        return '09:00-11:59'
    elif timezone[4] < time < timezone[5]:
        return '12:00-14:59'
    elif timezone[5] < time < timezone[6]:
        return '15:00-17:59'
    elif timezone[6] < time < timezone[7]:
        return '18:00-20:59'
    else:
        return '21:00-23:59'

def get_month(intger):
    return calendar.month_name[intger]

def set_month(): 
    temp = {}
    i = 1
    while i <= 12: 
        temp[get_month(i)] = set_time()
        i += 1
    return temp

month_total = set_month()
for month, time, year in zip(data_month['Month'], data_month['Time'],data_month['Year']):
    if str(month) != "-9" and str(time)!= "-9" and str(year) != "2020":
        index = find_zone(time)
        month_total[get_month(int(month))][index] += 1

re_dic = {'month':[], 'time':[], 'number':[]}
for key,value in month_total.items():
    for key2,value2 in value.items(): 
        re_dic['month'].append(key)
        re_dic['time'].append(key2)
        re_dic['number'].append(value2)

month_time_total_data = pd.DataFrame(data=re_dic)
month_time_total_data.head()

Unnamed: 0,month,time,number
0,January,00:00-02:59,375
1,January,03:00-05:59,238
2,January,06:00-08:59,343
3,January,09:00-11:59,479
4,January,12:00-14:59,644


In [77]:
fig_month_time = px.density_heatmap(month_time_total_data, 
                         x="time", 
                         y="month", 
                         z='number',
                         color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'])

fig_month_time.update_layout(
    title_text="<b>Road Deaths Distribution by month and time in Australia<b>", 
    plot_bgcolor='white', 
    xaxis_title_text="<b>Time<b>",
    yaxis_title_text="<b>Month<b>",
    legend_title_text="",
    title_x=0.5
)

fig_month_time.show()

Comment: When comparing the road deaths number by month and time, it is clear that the trend for each month is almost the same: the most safe time period is 3am to 6am, and then the deaths number increases gradually throughout the day and reaches the highest until 9pm to 0pm.

# Road Deaths Distribution by weekday and time in Australia

In [78]:
data_weekdays = read_file('ardd_fatalities.csv')

def set_weekday(): 
    return {'Monday' : set_time(),'Tuesday' : set_time(),'Wednesday' : set_time(),'Thursday' : set_time(),'Friday' : set_time(),'Saturday' : set_time(),'Sunday' : set_time(), }
    
weekdays_total = set_weekday()
for dayweek, time, year in zip(data_weekdays['Dayweek'], data_weekdays['Time'],data_weekdays['Year']):
    if str(dayweek) != "-9" and str(time)!= "-9" and str(year) != "2020":
        index = find_zone(time)
        weekdays_total[dayweek][index] += 1

re_dic = {'weekday':[], 'time':[], 'number':[]}
for key,value in weekdays_total.items():
    for key2,value2 in value.items(): 
        re_dic['weekday'].append(key)
        re_dic['time'].append(key2)
        re_dic['number'].append(value2)

weekday_time_total_data = pd.DataFrame(data=re_dic)
weekday_time_total_data.head()

Unnamed: 0,weekday,time,number
0,Monday,00:00-02:59,349
1,Monday,03:00-05:59,286
2,Monday,06:00-08:59,638
3,Monday,09:00-11:59,829
4,Monday,12:00-14:59,881


In [79]:
fig_weekday_time = px.density_heatmap(weekday_time_total_data, 
                         x="time", 
                         y="weekday", 
                         z='number',
                         color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'])

fig_weekday_time.update_layout(
    title_text="<b>Road Deaths Distribution by weekday and time in Australia<b>", 
    plot_bgcolor='white', 
    xaxis_title_text="<b>Time<b>",
    yaxis_title_text="<b>Weekday<b>",
    legend_title_text="",
    title_x=0.5
)

fig_weekday_time.show()

Comment: The highest frequency of road deaths is after 9pm on Friday and Saturday. This could be related to the previous statement of NT has highest fatality rate because of the driving with achohol. Friday and Saturday night is the peak time for people to go to hubs so it is reasonable why these time periods has the highest road death number.
In addition, weekends has a higher number of road deaths for weekdays in general because people drive out for shopping and travelling more frequently.

# Road Deaths Rate if Crashes Happen by Speed Limit in Australia

In [80]:
def set_speeds():
    return {'<40':0, '40-60':0, '60-80':0, '80-100':0, '>=100':0}

def add_speed(speed, classes):
    if str(speed) == '<40' or int(speed) < 40: 
        classes['<40'] += 1
    elif 40 <= int(speed) < 60:
        classes['40-60'] += 1 
    elif 60 <= int(speed) < 80:
        classes['60-80'] += 1 
    elif 80 <= int(speed) < 100:
        classes['80-100'] += 1 
    elif int(speed) >= 100:
        classes['>=100'] += 1 
        
data_speed = read_file('ardd_fatalities.csv')
speed_total = set_speeds()
for speed, year in zip(data_speed["Speed Limit"],data_speed["Year"]):
    if str(speed) != "Unspecified" and str(speed)!= "-9" and 2019>=int(year)>=2012:    
        add_speed(speed, speed_total)      
        
        
data_crash = read_file('combined_crash.csv')
speed_crash_total = set_speeds()
for speed in data_crash["Area Speed"]:
     add_speed(speed, speed_crash_total)

        
ret = dict()
for key, dividend in speed_total.items():
    ret[key] = round(dividend/speed_crash_total.get(key, 1),3)

    
re_dic = {"age":[], "crashes":[], "deaths":[], "rate":[]}
for key,value in speed_total.items():
    re_dic['age'].append(key)
    re_dic['crashes'].append(speed_crash_total[key])
    re_dic['deaths'].append(value)
    re_dic["rate"].append(ret[key])
    
speed_crash_total_data = pd.DataFrame(data=re_dic)
speed_crash_total_data.head()

Unnamed: 0,age,crashes,deaths,rate
0,<40,624,36,0.058
1,40-60,36096,1200,0.033
2,60-80,65061,2225,0.034
3,80-100,13119,1659,0.126
4,>=100,12772,4482,0.351


In [96]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go

#'ffffd9','edf8b1','c7e9b4','7fcdbb','41b6c4','1d91c0','225ea8','#253494'
fig_age = make_subplots(
    rows=1, cols=2,    
    specs=[[{"type": "domain"},{"type": "xy"}]],
    subplot_titles=("Total number of road deaths", "Deaths rate if crashes happen")
)

fig_age.add_trace(go.Pie(
        values=speed_crash_total_data.deaths,
        labels=speed_crash_total_data.age,
        marker_colors=['#edf8b1','#c7e9b4','#7fcdbb','#1d91c0','#225ea8','#253494'],
        textinfo='label+percent',
        ),
        row=1, col=1)

fig_age.add_trace(go.Bar(
        x=speed_crash_total_data.age,
        y=speed_crash_total_data.rate,
        opacity=1,
        marker_color=['#edf8b1','#c7e9b4','#7fcdbb','#1d91c0','#225ea8','#253494'],
        showlegend=False
        ),
        row=1, col=2)

fig_age.update_layout(title_text="<b>Road deaths number and rate if crashes happen by speed limit from 2012 to 2019 in Australia<b>",
                  plot_bgcolor='white',title_x=0.5)

fig_age.show()


Comment: A considerable number of road deaths (47%) can be attributed to driving at more than 100 km/hr. This group also has the highest possibility to die if crashes happen. The group of speed limit in 60-80 and 80-100 both account for around one fifths of total number of road deaths, while the death rate if crashes happen perform differently (3 % and 13% respectively.)

# Road Deaths Rate if Crashes Happen by Road User in Australia

In [97]:
data_user = read_file('ardd_fatalities.csv')

# set the user type
def set_users():
    return {'Driver':0, 'Rider':0, 'Passenger':0, 'Pedestrian':0}

# Calculate the road death for each user type
def add_user(user, classes):
    if str(user) == 'Driver': 
        classes['Driver'] += 1
    elif str(user) == 'Rider' or str(user) =='Pedal cyclist' or str(user) =='Motorcycle pillion passenger' or str(user) =='Motorcycle rider': 
        classes['Rider'] += 1 
    elif str(user) == 'Passenger': 
        classes['Passenger'] += 1 
    elif str(user) == 'Pedestrian':
        classes['Pedestrian'] += 1 
        
# Calculate for road death
users_total = set_users()
for user,year in zip(data_user["Road User"],data_user["Year"]):
    if 2019>=int(year)>=2012 and str(user) != "Other/-9":
        add_user(user, users_total)

# Calculate for road crash
data_crash_user = read_file('combined_casualty.csv')
users_crash_total = set_users()
for user in data_crash_user["Casualty Type"]:
        add_user(user, users_crash_total)

# Calculate for death rate if crash happen
ret = dict()
for key, dividend in users_total.items():
    ret[key] = round(dividend/users_crash_total.get(key, 1),3)

# Convert to dictionary
re_dic = {"user":[], "crashes":[], "deaths":[], "rate":[]}
for key,value in users_total.items():
    re_dic['user'].append(key)
    re_dic['crashes'].append(users_crash_total[key])
    re_dic['deaths'].append(value)
    re_dic["rate"].append(ret[key])

# Convert to csv
users_crash_total_data = pd.DataFrame(data=re_dic)
users_crash_total_data.head()

Unnamed: 0,user,crashes,deaths,rate
0,Driver,31462,4540,0.144
1,Rider,8387,1993,0.238
2,Passenger,10901,1795,0.165
3,Pedestrian,2505,1320,0.527


In [98]:
# Draw the graph
fig_user = make_subplots(
    rows=1, cols=2,    
    specs=[[{"type": "domain"},{"type": "xy"}]],
    subplot_titles=("Total number of road deaths", "Deaths rate if crashes happen")
)

fig_user.add_trace(go.Pie(
        values=users_crash_total_data.deaths,
        labels=users_crash_total_data.user,
        marker_colors=['#c7e9b4','#7fcdbb','#1d91c0','#225ea8'],
        textinfo='label+percent',
        ),
        row=1, col=1)

fig_user.add_trace(go.Bar(
        x=users_crash_total_data.user,
        y=users_crash_total_data.rate,
        opacity=1,
        marker_color=['#c7e9b4','#7fcdbb','#1d91c0','#225ea8'],
        showlegend=False
        ),
        row=1, col=2)

fig_user.update_layout(title_text="<b>Road deaths number and rate if crashes happen by road user from 2012 to 2019 in Australia<b>",
                  plot_bgcolor='white',title_x=0.5)

fig_user.show()

Comment:Although drivers account for the highest proportion of total road deaths numbers in Australia(around 50%), they have the least possibility to die if they are involved in a crash. It can be attributed to the safety designs in vehicles such as airbag and seat belt, compared with pedestrians and riders with nothing to protect themselves. By contrast, pedestrains are the most vulnerable in road accidents, with about 50% to die if they are involved in the crash.  

In [159]:
weekdays_driver_total = set_weekday()
for dayweek, time, year, user in zip(data_weekdays['Dayweek'], data_weekdays['Time'],data_weekdays['Year'],data_weekdays['Road User']):
    if str(dayweek) != "-9" and str(time)!= "-9" and str(year) != "2020" and str(user) == "Driver":
        index = find_zone(time)
        weekdays_driver_total[dayweek][index] += 1

weekdays_passenger_total = set_weekday()
for dayweek, time, year, user in zip(data_weekdays['Dayweek'], data_weekdays['Time'],data_weekdays['Year'],data_weekdays['Road User']):
    if str(dayweek) != "-9" and str(time)!= "-9" and str(year) != "2020" and str(user) == "Passenger":
        index = find_zone(time)
        weekdays_passenger_total[dayweek][index] += 1
        
weekdays_rider_total = set_weekday()
for dayweek, time, year, user in zip(data_weekdays['Dayweek'], data_weekdays['Time'],data_weekdays['Year'],data_weekdays['Road User']):
    if str(dayweek) != "-9" and str(time)!= "-9" and str(year) != "2020" :
        if str(user) == "Rider" or str(user) == "Pedal cyclist" or str(user) == "Motorcycle pillion passenger" or str(user) == "Motorcycle rider" :
            index = find_zone(time)
            weekdays_rider_total[dayweek][index] += 1

weekdays_pedestrian_total = set_weekday()
for dayweek, time, year, user in zip(data_weekdays['Dayweek'], data_weekdays['Time'],data_weekdays['Year'],data_weekdays['Road User']):
    if str(dayweek) != "-9" and str(time)!= "-9" and str(year) != "2020" and str(user) == "Pedestrian":
        index = find_zone(time)
        weekdays_pedestrian_total[dayweek][index] += 1
        
def user_dic(weekdays_user_total):
    re_dic = {'weekday':[], 'time':[], 'number':[]}
    for key,value in weekdays_user_total.items():
        for key2,value2 in value.items(): 
            re_dic['weekday'].append(key)
            re_dic['time'].append(key2)
            re_dic['number'].append(value2)
    return (re_dic)

weekday_driver_data = pd.DataFrame(data=user_dic(weekdays_driver_total))
weekday_driver_data.head()

weekday_passenger_data = pd.DataFrame(data=user_dic(weekdays_passenger_total))
weekday_passenger_data.head()

weekday_rider_data = pd.DataFrame(data=user_dic(weekdays_rider_total))
weekday_rider_data.head()

weekday_pedestrian_data = pd.DataFrame(data=user_dic(weekdays_pedestrian_total))
weekday_pedestrian_data.head()


Unnamed: 0,weekday,time,number
0,Monday,00:00-02:59,44
1,Monday,03:00-05:59,27
2,Monday,06:00-08:59,96
3,Monday,09:00-11:59,140
4,Monday,12:00-14:59,91


In [168]:
fig_driver_time = px.density_heatmap(weekday_driver_data, 
                         x="time", y="weekday", z='number',
                         color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'])

fig_driver_time.update_layout(
    title_text="<b>Road deaths distribution for driver by weekday and time in Australia<b>", 
    plot_bgcolor='white', xaxis_title_text="<b>Time<b>",yaxis_title_text="<b>Weekday<b>",title_x=0.5
)


fig_passenger_time = px.density_heatmap(weekday_passenger_data, 
                         x="time", y="weekday", z='number',
                         color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'])

fig_passenger_time.update_layout(
    title_text="<b>Road deaths distribution for passenger by weekday and time in Australia<b>", 
    plot_bgcolor='white', xaxis_title_text="<b>Time<b>",yaxis_title_text="<b>Weekday<b>",title_x=0.5
)


fig_rider_time = px.density_heatmap(weekday_rider_data, 
                         x="time", y="weekday", z='number',
                         color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'])

fig_rider_time.update_layout(
    title_text="<b>Road deaths distribution for rider by weekday and time in Australia<b>", 
    plot_bgcolor='white', xaxis_title_text="<b>Time<b>", yaxis_title_text="<b>Weekday<b>",title_x=0.5
)


fig_pedestrian_time = px.density_heatmap(weekday_pedestrian_data, 
                         x="time", y="weekday", z='number',
                         color_continuous_scale=['#edf8b1','#c7e9b4','#7fcdbb','#41b6c4','#1d91c0','#225ea8','#253494','#081d58'])

fig_pedestrian_time.update_layout(
    title_text="<b>Road deaths distribution for pedestrian by weekday and time in Australia<b>", 
    plot_bgcolor='white', xaxis_title_text="<b>Time<b>", yaxis_title_text="<b>Weekday<b>",title_x=0.5
)

fig_driver_time.show()
fig_pedestrian_time.show()
fig_passenger_time.show()
fig_rider_time.show()


Comment: The road deaths distributions for driver, pedestrian and passenger are similar, with the peak time of 9pm to 12pm on Friday and Saturaday, and low peak of 12pm to 6am from Monday to Thursday. However, rider's distribution is different to the above, the region with high road deaths numbers is from 9am to 6pm at weekends. 

# Road Deaths by age group and gender in Australia

In [296]:
# Count the total_death for each states
data_age = read_file('ardd_fatalities.csv')
# def set_ages():
#     return {'<20':0, '20-40':0, '40-60':0, '60-80':0, '>=80':0}

# def add_age(age, classes):
#     if int(age) < 20: 
#         classes['<20'] += 1
#     elif 20 <= int(age) < 40:
#         classes['20-40'] += 1 
#     elif 40 <= int(age) < 60:
#         classes['40-60'] += 1 
#     elif 60 <= int(age) < 80:
# #         classes['60-80'] += 1 
# #     elif int(age) >= 80:
# #         classes['>=80'] += 1 


# age_grp = set_ages()

ages_total = {}
for age,gender,year in zip(data_age["Age"],data_age['Gender'],data_age['Year']):
    if str(year) != '2020' and str(gender) != '-9'and str(gender) != 'Unspecified' and str(age) != '-9':
        add_age(age, age_grp)
        if age not in ages_total:
            ages_total[age] = {}             
        if gender not in ages_total[age]: 
            ages_total[age][gender] = 1
        else: 
             ages_total[age][gender] += 1 
                
# ages_total_data = {}
# for age in ages_total: 
#     if not bool(ages_total_data):
#         ages_total_data["age"] = []
#         ages_total_data["gender"] = []
#         ages_total_data["number"] = []
        
#     for gender,values in ages_total[age].items():
#         ages_total_data["age"].append(age)
#         ages_total_data["gender"].append(gender)
#         ages_total_data["number"].append(values)

ages_distribution_data = pd.DataFrame(data=ages_total_data)
ages_distribution_data.head()

Unnamed: 0,age,gender,number
0,<20,Male,112
1,<20,Female,75
2,<20,Male,1436
3,<20,Female,469
4,20-40,Male,711


In [299]:
# ages_gender = ages_total_data.copy()

# index = 0
# for ages in ages_gender['age']:
#     if 0 <= int(ages) < 20: 
#         ages_gender['age'][index] = '<20'
#     elif 20 <= int(ages) < 40:
#         ages_gender['age'][index] = '20-40'
#     elif 40 <= int(ages) < 60:
#         ages_gender['age'][index] = '40-60'
#     elif 60 <= int(ages) < 80:
#         ages_gender['age'][index] = '60-80'
#     elif int(ages) >= 80:
#         ages_gender['age'][index] = '>80'
#     index += 1


# ages_gender_data = pd.DataFrame(data=ages_gender)
# ages_gender_data.head()

In [300]:
# fig_age_gender = px.sunburst(ages_gender_data, path=['gender','age'], values='number',
#                   color=ages_gender_data['gender'],
#                   color_discrete_map={'Male':'#4682B4', 'Female':'#CD5C5C'}) 
# fig_age_gender.update_layout(title="<b>Road Deaths by gender and age group in Australia<b>",title_x=0.5)
# fig_age_gender.show()

fig_age = px.histogram(ages_distribution_data, 
                   x="age", 
                   y="number", 
                   color='gender',  
                   marginal="box", 
                   barmode="group",
                   hover_data=ages_distribution_data.columns,
                   color_discrete_sequence=['#4682B4','#CD5C5C'],
                   opacity=0.9
                  )

fig_age.update_layout(
    title_text="<b>Road deaths by age group in Australia<b>", 
    plot_bgcolor='white', 
    xaxis_title_text="<b>Age<b>",
    yaxis_title_text="<b>Total number of road deaths<b>",
    legend_title_text="",
    legend_orientation="h",
    title_x=0.5,
    bargap=0.1, # gap between bars of adjacent location coordinates
    bargroupgap=0.1 # gap between bars of the same location coordinates
)

fig_age.show()

Comment: It can be seen that the number of male road deaths is much higher than the female one. However, bith age group has the similar statistical distribution, which means within each gender group, the possibility of dying from road accidents for each age group is almost the same. 

# Bus and truck involvement

In [285]:
data_death = read_file('ardd_fatalities.csv')

truck_total = {}
for casualty, year in zip(data_death["Heavy Rigid Truck Involvement"],data_death['Year']):
    if str(casualty) != "-9" and str(year) != "2020":
        if casualty not in truck_total:
            truck_total[casualty] = 1
        else:
            truck_total[casualty] += 1

re_dic = {"yes":[],"number":[]}
for key,value in truck_total.items():
    re_dic['yes'].append(key)
    re_dic["number"].append(value)
    
Heavy_truck_data = pd.DataFrame(data=re_dic)
Heavy_truck_data.head()


artruck_total = {}
for casualty, year in zip(data_death["Articulated Truck Involvement"],data_death['Year']):
    if str(casualty) != "-9" and str(year) != "2020":
        if casualty not in artruck_total:
            artruck_total[casualty] = 1
        else:
            artruck_total[casualty] += 1
            
re_dic = {"yes":[],"number":[]}
for key,value in artruck_total.items():
    re_dic['yes'].append(key)
    re_dic["number"].append(value)
    
Articulated_truck_data = pd.DataFrame(data=re_dic)
Articulated_truck_data.head()


bus_total = {}
for casualty, year in zip(data_death["Bus Involvement"],data_death['Year']):
    if str(casualty) != "-9" and str(year) != "2020":
        if casualty not in bus_total:
            bus_total[casualty] = 1
        else:
            bus_total[casualty] += 1

re_dic = {"yes":[],"number":[]}
for key,value in bus_total.items():
    re_dic['yes'].append(key)
    re_dic["number"].append(value)
    
Bus_data = pd.DataFrame(data=re_dic)
Bus_data.head()

Unnamed: 0,yes,number
0,No,49954
1,Yes,961


In [288]:
fig = make_subplots(rows=1, cols=3, specs=[[{'type':'domain'}, {'type':'domain'},{'type':'domain'}]],
                    subplot_titles=("Bus Involvement", "Heavy Rigid Truck Involvement","Articulated Truck Involvement"))
# '#edf8b1','#c7e9b4','#7fcdbb','#1d91c0','#225ea8','#253494'
fig.add_trace(go.Pie(
        values=Bus_data.number,
        labels=Bus_data.yes,
        marker_colors=['#c7e9b4','#253494'],),
              1, 1)
fig.add_trace(go.Pie(
        values=Heavy_truck_data.number,
        labels=Heavy_truck_data.yes,),
              1, 2)
fig.add_trace(go.Pie(
        values=Articulated_truck_data.number,
        labels=Articulated_truck_data.yes,),
              1, 3)

fig.update_layout(title_text="<b>Road deaths number if bus or truck is involved in Australia<b>",
                  legend_orientation="h",
                  plot_bgcolor='white',title_x=0.5)


fig.show()

Comment: The pie charts shows that articulated truck involvement account for 10% of total road deaths number. The effect of heavy truck and bus on road fatality is relatively small, with 5% and 2% respectively.

# Experiment

In [301]:
data_casualty = read_file('combined_crash.csv')

casualtys_total = {}

for casualty in data_casualty["Crash Type"]:
        if casualty not in casualtys_total:
            casualtys_total[casualty] = 1
        else:
            casualtys_total[casualty] += 1
            
print(casualtys_total)

re_dic = {"user":[],"number":[]}
for key,value in casualtys_total.items():
    re_dic['user'].append(key)
    re_dic["number"].append(value)
    
casualtys_data = pd.DataFrame(data=re_dic)
casualtys_data.head()

{'Right Angle': 24236, 'Rear End': 37218, 'Right Turn': 10083, 'Hit Fixed Object': 19365, 'Other': 739, 'Hit Animal': 2042, 'Hit Parked Vehicle': 9148, 'Roll Over': 5304, 'Hit Pedestrian': 2848, 'Side Swipe': 13111, 'Left Road - Out of Control': 999, 'Head On': 2086, 'Hit Object on Road': 493}


Unnamed: 0,user,number
0,Right Angle,24236
1,Rear End,37218
2,Right Turn,10083
3,Hit Fixed Object,19365
4,Other,739
