# World Morality Analysis: Building a Dashboard Using Plotly

Qintong Li

24/07/2023

---

In this projetc, Data set from WHO website: https://www.who.int/data/gho/data/themes/mortality-and-global-health-estimates. The analysis was focusing on Morality worldwide. The datasets were also selected for this purpose. 

- First section: all data cleaning steps are wrapped in a pipeline. 

- Second section: using `plotly` to plot all the relevant charts and having further analysis on the datasets.

- Last section: an example of putting all useful plots on a dashboard using plotly objects.

### ALL PLOTS ARE SAVED IN `Plots` FILE

Import necessary libraries and data.

In [1]:
import numpy as np
import pandas as pd

import sklearn as sk
import scipy

import seaborn as sns
import matplotlib.pyplot as plt
import plotly
import plotly.express as px

In [2]:
df_adult_raw = pd.read_csv("/Users/qintongli/new_py/Final_Project/data/Adult_Morality.csv")
df_child_raw = pd.read_csv("/Users/qintongli/new_py/Final_Project/data/Child_Morality.csv")
df_maternal_raw = pd.read_csv("/Users/qintongli/new_py/Final_Project/data/Maternal_Morality.csv")
df_road_raw = pd.read_csv("/Users/qintongli/new_py/Final_Project/data/Road_Traffic.csv")
df_road_male_raw = pd.read_csv("/Users/qintongli/new_py/Final_Project/data/road_male.csv")
df_road_female_raw = pd.read_csv("/Users/qintongli/new_py/Final_Project/data/road_female.csv")

Create a class to clean the data set.
- The Adult Morality Rate only ranges from 2000 to 2016. Analysing this dataset is the key aim of this project. Therefore, all the other datasets are capped with the same range.

In [3]:
class DataCleaningPipeline:
    def __init__(self,df):
        self.df_raw = df

    def drop_na_columns(self):
        df_adult_raw_c1 = self.df_raw.dropna(axis=1)
        df_new =  df_adult_raw_c1.dropna(axis=0)
        return df_new
    
    def select_range_by_years(self):
        if ('Dim1' in self.drop_na_columns()) is True:
            df_new_c1 = self.drop_na_columns().rename(columns={'Dim1':'Gender','Dim1ValueCode':'GenderCode','FactValueNumeric':'NumericValue'})
            df_cleaned = df_new_c1[(df_new_c1['Period']<2017) & (df_new_c1['Period']>1999)][['Indicator','ParentLocationCode','ParentLocation',
                                        'SpatialDimValueCode','Location','Period','GenderCode','Gender','NumericValue']]
        else:
            df_new_c1 = self.drop_na_columns().rename(columns={'FactValueNumeric':'NumericValue'})
            df_cleaned = df_new_c1[(df_new_c1['Period']<2017) & (df_new_c1['Period']>1999)][['Indicator','ParentLocationCode','ParentLocation',
                                        'SpatialDimValueCode','Location','Period','NumericValue']]
        return df_cleaned

Present all the cleaned datasets.

In [4]:
adult = DataCleaningPipeline(df_adult_raw)
df_adult = adult.select_range_by_years()
df_adult

Unnamed: 0,Indicator,ParentLocationCode,ParentLocation,SpatialDimValueCode,Location,Period,GenderCode,Gender,NumericValue
0,Adult mortality rate (probability of dying bet...,WPR,Western Pacific,TON,Tonga,2016,FMLE,Female,100.20
1,Adult mortality rate (probability of dying bet...,AMR,Americas,BRB,Barbados,2016,BTSX,Both sexes,100.40
2,Adult mortality rate (probability of dying bet...,EUR,Europe,KGZ,Kyrgyzstan,2016,FMLE,Female,100.50
3,Adult mortality rate (probability of dying bet...,AMR,Americas,SLV,El Salvador,2016,FMLE,Female,102.50
4,Adult mortality rate (probability of dying bet...,EUR,Europe,SRB,Serbia,2016,BTSX,Both sexes,102.50
...,...,...,...,...,...,...,...,...,...
9328,Adult mortality rate (probability of dying bet...,EUR,Europe,BEL,Belgium,2000,BTSX,Both sexes,99.35
9329,Adult mortality rate (probability of dying bet...,SEAR,South-East Asia,LKA,Sri Lanka,2000,FMLE,Female,99.39
9330,Adult mortality rate (probability of dying bet...,WPR,Western Pacific,MYS,Malaysia,2000,FMLE,Female,99.79
9331,Adult mortality rate (probability of dying bet...,EUR,Europe,DNK,Denmark,2000,BTSX,Both sexes,99.82


In [5]:
child = DataCleaningPipeline(df_child_raw)
df_child = child.select_range_by_years()
df_child

Unnamed: 0,Indicator,ParentLocationCode,ParentLocation,SpatialDimValueCode,Location,Period,GenderCode,Gender,NumericValue
2985,Under-five mortality rate (probability of dyin...,EUR,Europe,SMR,San Marino,2016,FMLE,Female,1.91
2986,Under-five mortality rate (probability of dyin...,EMR,Eastern Mediterranean,OMN,Oman,2016,FMLE,Female,10.01
2987,Under-five mortality rate (probability of dyin...,EUR,Europe,UKR,Ukraine,2016,MLE,Male,10.03
2988,Under-five mortality rate (probability of dyin...,EUR,Europe,MKD,The former Yugoslav Republic of Macedonia,2016,FMLE,Female,10.11
2989,Under-five mortality rate (probability of dyin...,EUR,Europe,ALB,Albania,2016,MLE,Male,10.11
...,...,...,...,...,...,...,...,...,...
13129,Under-five mortality rate (probability of dyin...,AFR,Africa,MRT,Mauritania,2000,BTSX,Both sexes,98.77
13130,Under-five mortality rate (probability of dyin...,AFR,Africa,KEN,Kenya,2000,BTSX,Both sexes,98.82
13131,Under-five mortality rate (probability of dyin...,WPR,Western Pacific,LAO,Lao People's Democratic Republic,2000,FMLE,Female,98.94
13132,Under-five mortality rate (probability of dyin...,AFR,Africa,LSO,Lesotho,2000,FMLE,Female,99.42


In [6]:
maternal = DataCleaningPipeline(df_maternal_raw)
df_maternal = maternal.select_range_by_years()
df_maternal[['Gender','GenderCode']] = ['Female','FMLE']
df_maternal

Unnamed: 0,Indicator,ParentLocationCode,ParentLocation,SpatialDimValueCode,Location,Period,NumericValue,Gender,GenderCode
1480,Maternal mortality ratio (per 100 000 live bir...,EUR,Europe,BLR,Belarus,2016,1.24,Female,FMLE
1481,Maternal mortality ratio (per 100 000 live bir...,EUR,Europe,UKR,Ukraine,2016,10.31,Female,FMLE
1482,Maternal mortality ratio (per 100 000 live bir...,AMR,Americas,DOM,Dominican Republic,2016,101.90,Female,FMLE
1483,Maternal mortality ratio (per 100 000 live bir...,AMR,Americas,GTM,Guatemala,2016,103.10,Female,FMLE
1484,Maternal mortality ratio (per 100 000 live bir...,AMR,Americas,SUR,Suriname,2016,105.20,Female,FMLE
...,...,...,...,...,...,...,...,...,...
7765,Number of maternal deaths,EUR,Europe,KGZ,Kyrgyzstan,2000,92.94,Female,FMLE
7766,Number of maternal deaths,AFR,Africa,TGO,Togo,2000,929.90,Female,FMLE
7767,Number of maternal deaths,AFR,Africa,DZA,Algeria,2000,958.20,Female,FMLE
7768,Number of maternal deaths,EMR,Eastern Mediterranean,JOR,Jordan,2000,96.51,Female,FMLE


In [7]:
road_bs = DataCleaningPipeline(df_road_raw)
df_road_bs = road_bs.select_range_by_years()

road_male = DataCleaningPipeline(df_road_male_raw)
df_road_male = road_male.select_range_by_years()

road_female = DataCleaningPipeline(df_road_female_raw)
df_road_female = road_female.select_range_by_years()

df_road = pd.concat([df_road_bs,df_road_male,df_road_female])
df_road

Unnamed: 0,Indicator,ParentLocationCode,ParentLocation,SpatialDimValueCode,Location,Period,GenderCode,Gender,NumericValue
549,Estimated number of road traffic deaths,AMR,Americas,ATG,Antigua and Barbuda,2016,BTSX,Both sexes,0.00
550,Estimated number of road traffic deaths,AMR,Americas,GRD,Grenada,2016,BTSX,Both sexes,10.00
551,Estimated number of road traffic deaths,EMR,Eastern Mediterranean,SDN,Sudan,2016,BTSX,Both sexes,10178.00
552,Estimated number of road traffic deaths,SEAR,South-East Asia,MMR,Myanmar,2016,BTSX,Both sexes,10540.00
553,Estimated number of road traffic deaths,AMR,Americas,VEN,Venezuela (Bolivarian Republic of),2016,BTSX,Both sexes,10640.00
...,...,...,...,...,...,...,...,...,...
3655,Estimated road traffic death rate (per 100 000...,SEAR,South-East Asia,NPL,Nepal,2000,FMLE,Female,9.43
3656,Estimated road traffic death rate (per 100 000...,WPR,Western Pacific,MYS,Malaysia,2000,FMLE,Female,9.57
3657,Estimated road traffic death rate (per 100 000...,WPR,Western Pacific,FSM,Micronesia (Federated States of),2000,FMLE,Female,9.88
3658,Estimated road traffic death rate (per 100 000...,AMR,Americas,GUY,Guyana,2000,FMLE,Female,9.89


Bar charts comparing the Adult morality and all the possible causes over years.

In [15]:
import plotly.graph_objs as go
from plotly.subplots import make_subplots


df_adult_male = df_adult[df_adult['Gender'] == 'Male'].groupby('Period')['NumericValue'].sum()
df_adult_female = df_adult[df_adult['Gender'] == 'Female'].groupby('Period')['NumericValue'].sum()
df_child_male = df_child[df_child['Gender'] == 'Male'].groupby('Period')['NumericValue'].sum()
df_child_female = df_child[df_child['Gender'] == 'Female'].groupby('Period')['NumericValue'].sum()
df_road_male = df_road[df_road['Gender'] == 'Male'].groupby('Period')['NumericValue'].sum()
df_road_female = df_road[df_road['Gender'] == 'Female'].groupby('Period')['NumericValue'].sum()
df_line = df_adult.groupby('Period')['NumericValue'].sum()

colors = ['steelblue','firebrick']


trace1 = go.Bar(
    x=df_adult_female.index,
    y=df_adult_female.values,
    name='Female',
    marker_color=colors[0]
)

trace2 = go.Bar(
    x=df_adult_male.index,
    y=df_adult_male.values,
    name='Male',
    marker_color=colors[1]
)

trace3 = go.Bar(
    x=df_child_female.index,
    y=df_child_female.values,
    name='Female',
    marker_color=colors[0]
)

trace4 = go.Bar(
    x=df_child_male.index,
    y=df_child_male.values,
    name='Male',
    marker_color=colors[1]
)

trace5 = go.Bar(
    x=df_road_female.index,
    y=df_road_female.values,
    name='Female',
    marker_color=colors[0]
)

trace6 = go.Bar(
    x=df_road_male.index,
    y=df_road_male.values,
    name='Male',
    marker_color=colors[1]
)

trace7 = go.Scatter(
    x=df_line.index,
    y=df_line.values,
    name='Both Gender')


fig_1 = make_subplots(rows=4, cols=1,
                          subplot_titles=('Comparison for Adult Morality Between Gender', 
                                          'Comparison for Child Morality Between Gender',
                                          'Comparison for Road Traffic Morality Between Gender',
                                         'Trend of Adult Morality over Years'))


fig_1.append_trace(trace1, 1,1)
fig_1.append_trace(trace2, 1,1)
fig_1.append_trace(trace3, 2, 1)
fig_1.append_trace(trace4,2,1)
fig_1.append_trace(trace5,3,1)
fig_1.append_trace(trace6,3,1)
fig_1.append_trace(trace7,4,1)


# fig.update_xaxes(title_text="Year", row=1, col=1)
# fig.update_xaxes(title_text="Year", row=2, col=1)
# fig.update_xaxes(title_text="Year", row=3, col=1)
fig_1.update_xaxes(title_text="Year", row=4, col=1)

fig_1.update_yaxes(title_text="Adult Morality", row=1, col=1)
fig_1.update_yaxes(title_text="Child Morality", row=2, col=1)
fig_1.update_yaxes(title_text="Road Traffic Morality", row=3, col=1)
fig_1.update_yaxes(title_text="Adult Morality", row=4, col=1)

fig_1.update_layout(title = 'Morality Comparison between Genders by Bar Charts and Adult Morality Rate using Line Chart',
                    height = 1000,width=1000)


fig_1.show()

World maps for geo-analysis on Adult Morality rates over years. First graph is a comparison between 2000 and 2016. Second one is for all the maps over 2000 to 2016.

In [9]:
df_geo_2000 = df_adult[(df_adult['Period']==2000)&(df_adult['Gender']=='Both sexes')]
df_geo_2016 = df_adult[(df_adult['Period']==2016)&(df_adult['Gender']=='Both sexes')]

rows = 1
cols = 2
fig_geo = make_subplots(
    rows=rows, cols=cols,
    specs = [[{'type': 'choropleth'} for c in np.arange(cols)] for r in np.arange(rows)],
                        )


fig_geo.add_trace(go.Choropleth(
    locations=df_geo_2000['Location'],     
    locationmode='country names', 
    z=df_geo_2000['NumericValue'],             
    zmin=df_geo_2000['NumericValue'].min(),
    zmax=df_geo_2016['NumericValue'].max(),
    colorscale='bluered',         
    text=df_geo_2000['Location'],              
    colorbar_title='Adult Morality',
    
    hoverinfo='location+z',
    
),row = 1,col = 1    )



fig_geo.add_trace(go.Choropleth(
    locations=df_geo_2016['Location'],    
    locationmode='country names', 
    z=df_geo_2016['NumericValue'],            
    zmin=df_geo_2000['NumericValue'].min(),
    zmax=df_geo_2016['NumericValue'].max(),
    colorscale='bluered',         
    text=df_geo_2016['Location'],              
    colorbar_title='Adult Morality',
    hoverinfo='location+z',
),row = 1,col = 2)



fig_geo.update_geos(fitbounds="locations",
                visible=False,
                )

fig_geo.update_layout(
    title='Fig.2. Geographic Analysis on World Adult Morality 2000 vs 2016',
    title_x=0.5
#     geo=dict(
#         projection_type='natural earth',   # Choose the map projection
#     )
)

fig_geo.show()

In [10]:
years = sorted(set(df_adult['Period'].value_counts().index))

rows = 6
cols = 3
fig_geo_2 = make_subplots(
    rows=rows, cols=cols,
    specs = [[{'type': 'choropleth'} for c in np.arange(cols)] for r in np.arange(rows)],\
            horizontal_spacing = 0.05,
            vertical_spacing = 0.01,
             subplot_titles = years)
    


for i in range(len(years)):
    result = df_adult[(df_adult['Gender']=='Both sexes')&(df_adult['Period']==years[i])]
    fig_geo_2.add_trace(go.Choropleth(
        locations=result.Location,
        locationmode = 'country names', 
        z = result.NumericValue,
        zmin = df_adult['NumericValue'].min(),
        zmax = df_adult['NumericValue'].max(),
        colorbar_title = "Adult Morality",
        colorscale='bluered'
    ), row = i//cols+1, col = i%cols+1)


fig_geo_2.update_geos(fitbounds="locations",
                visible=False,
                )

fig_geo_2.update_layout(
    title_text = 'Fig.3. Geographic Analysis on World Adult Morality from 2000 to 2016',
    title_x=0.5,
    autosize = False,
    width=1000,
    height=1000,
    margin=dict(
        l=50,
        r=50,
        b=100,
        t=100,
        pad=4
    ),
    )

# for index, trace in enumerate(fig.data):
#     fig.data[index].hovertemplate = 'State: %{location}<br>Shooting deaths: %{z:.2f}<extra></extra>'
    
fig_geo_2.show()

Start building the dashboard:

In [11]:
from dash import Dash, html, dash_table, dcc

# Initialize the app
app = Dash(__name__)

In [12]:
# App layout

app.layout = html.Div([
    html.Div([
        html.H1("Figure 1"),
        dcc.Graph(figure=fig_1)
            ]),
    html.Div([
        html.H2("Figure 2"),
        dcc.Graph(figure=fig_geo)
            ]),
    html.Div([
        html.H3("Figure 3"),
        dcc.Graph(figure=fig_geo_2)
            ])
])


In [13]:
# Run the app
if __name__ == '__main__':
    app.run(port = 9001, debug=True)

In [14]:
# df_bub = df_adult.groupby('ParentLocation')['NumericValue'].sum()

# fig_4 = px.scatter(x=df_bub.index, y=df_bub.values,
#        size=df_bub.values, color=df_bub.index,
#                   size_max=130,text = df_bub.values)

# fig_4.update_layout(title = 'Fig.2 Adult Morality Analysis by Continents(2000-2016)',
#                   yaxis_title = 'Adult Morality',
#                  xaxis_title = 'Continents')

# fig_4.show()