In [1]:
# Required installations for Google Colab or local environments
!pip install plotly dash dash-bootstrap-components

Collecting dash
  Downloading dash-2.18.1-py3-none-any.whl.metadata (10 kB)
Collecting dash-bootstrap-components
  Downloading dash_bootstrap_components-1.6.0-py3-none-any.whl.metadata (5.2 kB)
Collecting dash-html-components==2.0.0 (from dash)
  Downloading dash_html_components-2.0.0-py3-none-any.whl.metadata (3.8 kB)
Collecting dash-core-components==2.0.0 (from dash)
  Downloading dash_core_components-2.0.0-py3-none-any.whl.metadata (2.9 kB)
Collecting dash-table==5.0.0 (from dash)
  Downloading dash_table-5.0.0-py3-none-any.whl.metadata (2.4 kB)
Collecting retrying (from dash)
  Downloading retrying-1.3.4-py3-none-any.whl.metadata (6.9 kB)
Downloading dash-2.18.1-py3-none-any.whl (7.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dash_core_components-2.0.0-py3-none-any.whl (3.8 kB)
Downloading dash_html_components-2.0.0-py3-none-any.whl (4.1 kB)
Downloading dash_table-5.0.0-py3-none-any.whl 

In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from dash import Dash, dcc, html
import dash_bootstrap_components as dbc

Once you run the below cell, 'choose files' button will get enabled. Please upload the cleaned_life_expectancy_data.xlsx(you can find the file in GitHub). Move to the next cell, once file uploaded.

In [3]:
from google.colab import files
uploaded = files.upload()

Saving cleaned_life_expectancy_data.xlsx to cleaned_life_expectancy_data.xlsx


In [4]:
data = pd.read_excel('/content/cleaned_life_expectancy_data.xlsx')

  warn(msg)


In [5]:
# Data preprocessing for the plots
grouped_data = data.groupby('Region').agg({
    'Life Expectancy World Bank': 'sum',
    'Unemployment': 'sum'
}).reset_index()
ordered_columns = ['Sanitation', 'Prevelance of Undernourishment', 'Health Expenditure %',
                   'Life Expectancy World Bank', 'CO2', 'Education Expenditure %',
                   'Unemployment', 'Injuries', 'Communicable', 'NonCommunicable']
data_selected = data[ordered_columns]

In [15]:
# Functions to create the responsive Plotly graphs
# 1. Function to create a world map for Life Expectancy by Country
def world_map_life_expectancy(data):
    fig = px.choropleth(
        data_frame=data,
        locations='Country Code',
        color='Life Expectancy World Bank',
        hover_name='Country Name',
        color_continuous_scale=px.colors.sequential.Viridis,
        title='Life Expectancy Across the World',
        labels={'Life Expectancy World Bank': 'Life Expectancy (Years)'},
        projection='natural earth'
    )

    fig.update_layout(
        template='plotly_white',
        height=600,
        width=1000,
        title_font_size=22,
        title_x=0.5,
        font_family="Arial, sans-serif",
        font_color="blue",
        margin={"r":0, "t":50, "l":0, "b":0},
        coloraxis_colorbar=dict(
            title="Life Expectancy (Years)",
            tickvals=[50, 60, 70, 80],
            len=0.7,
            thickness=10,
            tickfont=dict(size=12),
            title_font=dict(size=14)
        )
    )

    fig.update_geos(
        showcountries=True,
        countrycolor="grey",
        projection_scale=1
    )

    return fig

# 2. Life Expectancy Histogram (converted to Plotly)
def life_expectancy_histogram(data):
    fig = px.histogram(
        data,
        x='Life Expectancy World Bank',
        nbins=30,
        title='Distribution of Life Expectancy',
        labels={'x': 'Life Expectancy (Years)', 'y': 'Count'},
        color_discrete_sequence=[px.colors.sequential.Viridis[4]]
    )

    fig.update_layout(
        template='plotly_white',
        height=600,
        title_font_size=22,
        xaxis_title='Life Expectancy (Years)',
        yaxis_title='Frequency',
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        font_family="Arial, sans-serif",
        font_color="blue",
        bargap=0.1,
        yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='LightGrey'),
        title_x=0.5
    )

    return fig

# 3. Function for Correlation Heatmap
def correlation_heatmap_plotly(data):
    corr_matrix = data.corr()
    life_expectancy_corr = corr_matrix['Life Expectancy World Bank'].drop('Life Expectancy World Bank')
    strongest_positive = life_expectancy_corr.idxmax()
    strongest_negative = life_expectancy_corr.idxmin()

    fig = go.Figure(data=go.Heatmap(
        z=corr_matrix.values,
        x=corr_matrix.columns,
        y=corr_matrix.index,
        colorscale=px.colors.sequential.RdBu,
        zmin=-1, zmax=1,
        colorbar=dict(title='Correlation', tickvals=[-1, -0.5, 0, 0.5, 1]),
        text=corr_matrix.values,
        texttemplate='%{text:.2f}',
        textfont={"size": 10}
    ))
    fig.add_annotation(
        x=strongest_positive,
        y='Life Expectancy World Bank',
        text='Strongest Positive',
        showarrow=True,
        arrowhead=2,
        ax=70, ay=-40,
        bordercolor='green',
        borderwidth=2,
        bgcolor="lightgreen"
    )

    fig.add_annotation(
        x=strongest_negative,
        y='Life Expectancy World Bank',
        text='Strongest Negative',
        showarrow=True,
        arrowhead=2,
        ax=70, ay=40,
        bordercolor='red',
        borderwidth=2,
        bgcolor="lightpink"
    )

    fig.update_layout(
        title='Correlation Heatmap of Socio-Economic Factors and Life Expectancy',
        xaxis_title='Factors',
        yaxis_title='Factors',
        height=600,
        title_x=0.5,
        title_font_size=22,
        font_family="Arial, sans-serif",
        font_color="blue",
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        template='plotly_white',
        margin={"r": 20, "t": 80, "l": 20, "b": 60}
    )

    return fig

# 4. Function for Hexbin Plot: Life Expectancy vs. Sanitation
def hexbin_plot_life_expectancy_sanitation(data):
    fig = go.Figure(go.Histogram2dContour(
        x=data['Sanitation'],
        y=data['Life Expectancy World Bank'],
        colorscale='Blues',
        contours_coloring='fill',
        showscale=True,
        opacity=0.8
    ))

    trendline = np.polyfit(data['Sanitation'], data['Life Expectancy World Bank'], 1)
    trendline_fn = np.poly1d(trendline)
    fig.add_trace(go.Scatter(
        x=data['Sanitation'],
        y=trendline_fn(data['Sanitation']),
        mode='lines',
        line=dict(color='darkblue', width=2),
        name='Trend Line'
    ))

    fig.update_layout(
        title='Life Expectancy vs. Sanitation',
        xaxis_title='Sanitation',
        yaxis_title='Life Expectancy (Years)',
        template='plotly_white',
        height=600,
        font_family="Arial, sans-serif",
        font_color="blue",
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        title_font_size=22,
        title_x=0.5,
        yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='LightGrey'),
        coloraxis_colorbar=dict(
            thickness=10,
            title='Density',
            tickvals=[0, 50, 100, 150, 200, 250],
            title_font=dict(size=12),
            tickfont=dict(size=10)
        )
    )

    return fig

# 5. Function for Bar Chart: Average Sanitation and Life Expectancy by Region
def bar_chart_sanitation_life_expectancy_region(data):
    agg_data = data.groupby('Region')[['Sanitation', 'Life Expectancy World Bank']].mean().reset_index()
    agg_data = agg_data.sort_values(by='Life Expectancy World Bank', ascending=False)

    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=agg_data['Region'],
        y=agg_data['Sanitation'],
        name='Sanitation (%)',
        marker_color=px.colors.sequential.Viridis[2]
    ))
    fig.add_trace(go.Bar(
        x=agg_data['Region'],
        y=agg_data['Life Expectancy World Bank'],
        name='Life Expectancy (Years)',
        marker_color=px.colors.sequential.Viridis[4]
    ))

    fig.update_layout(
        title='Average Sanitation and Life Expectancy by Region (Sorted by Life Expectancy)',
        xaxis_title='Region',
        yaxis_title='Average Values',
        barmode='group',
        template='plotly_white',
        height=600,
        font=dict(family="Arial, sans-serif", color="blue"),
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        title_font_size=22,
        title_x=0.5
    )

    return fig

# 6. Function for Scatter Plot: Life Expectancy vs. Prevalence of Undernourishment
def scatter_plot_undernourishment_life_expectancy(data):
    fig = px.scatter(
        data,
        x='Prevelance of Undernourishment',
        y='Life Expectancy World Bank',
        trendline='ols',
        title='Life Expectancy vs. Prevalence of Undernourishment',
        labels={
            'Prevelance of Undernourishment': 'Prevalence of Undernourishment (%)',
            'Life Expectancy World Bank': 'Life Expectancy (Years)'
        },
        color='Region',
        hover_name='Country Name',
        color_discrete_sequence=px.colors.qualitative.Set2
    )

    fig.update_layout(
        template='plotly_white',
        height=600,
        title_font_size=22,
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        font_family="Arial, sans-serif",
        font_color="blue",
        title_x=0.5,
        xaxis=dict(tickangle=0, showgrid=True, gridwidth=0.5, gridcolor='LightGrey'),
        yaxis=dict(showgrid=True, gridwidth=0.5, gridcolor='LightGrey')
    )

    return fig

# 7. Box plot for Life Expectancy by Region
def box_plot_life_expectancy(data):
    fig = px.box(
        data,
        x='Region',
        y='Life Expectancy World Bank',
        title='Life Expectancy by Region',
        color='Region',
        color_discrete_sequence=px.colors.qualitative.Set2
        # color_discrete_sequence=px.colors.sequential.Viridis
    )

    fig.update_traces(marker=dict(size=6),
                      boxmean=True,
                      width=0.5)

    fig.update_layout(
        template='plotly_white',
        height=600,
        title_font_size=22,
        font_family="Arial, sans-serif",
        font_color="blue",
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        yaxis_title='Life Expectancy (Years)',
        xaxis_title='Region',
        showlegend=True,
        boxmode='group'
    )

    return fig

# 8. Time Series Plot for Life Expectancy Over Time by Region
def time_series_life_expectancy(data):
    fig = px.line(
        data,
        x='Year',
        y='Life Expectancy World Bank',
        color='Region',
        title='Life Expectancy Over Time by Region',
        labels={'Life Expectancy World Bank': 'Life Expectancy (Years)', 'Year': 'Year'},
        color_discrete_sequence=px.colors.qualitative.Set2,
        line_shape='linear'
    )
    fig.update_traces(line=dict(width=3), marker=dict(size=8))

    fig.update_layout(
        template='plotly_white',
        height=600,
        title_font_size=22,
        font_family="Arial, sans-serif",
        font_color="blue",
        xaxis_title_font=dict(size=16, color='blue'),
        yaxis_title_font=dict(size=16, color='blue'),
        xaxis=dict(tickangle=-45),
        yaxis=dict(showgrid=True, gridcolor='LightGrey'),
        showlegend=True,
        legend=dict(x=1.05, y=1, title_font_color="blue"),
        margin=dict(l=40, r=40, t=60, b=40)
    )

    return fig

In [16]:
# Creating the Dash App
app = Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])

# Define the layout of the app with the graphs embedded
app.layout = dbc.Container([
    html.H1("Uncovering the Impact of Socio-Economic Factors on Life Expectancy", style={'textAlign': 'center', 'padding': '20px'}),

    # 1. Life Expectancy World Map
    dbc.Row([
        dbc.Col(dcc.Graph(id='world-map-lifexp', figure=world_map_life_expectancy(data)), width="auto")
    ], justify="center", style={'padding': '20px'}),

    # 2. Histogram for Life Expectancy
    dbc.Row([
        dbc.Col(dcc.Graph(id='life-expectancy-histogram', figure=life_expectancy_histogram(data)), width=12)
    ], style={'padding': '20px'}),

    # 3. Correlation Heatmap
    dbc.Row([
        dbc.Col(dcc.Graph(id='correlation-heatmap', figure=correlation_heatmap_plotly(data_selected)), width=12)
    ], style={'padding': '20px'}),

    # 4. Hexbin Plot: Life Expectancy vs. Sanitation
    dbc.Row([
        dbc.Col(dcc.Graph(id='hexbin-plot-lifexp-sanit', figure=hexbin_plot_life_expectancy_sanitation(data_selected)), width=12)
    ], style={'padding': '20px'}),

    # 5. Bar Chart for Sanitation and Life Expectancy by Region
    dbc.Row([
        dbc.Col(dcc.Graph(id='bar-chart-sanit-lifexp', figure=bar_chart_sanitation_life_expectancy_region(data)), width=12)
    ], style={'padding': '20px'}),

    # 6. Scatter Plot for Undernourishment vs Life Expectancy
    dbc.Row([
        dbc.Col(dcc.Graph(id='scatter-plot-undernourishment', figure=scatter_plot_undernourishment_life_expectancy(data)), width=12)
    ], style={'padding': '20px'}),

    # 7. Box Plot for Life Expectancy by Region
    dbc.Row([
        dbc.Col(dcc.Graph(id='box-plot-lifexp-region', figure=box_plot_life_expectancy(data)), width=12)
    ], style={'padding': '20px'}),

    # 8. Time Series Plot for Life Expectancy Over Time by Region
    dbc.Row([
        dbc.Col(dcc.Graph(id='time-series-lifexp-region', figure=time_series_life_expectancy(data)), width=12)
    ], style={'padding': '20px'})
    ], fluid=True)

In [17]:
# Running the Dash App
if __name__ == '__main__':
    app.run_server(debug=True)

<IPython.core.display.Javascript object>

### 1. **Life Expectancy Across the World**
The choropleth map is an effective tool for visualizing the geographic distribution of life expectancy across the world. By shading countries according to their average life expectancy, the map allows for clear and direct comparison across regions. The natural earth projection was selected to provide a familiar and easy-to-interpret view of the world while minimizing distortion at the poles.

The **Viridis** color scale was chosen for its balance between aesthetics and functionality. Its gradient from dark purple (representing lower life expectancy) to bright yellow (representing higher life expectancy) provides a clear visual distinction between different life expectancy ranges. This color scheme is also colorblind-friendly, ensuring accessibility for a wider audience.

Adjustments were made to optimize the visual balance of the map. The size of the globe was reduced using the projection scale to prevent the map from overwhelming the viewer, while margins were added to create breathing room around the map. The color bar was resized to take up less space, keeping the focus on the map itself while still providing clear reference for the life expectancy values.



### 2. **Histogram: Distribution of Life Expectancy**

This histogram illustrates the distribution of life expectancy values across different countries, offering a clear visual representation of the frequency of each life expectancy range. The majority of countries fall within the 60-80 year range, as indicated by the concentration of higher bars in this area.

To maintain consistency with the dashboard's visual theme, the **Viridis** color palette was applied, specifically using a shade that complements the rest of the visualizations. This ensures that the histogram aligns with the overall design while remaining easy to interpret.

The choice of **Arial** font and **dark blue** color for the x-axis and y-axis labels ensures clarity and readability, aligning with the rest of the dashboard. Bars are slightly spaced using a small gap (`bargap`) to provide visual separation and improve clarity. The overall layout emphasizes simplicity, making the chart accessible and easy to read, while still adhering to the coherent aesthetic of the dashboard.


### 3. **Correlation Heatmap of Socio-Economic Factors and Life Expectancy**

This correlation heatmap visualizes the relationships between various socio-economic factors and life expectancy, providing a clear overview of how these factors interact. Each cell represents the correlation coefficient between two factors, with stronger positive correlations shown in darker blue and stronger negative correlations in darker red. This allows for easy identification of the most influential factors affecting life expectancy.

The strongest positive correlation is observed between **Sanitation** and **Life Expectancy**, marked for emphasis in green. In contrast, the strongest negative correlation is with **Prevalence of Undernourishment**, highlighted in red. These annotations serve to draw attention to the key relationships identified in the dataset.

The layout also ensures that the annotations are positioned clearly, minimizing visual clutter while still emphasizing important insights. The overall design maintains coherence with the dashboard, ensuring that the visualization is easy to interpret and visually harmonious with the rest of the dashboard elements.


### 4. **Hexbin Plot: Life Expectancy vs. Sanitation**

This hexbin plot is an ideal choice for visualizing the relationship between **Life Expectancy** and **Sanitation**. It effectively manages the overlap of data points by using hexagonal bins, allowing us to see areas of high data density clearly. The plot reveals a positive correlation between sanitation levels and life expectancy, represented by the upward trendline, which aligns with the findings from the earlier correlation heatmap.

This plot is a logical continuation of the data story after the heatmap, which identified sanitation as having the strongest positive correlation with life expectancy. Here, we explore that relationship in more detail, confirming the significance of sanitation as a key factor in increasing life expectancy. The plot provides a detailed visual confirmation of this correlation, reinforcing the insights presented earlier.

This plot is a crucial part of the data story, as it zooms in on a key relationship highlighted in the broader analysis, providing an in-depth exploration of how sanitation impacts life expectancy.


### 5. **Bar Chart: Average Sanitation and Life Expectancy by Region**

This bar chart visualizes the average **Sanitation ** and **Life Expectancy (Years)** by region, offering a direct comparison between the two factors across different geographical areas. It effectively conveys how these two metrics align, allowing for easy visual comparison between sanitation levels and life expectancy within each region. The regions are sorted by life expectancy, making it simple to see which areas have higher life expectancies and how sanitation correlates.

This color scheme not only aligns with the rest of the dashboard but also makes it easier to distinguish between the two variables.

This chart logically follows the hexbin plot and correlation heatmap as it further explores the relationship between sanitation and life expectancy, but with a regional focus. By showing the average values, it highlights how regional differences play a role in these key indicators, reinforcing the importance of sanitation as a significant factor in determining life expectancy. This regional breakdown provides a broader view, supporting the earlier findings and offering an additional layer of insight into global disparities in life expectancy.

### 6. **Scatter Plot: Life Expectancy vs. Prevalence of Undernourishment**

This scatter plot shows the negative correlation between **Life Expectancy** and **Prevalence of Undernourishment** across regions. Each dot represents a country, colored by region, with trend lines for each region indicating the overall relationship.

A key feature of this plot is its **interactivity**. You can hover over data points for detailed country information, and the legend allows for filtering by region or combinations of regions. This responsiveness makes it easy to explore regional patterns and dive into specific areas of interest.

The distinct regional colors ensure clarity while maintaining coherence with the dashboard's overall design. This chart builds on previous visualizations by offering a deeper look at how undernourishment impacts life expectancy, especially in regions like **Sub-Saharan Africa** and **South Asia**, where the effects are more pronounced.

### 7. **Box Plot: Life Expectancy by Region**

This box plot provides a clear visualization of the distribution of **Life Expectancy** within each region, highlighting the variability and central tendencies across different geographical areas. By displaying the median, quartiles, and potential outliers for each region, the plot allows for a comprehensive comparison of life expectancy distributions.

The choice of a box plot is intentional to showcase not just the average life expectancy but also the spread and skewness within each region. This is crucial for identifying regions with high variability, which might indicate disparities in health outcomes within the same geographical area.

The regions are color-coded using a consistent **Set2** palette from the **Plotly Express** library, ensuring visual harmony with the rest of the dashboard. This color scheme aids in distinguishing between regions while maintaining an overall aesthetic coherence. The use of interactive features allows users to hover over each box to see detailed statistics, such as the quartiles and outlier values, providing an in-depth understanding of the data.

### 8. **Time Series Plot: Life Expectancy Over Time by Region**

The time series plot illustrates the progression of **Life Expectancy** over the years for each region, providing a dynamic view of how health outcomes have evolved globally. By plotting life expectancy against time, we can observe trends, improvements, or declines within regions, highlighting the impact of historical events, policies, and developments on public health.

This line chart employs distinct colors for each region, using the **Set1** palette to ensure clear differentiation while maintaining visual consistency with the dashboard's theme. Interactive legends allow users to focus on specific regions by toggling their visibility, enhancing the exploratory experience.

In the context of the dashboard's data story, this time series plot extends the analysis from a static snapshot to a dynamic narrative. It complements previous visualizations by providing a temporal dimension, showing not just where regions stand in terms of life expectancy, but also how they have arrived there over time. This historical perspective is essential for understanding long-term trends and for forecasting future developments in global health.