# Method

### Dataset descriptions

For this datastory project we found the following two relevant datasets:

- **World Bank World Development Indicators (source: kaggle.com)**: The dataset comprises World Development Indicators from 1960 to 2022, sourced from the World Bank database. It encompasses macroeconomic, social, political, and environmental data for all countries and regions covered by the World Bank. The dataset includes information on 268 countries and regions, with 48 numerical features. Note that several entries are missing for various reasons. The dataset contains 17272 records and 50 variables.

- **World Bank Life expectancy & Socio-Economics (source: kaggle.com)**: This dataset includes life expectancy at birth data along with socio-economic indicators such as GDP per capita, education levels, healthcare access, and other relevant factors. It provides insights into how these variables influence life expectancy globally. The dataset contains 3306 records and 16 variables.

### Data cleaning and aggregating

To process the data, we first loaded the two dataframes. We ensured they had overlapping years by filtering the data from 2001 to 2019. To merge the datasets, we added a `Year` column to one dataset, matching the format of the `Year` column in the other. We also renamed the `Country Name` column in the Life Expectancy dataset to `country` to align with the World Bank Development Indicators dataset. We then merged the datasets based on the `country` and `Year` columns. After merging, we selected the columns relevant to our project. The final dataset was saved as a parquet file which contains 3,306 rows and 23 columns.

Aggregation steps that are specific to a graph will be described in the following sections. 

### Variables

The variables in the final dataset can be classified in the following categories:

**Nominal / Discrete variables**: `country`, `IncomeGroup`

**Ordinal / Discrete variables**: `Corruption`

**Interval / Discrete variables**: `date`, `Year`

**Ratio / Continuous variables**: `land_area`, `control_of_corruption_estimate`, `control_of_corruption_std`, `government_effectiveness_estimate`, `human_capital_index`, `political_stability_std`, `intentional_homicides`, `Life Expectancy World Bank`, `Prevalence of Undernourishment`, `Health Expenditure`, `Education Expenditure`, `Unemployment`, `Life_expectancy_percent`, `Sanitation`, `life_expectancy_at_birth`

Variables that were used for this datastory:
- `country`: Represents the name of the country. 
- `control_of_corruption_estimate`: Represents the estimate of the control of corruption in each country. Higher values indicate better control of corruption. The values typically range from approximately -2.5 (weak control of corruption) to 2.5 (strong control of corruption).
- `goverment_effectiveness_estimate`: Represents the estimate of government effectiveness in each country. Higher values indicate more effective governance. The values typically range from approximately -2.5 (weak governance) to 2.5 (strong governance).
- `IncomeGroup`: This variable categorizes countries into different income groups (e.g., Low income, Lower middle income, Upper middle income, High income). 
- `life_expectancy_at_birth`: Represents the average life expectancy at birth in years for each country. 
- `Life_expectancy_percent`: Represents the life expectancy of each country as a percentage of the maximum life expectancy. It is calculated using the formula: (life_expectancy_at_birth / max_life_expectancy) × 100.
- `Sanitation`: Represents the percentage of the population with access to sanitation facilities. 



### Life expectancy in the world over the years

To understand the factors influencing life expectancy in different countries, we first need to examine global patterns and trends. Let's start by exploring the life expectancy accros different countries over recent years with this heatmap.

In [19]:
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
dataset = pd.read_parquet('life_expectancy_world_data.pq')

In [20]:
df = dataset.copy()
mask = df['Year'] > 2004
df = df[mask]

# Zorgt ervoor dat je het verschil in kleuren duidelijker ziet
min_life_exp = df['Life Expectancy World Bank'].min()
max_life_exp = df['Life Expectancy World Bank'].max()

# Interactieve plot
fig = px.choropleth(
    df,
    locations="country",
    locationmode="country names",
    color="Life Expectancy World Bank",
    hover_name="country",
    animation_frame="Year",
    color_continuous_scale=px.colors.sequential.Inferno,  # Warm color scale with good contrast
    title="Figure 1: Life Expectancy by Country Over Years",
    range_color=(min_life_exp, max_life_exp)  # Adjust the range to fit data
)

# Layout update om het in wereldvorm te plotten
fig.update_layout(
    geo=dict(showframe=False, showcoastlines=True, projection_type='equirectangular'),
    coloraxis_colorbar=dict(
        title="Life Expectancy",
    ),
    width=900,  # Set the width of the figure
    height=500   # Set the height of the figure
)

fig.show()

The heatmap in figure 1 shows the life expectancy of different countries over the years. This interactive world map uses color gradients to show life expectancy in each country with red, orange colors indicating a higher life expectancy and purple/black colors indicating lower life expectancy. The colors on the map vary based on the minimun and maximum life expectancy values in the dataset, providing clear contrast and better visualization of differences between countries. Through this map we can quickly identify countries with notably high or low life expectancies helping us to focus our investigation on the factors contributing to these differences. As you use the slider, you can see that the colors become lighter over time, indicating that life expectancy is generally increasing.

## Political stability factors

When analyzing the factors that influence life expectancy in a country, political stability plays a crucial role. Key factors of political stability include corruption and government effectiveness. Corruption can undermine public trust and lead to inefficient use of resources, while government effectiveness reflects the ability of a government to provide public services and implement policies effectively (Lin et al., 2012).

### Life expectancy in relation to corruption

To see if corruption has a relationship with the life expectancy in a country, we have to visualize how these two variables correlate with each other. Let's start by plotting the relationship between life expectancy and corruption for each country in our dataset. 

In [13]:
def toon_scatterplot(jaar):
    # Filter de data voor het opgegeven jaar
    data_jaar = df[df['Year'] == jaar]
    
    # Maak een scatterplot voor het opgegeven jaar
    scatter_jaar = go.Scatter(
        x=data_jaar['control_of_corruption_estimate'],
        y=data_jaar['life_expectancy_at_birth'],
        mode='markers',
        marker=dict(
            size=10,
            opacity=0.8,
        ),
        text=data_jaar['country'],  # Add country names for hover information
        hoverinfo='text+x+y',
        name=str(jaar),
        visible=False  # Initial visibility set to False
    )
    
    return scatter_jaar

# Maak een lijst met scatterplots voor elk jaar
scatterplots = [toon_scatterplot(jaar) for jaar in df['Year'].unique()]

# Zet de zichtbaarheid van de eerste scatterplot aan
scatterplots[0]['visible'] = True

# Maak een slider
steps = []
for i, jaar in enumerate(df['Year'].unique()):
    step = dict(
        method='update',
        args=[{'visible': [i == j for j in range(len(df['Year'].unique()))]}],
        label=str(jaar)
    )
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={'prefix': 'Jaar: '},
    pad={'t': 50},
    steps=steps
)]

# Maak de layout
layout = go.Layout(
    title='Figure 2: Corruption control vs life expectancy per year',
    xaxis=dict(title='Corruption control'),
    yaxis=dict(title='Life expectancy'),
    showlegend=False,  # Remove the legend
    sliders=sliders
)

# Maak het figuur
fig = go.Figure(data=scatterplots, layout=layout)

# Toon de plot
fig.show()


The graph shown in figure 2 depicts the correlation between life expectancy in a country and the control of corruption in that country. Life expectancy is the estimated age a newborn baby will reach. The corruption control variable is an index value that indicates the extent to which a country has control over corruption. Higher values indicate more corruption control, while lower values indicate less corruption control. By using the slider, you can observe how the data points change over the years. Overall, this scatterplot appears to show a positive relationship, suggesting that as corruption control improves in a country, life expectancy increases and vice versa.

However, it's important to consider other factors that may provide a different perspective on the data. One such factor is the income class of a country, as this variable also significantly influences life expectancy (Chen et al., 2021; Freeman et al., 2020). Let's see what happens when we redraw the scatterplot, this time distinguishing between income classes.

In [14]:
country_stats = df.groupby('country').agg({
    'control_of_corruption_estimate': 'mean',
    'life_expectancy_at_birth': 'mean',
    'IncomeGroup': lambda x: x.iloc[0]
}).reset_index()

# Define colors for each income group
color_mapping = {
    'Low income': 'red',
    'Lower middle income': 'orange',
    'Upper middle income': 'green',
    'High income': 'blue'
}

# Create scatter plot for each income group
scatter_plots = []
scatter_plots_no_all = []  # List to store scatter plots without "All Income Classes"
for income_group in country_stats['IncomeGroup'].unique():
    group_data = country_stats[country_stats['IncomeGroup'] == income_group]
    scatter_plot = go.Scatter(
        x=group_data['control_of_corruption_estimate'],
        y=group_data['life_expectancy_at_birth'],
        mode='markers',
        name=income_group,
        marker=dict(color=color_mapping[income_group])
    )
    scatter_plots.append(scatter_plot)
    if income_group != 'All Income Classes':  # Exclude "All Income Classes" from scatter plots without legend
        scatter_plots_no_all.append(scatter_plot)

# Create scatter plot for all income classes
scatter_plot_all = go.Scatter(
    x=country_stats['control_of_corruption_estimate'],
    y=country_stats['life_expectancy_at_birth'],
    mode='markers',
    name='All Income Classes',
    marker=dict(color=[color_mapping[group] for group in country_stats['IncomeGroup']])  # Assign colors based on income group
)

# Create layout with dropdown menu
layout = go.Layout(
    title='Figure 3: Mean Life Expectancy vs Mean Corruption Control by Income Group',
    xaxis=dict(title='Mean Corruption Control'),
    yaxis=dict(title='Mean Life Expectancy'),
    showlegend=True,
    updatemenus=[
        dict(
            buttons=[
                dict(
                    label='All Income Classes',
                    method='update',
                    args=[{'visible': [True] * len(scatter_plots)},
                          {'title': 'Mean Life Expectancy vs Mean Corruption Control for All Income Classes'}]
                ),
            ] + [
                dict(
                    label=income_group,
                    method='update',
                    args=[{'visible': [income_group == trace.name for trace in scatter_plots_no_all]},
                          {'title': f'Mean Life Expectancy vs Mean Corruption Control for {income_group} Income Group'}]
                ) for income_group in country_stats['IncomeGroup'].unique() if income_group != 'All Income Classes'  # Exclude "All Income Classes" from dropdown
            ],
            direction='down',
            pad={'r': 10, 't': 10},
            showactive=True,
            x=1.05,
            xanchor='left',
            y=1.05,
            yanchor='bottom'
        )
    ]
)

# Create figure
fig = go.Figure(data=scatter_plots, layout=layout)

# Show figure
fig.show()

The scatterplot in figure 3 is somewhat different from the previous one. Instead of focusing on each specific year, we now aggregate data across all years, taking the average for each country over a 14-year period (2005 to 2019). Each data point now represents the average life expectancy and corruption control of a country during this timeframe. The colors of the data points indicate the income class to which the country belongs.

Using the drop-down menu, we can individually view different income classes. What immediately stands out is that the relationship between life expectancy and corruption control is much harder to discern. Especially for the two middle-income classes, there does not appear to be a strong relationship. In the low-income class, there might be a weak positive relationship, but this could potentially be influenced by outliers. Only the high-income class still shows a clear positive relationship between the two variables.

### Life expectancy in relation to government effectiveness

We observed the relationship between corruption and life expectancy, which shows variations across different income groups. Another variable closely related to corruption is government effectiveness, which indicates how well a country's government functions. A well-functioning government can enhance safety through efficient authorities, improving corruption control (Lin et al., 2012; Freeman et al., 2020). To demonstrate the relationship between government effectiveness and corruption, we plot the two variables on a dual y-axis line chart.

In [15]:
df = df[(df['Year'] >= 2010) & (df['Year'] <= 2019)]

# Aggregate the data
df_aggregated = df.groupby(['Year', 'IncomeGroup']).agg({
    'goverment_effectiveness_estimate': 'mean',
    'control_of_corruption_estimate': 'mean'
}).reset_index()

# Get unique income groups
income_groups = df_aggregated['IncomeGroup'].unique()

# Create subplots
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=income_groups,
    shared_xaxes=True,
    vertical_spacing=0.1,
    specs=[[{"secondary_y": True}, {"secondary_y": True}],
           [{"secondary_y": True}, {"secondary_y": True}]]
)


# Plot data for each income group
row_col_mapping = [(1, 1), (1, 2), (2, 1), (2, 2)]
for idx, income_group in enumerate(income_groups):
    data = df_aggregated[df_aggregated['IncomeGroup'] == income_group]
    row, col = row_col_mapping[idx]
    
    fig.add_trace(
        go.Scatter(x=data['Year'], y=data['goverment_effectiveness_estimate'], name='Government Effectiveness',
                   mode='lines', line=dict(color='green'), showlegend = False),
        row=row, col=col, secondary_y=False
    )
    
    fig.add_trace(
        go.Scatter(x=data['Year'], y=data['control_of_corruption_estimate'], name='Corruption Control',
                   mode='lines', line=dict(color='orange'), showlegend = False),
        row=row, col=col, secondary_y=True
    )

fig.add_trace(
    go.Scatter(x=[None], y=[None], mode='lines', line=dict(color='green'), name='Government Effectiveness'),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=[None], y=[None], mode='lines', line=dict(color='orange'), name='Corruption Control'),
    row=1, col=1
)

# Update axis labels
for idx, (row, col) in enumerate(row_col_mapping):
    if row == 2:  # Only update x-axis title for the lower two plots
        fig.update_xaxes(title_text="Year", row=row, col=col)
    fig.update_yaxes(title_text="Gov. Effectiveness", row=row, col=col, secondary_y=False)
    fig.update_yaxes(title_text="Corruption Control", row=row, col=col, secondary_y=True)

# Update layout
fig.update_layout(
    title_text="Figure 4: Corruption Control and Government Effectiveness by Income Group (2010-2019)",
    height=500,
    margin=dict(t=150),  # Adjust margin to fit the annotation
    showlegend=True,
    legend_title_text='Variables'
)

fig.show()


Both the y-axis of figure 4 represent the estimates for corruption control and government effectiveness, with values ranging from -2.5 to 2.5. Lower values indicate reduced government effectiveness and weaker control of corruption. The chart demonstrates that there is indeed a positive relationship between government effectiveness and corruption control in a country for all four income groups. 

Lower-income countries often face greater challenges in maintaining effective government operations and controlling corruption (Lin et al., 2012; Freeman et al., 2020). Enhancing governance and reducing corruption in these countries could be a logical step towards increasing life expectancy. Let´s take a look at the correlation between these two variables in figure 5.

In [16]:
df = df[(df['Year'] >= 2010) & (df['Year'] <= 2019)]

# Aggregate the data
df_aggregated = df.groupby(['Year', 'IncomeGroup']).agg({
    'goverment_effectiveness_estimate': 'mean',
    'life_expectancy_at_birth': 'mean'
}).reset_index()

# Get unique income groups
income_groups = df_aggregated['IncomeGroup'].unique()

# Create subplots
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=income_groups,
    shared_xaxes=True,
    vertical_spacing=0.1,
    specs=[[{"secondary_y": True}, {"secondary_y": True}],
           [{"secondary_y": True}, {"secondary_y": True}]]
)

# Plot data for each income group
row_col_mapping = [(1, 1), (1, 2), (2, 1), (2, 2)]
for idx, income_group in enumerate(income_groups):
    data_plot = df_aggregated[df_aggregated['IncomeGroup'] == income_group]
    row, col = row_col_mapping[idx]
    
    fig.add_trace(
        go.Scatter(x=data_plot['Year'], y=data_plot['goverment_effectiveness_estimate'], name=f'Government Effectiveness ({income_group})',
                   mode='lines', line=dict(color='green'), legendgroup='Government Effectiveness', showlegend=False),
        row=row, col=col, secondary_y=False
    )
    
    fig.add_trace(
        go.Scatter(x=data_plot['Year'], y=data_plot['life_expectancy_at_birth'], name=f'Life Expectancy ({income_group})',
                   mode='lines', line=dict(color='magenta'), legendgroup='Life Expectancy', showlegend=False),
        row=row, col=col, secondary_y=True
    )

# Add dummy traces for the legend
fig.add_trace(
    go.Scatter(x=[None], y=[None], mode='lines', line=dict(color='green'), name='Government Effectiveness'),
    row=1, col=1
)
fig.add_trace(
    go.Scatter(x=[None], y=[None], mode='lines', line=dict(color='magenta'), name='Life Expectancy'),
    row=1, col=1
)

# Update axis labels
for idx, (row, col) in enumerate(row_col_mapping):
    if row == 2:  # Only update x-axis title for the lower two plots
        fig.update_xaxes(title_text="Year", row=row, col=col)
    fig.update_yaxes(title_text="Gov. Effectiveness", row=row, col=col, secondary_y=False)
    fig.update_yaxes(title_text="Life Expectancy", row=row, col=col, secondary_y=True)

# Update layout
fig.update_layout(
    title_text="Figure 5: Life Expectancy and Government Effectiveness by Income Group (2010-2019)",
    height=500,
    showlegend=True,
    legend_title_text='Variables'
)

# Show the plot
fig.show()

Figure 5 compares the government effectiveness estimate (left y-axis) with life expectancy at birth (right y-axis) across different income classes. The trend shows that for middle-income classes, both government effectiveness and life expectancy increase over the years. Interestingly, in both high and low-income classes, life expectancy continues to rise even when government control decreases. Additionally, we observe a significant peak and subsequent drop in the government effectiveness estimate for the lower-middle-income class, which does not appear to correlate with a corresponding change in life expectancy.

Based on visual observations, the following can be said about the relationship between life expectancy and government effectiveness:
- The positive relationship between government effectiveness and life expectancy suggests that improvements in government effectiveness likely contribute to better health outcomes in certain income groups.
- In high and low-income groups, life expectancy continues to rise even when government effectiveness decreases. This could indicate that factors other than government effectiveness, such as economic wealth in high-income groups and international aid in low-income groups, may play a more significant role in improving life expectancy.
- The significant peak and drop in government effectiveness do not correlate with changes in life expectancy. This might imply that short-term fluctuations in government effectiveness do not immediately impact life expectancy, or other stabilizing factors could lessen the effect of such fluctuations.

However, due to the lack of statistical testing and contradictory patterns in the line chart, these statements are not statistically valid without further analysis.

## Life necessaties factors

When analyzing the factors that influence life expectancy in a country, basic life necessities are fundamental. Key life necessities factors include sanitation and prevalence of undernourishment. Proper sanitation is crucial for preventing diseases and maintaining public health, while the prevalence of undernourishment indicates the nutritional status of a population, which directly affects health and longevity (Ranabhat ., et al 2018).

### Life expectancy in relation to sanitation

As mentioned before, a potential factor that affects life expectancy could be access to sanitation facilities. Countries with limited access to such facilities are at higher risk of diseases and the spread of viruses that could potentially be life-threatening.

In [17]:
max_life_expectancy = df['life_expectancy_at_birth'].max()
df['Life_expectancy_percent'] = (df['life_expectancy_at_birth'] / max_life_expectancy) * 100

avg_data = df.groupby('IncomeGroup').agg({
    'Sanitation': 'mean',
    'Life_expectancy_percent': 'mean'
}).reset_index()

import plotly.graph_objs as go

# Create traces
trace1 = go.Bar(
    x=avg_data['IncomeGroup'],
    y=avg_data['Sanitation'],
    name='Sanitation (%)',
    marker=dict(color='dark blue')
)

trace2 = go.Bar(
    x=avg_data['IncomeGroup'],
    y=avg_data['Life_expectancy_percent'],
    name='Life Expectancy (%)',
    marker=dict(color='magenta')
)

data_bars = [trace1, trace2]

# Create layout
layout = go.Layout(
    title='Figure 6: Access to Sanitation and Life Expectancy by Income Group',
    barmode='group',
    xaxis=dict(title='Income Group', categoryorder='array', categoryarray=['High income', 'Upper middle income', 'Lower middle income', 'Low income']),
    yaxis=dict(title='Percentage'),
    legend=dict(title='Metrics')
)

# Create figure
fig = go.Figure(data=data_bars, layout=layout)

# Show figure
fig.show()

The bar chart in figure 6 illustrates the mean percentage of individuals with access to sanitation facilities across different income groups, alongside their respective life expectancy represented as a percentage. To derive the life expectancy percentage, we calculated each country's life expectancy as a fraction of the maximum life expectancy in the dataset (approximately 84 years), and then computed the average of these percentages within each income group.

When looking at this chart, it becomes clear that the high income groups have more access to sanitation facilities while also having a high life expectancy. You can see that the as the income level decreases, both life expectancy and sanitation seem to decline, suggesting a positive correlation between the two variables.



### Life expectancy in relation to prevalence of undernourishment

Finally, we'll examine the relationship between life expectancy and the prevalence of undernourishment. The prevalence of undernourishment represents the percentage of the population suffering from undernourishment. To effectively visualize this relationship, we'll use scatterplots with trendlines.

In [18]:
from scipy.stats import linregress
df = dataset.copy()

df_aggregated = df.groupby(['Year', 'IncomeGroup']).agg({
    'Prevelance of Undernourishment': 'mean',
    'life_expectancy_at_birth': 'mean'
}).reset_index()

income_colors = {
    'Low income': 'red',
    'High income': 'blue',
    'Upper middle income': 'green',
    'Lower middle income': 'orange'
}

income_groups = ['Low income', 'High income', 'Upper middle income', 'Lower middle income']

# Create a figure with subplots
fig = make_subplots(rows=2, cols=2, 
                    subplot_titles=['Low Income Group', 'High Income Group', 
                                    'Upper Middle Income Group', 'Lower Middle Income Group'])

# Iterate over each income group
for i, income_group in enumerate(income_groups):
    # Filter data by income group
    df_income_group = df_aggregated[df_aggregated['IncomeGroup'] == income_group]
    
    # Create scatter plot with hover information
    fig_scatter = px.scatter(df_income_group, x='Prevelance of Undernourishment', y='life_expectancy_at_birth', 
                             color_discrete_sequence=[income_colors[income_group]], 
                             labels={'Prevelance of Undernourishment': 'Prevalence of Undernourishment (%)', 
                                     'life_expectancy_at_birth': 'Life Expectancy at Birth'},
                             hover_name='Year',  # Specify the column for hover information
                             hover_data={'Year': True,  # Include Year in hover data
                                         'Prevelance of Undernourishment': True,
                                         'life_expectancy_at_birth': True})
    
    # Add scatter plot to the corresponding subplot
    fig.add_trace(fig_scatter.data[0], row=(i // 2) + 1, col=(i % 2) + 1)
    
    # Calculate trendline using linear regression
    slope, intercept, _, _, _ = linregress(df_income_group['Prevelance of Undernourishment'], df_income_group['life_expectancy_at_birth'])
    trendline_y = slope * df_income_group['Prevelance of Undernourishment'] + intercept
    
    # Add trendline to the corresponding subplot
    fig.add_trace(go.Scatter(x=df_income_group['Prevelance of Undernourishment'], y=trendline_y,
                             mode='lines',
                             name='Trendline',
                             line=dict(color=income_colors[income_group], width=2)),
                  row=(i // 2) + 1, col=(i % 2) + 1)
    
    # Update x-axis and y-axis labels
    fig.update_xaxes(title_text='Prevalence of Undernourishment (%)', row=(i // 2) + 1, col=(i % 2) + 1)
    fig.update_yaxes(title_text='Life Expectancy', row=(i // 2) + 1, col=(i % 2) + 1)

# Update layout with title and display
fig.update_layout(height=500, width=1000, showlegend=False, title='Figure 7: Life Expectancy and Prevalence of Undernourishment')

# Show the combined plot
fig.show()

Figure 7 illustrates a negative relationship between life expectancy and prevalence of undernourishment across different income groups. Each point on the plots represents the average life expectancy and prevalence of undernourishment for a specific year between 2001 and 2019 within a particular income group. Hovering over each point reveals the corresponding year.

The plots reveal that countries with lower life expectancy tend to have a higher prevalence of undernourishment. Points on the left side of the plots represent more recent years, showing a less pronounced trend compared to earlier years. This observation is particularly noticeable for the Low and Upper Middle income groups.

## Summary


Life expectancy is a crucial measure of a country's health and development, influenced by life necessities and political stability. This project explored these relationships across different income groups using World Bank data.
.

**Political Stability Factors:** The impact of political stability on life expectancy varied across income groups. Better corruption control generally correlated with higher life expectancy, but this relationship was more pronounced in high-income countries. In low-income countries like Ethiopia, political instability worsened health inequities and reduced life expectancy. Conversely, in high-income countries like the United States, life expectancy remained stable despite political instability, due to robust healthcare systems. Additionally, while government effectiveness positively influenced health outcomes, life expectancy often rose even when government effectiveness declined, suggesting that other factors such as economic wealth or international aid might play significant roles. These observations align with the perspective that political stability factors do not show a uniform correlation with life expectancy across different income gr

**Life Necessities Factors:** The analysis showed a consistent positive correlation between life necessities and life expectancy across all income groups. Higher healthcare expenditure significantly increased life expectancy, especially in high-income countries. Education and income levels were also strongly linked to longer life expectancy. High-income groups had better access to sanitation facilities, leading to higher life expectancy, while lower-income groups had less access and consequently shorter life expectancy. These findings support the perspective that life necessities factors have a similar impact on life expectancy across all income groups.oups.

Overall, the findings highlight the importance of both life necessities and political stability in determining life expectancy, with varying effects based on a country's income level. These insights can guide policymakers in developing targeted strategies to improve life expectancy. Further research is needed to conclusively determine the impacts of these factors.

## References

1. Freeman, T., Gesesew, H. A., Bambra, C., Giugliani, E. R. J., Popay, J., Sanders, D., Macinko, J., Musolino, C., & Baum, F. (2020). Why do some countries do better or worse in life expectancy relative to income? An analysis of Brazil, Ethiopia, and the United States of America. International Journal for Equity in Health, 19, 202. https://doi.org/10.1186/s12939-020-01315-z

2. Chen, Z., Ma, Y., Hua, J., Wang, Y., & Guo, H. (2021). Impacts from economic development and environmental factors on life expectancy: A comparative study based on data from both developed and developing countries from 2004 to 2016. International Journal of Environmental Research and Public Health, 18(16), 8559. doi: 10.3390/ijerph18168559

3. Lin, R.-T., Chen, Y.-M., Chien, L.-C., & Chan, C.-C. (2012). Political and social determinants of life expectancy in less developed countries: A longitudinal study. BMC Public Health, 12, 85. https://doi.org/10.1186/1471-2458-12-85

4. Ranabhat, C. L., Jakovljevic, M., Dhimal, M., & Kim, C. B. (2018). Structural factors responsible for universal health coverage in low- and middle-income countries: Results from 118 countries. Frontiers in Pharmacology, 9, 960. https://doi.org/10.3389/fphar.2018.00960


## TA and peer feedback

Our team, D2, collaborated with D3 and our TA to receive peer feedback. During these sessions, each group pitched their work from the past three weeks sequentially. Following each pitch, the other group and our TA provided feedback, highlighting both positive aspects and areas for improvement.

At that juncture, our work had already established a strong foundation; however, we received feedback on several fronts, notably the need to incorporate more scientific literature. Below is a summary of the feedback provided by both the other group and our TA:

- Ensure clarity in the figure "Corruption control vs life expectancy per year" by clearly indicating which country each point represents when hovered over, considering that the year is already indicated by the slider. Remove the legend from the right corner.
- Clarify the representation of colored lines in the figure "Corruption Control and Government Effectiveness by Income Group (2004-2019)."
- Maintain consistency in color coding across figures; for instance, use distinct colors to differentiate variables consistently in visualizations.

Following this constructive feedback session, our team promptly convened to integrate these suggestions into our project. We immediately devised a new plan to apply these improvements to our final project.

## Reflection

Throughout this project, we encountered both successes and challenges. Initially, we faced difficulties in selecting a suitable topic and identifying appropriate databases. However, once we secured access to relevant data sources, our progress improved significantly as we began visualizing relationships between various variables using graphs.

As we explored the data, we realized our initial project idea from the proposal wasn't quite fitting with our available data. Thankfully, our TA helped us pivot and find new perspectives that became the basis for our data story.

If we had more time for this project, it would have been interesting to delve deeper into exploring correlations between additional variables within this context. For instance, investigating how wealth inequality might influence life expectancy in different countries could have provided further insights.

## Work distribution

We communicated through lectures, meetings, and WhatsApp during the project. In our first lecture, we installed JupyterBook to convert our Jupyter notebook into a website via GitHub. After that, we had several meetings to meet deadlines, receiving feedback on June 20, 2024. In the final weeks, we processed this feedback and integrated our notebook into the GitHub site. WhatsApp helped us collaborate effectively. Here’s how we divided the tasks:

*Elaine Jans:* focused on cleaning datasets and creating graphs to represent relationships. Also presented our project during the peer review.

*Joy Filtenborg:* Adjusted/created graphs and added descriptions for both the graphs and dataset.

*Khang Nguyen:* Assisted in finding datasets and covered the literature reasearch for this project.

*Eric Molenaar:* Described the reflection, peer review, and task distribution, and integrated the Jupyter notebook into the GitHub website.