# Project 2

In [1]:
import pandas as pd
import plotly.express as px

In [2]:
import plotly.io as pio

pio.renderers.default = "vscode+jupyterlab+notebook_connected"

## Datasets used:

1. World Happiness Index (2018) - https://www.kaggle.com/datasets/sougatapramanick/happiness-index-2018-2019

2. World Bank Indicators Collection - https://www.kaggle.com/datasets/ploverbrown/world-bank-indicators-collection
(This dataset consists of multiple sheets of indicators across thematic areas. Only sheet titled 'Health1' has been retained for the purpose of this project) 

## Dataset 1 - World Happiness Index (2018)

The World Happiness Report reflects a worldwide demand for more attention to happiness and well-being as criteria for government policy. It reviews the state of happiness in the world today and shows how the science of happiness explains personal and national variations in happiness.

The dataset also includes observed data on six variables and estimates of their associations with life evaluations to explain the variation across countries. They include GDP per capita, social support, healthy life expectancy, freedom, generosity, and corruption. Happiness rankings are not based on any index of these six factors – the scores are instead based on individuals’ own assessments of their lives, in particular, their answers to the single-item Cantril ladder life-evaluation question. 

In [3]:
world_happiness_index = pd.read_csv("World_Happiness_Index_2018.csv")
world_happiness_index

Unnamed: 0,Overall rank,Country or region,Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,1,Finland,7.632,1.305,1.592,0.874,0.681,0.202,0.393
1,2,Norway,7.594,1.456,1.582,0.861,0.686,0.286,0.340
2,3,Denmark,7.555,1.351,1.590,0.868,0.683,0.284,0.408
3,4,Iceland,7.495,1.343,1.644,0.914,0.677,0.353,0.138
4,5,Switzerland,7.487,1.420,1.549,0.927,0.660,0.256,0.357
...,...,...,...,...,...,...,...,...,...
151,152,Yemen,3.355,0.442,1.073,0.343,0.244,0.083,0.064
152,153,Tanzania,3.303,0.455,0.991,0.381,0.481,0.270,0.097
153,154,South Sudan,3.254,0.337,0.608,0.177,0.112,0.224,0.106
154,155,Central African Republic,3.083,0.024,0.000,0.010,0.305,0.218,0.038


In [4]:
# check for NaNs - entire dataset
columns_whi = world_happiness_index.columns
cols_null = []
for c in columns_whi:
    null_check = world_happiness_index[c].isnull().values.any()
    print(f"Are there nulls for {c}? - {null_check}")
    if null_check == True:
        cols_null.append(c)
        count_na_whi = world_happiness_index.groupby(['Country or region']).agg({c:lambda x: x.isna().sum()}).reset_index()
        print(f"        % nulls for {c} - {count_na_whi.loc[count_na_whi[c] ==1].shape[0]*100/count_na_whi.shape[0]}")

Are there nulls for Overall rank? - False
Are there nulls for Country or region? - False
Are there nulls for Score? - False
Are there nulls for GDP per capita? - False
Are there nulls for Social support? - False
Are there nulls for Healthy life expectancy? - False
Are there nulls for Freedom to make life choices? - False
Are there nulls for Generosity? - False
Are there nulls for Perceptions of corruption? - True
        % nulls for Perceptions of corruption - 0.6410256410256411


In [5]:
## Standardize the column name (so that we can merge the two datasets later)
world_happiness_index = world_happiness_index.rename(columns={"Country or region":"Country Name","Score":"Happiness Score"})
world_happiness_index

Unnamed: 0,Overall rank,Country Name,Happiness Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption
0,1,Finland,7.632,1.305,1.592,0.874,0.681,0.202,0.393
1,2,Norway,7.594,1.456,1.582,0.861,0.686,0.286,0.340
2,3,Denmark,7.555,1.351,1.590,0.868,0.683,0.284,0.408
3,4,Iceland,7.495,1.343,1.644,0.914,0.677,0.353,0.138
4,5,Switzerland,7.487,1.420,1.549,0.927,0.660,0.256,0.357
...,...,...,...,...,...,...,...,...,...
151,152,Yemen,3.355,0.442,1.073,0.343,0.244,0.083,0.064
152,153,Tanzania,3.303,0.455,0.991,0.381,0.481,0.270,0.097
153,154,South Sudan,3.254,0.337,0.608,0.177,0.112,0.224,0.106
154,155,Central African Republic,3.083,0.024,0.000,0.010,0.305,0.218,0.038


Let's explore the countries that score the highest and lowest on the happiness rankings

In [6]:
hap_5 = world_happiness_index.sort_values(['Happiness Score'],ascending=False).reset_index(drop=True)
top_hap_5 = hap_5.iloc[:5]
fig = px.bar(y=top_hap_5['Country Name'],x=top_hap_5['Happiness Score'], title='Top 5 countries with highest \
happiness scores in 2018',orientation='h')
# Customize chart
fig.update_traces(marker_color='green')
fig.update_layout(
    yaxis=dict(autorange="reversed"),
    xaxis_title="Happiness Score",
    yaxis_title="Country Name",
    title_x=0.5 # center title
)
fig.show()
bottom_hap_5 = hap_5.iloc[-5:]
fig = px.bar(y=bottom_hap_5['Country Name'],x=bottom_hap_5['Happiness Score'], title='Bottom 5 countries with lowest \
happiness scores in 2018',orientation='h')
# Customize chart
fig.update_traces(marker_color='red')
fig.update_layout(
    yaxis=dict(autorange="reversed"),
    xaxis_title="Happiness Score",
    yaxis_title="Country Name",
    title_x=0.5 # center title
)
fig.show()


Unsurprisingly, it is Western European countries that rank the highest on happiness while African countries in armed conflict zones rank lowest on happiness scores

## Common drivers of Happiness

### 1. Income and Happiness

It is reasonable to suspect that income and happiness scores are positively correlated. Countries with higher GDP per capita should be able to have better infrastructure, institutions and governance to cater to its citizens' well-being needs (financial, social, environmental, health etc.). So let's explore this hypothesis.

In [7]:
world_happiness_index.columns

Index(['Overall rank', 'Country Name', 'Happiness Score', 'GDP per capita',
       'Social support', 'Healthy life expectancy',
       'Freedom to make life choices', 'Generosity',
       'Perceptions of corruption'],
      dtype='object')

Comment

In [8]:
country_col = 'Country Name'
world_happiness_index.loc[world_happiness_index[country_col].isin(top_hap_5[country_col].to_list()),'color'] = 'Top 5 Countries ranked by happiness'
world_happiness_index.loc[world_happiness_index[country_col].isin(bottom_hap_5[country_col].to_list()),'color'] = 'Bottom 5 ranked by happiness'
world_happiness_index.loc[~world_happiness_index[country_col].isin(bottom_hap_5[country_col].to_list()+top_hap_5[country_col].to_list()),'color'] = 'Other Countries'
fig = px.scatter(data_frame=world_happiness_index,x='GDP per capita',y='Happiness Score',color='color',
                 color_discrete_map={'Top 5 Countries ranked by happiness': 'green', 'Bottom 5 ranked by happiness': 'red','Other Countries':'blue'},
                 title="ScatterPlot: GDP per capita and Happiness Score",custom_data=[country_col], trendline = "ols")
# Note - install statsmodels for trendlines
fig.update_traces(
    hovertemplate="<br>".join([
        "GDP per capita: %{y}",
        "Happiness Score: %{x}",
        "Country Name: %{customdata[0]}",
    ])
)
# Customize axis labels
fig.update_layout(
    xaxis_title="GDP per capita",
    yaxis_title="Happiness Score",
    title_x=0.5 # center title
)
fig.show()

In [9]:
results = px.get_trendline_results(fig)
results.iloc[0,1].summary()


omni_normtest is not valid with less than 8 observations; 5 samples were given.



0,1,2,3
Dep. Variable:,y,R-squared:,0.048
Model:,OLS,Adj. R-squared:,-0.269
Method:,Least Squares,F-statistic:,0.1517
Date:,"Tue, 10 Dec 2024",Prob (F-statistic):,0.723
Time:,22:20:54,Log-Likelihood:,7.4462
No. Observations:,5,AIC:,-10.89
Df Residuals:,3,BIC:,-11.67
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,7.8598,0.789,9.957,0.002,5.348,10.372
x1,-0.2235,0.574,-0.390,0.723,-2.049,1.602

0,1,2,3
Omnibus:,,Durbin-Watson:,2.44
Prob(Omnibus):,,Jarque-Bera (JB):,0.627
Skew:,0.033,Prob(JB):,0.731
Kurtosis:,1.266,Cond. No.,52.7


The scatterplot above confirms the hypothesis that GDP per Capita and Happiness scores are strongly positively correlated. We can also see that the African countries with the lowest happiness scores have some of the lowest levels of GDP per capita while the western European countries that ranked higest on happiness have some of the highest GDP per capita. This is a good proxy to understand the resources at a government's disposal to cater to an average citizen.

### 2. Healthy life expectancy and Happiness Score

Another important d river of a subjective well-being measure such as 'Happiness' would be the quality of life that an average citizen of the country is likely to experience. One way of proxying for quality of life is estimating the 'Healthy life expectancy' i.e. A "healthy life expectancy" value between 0 and 1 represents a probability, where a number closer to 1 indicates a higher proportion of life lived in good health, while a number closer to 0 signifies a larger portion of life spent in a less healthy state; essentially, it shows the likelihood of living a significant portion of one's life in good health, with 1 representing perfect health throughout life and 0 representing no healthy years at all. 

So let's look at the correlation between Healthy life expectancy and Happiness with the hypothesis that higher the Healthy life expectancy, higher the Happiness score.

In [10]:
country_col = 'Country Name'
world_happiness_index.loc[world_happiness_index[country_col].isin(top_hap_5[country_col].to_list()),'color'] = 'Top 5 Countries ranked by happiness'
world_happiness_index.loc[world_happiness_index[country_col].isin(bottom_hap_5[country_col].to_list()),'color'] = 'Bottom 5 ranked by happiness'
world_happiness_index.loc[~world_happiness_index[country_col].isin(bottom_hap_5[country_col].to_list()+top_hap_5[country_col].to_list()),'color'] = 'Other Countries'
fig = px.scatter(data_frame=world_happiness_index,x='Healthy life expectancy',y='Happiness Score',color='color',
                 color_discrete_map={'Top 5 Countries ranked by happiness': 'green', 'Bottom 5 ranked by happiness': 'red','Other Countries':'blue'},
                 title="ScatterPlot: Healthy life expectancy and Happiness Score",custom_data=[country_col])
# Note - install statsmodels for trendlines
fig.update_traces(
    hovertemplate="<br>".join([
        "Healthy life expectancy: %{y}",
        "Happiness Score: %{x}",
        "Country Name: %{customdata[0]}",
    ])
)
# Customize axis labels
fig.update_layout(
    xaxis_title="Healthy life expectancy",
    yaxis_title="Happiness Score",
    title_x=0.5 # center title
)
fig.show()

The scatterplot above reveals a convex relationship between Healthy life expectancy and Happiness scores. We observe that countries with a Healthy life expectancy of upto 0.3 see a marginally decreasing happiness score (sign of hopelessness perhaps) but as their life expectancy improves, we see a strong positive correlation with happiness scores. 

## Dataset 2 - World Bank Health Data

This dataset includes the following World Bank data from 1960 to 2019 :Population Totals; Population Density; Population Growth; Female Population; Male Population; Education; Climate Change; Trade; Infrastructure; Poverty; Social Development; Environment; Agriculture & Rural; Economic Growth; Health; Private Sector; Public Sector; Financial Sector; Science & Technology; External Debt; Country Metadata; Metadata Indicators.

For the purpose of this project, only the sheet titled 'Health 1' has been retained using a spreadsheet program.


In [11]:
world_bank_health_df = pd.read_excel(f"WorldBank_Healthdata.xlsx")
world_bank_health_df

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Aruba,ABW,Unmet need for contraception (% of married wom...,SP.UWT.TFRT,,,,,,,...,,,,,,,,,,
1,Aruba,ABW,Completeness of death registration with cause-...,SP.REG.DTHS.ZS,,,,,,,...,,,,,,,,,,
2,Aruba,ABW,Completeness of birth registration (%),SP.REG.BRTH.ZS,,,,,,,...,,,,,,,,,,
3,Aruba,ABW,"Completeness of birth registration, urban (%)",SP.REG.BRTH.UR.ZS,,,,,,,...,,,,,,,,,,
4,Aruba,ABW,"Completeness of birth registration, rural (%)",SP.REG.BRTH.RU.ZS,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65527,Kosovo,XKX,Unmet need for contraception (% of married wom...,SP.UWT.TFRT,,,,,,,...,,,,,,,,,,
65528,Kosovo,XKX,Completeness of death registration with cause-...,SP.REG.DTHS.ZS,,,,,,,...,,,,,,,,,,
65529,Kosovo,XKX,Completeness of birth registration (%),SP.REG.BRTH.ZS,,,,,,,...,,,,,,,,,,
65530,Kosovo,XKX,"Completeness of birth registration, urban (%)",SP.REG.BRTH.UR.ZS,,,,,,,...,,,,,,,,,,


### We only want to look at the data for year 2018

In [12]:
# filter to columns of interest
columns = ['Country Name','Indicator Name','2018']
world_bank_health_df = world_bank_health_df[columns]
world_bank_health_df

Unnamed: 0,Country Name,Indicator Name,2018
0,Aruba,Unmet need for contraception (% of married wom...,
1,Aruba,Completeness of death registration with cause-...,
2,Aruba,Completeness of birth registration (%),
3,Aruba,"Completeness of birth registration, urban (%)",
4,Aruba,"Completeness of birth registration, rural (%)",
...,...,...,...
65527,Kosovo,Unmet need for contraception (% of married wom...,
65528,Kosovo,Completeness of death registration with cause-...,
65529,Kosovo,Completeness of birth registration (%),
65530,Kosovo,"Completeness of birth registration, urban (%)",


Let's look at the list of health indicators available in the dataset

In [13]:
world_bank_health_df['Indicator Name'].unique()

array(['Unmet need for contraception (% of married women ages 15-49)',
       'Completeness of death registration with cause-of-death information (%)',
       'Completeness of birth registration (%)',
       'Completeness of birth registration, urban (%)',
       'Completeness of birth registration, rural (%)',
       'Completeness of birth registration, male (%)',
       'Completeness of birth registration, female (%)',
       'Population, male (% of total population)', 'Population, male',
       'Population, female (% of total population)', 'Population, female',
       'Population, total', 'Population growth (annual %)',
       'Age dependency ratio, young (% of working-age population)',
       'Age dependency ratio, old (% of working-age population)',
       'Age dependency ratio (% of working-age population)',
       'Sex ratio at birth (male births per female births)',
       'Population ages 80 and above, male (% of male population)',
       'Population ages 80 and above, female 

## What drives Healthy Life Expectancy ?

Given that we know healthy life expectancy is strongly positively correlated with the happiness score, we might want to explore a few potential indicators that could influence a healthy life expectancy such as proportion of GDP spent on current health expenditure in these countries, what this translates to in per capita terms at purchasing power parity. Another interesting explanatory variable we could consider would be the prevalence of moderate to severe food insecurity. Since food insecurity can be traced to the roots of stunting, malnourishment, disease and other aspects that not drive healthy life expectancy and also the subjective well-being experiences captured by the Happiness scores. 

In [14]:
## Filter to keep indicators of interest
indicators = ['Population, total','Prevalence of moderate or severe food insecurity in the population (%)','Current health expenditure per capita, PPP (current international $)','Current health expenditure (% of GDP)']
world_bank_health_df = world_bank_health_df.loc[world_bank_health_df['Indicator Name'].isin(indicators)].sort_values(['Country Name','Indicator Name'])
world_bank_health_df

Unnamed: 0,Country Name,Indicator Name,2018
369,Afghanistan,Current health expenditure (% of GDP),9.395727e+00
367,Afghanistan,"Current health expenditure per capita, PPP (cu...",1.864073e+02
264,Afghanistan,"Population, total",3.717239e+07
346,Afghanistan,Prevalence of moderate or severe food insecuri...,6.080000e+01
875,Albania,Current health expenditure (% of GDP),5.262714e+00
...,...,...,...
49175,West Bank and Gaza,Prevalence of moderate or severe food insecuri...,
65137,World,Current health expenditure (% of GDP),9.848779e+00
65135,World,"Current health expenditure per capita, PPP (cu...",1.467186e+03
65032,World,"Population, total",7.591945e+09


In [15]:
# check for % of rows that have NaNs in the filtered dataset
world_bank_health_df.loc[world_bank_health_df['2018'].isna()].shape[0]*100/world_bank_health_df.shape[0]

20.55984555984556

We now want to reshape the dataframe so that we have 1 row per country and columns representing the indicators filtered.

In [16]:
# reshape the dataset to have the indicators as columns
world_bank_health_df = world_bank_health_df.set_index(["Country Name","Indicator Name"])["2018"].unstack("Indicator Name").reset_index().rename_axis(None, axis="columns")
world_bank_health_df

Unnamed: 0,Country Name,Current health expenditure (% of GDP),"Current health expenditure per capita, PPP (current international $)","Population, total",Prevalence of moderate or severe food insecurity in the population (%)
0,Afghanistan,9.395727,186.407288,3.717239e+07,60.800000
1,Albania,5.262714,697.304871,2.866376e+06,37.100000
2,Algeria,6.218427,962.719360,4.222843e+07,17.600000
3,American Samoa,,,5.546500e+04,
4,Andorra,6.710331,3607.000977,7.700600e+04,
...,...,...,...,...,...
254,"Venezuela, RB",3.562690,383.508545,2.887020e+07,
255,Vietnam,5.917897,440.166504,9.554040e+07,
256,Virgin Islands (U.S.),,,1.069770e+05,
257,West Bank and Gaza,,,4.569087e+06,


## Combine datasets

In [17]:
combined_indicator_df = pd.merge(world_bank_health_df,world_happiness_index,on=['Country Name'],how='right')
combined_indicator_df

Unnamed: 0,Country Name,Current health expenditure (% of GDP),"Current health expenditure per capita, PPP (current international $)","Population, total",Prevalence of moderate or severe food insecurity in the population (%),Overall rank,Happiness Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,color
0,Finland,9.037323,4457.170898,5515525.0,7.7,1,7.632,1.305,1.592,0.874,0.681,0.202,0.393,Top 5 Countries ranked by happiness
1,Norway,10.049393,6818.346191,5311916.0,4.9,2,7.594,1.456,1.582,0.861,0.686,0.286,0.340,Top 5 Countries ranked by happiness
2,Denmark,10.070716,5794.259277,5793636.0,5.2,3,7.555,1.351,1.590,0.868,0.683,0.284,0.408,Top 5 Countries ranked by happiness
3,Iceland,8.469309,5113.221680,352721.0,7.3,4,7.495,1.343,1.644,0.914,0.677,0.353,0.138,Top 5 Countries ranked by happiness
4,Switzerland,11.876209,8113.943359,8514329.0,2.7,5,7.487,1.420,1.549,0.927,0.660,0.256,0.357,Top 5 Countries ranked by happiness
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
151,Yemen,,,,,152,3.355,0.442,1.073,0.343,0.244,0.083,0.064,Bottom 5 ranked by happiness
152,Tanzania,3.628680,112.456367,56318348.0,,153,3.303,0.455,0.991,0.381,0.481,0.270,0.097,Bottom 5 ranked by happiness
153,South Sudan,6.400380,113.779228,10975920.0,84.9,154,3.254,0.337,0.608,0.177,0.112,0.224,0.106,Bottom 5 ranked by happiness
154,Central African Republic,10.992531,97.005814,4666377.0,,155,3.083,0.024,0.000,0.010,0.305,0.218,0.038,Bottom 5 ranked by happiness


In [18]:
combined_indicator_df.loc[combined_indicator_df['Current health expenditure per capita, PPP (current international $)'].isna()]

Unnamed: 0,Country Name,Current health expenditure (% of GDP),"Current health expenditure per capita, PPP (current international $)","Population, total",Prevalence of moderate or severe food insecurity in the population (%),Overall rank,Happiness Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,color
25,Taiwan,,,,,26,6.441,1.365,1.436,0.857,0.418,0.151,0.078,Other Countries
37,Trinidad & Tobago,,,,,38,6.192,1.223,1.492,0.564,0.575,0.171,0.019,Other Countries
38,Slovakia,,,,,39,6.173,1.21,1.537,0.776,0.354,0.118,0.014,Other Countries
56,South Korea,,,,,57,5.875,1.266,1.204,0.955,0.244,0.175,0.051,Other Countries
57,Northern Cyprus,,,,,58,5.835,1.229,1.211,0.909,0.495,0.179,0.154,Other Countries
58,Russia,,,,,59,5.81,1.151,1.479,0.599,0.399,0.065,0.025,Other Countries
65,Kosovo,,,,,66,5.662,0.855,1.23,0.578,0.448,0.274,0.023,Other Countries
69,Libya,,,6678567.0,35.9,70,5.566,0.985,1.35,0.553,0.496,0.116,0.148,Other Countries
75,Hong Kong,,,,,76,5.43,1.405,1.29,1.03,0.524,0.246,0.291,Other Countries
88,Macedonia,,,,,89,5.185,0.959,1.239,0.691,0.394,0.173,0.052,Other Countries


In [19]:
# check for NaNs - combined dataset
columns_ci = combined_indicator_df.columns
cols_null = []
for c in columns_ci:
    null_check = combined_indicator_df[c].isnull().values.any()
    print(f"Are there nulls for {c}? - {null_check}")
    if null_check == True:
        cols_null.append(c)
        count_na_ci = combined_indicator_df.groupby(['Country Name']).agg({c:lambda x: x.isna().sum()}).reset_index()
        print(f"        % nulls for {c} - {count_na_ci.loc[count_na_ci[c] ==1].shape[0]*100/count_na_ci.shape[0]}")

Are there nulls for Country Name? - False
Are there nulls for Current health expenditure (% of GDP)? - True
        % nulls for Current health expenditure (% of GDP) - 16.025641025641026
Are there nulls for Current health expenditure per capita, PPP (current international $)? - True
        % nulls for Current health expenditure per capita, PPP (current international $) - 16.025641025641026
Are there nulls for Population, total? - True
        % nulls for Population, total - 14.743589743589743
Are there nulls for Prevalence of moderate or severe food insecurity in the population (%)? - True
        % nulls for Prevalence of moderate or severe food insecurity in the population (%) - 46.794871794871796
Are there nulls for Overall rank? - False
Are there nulls for Happiness Score? - False
Are there nulls for GDP per capita? - False
Are there nulls for Social support? - False
Are there nulls for Healthy life expectancy? - False
Are there nulls for Freedom to make life choices? - False
Are 

## Healthy Life Expectancy and Current Health Expenditure (% of GDP)

In the earlier sections, we saw that GDP per capita had a strong positive correlation with Happiness Score. One pathway of influence from income to happiness could be via the country's expenditure on health as a % of its GDP indicative of how high or low health ranks in its priorities. Conventionally, we would expect a strong positive correlation between the government's expenditure on health and the healthy life expectancy of its citizens as improvements in healthcare infrastructure, access and service delivery can be achieved through higher budgetary allocations.

In [20]:
country_col = 'Country Name'
fig = px.scatter(data_frame=combined_indicator_df,y='Healthy life expectancy',x='Current health expenditure (% of GDP)',color='color',
                 color_discrete_map={'Top 5 Countries ranked by happiness': 'green', 'Bottom 5 ranked by happiness': 'red','Other Countries':'blue'},
                 title="ScatterPlot: Healthy life expectancy and Current health expenditure (% of GDP)",custom_data=[country_col])
# Note - install statsmodels for trendlines
fig.update_traces(
    hovertemplate="<br>".join([
        "Healthy life expectancy: %{y}",
        "Current health expenditure (% of GDP): %{x}",
        "Country Name: %{customdata[0]}",
    ])
)
# Customize axis labels
fig.update_layout(
    xaxis_title="Current health expenditure (% of GDP)",
    yaxis_title="Healthy life expectancy",
    title_x=0.5 # center title
)
fig.show()

As expected, we do see a strong positive correlation between % of GDP spent on Health and Healthy life expectancy levels of countries.

## Healthy Life Expectancy and Current Health Expenditure per capita (PPP) 

This question will help us answer whether proportion of GDP spent on health actually translates into the current health expenditure per capita and in turn influence healthy life expectancy similarly. 

In [21]:
country_col = 'Country Name'
fig = px.scatter(data_frame=combined_indicator_df,y='Healthy life expectancy',x='Current health expenditure per capita, PPP (current international $)',color='color',
                 color_discrete_map={'Top 5 Countries ranked by happiness': 'green', 'Bottom 5 ranked by happiness': 'red','Other Countries':'blue'},
                 title="ScatterPlot: Healthy life expectancy and Current health expenditure per capita, PPP (current international $)",custom_data=[country_col])
# Note - install statsmodels for trendlines
fig.update_traces(
    hovertemplate="<br>".join([
        "Healthy life expectancy: %{y}",
        "Current health expenditure per capita, PPP (current international $): %{x}",
        "Country Name: %{customdata[0]}",
    ])
)
# Customize axis labels
fig.update_layout(
    xaxis_title="Current health expenditure per capita, PPP (current international $)",
    yaxis_title="Healthy life expectancy",
    title_x=0.5 # center title
)
fig.show()

We see that current health expenditure per capita has a concave relationship with healthy life expectancy. It seems to suggest that there are factors beyond mere infrastructure, access and delivery of healthcare services that may be influencing healthy life expectancy. These could be habit factors such as diets, exercise etc.

## Healthy Life Expectancy and Food insecurity

One way of testing the above suggestion is to observe the relationship between prevalence of moderate or severe food insecurity and healthy life expectancy. While this may not proxy habits but certainly proxy access to and affordability of a balanced diet.


In [22]:
country_col = 'Country Name'
fig = px.scatter(data_frame=combined_indicator_df,y='Healthy life expectancy',x='Prevalence of moderate or severe food insecurity in the population (%)',color='color',
                 color_discrete_map={'Top 5 Countries ranked by happiness': 'green', 'Bottom 5 ranked by happiness': 'red','Other Countries':'blue'},
                 title="ScatterPlot: Healthy life expectancy and Prevalence of moderate or severe food insecurity in the population (%)",custom_data=[country_col])
# Note - install statsmodels for trendlines
fig.update_traces(
    hovertemplate="<br>".join([
        "Healthy life expectancy: %{y}",
        "Prevalence of moderate or severe food insecurity in the population (%): %{x}",
        "Country Name: %{customdata[0]}",
    ])
)
# Customize axis labels
fig.update_layout(
    xaxis_title="Prevalence of moderate or severe food insecurity in the population (%)",
    yaxis_title="Healthy life expectancy",
    title_x=0.5 # center title
)
fig.show()

As anticipated, we see a strong negative correlation between the prevalence of food insecurity and healthy life expectancy confirming our hypothesis that a healthy life expectancy is driven by factors beyond healthcare expenditure.