<H1>Influence of Income and Education on Happiness-TWO WAY ANOVA approach</H1>

The Country-Income-and-Education-Level data contain classification of countries by income and education level.

In this project a small study to see the influence of those variables on happiness.


<H2>We will perform:<p><p>

1-Data import, cleaning and merging

2-TWO WAY ANOVA testing for influence with pivot table

3-Interested map showing happiness across countries</H2>



## 1-Data and libraries import, cleaning and merging

In [31]:

import numpy as np 
import pandas as pd 
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))


/kaggle/input/countries-education-level-and-income/Country-Income-and-Education-Level.csv
/kaggle/input/world-happiness-report/2015.csv
/kaggle/input/world-happiness-report/2021.csv
/kaggle/input/world-happiness-report/2017.csv
/kaggle/input/world-happiness-report/2019.csv
/kaggle/input/world-happiness-report/2020.csv
/kaggle/input/world-happiness-report/2018.csv
/kaggle/input/world-happiness-report/2022.csv
/kaggle/input/world-happiness-report/2016.csv


In [32]:
X = pd.read_csv('/kaggle/input/countries-education-level-and-income/Country-Income-and-Education-Level.csv')
X

Unnamed: 0,Country,Education Index,Education Level,Income
0,Argentina,0.816,Very High Education Level,High income
1,Australia,0.929,Very High Education Level,High income
2,Austria,0.852,Very High Education Level,High income
3,Bahamas,0.726,High to Moderate Education Level,High income
4,Bahrain,0.758,High to Moderate Education Level,High income
...,...,...,...,...
175,Syrian Arab Republic,0.416,Low to Moderate Education Level,Low income
176,Tajikistan,0.682,Low to Moderate Education Level,Lower middle income
177,Togo,0.517,Low to Moderate Education Level,Low income
178,Uganda,0.523,Low to Moderate Education Level,Low income


In [33]:
Y = pd.read_csv('/kaggle/input/world-happiness-report/2022.csv')
Y.head()

Unnamed: 0,RANK,Country,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,1,Finland,7821,7886,7756,2518,1892,1258,775,736,109,534
1,2,Denmark,7636,7710,7563,2226,1953,1243,777,719,188,532
2,3,Iceland,7557,7651,7464,2320,1936,1320,803,718,270,191
3,4,Switzerland,7512,7586,7437,2153,2026,1226,822,677,147,461
4,5,Netherlands,7415,7471,7359,2137,1945,1206,787,651,271,419


In [47]:
merged_df = X.merge(Y, on='Country', how='left')
merged_df.head()

Unnamed: 0,Country,Education Index,Education Level,Income,RANK,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,Argentina,0.816,Very High Education Level,High income,57.0,5967.0,6090.0,5844.0,1891.0,1592.0,1102.0,662.0,555.0,81.0,85.0
1,Australia,0.929,Very High Education Level,High income,12.0,7162.0,7244.0,7081.0,2011.0,1900.0,1203.0,772.0,676.0,258.0,341.0
2,Austria,0.852,Very High Education Level,High income,11.0,7163.0,7237.0,7089.0,2148.0,1931.0,1165.0,774.0,623.0,193.0,329.0
3,Bahamas,0.726,High to Moderate Education Level,High income,,,,,,,,,,,
4,Bahrain,0.758,High to Moderate Education Level,High income,21.0,6647.0,6779.0,6514.0,2092.0,1854.0,1029.0,625.0,693.0,199.0,155.0


In [81]:
Z = merged_df.copy()


In [82]:
Z.dropna(inplace = True)


In [83]:
new_column_names = {'Happiness score': 'Happiness_score','Education Level':'Education_Level'}
Z.rename(columns=new_column_names, inplace=True)
columns_to_keep = ['Happiness_score', 'Income','Education_Level','Country']

# Drop all columns except the ones to keep
columns_to_drop = [col for col in Z.columns if col not in columns_to_keep]
Z.drop(columns=columns_to_drop, inplace=True)
# Convert 'Happiness_score' to float by removing commas
Z['Happiness_score'] = Z['Happiness_score'].str.replace(',', '').astype(float)
Z['Happiness_score'] = Z['Happiness_score']




In [89]:
Z['Happiness_score'] = Z['Happiness_score']/1000
Z.head()

Unnamed: 0,Country,Education_Level,Income,Happiness_score
0,Argentina,Very High Education Level,High income,5.967
1,Australia,Very High Education Level,High income,7.162
2,Austria,Very High Education Level,High income,7.163
4,Bahrain,High to Moderate Education Level,High income,6.647
6,Belgium,Very High Education Level,High income,6.805


## 2-TWO WAY ANOVA testing for influence with pivot table

In [90]:
pivot_table = Z.pivot_table(index='Income', columns='Education_Level', values='Happiness_score', aggfunc='mean')

pivot_table.fillna(0)

Education_Level,High to Moderate Education Level,Low to Moderate Education Level,Very High Education Level,Very Low Education Level
Income,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
High income,6.392125,0.0,6.787303,0.0
Low income,0.0,4.063286,0.0,4.729167
Lower middle income,5.227333,4.827706,0.0,0.0
Upper middle income,5.518542,4.7,5.53125,0.0


In [86]:
from scipy import stats
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

formula = 'Happiness_score ~ Income + Education_Level + Income:Education_Level'
model = ols(formula, Z).fit()
anova_table = anova_lm(model,type = 2)
anova_table


Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
Income,3.0,80432710.0,26810900.0,58.271881,5.993446e-22
Education_Level,3.0,4099819.0,1366606.0,2.970237,0.03541093
Income:Education_Level,9.0,2741037.0,304559.6,0.661942,0.7412922
Residual,101.0,46470120.0,460100.2,,


#### For 5% level of significance (alpha = 0.05) we are confident (p-value < alpha) that Income and Education have influence on happiness but no influence for interactions



## 3-Interested map showing happiness across countries

In [88]:
import plotly.express as px



df = Z.copy()

# Create a choropleth map
fig = px.choropleth(
    df,
    locations='Country',
    locationmode='country names',
    color='Happiness_score',
    hover_name='Country',
    color_continuous_scale='Viridis',  # Choose a color scale
    projection='natural earth'
)

# Show the plot
fig.show()


#### this concludes our project
