# The correlation between IQ score and happiness
Name: Demi Kampherbeek
Subject: Programming 1 (final assignment)

For this assignment, the correlation between IQ score and happiness was researched. Both dataframes used come from Kaggle.com. The dataframes used are IQ score per country, and a Happiness score per country. 

Happiness score dataframe link: https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021 

IQ score dataframe link: https://www.kaggle.com/code/idakarimi/average-iq-by-country/data?select=IQ.csv



## Import libraries that are being used

In [1]:
import pandas as pd
import numpy as np
import yaml
from bokeh.io import push_notebook, show, output_notebook
from bokeh.layouts import row 
from bokeh.plotting import figure
from bokeh.models import HoverTool

## Load both dataframes in


In [2]:
with open('D:/Downloads/Programming1_final_assignment/Config_file_final_assignment.yml') as config:
    data = yaml.safe_load(config)
    
IQ_score = pd.read_csv(data['IQ_score']) 
Happiness_score = pd.read_csv(data['Happiness_score'])

## Get matching countries from both dataframes

This code looks at country names that do not appear in IQ score dataframe that are in the happiness score dataframe and vice versa. After scanning these results, differently written countries were renamed in the happiness score dataframe to match the IQ score dataframe. After changing these names, countries were removed from the happiness score dataframe that do not appear in the IQ score dataframe and also vice versa. 

In [3]:
Difference_happiness = [x for x in list(Happiness_score['Country name'].unique()) if x not in list(IQ_score['country'].unique())]
list(Difference_happiness)
Difference_IQ = [x for x in list(IQ_score['country'].unique()) if x not in list(Happiness_score['Country name'].unique())]
list(Difference_IQ) 

Happiness_score['Country name'].replace({'Taiwan Province of China':'Taiwan','Hong Kong S.A.R. of China' :'Hong Kong', 'Congo (Brazzaville)' : 'DR Congo', 'Palestinian Territories' : 'Palestine'}, inplace = True)

Happiness_score = Happiness_score[Happiness_score['Country name'].isin(IQ_score['country'])] 
IQ_score = IQ_score[IQ_score['country'].isin(Happiness_score['Country name'])] 


## Merge both dataframes 
First, the 'country' column name is renamed in the IQ score dataframe to make merging easier. Afterwards, both dataframes were merged and a dataframe was made of country name with iq score and happiness score. The column names were renamed (whitespaces removed) for the interactive bokeh plot.

In [4]:
IQ_score = IQ_score.rename(columns={'country':'Country name'}) 
Happiness_and_IQ_score = pd.merge(IQ_score,Happiness_score,left_on='Country name',right_on='Country name') 
Happiness_and_IQ_score = Happiness_and_IQ_score[['Country name', 'iq', 'Ladder score']] 
Happiness_and_IQ_score = Happiness_and_IQ_score.rename(columns={'Country name':'Country_name','iq':'IQ_score','Ladder score':'Happiness_score'})
Happiness_and_IQ_score.head()

Unnamed: 0,Country_name,IQ_score,Happiness_score
0,Japan,106.48,5.94
1,Taiwan,106.47,6.584
2,Singapore,105.89,6.377
3,Hong Kong,105.37,5.477
4,China,104.1,5.339


## Plotting graph
A plot was graphed with bokeh. In this plot a scatter plot can be seen in which over the points can be hovered to see the corresponding country and exact IQ- and happiness score of that country. A regression line can also be seen that was calculated. 

In [5]:
par = np.polyfit(Happiness_and_IQ_score['IQ_score'], Happiness_and_IQ_score['Happiness_score'], 1, full=True)
slope=par[0][0]
intercept=par[0][1]
y_predicted = [slope*i + intercept  for i in Happiness_and_IQ_score['IQ_score']]

tools = 'crosshair,box_zoom,undo,reset,wheel_zoom,tap'
p = figure(plot_width=900, plot_height=550, title = 'IQ score versus happiness score per country', tools=tools)
p.scatter(x='IQ_score',
          y='Happiness_score',
          source=Happiness_and_IQ_score,
          fill_alpha=0.5, 
          color='deeppink',
          size=8)

p.line(Happiness_and_IQ_score['IQ_score'],y_predicted,color='black',legend='y='+str(round(slope,2))+'x+'+str(round(intercept,2)))
p.legend.location = 'top_left'

p.xaxis.axis_label = 'IQ score'
p.yaxis.axis_label = 'Happiness score'
hover = HoverTool(tooltips=[('Country','@Country_name'),
                           ('IQ_score','@IQ_score'),
                           ('Happiness_score','@Happiness_score')])
p.add_tools(hover)
show(p)



## Pearson’s correlation coefficient
A Pearson's correlation coeffiecient was calculated to measure the statistical relationship between IQ- and happiness score. 


In [6]:
corr = Happiness_and_IQ_score['IQ_score'].corr(Happiness_and_IQ_score['Happiness_score'], method='pearson')
R_value = str(round(corr,3))
print('R:',R_value)

R: 0.582


## Conclusion
From the interactive bokeh plot that has an ascending regression line and the Pearson correlation coefficient of 0.582, it can be concluded that there is a moderately correlated relation between IQ score and happiness score. This means IQ score and happiness score are to some degree associated with eachother. Because a lot of factors that contribute to happiness score and IQ score, a higher IQ score is not the sole component that causes happiness. For a next study, other contributing factors to the happiness score like local weather, work quality, gdp etc. can be researched together with happiness score. 