In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as st
import gmaps
import requests
import json
from config import gkey

In [None]:
happiness_2021 = pd.read_csv('cleaned_happiness_2021.csv')

# Data Overview

## World Happiness Report

The World Happiness Report is put together annually by the Sustainable Development Solutions Network, a non-profit created by the United Nations. It consists of a survey in which particpants are asked to rank themselves on the Cantril Ladder Scale as well as on other variables about their lives. This gives a glimpse into the satisfaction different nations have with their well being and quality of life as well as factors that likely affect that.

### The Cantril Ladder Scale

Developed by pioneering social researcher Dr. Hadley Cantril, consists of the following:

>Please imagine a ladder with steps numbered from zero at the bottom to 10 at the top.  
The top of the ladder represents the best possible life for you and the bottom of the ladder represents the worst possible life for you.  
On which step of the ladder would you say you personally feel you stand at this time?

from: [gallup.com](https://news.gallup.com/poll/122453/understanding-gallup-uses-cantril-scale.aspx)

## Hypothesis

We are looking to answer whether or not Gross Domestic Product (GDP) per capita, social support, healthy life expectancy, or freedom to make life choices affects happiness scores. 

## Summary of Analysis

We found that all four variables inspected have statistically significant affects on happiness score, all in a positive direction. 

In [None]:
corr = happiness_2021[["Happiness score","GDP per capita","Social support",'healthy life expectancy',"Freedom to make life choices"]].corr()
corr = corr.style.background_gradient(cmap='Purples')
corr

## Freedom

## GDP

## Life Expectancy

## Social Support

Per the World Happiness Report: 
>Social support (or having someone to count on in times of trouble) is the national average of the binary responses (either 0 or 1) to the GWP question “If you were in trouble, do you have relatives or friends you can count on to help you whenever you need them, or not?”

### Data Range
The lowest social support value is 0.44 (Benin), meaning 44% of respondents answered yes to the above question. The next lowest value is 0.50 (Morocco), meaning in all but one of the study countries, at least 50% of respondents felt like they had someone to count on in times of trouble. The mean is 0.81 and the highest is 0.98 (Iceland). 

In [None]:
happiness_2021['Social support'].describe()

In [None]:
plt.hist(happiness_2021['Social support'])
plt.xlabel("Social Support")
plt.title("Distribution of social support scores")
plt.show()

### Hypothesis Testing and Regression
Regression analysis shows a high correlation between social support and happiness (Pearson's r value of 0.81). This correlation is statistically significant, given the p-value calculated in the null hypothesis test is well below 0.05.

In [None]:
x_values = happiness_2021['Social support']
y_values = happiness_2021['Happiness score']
(slope, intercept, rvalue, pvalue, stderr) = st.linregress(x_values, y_values)
regress_values = x_values * slope + intercept
line_eq = "y = " + str(round(slope,2)) + "x + " + str(round(intercept,2))
plt.scatter(x_values,y_values)
plt.plot(x_values,regress_values,"r-")
plt.annotate(line_eq,(0.7,2.4),fontsize=15,color="red")
plt.xlabel("Social Support")
plt.ylabel("Happiness Score")
plt.title("Social Support\nvs\nTotal Happiness Score")
plt.show()
pr = round(st.pearsonr(x_values,y_values)[0],2)
print(f'The correlation between social support and happiness is {pr}, suggesting a strong link between the two factors.')

In [None]:
# sort dataset and see how many values need to be in each group
h0_test = happiness_2021[["Country","Happiness score","Social support"]]
h0_test = h0_test.sort_values("Social support")

bottom_third = h0_test.iloc[0:37,1]
middle_third = h0_test.iloc[37:74,1]
top_third = h0_test.iloc[74:110,1]

stat,p = st.kruskal(top_third,middle_third,bottom_third)
print(f'The p-value is {p}, which rejects the null hypothesis.')

### Geographic Distribution of Social Support
Countries with the highest social support values are concentrated in Europe (particularly Scandanavia and Eastern Europe) as well as Central Asia. The countries with the lowest social support values are concentrated in Northern and Sub-Saharan Africa, the Middle East, South and South-East Asia. In the Americas, most countries fall into the middle tier of social support, with 6 landing in the top third. Mexico is the notable exception as the one country in the bottom third of nations. 

In [None]:
gmaps.configure(api_key = gkey)
fig = gmaps.figure()

social_sorted = happiness_2021.sort_values('Social support')
top_locations = social_sorted.iloc[74:110,[6,7]]
top_social = social_sorted.iloc[74:110,3]
middle_locations = social_sorted.iloc[37:74,[6,7]]
middle_social = social_sorted.iloc[37:74,3]
bottom_locations = social_sorted.iloc[0:37,[6,7]]
bottom_social = social_sorted.iloc[0:37,3]
                             
fig = gmaps.figure()

symbols_top = gmaps.symbol_layer(top_locations, fill_color='#028833', stroke_color='#028833')
fig.add_layer(symbols_top)

symbols_middle = gmaps.symbol_layer(middle_locations, fill_color='blue', stroke_color='blue')
fig.add_layer(symbols_middle)

symbols_bottom = gmaps.symbol_layer(bottom_locations, fill_color='#E65300', stroke_color='#E65300')
fig.add_layer(symbols_bottom)

fig

In the above map, green dots represent the countries in the top third of social support values. Blue represents the middle third and orange represents the bottom third.

### We all need somebody to lean on
The results of comparing amount of social support to happiness in the study countries strongly suggests that social support positively influences happiness. Humans are social creatures, so it makes sense that having connections to others would increase happiness. 

# Conclusion
include what we might do differently next time