# Happiness, wellbeing and GDP

––––––

This is for a python workshop with economics students to explore the relationship of happiness and income to countries and individuals. The project can look at whether the current debate about considering the a broader measure of economic wellbeing is useful for policy making. 


__Topics covered__ 
*	Income and life satisfaction
*	Coronavirus and economic wellbeing

# Motivation 

In 2019 the New Zealand government published their new 'Wellbeing Budget. This was an attempt to include a broad set of indicators such as mental health, child poverty and domestic violence to have a forward looking approach to the environment, the strength of communities and the performance of the economy. Read the full article [here.](https://theconversation.com/new-zealands-well-being-budget-how-it-hopes-to-improve-peoples-lives-118052)

<img src="NZ budget.png" alt="NZ budget.png" width="400"/>


#### The Stiglitz Commission 
The 2009 Stiglitz Commission stated that “Research has shown that it is possible to collect meaningful and reliable data on subjective as well as objective well-being. Subjective well-being encompasses different aspects (cognitive evaluations of one’s life, happiness, satisfaction, positive emotions such as joy and pride, and negative emotions such as pain and worry): each of them should be measured separately to derive a more comprehensive appreciation of people’s lives... [SWB] should be included in larger-scale surveys undertaken by official statistical offices”.

#### Subjective Wellbeing 
Subjective Wellbeing is measured by simply asking people about their happiness. SWB is beginning to be used to monitor progress and to inform policy in terms of depression rates and in the provision of cognitive behavioural therapy. Policy appraisal using SWB has interested academics for decades and is increasinly concerning policymakers.



# Data sources 
* Happiness report data for 2008 - 2019 and can be downloaded [here:]( https://www.dropbox.com/s/cwe1g35er387955/happiness_with_continent%282019%29.csv?dl=0)
* Happiness report data for 2020, available [here](https://www.dropbox.com/s/4s5kg4qse26y8s8/2020.csv?dl=0)
* ONS, Total population estimates on personal and economic well-being across time [Link](https://www.ons.gov.uk/peoplepopulationandcommunity/wellbeing/datasets/totalpopulationestimatesonpersonalandeconomicwellbeingacrosstime)


To understand the data, you need to read the world happiness report section on data constrution and statistics. 
This will help you to understand how the variables were measured and constructed. 

## Related Literature 

Bellet, c. et al (2019) Does Employee Happiness Have an Impact on Productivity? Available [here.]('http://cep.lse.ac.uk/pubs/download/dp1655.pdf')

Frijters, P and Layard, R (2018) Direct wellbeing measurement and policy appraisal: a discussion paper [here.]('http://cep.lse.ac.uk/textonly/_new/staff/layard/pdf/0461_DirectWellbeingMeasurement.pdf') 

Gallup (2019) World Happiness Report 2019). Available [here.]('https://worldhappiness.report/')

Layard, R (2016) Wellbeing measurement and cost-effectiveness analysis. Available [here.]('http://cep.lse.ac.uk/textonly/_new/staff/layard/pdf/0381-06-07-16.pdf')

__Covid-19 impact__

Brodeur, A. et al. (2020) Covid-19, lockdowns and well-being: evidence from Google trends. 

Clark et al. (2020) When to release the lockdown: A wellbeing framework for analysing costs and benefits. 

# 1. Import packages 
You need to import the following packages: 
* pandas
* numpy
* matplotlib 
* seaborn (for nicer charts) 
* statsmodels (for regression analysis) 

In [6]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [7]:
df1 = pd.read_csv("happiness_with_continent(2019).csv")

df1.head(n=2) #check what the data looks like

Unnamed: 0,Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,...,"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014",Continent
0,Afghanistan,2008,3.72359,7.16869,0.450662,50.799999,0.718114,0.177889,0.881686,0.517637,...,,,,,,,,,,Asia
1,Afghanistan,2009,4.401778,7.33379,0.552308,51.200001,0.678896,0.200178,0.850035,0.583926,...,,0.441906,0.286315,,,,,,,Asia


Students will also need to import another file with the 2020 data, and then merge it together. 

In [8]:
df2 = pd.read_csv("https://www.dropbox.com/s/4s5kg4qse26y8s8/2020.csv?dl=1")

df2.head(n=2)

Unnamed: 0,Country name,Regional indicator,Ladder score,Standard error of ladder score,upperwhisker,lowerwhisker,Logged GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption,Ladder score in Dystopia,Explained by: Log GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption,Dystopia + residual
0,Finland,Western Europe,7.8087,0.031156,7.869766,7.747634,10.639267,0.95433,71.900825,0.949172,-0.059482,0.195445,1.972317,1.28519,1.499526,0.961271,0.662317,0.15967,0.477857,2.762835
1,Denmark,Western Europe,7.6456,0.033492,7.711245,7.579955,10.774001,0.955991,72.402504,0.951444,0.066202,0.168489,1.972317,1.326949,1.503449,0.979333,0.66504,0.242793,0.49526,2.432741


# 2. Inspect the data 
In order to do your analysis you need to get to know the data you're using before doing any analysis. Can you find some interesting stats from here to feedback during their presentations?

* Count the number of unique countries 
* How many years the dataset includes 
* Check for null values
* Happinest country, in the most recent year 
* Unhappiest country, in the most recent year  
* More unequal country 
* Most unequal country 
* Average life ladder score, globally 
* Average life ladder country, by continent 


In [10]:
df1.shape

(1704, 27)

In [11]:
df2.shape

(153, 20)

Clearly the two datasets don't have the same dimensions, so students may have trouble here. Likewise the 'continent' definitions are different, so how would the students overcome this? They should be able to create a 

In [9]:
df1.nunique()

Country name                                                 165
Year                                                          14
Life Ladder                                                 1704
Log GDP per capita                                          1676
Social support                                              1691
Healthy life expectancy at birth                             825
Freedom to make life choices                                1674
Generosity                                                  1622
Perceptions of corruption                                   1608
Positive affect                                             1685
Negative affect                                             1691
Confidence in national government                           1530
Democratic Quality                                          1558
Delivery Quality                                            1559
Standard deviation of ladder by country-year                1704
Standard deviation/Mean o

In [13]:
df2.nunique()

Country name                                  153
Regional indicator                             10
Ladder score                                  153
Standard error of ladder score                153
upperwhisker                                  153
lowerwhisker                                  153
Logged GDP per capita                         152
Social support                                153
Healthy life expectancy                       152
Freedom to make life choices                  153
Generosity                                    153
Perceptions of corruption                     153
Ladder score in Dystopia                        1
Explained by: Log GDP per capita              152
Explained by: Social support                  153
Explained by: Healthy life expectancy         152
Explained by: Freedom to make life choices    153
Explained by: Generosity                      153
Explained by: Perceptions of corruption       153
Dystopia + residual                           153


# 3. Merge the datasets 
You need to merge the 2008-2019 dataframe with the new 2020 dataframe. You now have the information of the dimensions and the names of all the columns (variables). So make sure that for the variables you'd like to concatonate, that the names are the same. 

To concotante dataframes we use: pd.concat()

# 4. Data visualisation

Consider creating the following charts: 
* Bar chart of 5 countries by life ladder (highest, lowest and 3 interesting)
* Line chart for UK and NZ life ladder over time 
* scatter plot for all countries life ladder and GDP per capita. Can you change the colour for continent? 
* Scatter plot for Europe, economic freedom and life ladder 
* Scatter plot with a line of best fit.

Line of best fit: y=mx+b 

This tutorial might be helpful: https://pythonprogramming.net/how-to-program-best-fit-line-machine-learning-tutorial/



# 5. Estimating the impact of happiness 
You could also consider running a regression to estimate the the impact of GDP on happiness. 

In order to this you will need to use the package statsmodels 

Now that you have run this simple regression, there may be other variables that might influence happiness that you may want to control for such as economic freedom.