# Data Visualisation - Happiness

Conor Fallon and Tassilo Henninger



In [59]:
options(width =200)
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(kableExtra)
require(gridExtra)
library(plotly)
library(scales)
library(GGally)
library(reshape2)
library(car)
library(rgl)
library(ggplot2)
require("glmnet")
library("corrplot")
library(viridis)
library("ggbiplot")
library(ggforce)
library(kohonen)
library("shinyLP")
library(IRdisplay)

cred <- "#b30000"
cgreen <- "#097969"

"Paket 'shinyLP' wurde unter R Version 4.1.3 erstellt"


# Overview

- Opening Questions
- Datasets and pre-processing
- Preliminary analyses
- Specific Factors on Happiness
- Happines over Time


# Datasets and Pre-Processing

## Over Time Dataset
The World Happiness Report is a landmark survey of the state of global happiness.The happiness scores and rankings use data from the Gallup World Poll (GWP). The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale.

Further, the Happiness Report includes additional 6 factors (levels of GDP, life expectancy, generosity, social support, freedom, and corruption) which show the estimated extent to which each of the six factor is estimated to contribute to making life evaluations (happiness score) higher in each country than in Dystopia. The underlying raw datapoints for those estimations are provided by other organisations (e.g. WHO) or from the Gallup World Poll question results. Dystopia in this context, is a hypothetical country with values equal to the world’s lowest national averages for each of the six factors raw values. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom, and least social support, it is referred to as “Dystopia,” in contrast to Utopia.

 happiness score can be calculated by: $$\sum_{i=1}^{6} factorvalue_i + dystopiahappiness + residual $$




In [37]:
load_data_explained_2015 <- function() {
    h_sample_2015 <- read.csv(file = './data/happy/2015.csv')
    kable(h_sample_2015[1:3,], "html") %>% 
        kable_styling("striped") %>% 
        scroll_box(width = "100%") %>%
        as.character() %>%
        display_html()
    }

In [31]:
load_data_explained_2015()

Country,Region,Happiness.Rank,Happiness.Score,Standard.Error,Economy..GDP.per.Capita.,Family,Health..Life.Expectancy.,Freedom,Trust..Government.Corruption.,Generosity,Dystopia.Residual
Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2.70201
Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204


In [32]:
load_data_explained_preprocessed <- function() {
    h_sample_2015 <- read.csv(file = './data/preprocessed_data_happy_incl_region_no_nan.csv', sep=";" )
    kable(h_sample_2015[1:3,2:12], "html") %>% 
        kable_styling("striped") %>% 
        scroll_box(width = "100%") %>%
        as.character() %>%
        display_html()
    }

In [33]:
load_data_explained_preprocessed()
#ranging from 2015 to 2022

Country,Happiness.Rank,Happiness,Economy,Family,Health,Freedom,Trust,Generosity,Year,Region
Switzerland,1,7.587,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2015,Western Europe
Iceland,2,7.561,1.30232,1.40223,0.94784,0.62877,0.14145,0.4363,2015,Western Europe
Denmark,3,7.527,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2015,Western Europe


## Influential Factors Dataset

additional version of the happiness dataset, which includes the actual raw values and which we can therefore use for analysing the variable importance and use in data dimension reduction steps.

* [smoking dataset](https://ourworldindata.org/smoking) 
* [alcohol dataset](https://www.kaggle.com/datasets/pralabhpoudel/alcohol-consumption-by-country?resource=download)
* [internet dataset](https://data.worldbank.org/indicator/IT.NET.USER.ZS)

* Country
* Year
* Happiness: happiness score
* Economy: Log GDP per capita
* Social: (support) national average of the binary responses (either 0 or 1) to the GWP question
* Health: Healthy life expectancy at birth from WHO
* Freedom: Freedom to make life choices, national average of responses to the GWP question
* Generosity: residual of regressing national average of response to the GWP question
* Corruption: national average of the survey responses to two questions in the GWP, (either 0 or 1)
* Positive: (affect) defined as the average of three positive affect measures in GWP: happiness, laugh and enjoyment
* Negative: (affect)  defined as the average of three negative affect measures in GWP:  they are worry, sadness and anger
* Government: Confidence in national government
* Code: Country code
* Alcohol: Total alcohol consumption per capita (liters of pure alcohol, projected estimates, 15+ years of age)
* Population: Population (historical estimates)
* Tobacco: Prevalence of current tobacco use (% of adults)
* Internet: Individuals using the Internet (% of population)

![missing values full data](./figs/full_data_missing_values.png)

![missing values 2018](./figs/2018_missing_values.png)

In [35]:
load_data_raw_preprocessed <- function() {
    data_2018 <- read.csv(file = './data/preprocessed_raw_2018_no_nan.csv')
    colnames(data_2018)[6] <- "Social"
    colnames(data_2018)[11] <- "Positive"
    colnames(data_2018)[12] <- "Negative"
    #data_2018$Region <- as.factor(data_2018$Region)
    kable(data_2018[1:3,1:ncol(data_2018)], "html") %>% 
        kable_styling("striped") %>% 
        scroll_box(width = "100%") %>%
        as.character() %>%
        display_html()
    }

In [36]:
load_data_raw_preprocessed()

Country,Region,Year,Happiness,Economy,Social,Health,Freedom,Generosity,Corruption,Positive,Negative,Government,Code,Alcohol,Population,Tobacco,Internet
Albania,Central and Eastern Europe,2018,5.004403,9.412399,0.6835917,68.7,0.8242123,0.005385,0.8991294,0.7132996,0.3189967,0.435338,ALB,7.17,2882735,29.2,65.4
Argentina,Latin America and Caribbean,2018,5.792797,9.809972,0.8999116,68.8,0.8458947,-0.2069366,0.8552552,0.8203097,0.3205021,0.2613523,ARG,9.65,44361150,21.8,77.7
Armenia,Commonwealth of Independent States,2018,5.062449,9.119424,0.814449,66.9,0.8076437,-0.1491087,0.6768264,0.5814877,0.4548403,0.6708276,ARM,5.55,2951741,26.7,68.24505


In [39]:
data_2018 <- read.csv(file = './data/preprocessed_raw_2018_no_nan.csv')
colnames(data_2018)[6] <- "Social"
colnames(data_2018)[11] <- "Positive"
colnames(data_2018)[12] <- "Negative"
correlation_categories <- c("Happiness","Economy","Social","Health","Freedom","Corruption","Generosity","Positive","Negative","Government","Alcohol","Population","Tobacco","Internet")
not_scaled_data_factors <- data.frame(data_2018[,correlation_categories])
scaled_data_factors <- data.frame(scale(data_2018[,correlation_categories]))

![Correlation Matrix](./figs/correlation_matrix.svg)

![VIF Full Model](./figs/VIF_full_model.svg)

![VIF Small Model](./figs/VIF_small_model.svg)

![Significant Factors](./figs/significant_factors.svg)

![PCA](./figs/pcaplot.svg)

![Biplot](./figs/biplot.svg)

![SOM](./figs/SOM_svg.svg)

In [63]:
iframe(url_link='./figs/happiness_region.html', height = 900, width = 1600)

In [64]:
iframe(url_link='./figs/map.html', height = 900, width = 1600)

In [65]:
iframe(url_link='./figs/happy.gif', height = 900, width = 1600)

In [5]:
iframe(url_link='./figs/tobacco.html', height = 900, width = 1600)

# Future Work

- Further research into tobacco consumption and relation to 'happiness'
