# In Search Of Happiness: What If Health?

Let's continue our happiness challenge! Previously, we found out the happiest places on the planet. You can read about this and much more [here](https://nbviewer.jupyter.org/github/chupstee/data.sugar/blob/master/00002_world_happiness/world_happiness.map.ipynb).  
And now we would like to know whether the level of happiness is associated with health.  
We'll take [the same report from Kaggle](https://www.kaggle.com/unsdsn/world-happiness?select=2019.csv), which ranks 156 countries by their level of happiness on a 10-point scale.


## The World Happiness Report

Recall quoting Kaggle:

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

You can read more [here](https://www.kaggle.com/unsdsn/world-happiness).

We are most interested in the following columns:

- `Country or region` - country name
- `Overall rank` - country's place in the rating
- `Score` - happiness score


## Introduction

So, what is happiness? Or, better yet, are there any patterns in people's happiness?

[The World Factbook](https://www.cia.gov/library/publications/the-world-factbook) by CIA provides information on the history, people and society, government, economy, energy, geography, communications, transportation, military, and transnational issues for 267 world entities.  

- `Life expectancy at birth` - the average number of years to be lived by a group of people born in the same year, if mortality at each age remains constant in the future. Life expectancy at birth is also a measure of overall quality of life in a country and summarizes the mortality at all ages.
- `Median age` - This entry is the age that divides a population into two numerically equal groups; that is, half the people are younger than this age and half are older. It is a single index that summarizes the age distribution of a population. Currently, the median age ranges from a low of about 15 in Niger and Uganda to 40 or more in several European countries and Japan. See the entry for "Age structure" for the importance of a young versus an older age structure and, by implication, a low versus a higher median age.
- `Population growth rate` - the average annual percent change in populations, resulting from a surplus (or deficit) of births over deaths and the balance of migrants entering and leaving a country. The rate may be positive or negative.
- `Death rate` - the average annual number of deaths during a year per 1,000 population at midyear; also known as crude death rate.
- `Birth rate` - the average annual number of births during a year per 1,000 persons in the population at midyear; also known as crude birth rate.

We will compare the happiness scores with the CIA rates for 2018.  

In the search for an answer to the question ***what*** *is happiness?*, let's try to establish *the relationship between happiness and age indicators*.


## Reading The Data

In [1]:
# Import libs
from glob import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Happiness Reports

As we mentioned above, we'll read the happiness report, published in 2018.  
We'll also rename the columns according to the snake_case format.

In [2]:
# Set the width to display
pd.set_option('display.width', 120)
# Increase the number of rows to display
pd.set_option('display.max_rows', 300) 

# Get the data
happiness = pd.read_csv('data/happiness_2018.csv')

# Column map to rename
cols_dict = {'Country':'country',
             'Country or region':'country',
             'Region':'region',
             'Happiness Rank':'rank',
             'Happiness.Rank':'rank',
             'Overall rank':'rank',
             'Happiness Score':'score',
             'Happiness.Score':'score',
             'Score':'score',
             'Economy (GDP per Capita)':'gdp_per_capita',
             'Economy..GDP.per.Capita.':'gdp_per_capita',
             'GDP per capita':'gdp_per_capita',
             'Family':'family',
             'Freedom':'freedom',
             'Freedom to make life choices':'freedom',
             'Generosity':'generosity',
             'Health (Life Expectancy)':'life_expectancy',
             'Health..Life.Expectancy.':'life_expectancy',
             'Healthy life expectancy':'life_expectancy',
             'Perceptions of corruption':'trust_corruption',
             'Trust (Government Corruption)':'trust_corruption',
             'Trust..Government.Corruption.':'trust_corruption',
             'Social support':'social_support',
             'Dystopia Residual':'dystopia_residual',
             'Dystopia.Residual':'dystopia_residual',
             'Standard Error':'standard_error',
             'Upper Confidence Interval':'whisker_high',
             'Whisker.high':'whisker_high',
             'Lower Confidence Interval':'whisker_low',
             'Whisker.low':'whisker_low'
            }

# Rename the columns
happiness.rename(columns=cols_dict, inplace=True)

print(happiness.columns) # check the new column names
happiness.head() # check the values

Index(['rank', 'country', 'score', 'gdp_per_capita', 'social_support', 'life_expectancy', 'freedom', 'generosity',
       'trust_corruption'],
      dtype='object')


Unnamed: 0,rank,country,score,gdp_per_capita,social_support,life_expectancy,freedom,generosity,trust_corruption
0,1,Finland,7.632,1.305,1.592,0.874,0.681,0.202,0.393
1,2,Norway,7.594,1.456,1.582,0.861,0.686,0.286,0.34
2,3,Denmark,7.555,1.351,1.59,0.868,0.683,0.284,0.408
3,4,Iceland,7.495,1.343,1.644,0.914,0.677,0.353,0.138
4,5,Switzerland,7.487,1.42,1.549,0.927,0.66,0.256,0.357


In [3]:
happiness.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156 entries, 0 to 155
Data columns (total 9 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   rank              156 non-null    int64  
 1   country           156 non-null    object 
 2   score             156 non-null    float64
 3   gdp_per_capita    156 non-null    float64
 4   social_support    156 non-null    float64
 5   life_expectancy   156 non-null    float64
 6   freedom           156 non-null    float64
 7   generosity        156 non-null    float64
 8   trust_corruption  155 non-null    float64
dtypes: float64(7), int64(1), object(1)
memory usage: 11.1+ KB


We see 157 countries in the report of 2016. There are no missing values for the `country`, `rank`, `score` columns.

Let's check for duplicates.

In [4]:
# Duplicated
print('Duplicated: {}'.format(happiness.duplicated(subset='country').sum()))

Duplicated: 0


It's OK. Let's get the CIA data.

### CIA Reports

We have downloaded The World Factbook archive for different years and saved the data that was collected in 2016.

In [5]:
cia_files = glob('data/cia.age.*.2018.txt')
cia = pd.DataFrame()

for file in cia_files:
    c = pd.read_csv(file,
                    engine='python', sep=r'\s{3,}', header=None,
                    names=['country_cia', file.split('.')[2], 'data_year'],
                    squeeze=False, skiprows=1, index_col=0,
                    thousands=',', dtype={file.split('.')[2]:'float64'}
                   )[['country_cia', file.split('.')[2]]] # read the file
    if cia.size == 0:
        cia = cia.append(c)
        print('Initialize: {}'.format(cia.shape[0])) # for the first file
    else:
        cia = cia.merge(c, on='country_cia', how='outer')
        print('Merge {}: {}'.format(file.split('.')[2], cia.shape[0]))

cia.reset_index()

cia.info()
cia

Initialize: 226
Merge death: 226
Merge life_expectancy_at_birth: 226
Merge median_age: 229
Merge population_growth: 237
<class 'pandas.core.frame.DataFrame'>
Int64Index: 237 entries, 0 to 236
Data columns (total 6 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   country_cia               237 non-null    object 
 1   birth                     226 non-null    float64
 2   death                     226 non-null    float64
 3   life_expectancy_at_birth  223 non-null    float64
 4   median_age                228 non-null    float64
 5   population_growth         234 non-null    float64
dtypes: float64(5), object(1)
memory usage: 13.0+ KB


Unnamed: 0,country_cia,birth,death,life_expectancy_at_birth,median_age,population_growth
0,Angola,43.7,9.0,60.6,15.9,3.49
1,Niger,43.6,11.5,56.3,15.5,3.16
2,Mali,43.2,9.6,60.8,15.8,2.98
3,Chad,43.0,10.5,57.5,15.8,3.23
4,Uganda,42.4,9.9,56.3,15.9,3.18
5,Zambia,41.1,12.0,53.0,16.8,2.91
6,Burundi,40.9,8.6,61.4,17.1,3.23
7,Malawi,40.7,7.7,62.2,16.6,3.31
8,Somalia,39.3,12.8,53.2,18.2,2.08
9,Liberia,37.9,7.4,63.8,17.8,2.59
