In [186]:
import pandas as pd
import numpy as np
import altair as alt
import statsmodels.api as sm

## Changes Over Time
_Between 2008 and 2023, how has the global well-being, and overall happiness changed?_

In [187]:
happy = pd.read_csv("./data/whr-2023.csv")
happy.head(15)

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
0,Afghanistan,2008,3.724,7.35,0.451,50.5,0.718,0.168,0.882,0.414,0.258
1,Afghanistan,2009,4.402,7.509,0.552,50.8,0.679,0.191,0.85,0.481,0.237
2,Afghanistan,2010,4.758,7.614,0.539,51.1,0.6,0.121,0.707,0.517,0.275
3,Afghanistan,2011,3.832,7.581,0.521,51.4,0.496,0.164,0.731,0.48,0.267
4,Afghanistan,2012,3.783,7.661,0.521,51.7,0.531,0.238,0.776,0.614,0.268
5,Afghanistan,2013,3.572,7.68,0.484,52.0,0.578,0.063,0.823,0.547,0.273
6,Afghanistan,2014,3.131,7.671,0.526,52.3,0.509,0.106,0.871,0.492,0.375
7,Afghanistan,2015,3.983,7.654,0.529,52.6,0.389,0.082,0.881,0.491,0.339
8,Afghanistan,2016,4.22,7.65,0.559,52.925,0.523,0.044,0.793,0.501,0.348
9,Afghanistan,2017,2.662,7.648,0.491,53.25,0.427,-0.119,0.954,0.435,0.371


In [188]:
#Splitting the dataset into multiple based on the year
years = np.unique(happy['year'])
year_dict = {}
for year in years:
    year_dict[year] = happy.loc[happy['year']==int(year)]
year_dict[2008]

Unnamed: 0,Country name,year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect
0,Afghanistan,2008,3.724,7.350,0.451,50.50,0.718,0.168,0.882,0.414,0.258
45,Argentina,2008,5.961,10.043,0.892,66.06,0.678,-0.135,0.865,0.720,0.318
62,Armenia,2008,4.652,9.230,0.709,64.32,0.462,-0.216,0.876,0.486,0.385
78,Australia,2008,7.254,10.709,0.947,70.04,0.916,0.302,0.431,0.729,0.218
93,Austria,2008,7.181,10.881,0.935,69.70,0.879,0.287,0.614,0.716,0.173
...,...,...,...,...,...,...,...,...,...,...,...
2106,Uzbekistan,2008,5.311,8.402,0.894,61.82,0.831,-0.030,,0.647,0.187
2123,Venezuela,2008,6.258,9.719,0.922,65.38,0.678,-0.230,0.776,0.818,0.224
2140,Vietnam,2008,5.480,8.658,0.805,64.34,0.889,0.182,0.789,0.624,0.218
2169,Zambia,2008,4.730,7.918,0.624,48.08,0.717,0.054,0.890,0.707,0.206


In [189]:
#Function to find the means of a particular shared column across all the newly created datasets.
def mean_of_each_year(dataset, columns):
    means = [dataset[col].mean() for col in columns]
    return means

#Function to gather all the data and make it into a dataframe ()
def create_new_data_frame_means(column):
    year = 2008
    column_to_graph = []
    all_years = []
    while year != 2023:
        column_to_graph.append(mean_of_each_year(year_dict[year], [column]))
        all_years.append(year)
        year+=1
    means_data_frame = pd.DataFrame({"Years": all_years,
           "Means": column_to_graph}).applymap(lambda x: x[0] if isinstance(x, list) else x)
    means_data_frame["Diffs of Means"] = means_data_frame["Means"].diff()
    return means_data_frame


### Change in Life Ladder over the years.

The Life ladder value represents the satisfaction the respondent has with their life on a scale from 1 to 10. This results in it being a major predictor for overall happiness.

In [190]:
#Graphing the values
graph_life_ladder = alt.Chart(create_new_data_frame_means("Life Ladder")).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Means:Q", title = "Mean Satisfaction Score")
).properties(
    title = "Mean Satisfaction Score Across the World over the Years",
)
graph_life_ladder_diffs = alt.Chart(create_new_data_frame_means("Life Ladder")).mark_bar().encode(
    x='Years:O',
    y=alt.Y("Diffs of Means:Q", title = "Differences in Mean Satisfaction Score")
).properties(
    title = "Differences between Mean Satisfaction Score Across the World over the Years",
)
graph_life_ladder | graph_life_ladder_diffs


The mean Life Ladder score has been comfortably between 5 and 6 since 2008. There are some slight fluctuations but the mean never significantly increases or decreases. Something interesting is the fact that the highest mean score came in 2020, despite the Covid-19 epidemic beginning that year around the world. The second graph indicates the differences in mean between each year and the last. It is clear that there is no consistent trend. However, the steepest drop comes in 2021. It is possible that this was the year that people saw their overall satisfaction decrease, at the height of the pandemic. 

### Positive and Negative Affects


The positive affect is a stat that quantifies the average frequency and intensity of positive emotions or experiences reported by the people who took the survey. 

In [191]:
graph_pos = alt.Chart(create_new_data_frame_means("Positive affect")).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Means:Q", title = "Mean Positivity score")
).properties(
    title = "Mean Positivity Score Across the World over the Years",
)
graph_pos_diff = alt.Chart(create_new_data_frame_means("Positive affect")).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Diffs of Means:Q", title = "Differences in Mean Positivity Score")
).properties(
    title = "Differences in Mean Positivity Score Across the World over the Years",
)
graph_pos | graph_pos_diff

The mean positivity score has remained very consistent over the years. Looking at the graph of differences, it is clear that there is no truly consistent trend in any direction. This implies that the average positivity around the world has not changed very much, even if it has changed for different countries. 

The Negative affect statistic is very similar to the Positive affect statistic except it represents the frequency of negative emotions and experiences. Here are the graphs for the Negative affect score:

In [192]:
graph_neg = alt.Chart(create_new_data_frame_means("Negative affect")).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Means:Q", title = "Mean Negativity score")
).properties(
    title = "Mean Negativity Score Across the World over the Years",
)
graph_neg_diff = alt.Chart(create_new_data_frame_means("Negative affect")).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Diffs of Means:Q", title = "Differences in Mean Negativity Score")
).properties(
    title = "Differences in Mean Negativity Score Across the World over the Years",
)
graph_neg | graph_neg_diff

There is a clear increase in negativity since 2008. It has increased for ten out of the past fourteen years. The most significant increase in negativity lines up approximately with the time period in which the lockdown began. Something interesting, however, is the fact that negativity and positivity both increase some years and both decrease some years. Now they are completely independent variables in this dataset, so it makes sense. Potential explanations could be differences in interpretations of the survey on a yearly basis because this is a very subjective score and different people could give a different value for what could be considered an equal feeling. 

### Relationships between Life Ladder and Positive/Negative affect over the years



In [193]:
pos_corrs = []
neg_corrs = []
all_years = [2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 
             2016, 2017, 2018, 2019, 2020, 2021, 2022]
for key, df_year in year_dict.items():
    pos_corrs.append(df_year["Life Ladder"].corr(df_year["Positive affect"]))
    neg_corrs.append(df_year["Life Ladder"].corr(df_year["Negative affect"]))
df_corrs = pd.DataFrame({"Years": all_years[3:],
              "Positive Correlations": pos_corrs[3:],
              "Negative Correlations": neg_corrs[3:]})

graph_pos_corrs = alt.Chart(df_corrs).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Positive Correlations:Q")
).properties(
    title = "Correlation between Positve affect and Life Ladder",
)
graph_neg_corrs = alt.Chart(df_corrs).mark_bar().encode(
    x=alt.X('Years:O'),
    y=alt.Y("Negative Correlations:Q")
).properties(
    title = "Correlation between Negative affect and Life Ladder",
)
graph_pos_corrs | graph_neg_corrs

Life Ladder is perhaps the most telling statistic of an individual's overall happiness. The correlation between positivity and Life Ladder in here is, as expected, always positive. The correlation itself has peaked in recent years, but is consistently around 0.5 or 0.6 which does not indicate a strong positive correlation, but there is a clear one. The correlations between Life Ladder and Negative affect are consistently negative. The correlation has become increasingly stronger for many years but something that stands out is the year 2009. Negativity actually appears to have a positive correlation that year, although it is very weak. This potentially means that negative emotion used to not affect overall happiness, but as time has passed, it has impacted happiness more.

### Conclusion

When looking at the entire world over the last 15 years, the average values for some of the major predictors of happiness such as Positive affect and Life Ladder, have not changed much. However, the overall negativity has increased and has become much more significant in its impact to overall happiness. However, this data is slightly incomplete. Not every country has data from 2008 to 2023 and if the data was complete, it is possible that a very different story could be told. 