# Solution: Exploring Data

Hi,

The marketing team would like to share out the five happiest countries of the 2010s on social media.
I’ve attached a notebook that another data scientist started with happiness data inside. I would recommend:
* Creating a list of each country’s highest happiness score, and then sorting it from happiest to least happy country
* Creating a list of each country’s average happiness score, and then sorting it from happiest to least happy country

Are there any differences between the two lists?

Thanks!\
Anna

## Read in the data

In [1]:
# import the data 

import pandas as pd

df = pd.read_csv('../Data/happiness_survey_data.csv')
df.head()

Unnamed: 0,country_name,year,happiness_score,social_support,freedom_to_make_life_choices,healthy_life_expectancy_at_birth
0,Afghanistan,2008,3.72359,0.450662,0.718114,50.5
1,Afghanistan,2009,4.401778,0.552308,0.678896,50.799999
2,Afghanistan,2010,4.758381,0.539075,0.600127,51.099998
3,Afghanistan,2011,3.831719,0.521104,0.495901,51.400002
4,Afghanistan,2012,3.782938,0.520637,0.530935,51.700001


In [2]:
# there are 2089 rows and 6 columns of data
df.shape

(2089, 6)

In [3]:
# the years range from 2005 - 2021
# happiness scores range from 2 - 8
df.describe()

Unnamed: 0,year,happiness_score,social_support,freedom_to_make_life_choices,healthy_life_expectancy_at_birth
count,2089.0,2089.0,2076.0,2057.0,2031.0
mean,2013.727621,5.473747,0.811542,0.745462,63.180326
std,4.455614,1.115567,0.118935,0.140751,6.948546
min,2005.0,2.178809,0.290184,0.257534,6.72
25%,2010.0,4.651972,0.747664,0.651689,58.965
50%,2014.0,5.405246,0.83477,0.767357,64.980003
75%,2017.0,6.294282,0.904682,0.857677,68.362499
max,2021.0,8.018934,0.987343,0.985178,74.349998


## Explore the data

In [4]:
# Filter out any data before 2010 and after 2019
filtered_df = df[(df.year >= 2010) & (df.year < 2020)]
filtered_df

Unnamed: 0,country_name,year,happiness_score,social_support,freedom_to_make_life_choices,healthy_life_expectancy_at_birth
2,Afghanistan,2010,4.758381,0.539075,0.600127,51.099998
3,Afghanistan,2011,3.831719,0.521104,0.495901,51.400002
4,Afghanistan,2012,3.782938,0.520637,0.530935,51.700001
5,Afghanistan,2013,3.572100,0.483552,0.577955,52.000000
6,Afghanistan,2014,3.130896,0.525568,0.508514,52.299999
...,...,...,...,...,...,...
2082,Zimbabwe,2015,3.703191,0.735800,0.667193,51.200001
2083,Zimbabwe,2016,3.735400,0.768425,0.732971,51.674999
2084,Zimbabwe,2017,3.638300,0.754147,0.752826,52.150002
2085,Zimbabwe,2018,3.616480,0.775388,0.762675,52.625000


In [5]:
# Group the data by country and calculate the maximum happiness score for each one
filtered_df.groupby('country_name')['happiness_score'].max()

country_name
Afghanistan    4.758381
Albania        5.867422
Algeria        6.354898
Angola         5.589001
Argentina      6.775805
                 ...   
Venezuela      7.478455
Vietnam        5.767344
Yemen          4.350313
Zambia         5.243996
Zimbabwe       4.955101
Name: happiness_score, Length: 164, dtype: float64

In [6]:
# Sort the grouped countries by happiness score and return the top five
filtered_df.groupby('country_name')['happiness_score'].max().sort_values(ascending=False).head()

country_name
Finland        7.858107
Denmark        7.788232
Switzerland    7.776209
Norway         7.678277
Canada         7.650346
Name: happiness_score, dtype: float64

In [7]:
# Group the data by country and calculate the average happiness score for each one
filtered_df.groupby('country_name')['happiness_score'].mean()

country_name
Afghanistan    3.501017
Albania        4.976791
Algeria        5.389717
Angola         4.420299
Argentina      6.398067
                 ...   
Venezuela      5.858085
Vietnam        5.278218
Yemen          3.765910
Zambia         4.452677
Zimbabwe       4.074384
Name: happiness_score, Length: 164, dtype: float64

In [8]:
# Sort the grouped countries by happiness score and return the top five
filtered_df.groupby('country_name')['happiness_score'].mean().sort_values(ascending=False).head()

country_name
Denmark        7.618252
Switzerland    7.568010
Finland        7.553138
Norway         7.541094
Iceland        7.518146
Name: happiness_score, dtype: float64

## Compare the two lists

### Happiest countries for social media
    
**Countries with the highest happiness scores**
1. Finland: 7.858107
2. Denmark: 7.788232
3. Switzerland: 7.776209
4. Norway: 7.678277
5. Canada: 7.650346

**Countries with the highest average happiness scores**
1. Denmark: 7.618252
2. Switzerland: 7.568010
3. Finland: 7.553138
4. Norway: 7.541094
5. Iceland: 7.518146

In [9]:
# Look at top few countries in more detail
filtered_df[filtered_df.country_name.isin(['Finland', 'Denmark', 'Switzerland'])]

Unnamed: 0,country_name,year,happiness_score,social_support,freedom_to_make_life_choices,healthy_life_expectancy_at_birth
481,Denmark,2010,7.770515,0.974977,0.943631,69.400002
482,Denmark,2011,7.788232,0.961736,0.93476,69.620003
483,Denmark,2012,7.519909,0.951437,0.932628,69.839996
484,Denmark,2013,7.588607,0.964708,0.920255,70.059998
485,Denmark,2014,7.507559,0.956344,0.941572,70.279999
486,Denmark,2015,7.514425,0.959701,0.941436,70.5
487,Denmark,2016,7.557783,0.954452,0.948231,70.625
488,Denmark,2017,7.593702,0.9521,0.955416,70.75
489,Denmark,2018,7.648786,0.958219,0.935438,70.875
490,Denmark,2019,7.693003,0.957706,0.963318,71.0
