<a href="https://colab.research.google.com/github/TsamayaDesigns/codeDivision-data-with-python/blob/main/How_happy_is_the_world.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How happy is the world?

![HappinessImage-Benjamin Scott](https://drive.google.com/uc?id=1wWdXTclLAjPpZUiKkR8XGg0Sl7iD1u8T)  

(Image by Benjamin Scott, [source](https://www.natureindex.com/news-blog/data-visualization-these-are-the-happiest-countries-world-happiness-report-twenty-nineteen))   

The Sustainable Development Solutions Network (SDSN) collects data across the world relating to happiness.  They use this data to rank countries in order of happiness factor.

This is not an exact science but can give food for thought in terms of what factors might have the most impact on a nation's happiness levels.

Data is taken from the Gallup World Poll, so not collected directly by SDSN.  
Countries are grouped by region.  

### The factors included are:
Economy (measured in GDP per Capita)
Family (support systems)
Health (measured by Life Expectancy)
Freedom (sense of)
Trust (Government Corruption)
Generosity (charitable inclinations)
Dystopia Residual
*  Dystopic is the theoretical most unhappy country with the lowest levels in all six of the above factors  
*  The Residual measure is a calculated as the average of the six distances from lowest

Let's take a look at the data


---
### Open a data set

Open the data set, an Excel file with only one sheet (so sheet_name is not necessary) from here: https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true

Interrogate the data (head, tail, iloc) to get to know what it contains.


In [52]:
import pandas as pd
pd.set_option('display.width', 240)

def get_excel_data():
  url = "https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true"
  df = pd.read_excel(url)
  return df

data = get_excel_data()

# Interrogate data
info = data.iloc[:]
shape = data.shape
columns = data.columns

print(f"\nInfo (iloc): \n{info}\n\nShape: \n{shape}\n\nColumns: \n{columns}\n")



Info (iloc): 
         Country                           Region  Happiness Rank  Happiness Score  Standard Error  Economy (GDP per Capita)   Family  Health (Life Expectancy)  Freedom  Trust (Government Corruption)  Generosity  Dystopia Residual
0    Switzerland                   Western Europe               1            7.587         0.03411                   1.39651  1.34951                   0.94143  0.66557                        0.41978     0.29678            2.51738
1        Iceland                   Western Europe               2            7.561         0.04884                   1.30232  1.40223                   0.94784  0.62877                        0.14145     0.43630            2.70201
2        Denmark                   Western Europe               3            7.527         0.03328                   1.32548  1.36058                   0.87464  0.64938                        0.48357     0.34139            2.49204
3         Norway                   Western Europe            

---
### Sort the data in different ways

The data is currently sorted in order of rank.  To sort the data in the table, run the code below, which identifies the column on which to sort in the brackets.

Then, **try sorting on other columns** *Note: you must type the column heading in the quotes and exactly as it appears in the table (including capitalisation)*.  To sort on multiple columns, enter a list of column headings in the brackets (e.g. `.sort_values(['Region','Freedom'])`



In [53]:
sorted_table = data.sort_values(["Happiness Rank", "Economy (GDP per Capita)", "Generosity","Freedom" ], ascending = [True, True, True, True])
sorted_table  # output the table below

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Switzerland,Western Europe,1,7.587,0.03411,1.39651,1.34951,0.94143,0.66557,0.41978,0.29678,2.51738
1,Iceland,Western Europe,2,7.561,0.04884,1.30232,1.40223,0.94784,0.62877,0.14145,0.43630,2.70201
2,Denmark,Western Europe,3,7.527,0.03328,1.32548,1.36058,0.87464,0.64938,0.48357,0.34139,2.49204
3,Norway,Western Europe,4,7.522,0.03880,1.45900,1.33095,0.88521,0.66973,0.36503,0.34699,2.46531
4,Canada,North America,5,7.427,0.03553,1.32629,1.32261,0.90563,0.63297,0.32957,0.45811,2.45176
...,...,...,...,...,...,...,...,...,...,...,...,...
153,Rwanda,Sub-Saharan Africa,154,3.465,0.03464,0.22208,0.77370,0.42864,0.59201,0.55191,0.22628,0.67042
154,Benin,Sub-Saharan Africa,155,3.340,0.03656,0.28665,0.35386,0.31910,0.48450,0.08010,0.18260,1.63328
155,Syria,Middle East and Northern Africa,156,3.006,0.05015,0.66320,0.47489,0.72193,0.15684,0.18906,0.47179,0.32858
156,Burundi,Sub-Saharan Africa,157,2.905,0.08658,0.01530,0.41587,0.22396,0.11850,0.10062,0.19727,1.83302


---
### Summarising the data

Look at the happiness dataframe.  Create new dataframes from a range of rows, columns, statistical information, etc.

For each dataframe, add a text cell to explain what it is showing

In [12]:
import pandas as pd
pd.set_option('display.width', 240)

def get_excel_data():
  url = "https://github.com/futureCodersSE/working-with-data/blob/main/Happiness-Data/2015.xlsx?raw=true"
  df = pd.read_excel(url)
  return df

data = get_excel_data()

# Interrogate data
def investigate_data(data):
  # 1. Create new dataframe by selecting specific columns & sort
  hap_col_select = data[["Country", "Region", "Happiness Rank", "Happiness Score", "Economy (GDP per Capita)", "Freedom", "Trust (Government Corruption)", "Generosity"]]
  hap_col_select_sort = hap_col_select.sort_values(["Country", "Region", "Happiness Rank", "Happiness Score", "Economy (GDP per Capita)", "Freedom", "Trust (Government Corruption)", "Generosity"], ascending = [True, True, True, True, True, True, True, True])

  # 2. Happiest region with happiest country & score
  hap_by_reg = hap_col_select_sort.groupby(["Region"])[["Happiness Score"]].mean().sort_values("Happiness Score", ascending=False)

  # 3. Happiest Region with (min/max) scores
  hap_min_max_by_reg = hap_col_select_sort.groupby(["Region"])[["Happiness Score"]].agg([min, max])

  # 4. Calculate the range per region (range) AND order
  hap_range_by_reg = hap_min_max_by_reg[("Happiness Score", "max")] - hap_min_max_by_reg[("Happiness Score", "min")]
  hap_range_by_reg_sort = hap_range_by_reg.sort_values(ascending=True)

  # 5. Select "Europe" rows & display (head())
  europe_reg = ["Central and Eastern Europe", "Western Europe"]
  europe_rows = hap_col_select_sort[hap_col_select_sort["Region"].isin(europe_reg)]

  # 6. Display Europe region, by Country & Happiness Score
  eur_sort_reg = europe_rows.groupby(["Region", "Country"])["Happiness Score"].mean()

  # 7. Define G7 Countries & select rows corresponding to the G7 Countries
  g7_sel = ["Canada", "France", "Germany", "Italy", "Japan", "United Kingdom", "United States"]
  g7_rows = hap_col_select_sort[hap_col_select_sort["Country"].isin(g7_sel)]

  # 7. Group by Region & Country. Calculate mean Happiness Score & sort in descending order
  g7_grouped = g7_rows.groupby(["Region", "Country"])["Happiness Score"].mean()
  g7_sorted = g7_grouped.sort_values(ascending=False)

  return {
    "1. New dataframe with selected columns (sorted ascending)": hap_col_select_sort,
    "2. Happiest Region (mean)": hap_by_reg,
    "3. Happiest Region with (min/max) scores": hap_min_max_by_reg,
    "4. Show the Happiness Score range per region (range) (sorted ascending)": hap_range_by_reg_sort,
    "5. Select only Europe region, showing (head)": europe_rows.head(),
    "6. Display Europe region, by Country & Happiness Score": eur_sort_reg,
    "7. Display G7 countries, by Country & Happiness Score": g7_sorted,
}

statistics = investigate_data(data)

print("\nWorld Happiness Investigation\n")
for key, value in statistics.items():
  print(f"{key}:\n{value}\n")



World Happiness Investigation

1. New dataframe with selected columns (sorted ascending):
         Country                           Region  Happiness Rank  Happiness Score  Economy (GDP per Capita)  Freedom  Trust (Government Corruption)  Generosity
152  Afghanistan                    Southern Asia             153            3.575                   0.31982  0.23414                        0.09719     0.36510
94       Albania       Central and Eastern Europe              95            4.959                   0.87867  0.35733                        0.06413     0.14272
67       Algeria  Middle East and Northern Africa              68            5.605                   0.93929  0.28579                        0.17383     0.07822
136       Angola               Sub-Saharan Africa             137            4.033                   0.75778  0.10384                        0.07122     0.12344
29     Argentina      Latin America and Caribbean              30            6.574                   1.0

---
### Next steps

There are data sets for the years 2015 to 2019 available.  To access and try out other years, change 2015 to the required year in the URL in the first code cell.  Leave the rest exactly as it is.  

Other years may have different column headings and so there will be different data to play with.