### SI 370 - Homework #3: Applied Statistics

## Background

This homework assignment focuses on data from the [World Happiness Report](https://www.kaggle.com/unsdsn/world-happiness).

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

We will be using data from the 2016 report.

Your main task in this assignment is to explore the data *using the data
manipulation, analysis, visualization, and applied statistical methods we covered in class* as well as those in the assigned readings. 

** You should also feel free to ask questions on the __[class Slack channel](https://si370fa2018.slack.com/messages/CCQLTNS65/team/UCFLDB049/)__ ! **

A total of 30 points is available in this homework assignment.

Questions 1-6 are worth 5 points each.  Points will be allocated according to the following rubric:

- 5 points: Question is correctly and completely answered.  Answer consists of well-written code that conforms to [PEP 8](https://www.python.org/dev/peps/pep-0008/) guidelines and is 
accompanied by a written interpretation in a Markdown block.  Written interpretation does not contain spelling, grammar or stylistic errors (see https://faculty.washington.edu/heagerty/Courses/b572/public/StrunkWhite.pdf for a detailed specifications).
- 4 points: Answer is mostly complete and correct; two or fewer noticeable omissions or errors.  Minor stylistic flaws, either in code or in written interpretation.
- 3 points: Answer has significant omissions or errors; Noticeable departure from PEP-8 guidelines and/or moderate spelling, grammar, or style issues in written interpretations.
- 2 points: Question is perfunctorily attempted.  Substantial parts are missing or incorrect.
- 0 points: Question not attempted.

A bonus question worth up to 5 points is also available.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [10]:
happiness = pd.read_csv('data/world-happiness-2016.csv')

In [11]:
happiness.shape

(157, 13)

In [12]:
happiness.head(20)

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Lower Confidence Interval,Upper Confidence Interval,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Denmark,Western Europe,1,7.526,7.46,7.592,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939
1,Switzerland,Western Europe,2,7.509,7.428,7.59,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463
2,Iceland,Western Europe,3,7.501,7.333,7.669,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137
3,Norway,Western Europe,4,7.498,7.421,7.575,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465
4,Finland,Western Europe,5,7.413,7.351,7.475,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2.82596
5,Canada,North America,6,7.404,7.335,7.473,1.44015,1.0961,0.8276,0.5737,0.31329,0.44834,2.70485
6,Netherlands,Western Europe,7,7.339,7.284,7.394,1.46468,1.02912,0.81231,0.55211,0.29927,0.47416,2.70749
7,New Zealand,Australia and New Zealand,8,7.334,7.264,7.404,1.36066,1.17278,0.83096,0.58147,0.41904,0.49401,2.47553
8,Australia,Australia and New Zealand,9,7.313,7.241,7.385,1.44443,1.10476,0.8512,0.56837,0.32331,0.47407,2.5465
9,Sweden,Western Europe,10,7.291,7.227,7.355,1.45181,1.08764,0.83121,0.58218,0.40867,0.38254,2.54734


## Answer the questions below. 
For each question, you should
1. Write code that can help you answer the following questions, and
2. Explain your answers in plain English. You should use complete sentences that would be understood by an educated professional who is not necessarily a data scientist (like a product manager).

### <font color="magenta"> Q1: What are the top 5 correlation coefficients among "Happiness Score", Economy, Family, Health, Freedom, Trust, and Generosity? Provide a visualization as well as a written statement of your findings. (5 points)

In [9]:
# put your code here

(and explain your answers)

### <font color='magenta'> Q2: Describe, using plots, the relationships between the following variables: (5 points) </font>
1. Happiness vs. Family
2. Happiness vs. Economy
3. Happiness vs. Health
4. Happiness vs. Freedom
5. Happiness vs. Trust
6. Happiness vs. Generosity

You have, at this point, a wide variety of visualizations that you know how to generate.  Choose wisely!

In [8]:
# put your code here

(Use this space to explain your answers)

### <font color="magenta"> Q3: Does there appear to be an interaction between region and happiness quartile? (5 points)
    
Create a new variable that represents which quartile each country's happiness score is in.  For example, the first 39 or 40 countries are in the 4th happiness quartile (it's up to you to decide exactly how to divide the countries into happiness quartiles).
    
(Hint: contingency tables, mosaic plots, and chi-square may be useful here).

In [14]:
# put your code here

(Use this space to explain your answers)

### <font color="magenta">Q4: Use a linear regression to model the relationship between Happiness Score and Family.  (5 points)

What does this tell you about the relationship?


You may wish to include a visualization.

In [15]:
# put your code here

(Use this space to explain your answers)

### <font color="magenta">Q5: Do happiness scores vary significantly between regions?  Which region has the highest mean happiness score? (5 points)

You may wish to include a visualization.
Hint: ANOVA might help here.

In [15]:
# put your code here

(Use this space to explain your answers)

### <font color="magenta">Q6: Which Eastern Asian country has the lowest happiness score.  Comment on the influence that that country has on the relationship between Happiness and Economy.

Hint: you may want to look at the residuals of a regression between Happiness and Economy.

In [15]:
# put your code here

(Use this space to explain your answers)

## <font color='green'> Please submit your completed notebook in .HTML format via Canvas </font>