<img src="https://allthatsinteresting.com/wordpress/wp-content/uploads/2016/03/giphy-4.gif" width="700px">



### **Introduction**: 

* In this project, you'll build a Model to determine the most important factors in day-to-day life that makes a person happy. We've provided some of the code, but left most of the part for you. After you've submitted this project, feel free to explore the data and the model more.
* The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives,which the report also correlates with various life factors

* The rankings of national happiness are based on a Cantril ladder survey. Nationally representative samples of respondents are asked to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale. The report correlates the results with various life factors.

* In the reports, experts in fields including economics, psychology, survey analysis, and national statistics, describe how measurements of well-being can be used effectively to assess the progress of nations, and other topics. Each report is organized by chapters that delve deeper into issues relating to happiness, including mental illness, the objective benefits of happiness, the importance of ethics, policy implications, and links with the Organisation for Economic Co-operation and Development's (OECD) approach to measuring subjective well-being and other international and national efforts.






#### -------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Is this GDP per capita which makes you happy ?

<img src="https://i.pinimg.com/originals/35/da/23/35da236b480636ec8ffee367281fe1b1.gif" 
width="700" height="300" />

## Is this Perception of Corruption about Goverment, which make you sad?


<img src="https://media.tenor.com/images/50c6b91a0384dcc0c715abe9326789cd/tenor.gif" 
width="700" height="400" />



## Is this Freedom of Life Choises which makes you happy ?

<img src="https://media0.giphy.com/media/OmAdpbVnAAWJO/giphy.gif"
width="700" height="400" />



## Let us explore the factor of happiness.


<img src="https://media1.giphy.com/media/1rKFURpStAa8VOiBLg/giphy.gif" 
width="700" height="400" />



<a id='description'></a>


# Description 

In this project you are going to explore and explain the <span style="color: blue;">**relationship**</span>  between <span style="color: blue;">**happiness score**</span> and other variable like <span style="color: green;">**GDP per Capita**</span>, <span style="color: green;">**Life Expectancy**</span>, <span style="color: green;">**Freedom**</span> etc.

### Dataset

The dataset which you are going to use is the world happiness report data. 

#### Context

The World Happiness Report is a landmark survey of the state of global happiness. The first report was published in 2012, the second in 2013, the third in 2015, and the fourth in the 2016 Update. The World Happiness 2017, which ranks 155 countries by their happiness levels, was released at the United Nations at an event celebrating International Day of Happiness on March 20th. The report continues to gain global recognition as governments, organizations and civil society increasingly use happiness indicators to inform their policy-making decisions. Leading experts across fields – economics, psychology, survey analysis, national statistics, health, public policy and more – describe how measurements of well-being can be used effectively to assess the progress of nations. The reports review the state of happiness in the world today and show how the new science of happiness explains personal and national variations in happiness.

#### Content

The happiness scores and rankings use data from the Gallup World Poll. The scores are based on answers to the main life evaluation question asked in the poll. This question, known as the Cantril ladder, asks respondents to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. The scores are from nationally representative samples for the years 2013-2016 and use the Gallup weights to make the estimates representative. The columns following the happiness score estimate the extent to which each of six factors – economic production, social support, life expectancy, freedom, absence of corruption, and generosity – contribute to making life evaluations higher in each country than they are in Dystopia, a hypothetical country that has values equal to the world’s lowest national averages for each of the six factors. They have no impact on the total score reported for each country, but they do explain why some countries rank higher than others.

#### Inspiration

What countries or regions rank the highest in overall happiness and each of the six factors contributing to happiness? How did country ranks or scores change between the 2015 and 2016 as well as the 2016 and 2017 reports? Did any country experience a significant increase or decrease in happiness?

#### What is Dystopia?

Dystopia is an imaginary country that has the world’s least-happy people. The purpose in establishing Dystopia is to have a benchmark against which all countries can be favorably compared (no country performs more poorly than Dystopia) in terms of each of the six key variables, thus allowing each sub-bar to be of positive width. The lowest scores observed for the six key variables, therefore, characterize Dystopia. Since life would be very unpleasant in a country with the world’s lowest incomes, lowest life expectancy, lowest generosity, most corruption, least freedom and least social support, it is referred to as “Dystopia,” in contrast to Utopia.


#### What do the columns succeeding the Happiness Score(like Family, Generosity, etc.) describe?

The following columns: GDP per Capita, Family, Life Expectancy, Freedom, Generosity, Trust Government Corruption describe the extent to which these factors contribute in evaluating the happiness in each country. The Dystopia Residual metric actually is the Dystopia Happiness Score(1.85) + the Residual value or the unexplained value for each country as stated in the previous answer.

If you add all these factors up, you get the happiness score so it might be un-reliable to model them to predict Happiness Scores.



## Analyzing Happiness around the Globe.

### Lets Start


## Importing some useful libraries



In [None]:
# for some basic operations
import numpy as np 
import pandas as pd

# for visualizations
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')


# for interactive visualizations
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode(connected = True)
from bubbly.bubbly import bubbleplot

# for model interpretations
import eli5
from eli5.sklearn import PermutationImportance
from lightgbm import LGBMRegressor


## Reading the dataset

In this model you are going to use data for only 2015, 2016 and 2017. And you will use this data interchangably to avoid repeating some parts. Later on after going through this model you can explore data for 2018 and 2019 yourself.



In [None]:
# mention the datapath

data_path_2015=''
data_path_2016=''
data_path_2017=''


data_2015 = pd.read_csv("data_path_2015")
data_2016 = pd.read_csv("data_path_2016")
data_2017 = pd.read_csv("data_path_2017")


## Understanding the data

A critical step in working with machine learning models is preparing the data correctly. Variables on different scales make it difficult for the network to efficiently learn the correct weights. Below, we've written the code to load and prepare the data. You'll learn more about this soon!



In [None]:
data_2015.head()

In [None]:
data_2016.head()

In [None]:
data_2017.head()

In [None]:
data_2015.describe()

In [None]:
data_2016.describe()

In [None]:
data_2017.describe()

Since now you have seen that data have not changed considerably so you will use all 3 years data interchangably throughout this model



## Exploratory Data Analysis

### 1. Bi-variate data analysis

* Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences. There are three types of bivariate analysis.		
 		
>Numerical & Numerical- It is performed when the variables to be analyzed are both numerical.

>Categorical & Categorical- It is performed when the variables to be analyzed are both categorical.

>Numerical & Categorical- It is performed when one of the variables to be analyzed is numerical and other is categorical.

Now lets make a violin plot between "Happiness Score" and "Region" to see how the happiness of people vary across different parts of the world. 



In [None]:
# Make a violin plot for any year data
# happiness score vs continents

## Start code



## end code


### Multi-Variate Analysis

* Multivariate analysis (MVA) is based on the principles of multivariate statistics, which involves observation and analysis of more than one statistical outcome variable at a time. Typically, MVA is used to address the situations where multiple measurements are made on each experimental unit and the relations among these measurements and their structures are important.

* Essentially, multivariate analysis is a tool to find patterns and relationships between several variables simultaneously. It lets us predict the effect a change in one variable will have on other variables. ... This gives multivariate analysis a decisive advantage over other forms of analysis.


### Correlations Between the Data

**What is a heatmap??**
* A heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors. The seaborn python package allows the creation of annotated heatmaps which can be tweaked using Matplotlib tools as per the creator's requirement.

* Image below is of a heatmap.

<img src="https://d1rwhvwstyk9gu.cloudfront.net/2017/07/seaburn-2.png"
     width="800px">

Lets find some top correlating features of this dataset by making a heatmap using Seaborn Library.



In [None]:
# Make a heatmap for any year data

## start data



## end data

> In the above Heat Map you can see that Happiness Score is very highly correlated with Economy, Health, and Family Satisfaction and somewhat related with Freedom also but has very low relation with Trust in Government in average case.



Lets analyze these heatmap region wise to get a better insight.
Code for making one of the heatmap is provided below and others are left for you to write their code.

## 1. Correlations for Western Europe



In [None]:
plt.rcParams['figure.figsize'] = (20, 15)

d = data_2016.loc[lambda data_2016: data_2016['Region'] == 'Western Europe']
sns.heatmap(d.corr(), cmap = 'Wistia', annot = True)

plt.show()



## 2. Correlations for Eastern Asia



In [None]:
# Make a heatmap for 2016 year data for Eastern Asia region only

## start data



## end data



> You have noticed that, the situation gets worsened as the Correlation is negative for many important factors such as Economy, Health, Trust in Government which makes the situation very critical. It has Positive correlations only with Freedom, Generosity and Famlily Satisfaction.



## 3. North America



In [None]:
# Make a heatmap for 2016 year data for Northern America region only

## start data



## end data

## 4. Middle East and Northern Africa



In [None]:
# Make a heatmap for 2016 year data for Middle East and Northern Africa region only

## start data



## end data


## 5. Sub-Saharan Africa



In [None]:

# Make a heatmap for 2016 year data for Sub-Saharan Africa region only

## start data



## end data




## Bubble Charts

 * A bubble chart (aka bubble plot) is an extension of the scatter plot used to look at relationships between three numeric variables. Each dot in a bubble chart corresponds with a single data point, and the variables' values for each point are indicated by horizontal position, vertical position, and dot size.
 
 * Like the scatter plot, a bubble chart is primarily used to depict and show relationships between numeric variables. However, the addition of marker size as a dimension allows for the comparison between three variables rather than just two.

Lets make some bubble charts to analyze the type of relationship among various features of the dataset.
Code for making one bubble chart is provided below to explain you its implementation and others are left for you to write their code at your own.



In [None]:
# Happiness vs Generosity vs Economy

import warnings
warnings.filterwarnings('ignore')

figure = bubbleplot(dataset = data_2015, x_column = 'Happiness Score', y_column = 'Generosity', 
    bubble_column = 'Country', size_column = 'Economy (GDP per Capita)', color_column = 'Region', 
    x_title = "Happiness Score", y_title = "Generosity", title = 'Happiness vs Generosity vs Economy',
    x_logscale = False, scale_bubble = 1, height = 650)

py.iplot(figure, config={'scrollzoom': True})

In [None]:
# Make a bubble chart using 2015 year data
# Happiness vs Trust vs Economy

## start code





## end code


In [None]:
# Make a bubble chart using 2016 year data
# Happiness vs Health vs Economy

## start code





## end code


In [None]:
# Make a bubble chart using 2015 year data
# Happiness vs Family vs Economy

## start code





## end code


## Bullet Chart

* A bullet graph is a variation of a bar graph developed to replace dashboard gauges and meters. A bullet graph is useful for comparing the performance of a primary measure to one or more other measures.

* Bullet charts came into existence to overcome the drawbacks of Gauge charts. We can refer to them as Liner Gauge charts.

Lets make a bullet Chart to Represent the Range for some of the most Important Attributes given in the data.



In [None]:
import plotly.figure_factory as ff

data = (
  {"label": "Happiness", "sublabel":"score",
   "range": [5, 6, 8], "performance": [5.5, 6.5], "point": [7]},
  {"label": "Economy", "sublabel": "score", "range": [0, 1, 2],
   "performance": [1, 1.5], "sublabel":"score","point": [1.5]},
  {"label": "Family","sublabel":"score", "range": [0, 1, 2],
   "performance": [1, 1.5],"sublabel":"score", "point": [1.3]},
  {"label": "Freedom","sublabel":"score", "range": [0, 0.3, 0.6],
   "performance": [0.3, 0.4],"sublabel":"score", "point": [0.5]},
  {"label": "Trust", "sublabel":"score","range": [0, 0.2, 0.5],
   "performance": [0.3, 0.4], "point": [0.4]}
)



fig = ff.create_bullet(
    data, titles='label', subtitles='sublabel', markers='point',
    measures='performance', ranges='range', orientation='v',
)
py.iplot(fig, filename='bullet chart from dict')

## Pie Chart



In [None]:
# Make a pie chart pie that depicts the Number of Countries from each Region

## start code








## end code


## Chloropleth Maps

* A Choropleth Map is a map composed of colored polygons. It is used to represent spatial variations of a quantity. This page documents how to build outline choropleth maps, but you can also build choropleth tile maps using our Mapbox trace types.
* Making choropleth maps requires two main types of input:

1. Geometry information:

>>This can either be a supplied GeoJSON file where each feature has either an id field or some identifying value in properties; or

>>one of the built-in geometries within plotly: US states and world countries 

2. A list of values indexed by feature identifier.

>>The GeoJSON data is passed to the geojson argument, and the data is passed into the color argument of px.choropleth (z if using graph_objects), in the same order as the IDs are passed into the location argument.

* Note the geojson attribute can also be the URL to a GeoJSON file, which can speed up map rendering in certain cases.
Lets make some chloropleth maps to get a better insight in relationshop between "Country" and other features.
Code for making one Chloropleth map is provided below to explain you its implementation and others are left for you to write their code at your own.



### Country vs Generosity



In [None]:
trace1 = [go.Choropleth(
               colorscale = 'Earth',
               locationmode = 'country names',
               locations = data_2017['Country'],
               text = data_2017['Country'], 
               z = data_2017['Generosity'],
               )]

layout = dict(title = 'Generosity',
                  geo = dict(
                      showframe = True,
                      showocean = True,
                      showlakes = True,
                      showcoastlines = True,
                      projection = dict(
                          type = 'hammer'
        )))


projections = [ "equirectangular", "mercator", "orthographic", "natural earth","kavrayskiy7", 
               "miller", "robinson", "eckert4", "azimuthal equal area","azimuthal equidistant", 
               "conic equal area", "conic conformal", "conic equidistant", "gnomonic", "stereographic", 
               "mollweide", "hammer", "transverse mercator", "albers usa", "winkel tripel" ]

buttons = [dict(args = ['geo.projection.type', y],
           label = y, method = 'relayout') for y in projections]
annot = list([ dict( x=0.1, y=0.8, text='Projection', yanchor='bottom', 
                    xref='paper', xanchor='right', showarrow=False )])


# Update Layout Object

layout[ 'updatemenus' ] = list([ dict( x=0.1, y=0.8, buttons=buttons, yanchor='top' )])
layout[ 'annotations' ] = annot


fig = go.Figure(data = trace1, layout = layout)
py.iplot(fig)



### Top 10 Most Generous Countries



In [None]:
data_2017[['Country', 'Generosity']].sort_values(by = 'Generosity',
                                                ascending = False).head(10)

### Country vs Trust in Government (Corruption)



In [None]:
# Make a chloropleth map for depicting Country vs Trust in Government (Corruption) for 2017 year data

## start code 





















## end code

### Top 10 Countries with Trust in Government



In [None]:
data_2017[['Country', 'Trust..Government.Corruption.']].sort_values(by = 'Trust..Government.Corruption.',
                                                                     ascending = False).head(10)

### Country vs Family Satisfaction Index

In [None]:
# Make a chloropleth map for depicting Country vs Family Satisfaction Index for 2017 year data

## start code 





















## end code

### Top 10 Countries in Family Satisfaction



In [None]:
data_2017[['Country', 'Family']].sort_values(by = 'Family', ascending = False).head(10)


### Country vs Economy (GDP per Capita)



In [None]:
# Make a chloropleth map for depicting Country vs Economy (GDP per Capita) for 2017 year data

## start code 





















## end code

## Top 10 Countries with Best Economy



In [None]:

data_2017[['Country', 'Economy..GDP.per.Capita.']].sort_values(by = 'Economy..GDP.per.Capita.',
            ascending = False).head(10)


### Country vs Freedom



In [None]:
# Make a chloropleth map for depicting Country vs Freedom for 2017 year data

## start code 





















## end code

## Top 10 Most Freedom Oriented Countries



In [None]:

data_2017[['Country', 'Freedom']].sort_values(by = 'Freedom', ascending = False).head(10)


## Model Building

* For this project your main task is to determine whih factors are most important for Happiness in people.

* So for this you will not build a model as such but apply LGBMReggressor Permutaion Importance only to determine most important factors for happiness.



In [None]:
lgbm = LGBMRegressor(n_estimators=5000)
indData = data_2016.loc[:,"Economy (GDP per Capita)":"Generosity"]
depData = data_2016.pop("Happiness Score")
lgbm.fit(indData, depData)
columns = indData.columns.to_list()
perm = PermutationImportance(lgbm, random_state=10).fit(indData, depData)
eli5.show_weights(perm, feature_names = columns)

### At Last Country vs Happiness Rank



In [None]:
trace1 = [go.Choropleth(
               colorscale = 'Electric',
               locationmode = 'country names',
               locations = data_2015['Country'],
               text = data_2015['Country'], 
               z = data_2015['Happiness Rank'],
               )]

layout = dict(title = 'Happiness Rank',
                  geo = dict(
                      showframe = True,
                      showocean = True,
                      showlakes = True,
                      showcoastlines = True,
                      projection = dict(
                          type = 'hammer'
        )))


projections = [ "equirectangular", "mercator", "orthographic", "natural earth","kavrayskiy7", 
               "miller", "robinson", "eckert4", "azimuthal equal area","azimuthal equidistant", 
               "conic equal area", "conic conformal", "conic equidistant", "gnomonic", "stereographic", 
               "mollweide", "hammer", "transverse mercator", "albers usa", "winkel tripel" ]

buttons = [dict(args = ['geo.projection.type', y],
           label = y, method = 'relayout') for y in projections]

annot = list([ dict( x=0.1, y=0.8, text='Projection', yanchor='bottom', 
                    xref='paper', xanchor='right', showarrow=False )])


# Update Layout Object

layout[ 'updatemenus' ] = list([ dict( x=0.1, y=0.8, buttons=buttons, yanchor='top' )])
layout[ 'annotations' ] = annot


fig = go.Figure(data = trace1, layout = layout)
py.iplot(fig)


## Top 10 Happiest Countries



In [None]:
data_2016[['Country', 'Happiness Rank']].sort_values(by = 'Happiness Rank', ascending = True).head(10)


## Conclusions



In [None]:
## write your conclusions here about which factors you found most important for happiness.

<img src="https://media.giphy.com/media/g3bKgbTctP1kI/giphy.gif" width="400px">