- Recently I got a lot of feedback from my dear friends who just change or about the change their career towards to Data Analysis/ Data Science and Machine Learning areas about the lack of material between beginning the analysis journey and the advanced techniques.

- They are looking for detailed but at the same time beginner friendly, not so much complicated (with different regression, normalization techniques, etc.) explained Explanatory Data Analysis examples, which show them how to start and most importantly how to read the descriptive statistics and graphs.

- After getting these feedbacks, I have decided to make some kind of series of EDA’s from different datasets, without making so complicated for the people at their first steps of DS/ML journey.



**This notebook is part of the 9 Beginner Friendly EDAs. If these EDAs would be helpful to anyone, I would be more than happy.**



#### **INTRO**

In this Exploratory Data Analysis (EDA), 

- Descriptive statistics of the 2021 dataset
- Correlation among the variables
- Happines Score distribution over the regions
- Descriptive statistics of 2021 dataset for Central and Eastern Europe region
- Happiness Score's trend in 2021 and the previous years at the Central and Eastern Europe will be discussed.


- First thing first, lets' import the related libraries for further analysis

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt 
import seaborn as sns 
import matplotlib as mpl



import plotly 
import plotly.express as px
import plotly.graph_objs as go
import plotly.offline as py
from plotly.offline import iplot
from plotly.subplots import make_subplots
import plotly.figure_factory as ff

World Happiness Report 2021 and World Happines Report (which includes data before 2021) will be used in this Exploratory Data Analysis (EDA) 

In [None]:
df_2021 = pd.read_csv('../input/world-happiness-report-2021/world-happiness-report-2021.csv')
df_past = pd.read_csv("../input/world-happiness-report-2021/world-happiness-report.csv")

In [None]:
df_2021.head()

In [None]:
df_2021.shape

In [None]:
df_past.head()

In [None]:
df_past.shape

- For the sake of the further analysis we will rename the column names in the old dataset.

In [None]:
df_past = df_past.rename(columns={'Life Ladder':'Ladder score', 'Log GDP per capita':'Logged GDP per capita','Healthy life expectancy at birth':'Healthy life expectancy'})

In [None]:
df_past.sample(1)

In this EDA, areas of interest will be 
- 'Country name',
- 'Regional indicator',
- 'Ladder score',
- 'Logged GDP per capita',
- 'Social support',
- 'Healthy life expectancy',
- 'Freedom to make life choices',
- 'Generosity',
- 'Perceptions of corruption' 

For that reason we will make further adjustments.

In [None]:
df1_2021= df_2021[['Country name','Regional indicator','Ladder score','Logged GDP per capita','Social support','Healthy life expectancy','Freedom to make life choices','Generosity','Perceptions of corruption']].copy() 


In [None]:
df1_2021.head()

- Let's look for the general information about the data and see whether any missing values to deal with.

In [None]:
df1_2021.info()

- So far so good

- General statistical information on the 2021 world happiness report

In [None]:
df1_2021.describe()

- Let's see the correlation among the numerical variables in the dataset

In [None]:
df1_2021.corr()

Even though, based on the domain, high level correlation is interpreted differently (some of the areas %80 or above is a good sign of the string correlation, for some areas it will be aroun %60)

- With keeping that in mind, Happiness score(Ladder Score) has strong level correlation with GDP, Social Support, Healthy life Expectancy.

- Freedom to make life choice and happiness score have mid level correlation between them.

- Perception of corruption and happiness score have weak level negative level correlation between them.

To see that relationships from differnt visualization, heatmap will be used further.

In [None]:
fig = go.Figure(go.Heatmap(z=df1_2021.corr(), x=df1_2021.corr().columns.tolist(), y=df1_2021.corr().columns.tolist(), colorscale='agsunset'))
fig.show()

Before moving on the Central and Eastern Europe, let's see the happines score at the Regional Level

In [None]:
df1_2021.groupby('Regional indicator')['Ladder score'].describe()

Based on the above information, it is easily seen that, 
- Western Europe has the highest level happiness score, 

on the other hand;

- South Asia and Sub-Saharan Africa have the lowest level happiness score amongst the regions.

- Let's look at the boxplot to see overall distribution of the happiness score at the different regions.


In [None]:
fig = px.box(df1_2021, x= 'Ladder score', y='Regional indicator',hover_data = df1_2021[['Regional indicator','Country name']])
fig.update_traces(quartilemethod="inclusive")
fig.show()

In the boxplot, based on the happines score distributions on the regions, several outliers were seen in the:

- Latin America and Caribbean
- Central and Eastern Europe

- After this point, let's look in detail to the Central and Eastern Europe part of the dataset.

In [None]:
center_east_europe = df1_2021[df1_2021['Regional indicator']=='Central and Eastern Europe']

center_east_europe

- Let's see how correlated the variables in Central and Eastern Europe.

In [None]:
center_east_europe.corr()

Even though some similarities can be found with the whole 2021 Happines Score correlation matrix, happiness score's correlation with the other variables in Central and Eastern Europe is considerably lower than whole dataset correlation matrix results.

In [None]:
fig = go.Figure(go.Heatmap(z=center_east_europe.corr(), x=center_east_europe.corr().columns.tolist(), y=center_east_europe.corr().columns.tolist(), colorscale='agsunset'))
fig.show()

In [None]:
center_east_europe.describe()

Based on the descriptive information, based on the Mean-Median differences and  IQRs (InterQuartile Ranges) possible outliers can be seen in the:

- Happiness Score
- Social Support
- Healthy life expectancy
- Generosity
- Perceptions of corruption

- For the sake of the simplicity, in this EDA we will focus on Happiness Score.

For further detail, let's make a boxplot, barplot and use Plotly's interactive environment.

### Happiness in the Central and Eastern Europe

In [None]:
fig = px.box(center_east_europe, x= 'Ladder score', hover_data = center_east_europe[['Country name']])
fig.update_traces(quartilemethod="inclusive")
fig.show()


In the Central and Eastern Europe, based on the happiness score distributions, 3 possible outliers can be seen.
One outlier is at the high score side of the happiness score and other two outliers are at the minimum side of the happiness score.

In [None]:
fig = px.bar(center_east_europe, x='Ladder score', y='Country name')
fig.show()

As seen in the barplot, 
- **Czech Republic** has the highest happiness score in the Central and Eastern Europe.
- **Albania** and **North Macedonia** have the lowest happiness score in the Central and Eastern Europe.
- These three country are possible outliers based on the happiness score distribution in the Central and Eastern Europe.

By using Plotly interactive environment we can see the differences among the countries in the Central and Eastern Europe on the Happiness Score

In [None]:
fig=go.Figure()
fig.add_trace(go.Scatter(
    x=center_east_europe['Country name'],
    y=center_east_europe['Ladder score'],
    name='Happines Score',
    mode='markers+text',
    marker_color='blue',
    marker_size=10,
    text=center_east_europe['Ladder score'],
    textposition='top center',
    line=dict(color='red',dash='dash'),
))
fig.update_layout(
    title= "<b>Center East Europe Happiness Score in 2021</b>",
    xaxis_title="<b>Country</b>",
    yaxis_title="<b>Happiness Score</b>",
    template='plotly_white',
    font=dict(
        size=12,
        color="Black",
        family="Oswald', sans-serif"
        ),
    xaxis=dict(showgrid=True),
    yaxis=dict(showgrid=True),
    yaxis2=dict(showgrid=True,overlaying='y',side='right',title='<b>Happiness Score</b>'),
)
fig.show()

- After seeing the 2021 happines score it would be good idea to see how was the happpiness score trends in the past years.
- Let's look at the previous years happines score for all the countries in the region. 

In [None]:
css3_colors = ['#add8e6', '#f08080','#e0ffff','#fafad2','#d3d3d3','#90ee90','#ffb6c1','#ffa07a','#20b2aa','#87cefa','#778899','#b0c4de','#32cd32','#ff00ff','#66cdaa','#ba55d3', '#7b68ee','#48d1cc', '#f5fffa' ]
css3_dict ={}
i=0
for name in center_east_europe['Country name']:
    css3_dict[name]=css3_colors[i]
    i+=1

In [None]:
center_east_europe_past=df_past[df_past['Country name'].isin(center_east_europe['Country name'].to_list())].loc[:,'Country name':'Ladder score']


In [None]:
fig=go.Figure()
for name in center_east_europe['Country name']:
    fig.add_trace(go.Scatter(
    x=center_east_europe_past[center_east_europe_past['Country name']==name]['year'],
    y=center_east_europe_past[center_east_europe_past['Country name']==name]['Ladder score'],
    name=name,
    mode='markers+text+lines',
    marker_color='black',
    marker_size=3,
    line=dict(color=css3_dict[name]),
    yaxis='y1'))
    
fig.update_layout(
    title="Happiness Score Trend in Central and Eastern Europe ",
    xaxis_title="Year",
    yaxis_title='Happiness Score',
    template='plotly_white',
    font=dict(
        size=14,
        color="Blue",
        family="Oswald', sans-serif"
    ),
    xaxis=dict(showgrid=True),
    yaxis=dict(showgrid=True)
)
fig.show()

- After this point let's look at the Happiness Score relation with the other variables in the region.

## **Happiness Score & GDP** 

In [None]:
trace = go.Scatter(x = center_east_europe['Ladder score'],y=center_east_europe['Logged GDP per capita'],text = center_east_europe['Country name'],mode='markers',marker={'color':'blue', 'size':10})
df=[trace]
layout = go.Layout(title='Happiness Score & Logged GDP per capita in Central Eastern Europe',xaxis=dict(title='Ladder Score'),yaxis=dict(title='Logged GDP per capita'),hovermode='closest')
figure = go.Figure(data=df,layout=layout)
figure.update_layout(template='plotly_white',
                  font=dict(family="Oswald', sans-serif"))
figure.show()

## **Happiness Score & Social Support** 

In [None]:
trace = go.Scatter(x = center_east_europe['Ladder score'],y=center_east_europe['Social support'],text = center_east_europe['Country name'],mode='markers',marker={'color':'blue', 'size':10})
df=[trace]
layout = go.Layout(title='Happiness Score & Social Support in Central Eastern Europe',xaxis=dict(title='Ladder score'),yaxis=dict(title='Social Support'),hovermode='closest')
figure = go.Figure(data=df,layout=layout)
figure.update_layout(template='plotly_white',
                  font=dict(family="Oswald', sans-serif"))
figure.show()

## **Happiness Score & Perceptions of Corruption** 

In [None]:
trace = go.Scatter(x = center_east_europe['Ladder score'],y=center_east_europe['Perceptions of corruption'],text = center_east_europe['Country name'],mode='markers',marker={'color':'blue', 'size':10})
df=[trace]
layout = go.Layout(title='Happiness Score & Perceptions of Corruption in Central Eastern Europe',xaxis=dict(title='Ladder score'),yaxis=dict(title='Perceptions of corruption'),hovermode='closest')
figure = go.Figure(data=df,layout=layout)
figure.update_layout(template='plotly_white',
                  font=dict(family="Oswald', sans-serif"))
figure.show()

## **Happiness Score & Healthy Life Expectancy** 

In [None]:
trace = go.Scatter(x = center_east_europe['Ladder score'],y=center_east_europe['Healthy life expectancy'],text = center_east_europe['Country name'],mode='markers',marker={'color':'blue', 'size':10})
df=[trace]
layout = go.Layout(title='Happiness Score & Healthy Life Expectancy in Central Eastern Europe',xaxis=dict(title='Ladder score'),yaxis=dict(title='Healthy life expectancy'),hovermode='closest')
figure = go.Figure(data=df,layout=layout)
figure.update_layout(template='plotly_white',
                  font=dict(family="Oswald', sans-serif"))
figure.show()

## **Happiness Score & Freedom to Make Life Choices** 

In [None]:
trace = go.Scatter(x = center_east_europe['Ladder score'],y=center_east_europe['Freedom to make life choices'],text = center_east_europe['Country name'],mode='markers',marker={'color':'blue', 'size':10})
df=[trace]
layout = go.Layout(title='Happiness Score & Freedom to Make Life Choices in Central Eastern Europe',xaxis=dict(title='Ladder score'),yaxis=dict(title='Freedom to make life choices'),hovermode='closest')
figure = go.Figure(data=df,layout=layout)
figure.update_layout(template='plotly_white',
                  font=dict(family="Oswald', sans-serif"))
figure.show()

## This notebook is a part of the 9 Beginner Friendly EDAs
## If you like this one, you can also check out other notebooks in the Beginner Friendly EDAs series!

* [Data Analyst Jobs - EDA](https://www.kaggle.com/kaanboke/plotly-data-analyst-jobs)
* [Top Games on Google Play Store](https://www.kaggle.com/kaanboke/plotly-beginner-friendly-top-games)
* [Hollywood Top Movies- EDA](https://www.kaggle.com/kaanboke/plotly-beginner-friendly-top-movies)
* [UDEMY Courses EDA](https://www.kaggle.com/kaanboke/plotly-beginner-friendly-udemy)
* [Countries Life Expectancy](https://www.kaggle.com/kaanboke/plotly-beginner-friendly)
* [Netflix Movies- EDA](https://www.kaggle.com/kaanboke/plotly-beginner-friendly-netflix)
* [Amazon Top 50 Bestselling Books EDA](https://www.kaggle.com/kaanboke/plotly-beginner-friendly-amazon)
* [London bike Sharing EDA](https://www.kaggle.com/kaanboke/plotly-beginner-friendly-london-bike)




- Thanks for the dataset contibutor for this data. I really enjoyed working on it.

- It was a quite pleasure to share with you this detailed, beginner friendly EDA. Thanks for your time.

- All the best