# **Overview**

Throughout this assignment, you will be performing specific well-defined tasks that’ll strengthen your concepts in data visualization. We will be using the 2021 world happiness report for the assignment and here is a brief context about the same - “The World Happiness Report 2021 focuses on the effects of COVID-19 and how people all over the world have fared. The objective of the report was two-fold, first to focus on the effects of COVID-19 on the structure and quality of people’s lives, and second to describe and evaluate how governments all over the world have dealt with the pandemic.”

As part of the assignment, you will have to accomplish the below tasks.


**Author:** Chintoo Kumar
**Contributor:** Chanukya Patnaik

# **Dataset**
Dataset Link: https://raw.githubusercontent.com/dphi-official/Datasets/master/world-happiness-report-2021.csv




**About the dataset:**

It contains data from 149 countries of the world. The dataset comprises 20 different attributes that provide information on each country's world happiness report scores for the year 2021. The happiness study ranks the nations of the world based on questions from the Gallup World Poll. The results are then correlated with other factors, including GDP and social security, etc.
Data Description:

* 'Country name': Name of the country.
* 'Regional indicator': Region the country belongs to.
* 'Ladder score': Changes in well-being. Happiness score or subjective well-being.
* 'Standard error of ladder score': changes of well-being based on
standard errors clustered at the country level. 
* 'upperwhisker':Age 60+
* 'lowerwhisker':Age<30
* 'Logged GDP per capita': Economic production of a country. The statistics of GDP per capita.
* 'Social support':Social support (or having someone to count on in times of trouble) is the national
average of the binary responses (either 0 or 1) to the GWP question “If you
were in trouble, do you have relatives or friends you can count on to help you
whenever you need them, or not?”
* 'Healthy life expectancy':Rank of the country based on the Happiness Score.
* 'Freedom to make life choices':The extent to which Freedom contributed to the calculation of the Happiness Score.
* 'Generosity': Generosity is the residual of regressing national average of response to the GWP
question “Have you donated money to a charity in the past month?” on GDP
per capita.
* 'Perceptions of corruption': Absence of corruption.
* 'Ladder score in Dystopia': A social evil lead to inhumanized or fearful lives for the people.
* 'Explained by: Log GDP per capita': explaining economic status of a country by comparings GDPs.
* 'Explained by: Social support': social factors and social behaviors—including
the quality and quantity of people’s social relationships—have also been shown to protect
well-being during the pandemic.
* 'Explained by: Healthy life expectancy': the objective benefits of happiness.
* 'Explained by: Freedom to make life choices':  perceived freedom to make life choice.
* 'Explained by: Generosity':  importance of ethics, policy implications, and links with the Organisation for Economic Co-operation and Development's (OECD) approach to measuring subjective well-being and other international and national efforts.
* 'Explained by: Perceptions of corruption': Weight score due to involving in some kind of corruption by a country.
* 'Dystopia + residual: psychological factors indicating some kinds of suffering in a society and lead to mental health problem.

# **Task 1: Data loading and Data Analysis**

* Load the data file name it as: df1
* Display the first 5 rows of the world-happiness-report-2021
* Display the last 10 observations of the world-happiness-report-2021
* Display a concise summary of the provided data and list out 2 observations/inferences that you observe from the result. You can use the info() method for this.
* Display the descriptive statistics of the world-happiness-report-2021
* Is there any missing values in each column of the provided dataset
* How many unique countries are there in western europe
* Display all the unique countries of western europe
* Filter and display the world happiness report score for the country 'India' in year 2021




In [1]:
import pandas as pd
import matplotlib.pyplot as plt

#Load the data file name it as: df1
df1 = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/world-happiness-report-2021.csv')
#Display the first 5 rows of the world-happiness-report-2021
df1.head(5)
#Display the last 10 observations of the world-happiness-report-2021
df1.tail(10)
#Display a concise summary of the provided data and list out 2 observations/inferences that you observe from the result. You can use the info() method for this.
df1.info()
#Display the descriptive statistics of the world-happiness-report-2021
df1.describe()
#Is there any missing values in each column of the provided dataset
df1.isnull().values.any()
#How many unique countries are there in western europe
len(df1[df1['Regional indicator'] == 'Western Europe']['Country name'].unique())
#Display all the unique countries of western europe
df1[df1['Regional indicator'] == 'Western Europe']['Country name'].unique()
#Filter and display the world happiness report score for the country 'India' in year 2021
df1[df1['Country name'] == 'India']['Ladder score'].values

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 149 entries, 0 to 148
Data columns (total 20 columns):
 #   Column                                      Non-Null Count  Dtype  
---  ------                                      --------------  -----  
 0   Country name                                149 non-null    object 
 1   Regional indicator                          149 non-null    object 
 2   Ladder score                                149 non-null    float64
 3   Standard error of ladder score              149 non-null    float64
 4   upperwhisker                                149 non-null    float64
 5   lowerwhisker                                149 non-null    float64
 6   Logged GDP per capita                       149 non-null    float64
 7   Social support                              149 non-null    float64
 8   Healthy life expectancy                     149 non-null    float64
 9   Freedom to make life choices                149 non-null    float64
 10  Generosity    

array([3.819])

# **Task2 : Visualization of results using Matplotlib library**

* Display the data of top 5 east asian countries based on 'Generosity'
* Build a plot (line plot) that shows the variation of 'Ladder score' among the above 5 east asian countries
* Build a plot that shows the variation of 'Ladder score' among the above 5 south east asian countries
* Create a dataframe object : df_2021 with the following countries: 'China', 'Nepal', 'Bangladesh', 'Pakistan', 'Myanmar', 'India', 'Afghanistan'. Now, build a scatter plot to show the relation these countries vs their 'Logged GDP per capita'

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

#Load the data file name it as: df2
df = pd.read_csv('https://raw.githubusercontent.com/dphi-official/Datasets/master/world-happiness-report-2021.csv')

#Display the data of top 5 east asian countries based on 'Generosity'
df2 = df[df['Regional indicator']=='East Asia'].sort_values(by='Generosity', ascending=False)[0:5]
df2

#Build a plot (line plot) that shows the variation of 'Ladder score' among the above 5 east asian countries
# plt.plot(df2[df2['Regional indicator']=='East Asia']['Country name'],df2[df2['Regional indicator']=='East Asia']['Ladder score'],'-bo')
# plt.xticks(rotation=45)
# plt.title('5 East Asia Country vs Ladder Score')
# plt.show()

#Build a plot that shows the variation of 'Ladder score' among the above 5 south east asian countries
# plt.plot(df[df['Regional indicator']=='Southeast Asia']['Country name'][0:5],df[df['Regional indicator']=='Southeast Asia']['Ladder score'][0:5],'-bo')
# plt.xticks(rotation=45)
# plt.show()

#Create a dataframe object : df_2021 with the following countries: 'China', 'Nepal', 'Bangladesh', 'Pakistan', 'Myanmar', 'India', 'Afghanistan'. Now, build a scatter plot to show the relation these countries vs their 'Logged GDP per capita'
df_2021 = df[(df['Country name']=='China') | 
             (df['Country name']=='Nepal') | 
             (df['Country name']=='Bangladesh') | 
             (df['Country name']=='Pakistan') | 
             (df['Country name']=='Myanmar') | 
             (df['Country name']=='India') | 
             (df['Country name']=='Afghanistan')].sort_values(by='Logged GDP per capita')
plt.scatter(df_2021['Country name'],df_2021['Logged GDP per capita'])
plt.xticks(rotation=15)
plt.show()

# **Scores**

Each of the questions in both the tasks is of 1 mark each and the scores will be given accordingly after the notebook review.