# World Happiness Report Analysis using SQL
We have dataset of different years with happiness scores of each country along with the factors that have contributed. Along with them, we also have a dataset of countries with how much share of their population is depressed or anxious and we analyse it accordingly to see contributed the most.

## Connecting to Database

In [5]:
import pandas as pd
import pyodbc
import mysql.connector as connector
import numpy as np

In [6]:
# Specify the path to your CSV file
csv_file_path = 'C:\\Users\\Arvind Kumar\\Downloads\\archive\\2022.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(csv_file_path)

df.head()

Unnamed: 0,RANK,Country,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,1,Finland,7821,7886,7756,2518,1892,1258,775,736,109,534
1,2,Denmark,7636,7710,7563,2226,1953,1243,777,719,188,532
2,3,Iceland,7557,7651,7464,2320,1936,1320,803,718,270,191
3,4,Switzerland,7512,7586,7437,2153,2026,1226,822,677,147,461
4,5,Netherlands,7415,7471,7359,2137,1945,1206,787,651,271,419


Now we can see that our above dataset has commas instead of decimal points, therefore we need to fix that first 

### 1. Fixing Database 
---


In [4]:
# Specify the path to the input CSV file
input_file = 'C:\\Users\\Arvind Kumar\\Downloads\\data_2022.csv'

# Specify the path to the output CSV file
output_file = 'C:\\Users\\Arvind Kumar\\Downloads\\data_2022.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(input_file)

# Process the DataFrame by replacing commas with decimals
df = df.replace(',', '.', regex=True)

# Write the processed DataFrame to the output CSV file
df.to_csv(output_file, index=False)

# Print a message indicating the process is complete
print("Commas replaced with decimals in the CSV file.")


Commas replaced with decimals in the CSV file.


In [5]:
df.head()

Unnamed: 0,RANK,Country,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,1,Finland,7.821,7.886,7.756,2.518,1.892,1.258,0.775,0.736,0.109,0.534
1,2,Denmark,7.636,7.71,7.563,2.226,1.953,1.243,0.777,0.719,0.188,0.532
2,3,Iceland,7.557,7.651,7.464,2.32,1.936,1.32,0.803,0.718,0.27,0.191
3,4,Switzerland,7.512,7.586,7.437,2.153,2.026,1.226,0.822,0.677,0.147,0.461
4,5,Netherlands,7.415,7.471,7.359,2.137,1.945,1.206,0.787,0.651,0.271,0.419


The daatbase is fixed now and we can proceed with our connection to the server.

### 2. Establising Connection
---

In [7]:

# Establish a connection to the database
conn_str = 'DRIVER={SQL Server};SERVER=server_name;DATABASE=database_name;UID=username;PWD=password'

conn = pyodbc.connect('Driver={SQL Server};'
                     'Server=DESKTOP-CPIGF08;'
                     'Database=world_happiness;'
                     'Trusted_Connection=yes;')

print(conn)

# Create a cursor to execute queries
cursor = conn.cursor()

<pyodbc.Connection object at 0x0000014F8F4CBB80>


In [8]:
def run_query(query):

    # Execute a sample query
    cursor.execute(query)

    rows = cursor.fetchall()
    column_names = [column[0] for column in cursor.description]

    # Create a DataFrame from the fetched data
    df = pd.DataFrame(np.array(rows), columns=column_names)

    return df


## Understanding the Data
---
First, we start with looking at how the data is split and what factors are involved.

In [38]:
df.head()

Unnamed: 0,RANK,Country,Happiness score,Whisker-high,Whisker-low,Dystopia (1.83) + residual,Explained by: GDP per capita,Explained by: Social support,Explained by: Healthy life expectancy,Explained by: Freedom to make life choices,Explained by: Generosity,Explained by: Perceptions of corruption
0,1,Finland,7.82,7.89,7.76,2.52,1.89,1.26,0.78,0.74,0.11,0.53
1,2,Denmark,7.64,7.71,7.56,2.23,1.95,1.24,0.78,0.72,0.19,0.53
2,3,Iceland,7.56,7.65,7.46,2.32,1.94,1.32,0.8,0.72,0.27,0.19
3,4,Switzerland,7.51,7.59,7.44,2.15,2.03,1.23,0.82,0.68,0.15,0.46
4,5,Netherlands,7.42,7.47,7.36,2.14,1.94,1.21,0.79,0.65,0.27,0.42


The table above comprises **12 columns** with intuitive names:
>- **Rank** - Rank of country in terms of happiness score.
>- **Country** — Name of the Country.
>- **Happiness score** — Score achieved on a scale of 0-10 by combining various factors .
>- **Whisker-high** — Highest value of the whisker which represents the 95% confidence interval of the happiness score.
>- **Whisker-low** — Lowest value of the whisker which represents the 95% confidence interval of the happiness score.
>- **Dystopia (1.83) + residual** — Dystopia is the state of worst possible happiness score which is 1.83 on average.
>- **Explained by: GDP per capita** — Score contributing to happiness from GDP per capita.
>- **Explained by: Social support** — Score contributing to happiness from social support.
>- **Explained by: Healthy life expectancy** — Score contributing to happiness from healthy life expectancy.
>- **Explained by: Freedom to make life choices** — Score contributing to happiness from freedom to make life choice.
>- **Explained by: Generosity** —Score contributing to happiness from generosity.
>- **Explained by: Perceptions of corruption** — Score contributing to happiness from perceptions of corruption.

## Exploring the Dataset
---
We have already looked above at what the top 5 happiest countries are, now we can look at which are top 10 worst countries and compare them with their respective Dystopia levels.

In [28]:
run_query("SELECT TOP 10 Country, Happiness_score, Dystopia_1_83_residual AS Dystopia FROM data_2022 WHERE Happiness_score IS NOT NULL ORDER BY Happiness_score ")

Unnamed: 0,Country,Happiness_score,Dystopia
0,Afghanistan,2.4,1.26
1,Lebanon,2.95,0.22
2,Zimbabwe,2.99,0.55
3,Rwanda*,3.27,0.54
4,Botswana*,3.47,0.19
5,Lesotho*,3.51,1.31
6,Sierra Leone,3.57,1.56
7,Tanzania,3.7,0.74
8,Malawi,3.75,1.66
9,Zambia,3.76,1.13


>- Afghanistan is the least happiest country in the world, with having very minimal difference from its dystopia level. <br>

It can also be seen why because they have been under the control of the Taliban and do not have basic human rights, which justifies their rank.

Now we can further explore which country is the happiest because of their GDP and how their quality of life improves from money.

In [14]:
run_query("SELECT TOP 10 Country, Explained_by_GDP_per_capita AS GDP FROM data_2022 ORDER BY Explained_by_GDP_per_capita DESC")

Unnamed: 0,Country,GDP
0,Luxembourg*,2.21
1,Singapore,2.15
2,Ireland,2.13
3,Switzerland,2.03
4,Norway,2.0
5,United Arab Emirates,2.0
6,United States,1.98
7,Hong Kong S.A.R. of China,1.96
8,Denmark,1.95
9,Netherlands,1.95


### A1. Does the perception of generosity depend on GDP per capita?

In [13]:
run_query("SELECT TOP 10 Country,Explained_by_Generosity AS Generosity, Explained_by_GDP_per_capita AS GDP FROM data_2022 ORDER BY Explained_by_Generosity DESC")

Unnamed: 0,Country,Generosity,GDP
0,Indonesia,0.47,1.38
1,Myanmar,0.45,1.04
2,Gambia*,0.39,0.79
3,Kenya,0.32,1.03
4,Thailand,0.32,1.53
5,Kosovo,0.31,1.36
6,Turkmenistan*,0.31,1.48
7,United Kingdom,0.29,1.87
8,Uzbekistan,0.28,1.22
9,Iceland,0.27,1.94


It can be obsereved that GDP is linked with Generosity as most countries with high generosity have a high GDP but it is not only GDP that is a factor, there can also be social support that contributes to higher generosity.

### A2. Or is it dependent on social support?

In [12]:
run_query("SELECT TOP 10 Country,Explained_by_Generosity AS Generosity ,Explained_by_Social_support AS Social_support FROM data_2022 ORDER BY Explained_by_Social_support DESC")

Unnamed: 0,Country,Generosity,Social_support
0,Iceland,0.27,1.32
1,Turkmenistan*,0.31,1.32
2,Czechia,0.16,1.26
3,Finland,0.11,1.26
4,Slovenia,0.12,1.25
5,Denmark,0.19,1.24
6,New Zealand,0.25,1.24
7,Norway,0.22,1.24
8,Estonia,0.12,1.23
9,Hungary,0.08,1.23


According to the above query, social support is a contributer to generoisty but GDP is a bigger contributer to generosity as high social support countries don't necessarily have most generous people.

### B. Which countries have the worst social support and how is happinesss affected?

In [9]:
run_query("SELECT TOP 10 Country, Explained_by_Social_support AS Social_support, Happiness_score FROM data_2022 WHERE Happiness_score IS NOT NULL ORDER BY 2,3 DESC")

Unnamed: 0,Country,Social_support,Happiness_score
0,Afghanistan,0.0,2.4
1,Benin,0.06,4.62
2,Rwanda*,0.13,3.27
3,Morocco,0.27,5.06
4,Malawi,0.28,3.75
5,Togo,0.32,4.11
6,India,0.38,3.78
7,Congo,0.41,5.07
8,Pakistan,0.41,4.52
9,Sierra Leone,0.42,3.57


Social support is an important factor as countries with worst social support seem to have a big effect on their happiness scores, however Morroco and Congo are odd cases where happiness is moderate with the worst social support, possibly GDP is helping their case or any other factors. 

### C. Does GDP influence people's freedom to make life choices?

In [18]:
run_query("SELECT TOP 10 Country, Explained_by_Freedom_to_make_life_choices AS Freedom_to_make_life_choices, Explained_by_GDP_per_capita AS GDP FROM data_2022 WHERE Explained_by_Freedom_to_make_life_choices IS NOT NULL ORDER BY 2 DESC")

Unnamed: 0,Country,Freedom_to_make_life_choices,GDP
0,Cambodia,0.74,1.02
1,Finland,0.74,1.89
2,Norway,0.73,2.0
3,Sweden,0.72,1.92
4,Uzbekistan,0.72,1.22
5,Iceland,0.72,1.94
6,Denmark,0.72,1.95
7,Vietnam,0.71,1.25
8,United Arab Emirates,0.7,2.0
9,Luxembourg*,0.7,2.21


#### Figure above shows countries ranked by freedom to make life choices along with their respective GDPs.
#### Figure below shows countries ranked lowest by GDP along with their respective freedom to make life choices.

In [17]:
run_query("SELECT TOP 10 Country, Explained_by_Freedom_to_make_life_choices AS Freedom_to_make_life_choices, Explained_by_GDP_per_capita AS GDP FROM data_2022 WHERE Explained_by_Freedom_to_make_life_choices IS NOT NULL ORDER BY 3")

Unnamed: 0,Country,Freedom_to_make_life_choices,GDP
0,Venezuela,0.28,0.0
1,Niger*,0.57,0.57
2,Mozambique,0.59,0.58
3,Liberia*,0.41,0.64
4,Malawi,0.48,0.65
5,Chad*,0.18,0.66
6,Madagascar*,0.2,0.67
7,Yemen*,0.33,0.69
8,Sierra Leone,0.39,0.69
9,Afghanistan,0.0,0.76


GDP does seem to affect people's freedom to  make life choices as with low GDP, the score for freedom to make life choices has dropped by 30-40%. There could be other factors involved in this, however there is a clear correlation between GDP and freedom to make life choices.

## Comparison with 2020, Year of Covid-19
---
We look into 2020, and see how COVID-19 affected people's happiness and how have things changed since then:

### A. How has happiness changed since 2020?

In [45]:
run_query("SELECT TOP 10 Country, Happiness_score AS Happiness2022, Ladder_score AS Happiness2020, ROUND(((Happiness_score-Ladder_score)/Ladder_score)*100,2) AS Percent_Change FROM data_2022 JOIN data_2020 ON Country_name= Country ORDER BY 2 DESC")

Unnamed: 0,Country,Happiness2022,Happiness2020,Percent_Change
0,Finland,7.82,7.81,0.13
1,Denmark,7.64,7.65,-0.13
2,Iceland,7.56,7.5,0.8
3,Switzerland,7.51,7.56,-0.66
4,Netherlands,7.41,7.45,-0.54
5,Sweden,7.38,7.35,0.41
6,Israel,7.36,7.13,3.23
7,Norway,7.36,7.49,-1.74
8,New Zealand,7.2,7.3,-1.37
9,Austria,7.16,7.29,-1.78


Happiness seems to be mostly in decline with very few countries being happier since then. It could possibly be because of the after effects of COIVD, how it affected many economies and the current recession going currently through many countries. 

### B. Which countries have recovered th best and improved their GDP since then?

In [46]:
run_query("SELECT TOP 10 Country, Explained_by_GDP_per_capita AS GDP2022, Explained_by_Log_GDP_per_capita AS GDP2020, ROUND(((Explained_by_GDP_per_capita-Explained_by_Log_GDP_per_capita)/Explained_by_Log_GDP_per_capita)*100,2) AS Percent_Change FROM data_2022 JOIN data_2020 ON Country_name= Country ORDER BY 4 DESC")

Unnamed: 0,Country,GDP2022,GDP2020,Percent_Change
0,Malawi,0.65,0.18,261.11
1,Mozambique,0.58,0.18,222.22
2,Sierra Leone,0.69,0.24,187.5
3,Togo,0.77,0.27,185.19
4,Burkina Faso,0.78,0.3,160.0
5,Afghanistan,0.76,0.3,153.33
6,Uganda,0.78,0.31,151.61
7,Benin,0.93,0.37,151.35
8,Ethiopia,0.79,0.32,146.88
9,Mali,0.79,0.35,125.71


The above shows the countries that have recovered and improved their economy since then. Most of these countries are small countries and not large economies which have not bounced back yet. 

## Depression effect on happiness
---

### Which countries have the most depression and how is happiness affected from it? 

In [33]:
run_query("SELECT TOP 10 Country_name AS Country, Social_support, ROUND(Share,2) AS Depression_share, Ladder_score AS Happiness FROM data_2020 JOIN depression ON Country_name = Entity ORDER BY 3 DESC, 4 DESC")

Unnamed: 0,Country,Social_support,Depression_share,Happiness
0,Peru,0.83,49.3,5.8
1,Ecuador,0.84,42.7,5.93
2,Zambia,0.7,40.4,3.76
3,Bolivia,0.8,38.4,5.75
4,Dominican Republic,0.88,38.4,5.69
5,Venezuela,0.89,38.2,5.05
6,El Salvador,0.81,38.1,6.35
7,Chile,0.88,37.3,6.23
8,Cameroon,0.7,36.3,5.08
9,Nicaragua,0.86,35.3,6.14


Countries with the most depression have an average amount of social support ( as seen before 1.3 and 1.2 were high values for social support) and their happiness score has taken a hit from that. However Zambia and Ecuador have a similar share of depression, but their happiness score have a big difference. We can look into which other factors are contributing to their lower happiness.


In [34]:
run_query("SELECT Country_name AS Country ,Explained_by_Log_GDP_per_capita AS GDP, Explained_by_Social_support, Ladder_score AS Happiness, Explained_by_Generosity, Explained_by_Freedom_to_make_life_choices, ROUND(Share,2) AS Depression_Share FROM data_2020 JOIN depression ON Country_name = Entity WHERE Entity IN ('Ecuador' , 'Zambia') ORDER BY 4 DESC")

Unnamed: 0,Country,GDP,Explained_by_Social_support,Happiness,Explained_by_Generosity,Explained_by_Freedom_to_make_life_choices,Depression_Share
0,Ecuador,0.85,1.22,5.93,0.12,0.56,42.7
1,Zambia,0.54,0.9,3.76,0.25,0.49,40.4


Since GDP has a larger effect on happiness, there is a considerable difference in their GDP score along with social support which explains why happiness is so different even though depression levels are simialr.

In [49]:
cursor.close()
conn.close()

## Conclusion
---

Throughout this project, we used the 2020,2022 World Happiness Report data to identify the how the countries around the world are happy and what factors contribute to them. Are those factors related or independent? How did COVID affect the world since 2020 and have people recovered since then? How depression affects people's overall happiness in the world? 
Our analysis is based on quantitative indicators like happiness score and all the factors that contribute to making the score.

>- Perception of generosity depends more on GDP per capita than it does on social support of the country. <br>
>- GDP does affect people's freedom to make life choices . <br>
>- Happiness seens to be mostly in decline with very few countries being happier since 2020.  <br>
>- Countries with the most depression are smaller ones, that have an average amount of social support and their happiness score has suffered because of that.
  
## Limitation
---
  
- The data has some different variables across different years which makes the comaprison difficult. For example, 2020 had Log-GDP score and 2022 just had Exaplined by: GDP per capita and not the actual score calcualted from GDP, just the propotion of the happiness score. 