# 🌍 World Happiness Decoded by AI - Data Analysis

**Author:** Haidar Dagham

**Year:** 2025

**Tools:** Skills, Tools & Technologies Used:
- Python Libraries: `pandas`, `numpy`, `seaborn`, `plotly`
- Notebook Environments: Jupyter Notebook, Google Colab.
- Cloud Platforms: IBM Watsonx.ai
- BI & Visualization Tools: IBM Cognos Analytics, PowerPoint.
- AI Tools: ChatGPT, DeepSeek.
- Version Control: Git & GitHub.



This notebook uses generative AI to conduct a comprehensive data analysis on the World Happiness Report 2016 dataset. Tasks include data cleaning, exploratory data analysis (EDA), visual storytelling, and dashboard creation to uncover patterns affecting global happiness.



## 📌 Project Objectives
- Investigate whether economic, demographic, or regional factors contribute to happiness.
- Use generative AI prompts to build and test data analysis and visualization code.
- Deliver insights via a dashboard and presentation.

## ✅ Key Visualizations in This Project
- Bar chart of GDP and Life Expectancy for top 10 happiest countries
- Correlation heatmap of happiness-related factors
- Scatter plot of GDP vs Happiness Score per region
- Pie chart of Happiness Score by region
- Choropleth map showing GDP per capita with life expectancy tooltip
- Interactive dashboard with multiple visualizations

## Preparing and Explore the Dataset


## 📥 1. Data Loading

Install the needed library


In [None]:
!pip install pandas
%pip install seaborn



In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.io as pio

## Load the dataset


Utilize the Pandas method read_csv() to load the data into a dataframe.


In [None]:
file_path = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-AI0272EN-SkillsNetwork/labs/dataset/2016.csv"
!wget {file_path}

df = pd.read_csv('2016.csv')
# Set pandas option to display all columns
pd.set_option('display.max_columns', None)

--2025-05-20 07:03:04--  https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-AI0272EN-SkillsNetwork/labs/dataset/2016.csv
Resolving cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)... 169.45.118.108
Connecting to cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud (cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud)|169.45.118.108|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17062 (17K) [text/csv]
Saving to: ‘2016.csv’


2025-05-20 07:03:04 (3.46 MB/s) - ‘2016.csv’ saved [17062/17062]



## Explore the dataset


Display the top 5 rows and columns from your dataset.


In [None]:
df.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Lower Confidence Interval,Upper Confidence Interval,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Denmark,Western Europe,1,7.526,7.46,7.592,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939
1,Switzerland,Western Europe,2,7.509,7.428,7.59,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463
2,Iceland,Western Europe,3,7.501,7.333,7.669,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137
3,Norway,Western Europe,4,7.498,7.421,7.575,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465
4,Finland,Western Europe,5,7.413,7.351,7.475,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2.82596


# Data Analysis Using Generative AI

In [None]:
# Replace 'your_file_path.csv' with the actual path to your CSV file
file_path = '2016.csv'

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(file_path)

# Print the first 5 rows of the DataFrame
df.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Lower Confidence Interval,Upper Confidence Interval,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Denmark,Western Europe,1,7.526,7.46,7.592,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939
1,Switzerland,Western Europe,2,7.509,7.428,7.59,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463
2,Iceland,Western Europe,3,7.501,7.333,7.669,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137
3,Norway,Western Europe,4,7.498,7.421,7.575,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465
4,Finland,Western Europe,5,7.413,7.351,7.475,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2.82596


In [None]:
df.dtypes

Unnamed: 0,0
Country,object
Region,object
Happiness Rank,int64
Happiness Score,float64
Lower Confidence Interval,float64
Upper Confidence Interval,object
Economy (GDP per Capita),object
Family,float64
Health (Life Expectancy),object
Freedom,object


In [None]:
# Remove leading and trailing whitespaces from the values in a specific column
# Replace 'your_column_name' with the actual column name
df['Upper Confidence Interval'] = df['Upper Confidence Interval'].str.strip()
df['Economy (GDP per Capita)'] = df['Economy (GDP per Capita)'].str.strip()
df['Health (Life Expectancy)'] = df['Health (Life Expectancy)'].str.strip()
df['Freedom'] = df['Freedom'].str.strip()

# Clean a column by replacing empty strings with NaN values
df['Upper Confidence Interval'] = df['Upper Confidence Interval'].replace('', np.nan)
df['Economy (GDP per Capita)'] = df['Economy (GDP per Capita)'].replace('', np.nan)
df['Health (Life Expectancy)'] = df['Health (Life Expectancy)'].replace('', np.nan)
df['Freedom'] = df['Freedom'].replace('', np.nan)

In [None]:
null_counts = df.isnull().sum()

print(null_counts)

Country                          0
Region                           0
Happiness Rank                   0
Happiness Score                  0
Lower Confidence Interval        4
Upper Confidence Interval        3
Economy (GDP per Capita)         2
Family                           0
Health (Life Expectancy)         3
Freedom                          1
Trust (Government Corruption)    0
Generosity                       0
Dystopia Residual                0
dtype: int64


In [None]:
# Identify columns with missing values
missing_value_columns = df.columns[df.isnull().any()]


# Replace missing values with the mean of the column
for column in missing_value_columns:
    # Convert the column to float64
    df[column] = df[column].astype('float64')

    # Calculate the mean of the column
    mean_value = df[column].mean()

    # Replace missing values with the mean
    df[column]= df[column].fillna(mean_value)


In [None]:
null_counts = df.isnull().sum()

print(null_counts)

Country                          0
Region                           0
Happiness Rank                   0
Happiness Score                  0
Lower Confidence Interval        0
Upper Confidence Interval        0
Economy (GDP per Capita)         0
Family                           0
Health (Life Expectancy)         0
Freedom                          0
Trust (Government Corruption)    0
Generosity                       0
Dystopia Residual                0
dtype: int64


Using AI: Write a python code to do the following tasks as per latest pandas:

Convert all the columns types to the perfect type for our analysis.

In [None]:
df = df.convert_dtypes()
df.dtypes

Unnamed: 0,0
Country,string[python]
Region,string[python]
Happiness Rank,Int64
Happiness Score,Float64
Lower Confidence Interval,Float64
Upper Confidence Interval,Float64
Economy (GDP per Capita),Float64
Family,Float64
Health (Life Expectancy),Float64
Freedom,Float64


In [None]:
# Sort the DataFrame by GDP per capita to get the top 10 countries
top_10_countries = df.nlargest(10, 'Economy (GDP per Capita)')

# Create a bar chart using Plotly
fig1 = px.bar(top_10_countries,
              x='Country',
              y=['Economy (GDP per Capita)', 'Health (Life Expectancy)'],
              barmode='group',
              title='GDP per Capita and Healthy Life Expectancy of Top 10 Countries')

# Show the figure
fig1.show()

In [None]:
# Create a sub-dataset with the specified attributes
sub_dataset = df[['Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)',
                  'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Happiness Score']]

# Display the first few rows of the sub-dataset
sub_dataset.head()

Unnamed: 0,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Happiness Score
0,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,7.526
1,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,7.509
2,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,7.501
3,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,7.498
4,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,7.413


Using AI: Write a python code that performs the following actions:

Find the correlation between the attributes in the subdataset as a heatmap named fig2 using Plotly of width 800 and height 600.

In [None]:
# Calculate the correlation matrix
correlation_matrix = sub_dataset.corr()

# Create a heatmap using Plotly
fig2 = px.imshow(correlation_matrix,
                 labels=dict(x="Attributes", y="Attributes", color="Correlation"),
                 x=correlation_matrix.columns,
                 y=correlation_matrix.columns,
                 color_continuous_scale='Viridis',
                 title='Correlation Heatmap of sub-dataset Attributes')

# Update layout for the heatmap
fig2.update_layout(width=800, height=600)

# Show the figure
fig2.show()

In [None]:
# Create a scatter plot using Plotly
fig3 = px.scatter(df,
                  x='Economy (GDP per Capita)',
                  y='Happiness Score',
                  color='Region',
                  title='Scatter Plot of Happiness Score vs GDP per Capita',
                  labels={'Economy (GDP per Capita)': 'GDP per Capita', 'Happiness Score': 'Happiness Score'})

# Show the figure
fig3.show()

In [None]:
# Aggregate the Happiness Score by Region
region_happiness = df.groupby('Region')['Happiness Score'].sum().reset_index()

# Create a pie chart using Plotly
fig4 = px.pie(region_happiness,
              names='Region',
              values='Happiness Score',
              title='Happiness Score by Region')

# Show the figure
fig4.show()

In [None]:
# Create a choropleth map using Plotly
fig5 = px.choropleth(df,
                     locations='Country',
                     locationmode='country names',
                     color='Economy (GDP per Capita)',
                     hover_name='Country',
                     hover_data={'Economy (GDP per Capita)': True, 'Health (Life Expectancy)': True},
                     title='GDP per Capita and Healthy Life Expectancy by Country',
                     color_continuous_scale='Viridis')

# Show the figure
fig5.show()

In [None]:
# Create a heatmap using Plotly
fig2 = px.imshow(correlation_matrix,
                 labels=dict(x="Attributes", y="Attributes", color="Correlation"),
                 x=correlation_matrix.columns,
                 y=correlation_matrix.columns,
                 color_continuous_scale='Viridis',
                 title='Correlation Heatmap of sub-dataset Attributes')
fig2.update_layout(width=800, height=600)

fig3 = px.scatter(df,
                  x='Economy (GDP per Capita)',
                  y='Happiness Score',
                  color='Region',
                  title='Scatter Plot of Happiness Score vs GDP per Capita',
                  labels={'Economy (GDP per Capita)': 'GDP per Capita', 'Happiness Score': 'Happiness Score'})

# Create a pie chart using Plotly
fig4 = px.pie(region_happiness,
              names='Region',
              values='Happiness Score',
              title='Happiness Score by Region')

# Create a choropleth map using Plotly
fig5 = px.choropleth(df,
                     locations='Country',
                     locationmode='country names',
                     color='Economy (GDP per Capita)',
                     hover_name='Country',
                     hover_data={'Economy (GDP per Capita)': True, 'Health (Life Expectancy)': True},
                     title='GDP per Capita and Healthy Life Expectancy by Country',
                     color_continuous_scale='Viridis')

# Create a list of HTML strings for each figure
html_figures = [pio.to_html(fig, full_html=False) for fig in [fig2, fig3, fig4, fig5]]

# Combine all HTML strings into a single HTML document
full_html = "<html><head></head><body>" + "".join(html_figures) + "</body></html>"

# Write the combined HTML to a file
with open('dashboard.html', 'w') as f:
    f.write(full_html)

# Optionally, open the file in the default web browser
import webbrowser
webbrowser.open('dashboard.html')

False

# Narrative for World Happiness Report Dashboard


# Introduction

This dashboard provides an in-depth exploration of the World Happiness Report 2016 data, offering visual insights into the multifaceted aspects that influence happiness at a global, regional, and country-specific level. The dashboard is divided into four main sections, each featuring distinct visualizations that collectively tell a comprehensive story about the state of global happiness.


# 1. Heatmap of Correlations
Visualization: Correlation Heatmap

Purpose: The heatmap serves as a foundation, providing a visual summary of the linear relationships among key factors affecting happiness—Economy (GDP per Capita), Family, Health (Life Expectancy), Freedom, Trust (Government Corruption), Generosity, and Happiness Score.


### Key Findings:

Darker shades indicate stronger correlations. Negative values indicate inverse relationships (e.g., higher GDP may sometimes correlate with lower health rankings due to urban stress, etc.).
The heatmap reveals that 'Economy' (GDP per Capita) has the strongest positive correlation with 'Happiness Score', suggesting a significant economic contribution to overall well-being.
'Health' (Life Expectancy) and 'Freedom' also emerge as highly correlated factors, underlining the importance of good health infrastructure and political freedoms in enhancing happiness.



# 2. Scatter Plot - GDP per Capita vs Happiness Score by Region
Visualization: Scatter Plot with Regional Segmentation

Purpose: This plot explores the relationship between a country's GDP per capita and its corresponding Happiness Score, segmented by geographical regions (Africa, Americas, Asia, Europe, Oceania).


### Key Findings:

The scatter plot visually reveals clusters and outliers, depicting regions where high GDP does not necessarily equate to high happiness scores (e.g., certain countries in Asia) and vice versa (e.g., some Nordic countries).
A clear distinction is observable between developed and developing economies, yet the plot also highlights exceptions, hinting at the complexity of the relationship beyond just economic factors.


# 3. Pie Chart - Happiness Score Distribution by Region
Visualization: Pie Chart

Purpose: This chart offers a simplified, comparative view of the aggregated Happiness Scores across different regions, normalized to percentages.


###Key Findings:

The distribution of happiness scores visually segregates regions, pointing to disparities in global happiness.
Europe and Northern America, typically scoring the highest, form the largest slices, highlighting the regional advantage in perceived well-being.
This helps in understanding the regional concentration of high happiness scores, providing a geopolitical perspective.


# 4. Interactive Map - GDP per Capita with Healthy Life Expectancy Tooltip
Visualization: Interactive Map with Tooltips

Purpose: The map visualizes the GDP per Capita of each country and, upon hovering, displays Healthy Life Expectancy.


### Key Findings:

The geographical distribution of wealth (GDP per Capita) underscores the economic divide across the globe.
Tooltips providing Healthy Life Expectancy bring attention to the individual country's health profile, revealing discrepancies between wealth and health across nations.
This interactive feature allows users to explore and compare countries based on both economic and health metrics, fostering a deeper understanding of the multidimensional nature of happiness.


# Conclusion

Collectively, these visualizations from the World Happiness Report 2016 dataset offer a nuanced exploration of what drives happiness globally. They highlight the multifaceted nature of happiness