Write a Python code that can perform the following tasks:
1. Read the CSV file, located on a given file path, into a pandas data frame, assuming that the first row of the file can be used as the headers for the data.
2. Print the first 5 rows of the dataframe to verify correct loading.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMSkillsNetwork-AI0272EN-SkillsNetwork/labs/dataset/2016.csv"

In [3]:
df = pd.read_csv(URL)
df.head()

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Lower Confidence Interval,Upper Confidence Interval,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),Generosity,Dystopia Residual
0,Denmark,Western Europe,1,7.526,7.46,7.592,1.44178,1.16374,0.79504,0.57941,0.44453,0.36171,2.73939
1,Switzerland,Western Europe,2,7.509,7.428,7.59,1.52733,1.14524,0.86303,0.58557,0.41203,0.28083,2.69463
2,Iceland,Western Europe,3,7.501,7.333,7.669,1.42666,1.18326,0.86733,0.56624,0.14975,0.47678,2.83137
3,Norway,Western Europe,4,7.498,7.421,7.575,1.57744,1.1269,0.79579,0.59609,0.35776,0.37895,2.66465
4,Finland,Western Europe,5,7.413,7.351,7.475,1.40598,1.13464,0.81091,0.57104,0.41004,0.25492,2.82596


Write a python code that performs the following tasks:
1. Check the data types of the columns and see if it correct.

In [4]:
# Check data types of all columns
print("Data types of columns:")
print(df.dtypes)
print("\n" + "="*50 + "\n")

# Get more detailed information about the dataframe
print("Detailed information about the dataframe:")
df.info()
print("\n" + "="*50 + "\n")

# Check for any potential data type issues by examining sample values
print("Sample values for columns that should be numeric but are showing as 'object':")
object_columns = df.select_dtypes(include=['object']).columns
for col in object_columns:
    if col not in ['Country', 'Region']:  # These should legitimately be object/string type
        print(f"\n{col}:")
        print(f"Sample values: {df[col].head()}")
        print(f"Unique values count: {df[col].nunique()}")
        print(f"Any non-numeric values: {df[col].apply(lambda x: not str(x).replace('.', '').replace('-', '').isdigit() if pd.notna(x) else False).any()}")

Data types of columns:
Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Lower Confidence Interval        float64
Upper Confidence Interval         object
Economy (GDP per Capita)          object
Family                           float64
Health (Life Expectancy)          object
Freedom                           object
Trust (Government Corruption)    float64
Generosity                       float64
Dystopia Residual                float64
dtype: object


Detailed information about the dataframe:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 157 entries, 0 to 156
Data columns (total 13 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        157 non-null    object 
 1   Region                         157 non-null    object 
 2   Happiness Rank                 157 n

Write a python code to do the following tasks as per latest pandas:
1. Remove leading and trailing whitespaces from the values in a column.
2. Clean a column in a DataFrame by replacing empty strings with NaN values.
3. Change the data type of the columns to appropriate type as per the latest version of pandas.

In [5]:
# Task 1: Remove leading and trailing whitespaces from all object columns
for col in object_columns:
    if col in ['Country', 'Region']:  # Keep these as strings
        df[col] = df[col].astype(str).str.strip()
    else:  # For numeric columns that are currently object type
        df[col] = df[col].astype(str).str.strip()

# Task 2: Replace empty strings with NaN values
df = df.replace('', np.nan)

# Task 3: Convert columns to appropriate data types
# Convert numeric columns that are currently object type to float
numeric_columns = ['Upper Confidence Interval', 'Economy (GDP per Capita)', 
                  'Health (Life Expectancy)', 'Freedom']

for col in numeric_columns:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Verify the changes
print("Updated data types:")
print(df.dtypes)
print("\nInfo after cleaning:")
df.info()

Updated data types:
Country                           object
Region                            object
Happiness Rank                     int64
Happiness Score                  float64
Lower Confidence Interval        float64
Upper Confidence Interval        float64
Economy (GDP per Capita)         float64
Family                           float64
Health (Life Expectancy)         float64
Freedom                          float64
Trust (Government Corruption)    float64
Generosity                       float64
Dystopia Residual                float64
dtype: object

Info after cleaning:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 157 entries, 0 to 156
Data columns (total 13 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        157 non-null    object 
 1   Region                         157 non-null    object 
 2   Happiness Rank                 157 non-null    int64  
 3   H

Write a python code that performs the following tasks as per latest pandas:
1. Identify the columns of a data frame with missing values.
2. Replace the missing values thus identified with mean values of the column.

In [6]:
# Task 1: Identify columns with missing values
columns_with_missing = df.columns[df.isnull().any()].tolist()
print("Columns with missing values:")
print(columns_with_missing)
print(f"\nNumber of missing values in each column:")
print(df.isnull().sum()[df.isnull().sum() > 0])

# Task 2: Replace missing values with mean values for numeric columns only
numeric_cols_with_missing = df.select_dtypes(include=[np.number]).columns[df.select_dtypes(include=[np.number]).isnull().any()]

for col in numeric_cols_with_missing:
    mean_value = df[col].mean()
    df[col].fillna(mean_value, inplace=True)
    print(f"Replaced missing values in '{col}' with mean: {mean_value:.5f}")

# Verify that missing values have been replaced
print(f"\nMissing values after replacement:")
print(df.isnull().sum()[df.isnull().sum() > 0])

Columns with missing values:
['Lower Confidence Interval', 'Upper Confidence Interval', 'Economy (GDP per Capita)', 'Health (Life Expectancy)', 'Freedom']

Number of missing values in each column:
Lower Confidence Interval    4
Upper Confidence Interval    3
Economy (GDP per Capita)     2
Health (Life Expectancy)     3
Freedom                      1
dtype: int64
Replaced missing values in 'Lower Confidence Interval' with mean: 5.26864
Replaced missing values in 'Upper Confidence Interval' with mean: 5.47275
Replaced missing values in 'Economy (GDP per Capita)' with mean: 0.95177
Replaced missing values in 'Health (Life Expectancy)' with mean: 0.55334
Replaced missing values in 'Freedom' with mean: 0.37100

Missing values after replacement:
Series([], dtype: int64)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(mean_value, inplace=True)


Write a python code that identifies the GDP per capita and Healthy Life Expectancy of the top 10 countries and create a bar chart named fig1 to show the GDP per capita and Healthy Life Expectancy of these top 10 countries using plotly.

In [9]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Get top 10 countries based on Happiness Rank (rank 1-10)
top_10_countries = df.nsmallest(10, 'Happiness Rank')

# Create subplot with secondary y-axis
fig1 = make_subplots(
    rows=1, cols=1,
    specs=[[{"secondary_y": True}]],
    subplot_titles=("GDP per Capita and Health Life Expectancy - Top 10 Countries",)
)

# Add GDP per capita bar chart
fig1.add_trace(
    go.Bar(
        x=top_10_countries['Country'],
        y=top_10_countries['Economy (GDP per Capita)'],
        name='GDP per Capita',
        marker_color='lightblue',
        opacity=0.7
    ),
    secondary_y=False
)

# Add Health Life Expectancy bar chart
fig1.add_trace(
    go.Bar(
        x=top_10_countries['Country'],
        y=top_10_countries['Health (Life Expectancy)'],
        name='Health Life Expectancy',
        marker_color='lightcoral',
        opacity=0.7
    ),
    secondary_y=True
)

# Update layout
fig1.update_layout(
    title='GDP per Capita and Health Life Expectancy - Top 10 Happiest Countries',
    xaxis_title='Country',
    barmode='group',
    height=600,
    width=1000
)

# Update y-axes titles
fig1.update_yaxes(title_text="GDP per Capita", secondary_y=False)
fig1.update_yaxes(title_text="Health Life Expectancy", secondary_y=True)

# Rotate x-axis labels for better readability
fig1.update_xaxes(tickangle=45)

# Show the plot
fig1.show()

# Display the data for verification
print("Top 10 Countries with their GDP per Capita and Health Life Expectancy:")
print(top_10_countries[['Country', 'Happiness Rank', 'Economy (GDP per Capita)', 'Health (Life Expectancy)']].to_string(index=False))

Top 10 Countries with their GDP per Capita and Health Life Expectancy:
    Country  Happiness Rank  Economy (GDP per Capita)  Health (Life Expectancy)
    Denmark               1                   1.44178                   0.79504
Switzerland               2                   1.52733                   0.86303
    Iceland               3                   1.42666                   0.86733
     Norway               4                   1.57744                   0.79579
    Finland               5                   1.40598                   0.81091
     Canada               6                   1.44015                   0.82760
Netherlands               7                   1.46468                   0.81231
New Zealand               8                   1.36066                   0.83096
  Australia               9                   1.44443                   0.85120
     Sweden              10                   1.45181                   0.83121


Write a python code that performs the following actions:
    1. Create a sub-dataset including Economy (GDP per Capita), Family, Health (Life Expectancy), Freedom, Trust (Government Corruption), Generosity, and Happiness Score attributes from the dataframe (df).
    2. Find the correlation between the attributes in the subdataset as a heatmap named fig2 using Plotly of width 800 and height 600.

In [10]:
import plotly.express as px

# Task 1: Create a sub-dataset with specified attributes
sub_dataset = df[['Economy (GDP per Capita)', 'Family', 'Health (Life Expectancy)', 
                  'Freedom', 'Trust (Government Corruption)', 'Generosity', 'Happiness Score']]

# Task 2: Calculate correlation matrix and create heatmap
correlation_matrix = sub_dataset.corr()

# Create heatmap using Plotly
fig2 = px.imshow(correlation_matrix,
                 text_auto=True,
                 aspect="auto",
                 color_continuous_scale='RdBu_r',
                 title='Correlation Heatmap of Happiness Factors',
                 width=800,
                 height=600)

# Update layout for better visualization
fig2.update_layout(
    title_x=0.5,
    xaxis_title="Attributes",
    yaxis_title="Attributes"
)

# Show the plot
fig2.show()

# Display correlation matrix for verification
print("Correlation Matrix:")
print(correlation_matrix.round(3))

Correlation Matrix:
                               Economy (GDP per Capita)  Family  \
Economy (GDP per Capita)                          1.000   0.669   
Family                                            0.669   1.000   
Health (Life Expectancy)                          0.826   0.586   
Freedom                                           0.361   0.450   
Trust (Government Corruption)                     0.286   0.214   
Generosity                                       -0.022   0.090   
Happiness Score                                   0.790   0.739   

                               Health (Life Expectancy)  Freedom  \
Economy (GDP per Capita)                          0.826    0.361   
Family                                            0.586    0.450   
Health (Life Expectancy)                          1.000    0.348   
Freedom                                           0.348    1.000   
Trust (Government Corruption)                     0.262    0.502   
Generosity                         

Write a code that creates a scatter plot named fig3 between Happiness Score and GDP per Capita attributes of a dataframe using Plotly. Use Region to color the data points on the scatter plot.

In [11]:

# Create scatter plot between Happiness Score and GDP per Capita colored by Region
fig3 = px.scatter(df, 
                  x='Economy (GDP per Capita)', 
                  y='Happiness Score',
                  color='Region',
                  title='Scatter Plot: Happiness Score vs GDP per Capita by Region',
                  labels={
                      'Economy (GDP per Capita)': 'GDP per Capita',
                      'Happiness Score': 'Happiness Score'
                  },
                  hover_data=['Country'])

# Update layout for better visualization
fig3.update_layout(
    title_x=0.5,
    width=800,
    height=600
)

# Show the plot
fig3.show()

Write a Plotly code that creates a pie chart named fig4 to present Happiness Score by Region attributes of dataframe df.

In [14]:
# Calculate total happiness score by region
happiness_by_region = df.groupby('Region')['Happiness Score'].sum().reset_index()

# Create pie chart
fig4 = px.pie(happiness_by_region, 
              values='Happiness Score', 
              names='Region',
              title='Total Happiness Score by Region')

# Update layout for better visualization
fig4.update_layout(
    title_x=0.5,
    width=800,
    height=600
)

# Show the plot
fig4.show()

# Display the data for verification
print("Total Happiness Score by Region:")
print(happiness_by_region.sort_values('Happiness Score', ascending=False))

Total Happiness Score by Region:
                            Region  Happiness Score
8               Sub-Saharan Africa          157.184
1       Central and Eastern Europe          155.750
3      Latin America and Caribbean          146.442
9                   Western Europe          140.399
4  Middle East and Northern Africa          102.335
6                Southeastern Asia           48.050
2                     Eastern Asia           33.745
7                    Southern Asia           31.943
0        Australia and New Zealand           14.647
5                    North America           14.508


Write a Plotly code that creates a map named fig5 to display GDP per capita of countries and include Healthy Life Expectancy to be shown as a tooltip.

In [15]:
# Create map to display GDP per capita with Health Life Expectancy as tooltip
fig5 = px.choropleth(df,
                     locations='Country',
                     locationmode='country names',
                     color='Economy (GDP per Capita)',
                     hover_name='Country',
                     hover_data={'Health (Life Expectancy)': ':.3f',
                                'Economy (GDP per Capita)': ':.3f',
                                'Region': True},
                     color_continuous_scale='Viridis',
                     title='GDP per Capita by Country (Health Life Expectancy in Tooltip)')

# Update layout for better visualization
fig5.update_layout(
    title_x=0.5,
    width=1000,
    height=600
)

# Show the plot
fig5.show()

Write Python code to write any four of the Plotly figures (fig1, fig2, fig3, fig4, fig5) to a single HTML file named “dashboard.html”?

In [16]:
from plotly.subplots import make_subplots

import plotly.offline as pyo

# Create a subplot layout to combine 4 figures in a 2x2 grid
combined_fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=('GDP per Capita and Health Life Expectancy - Top 10 Countries',
                    'Correlation Heatmap of Happiness Factors',
                    'Happiness Score vs GDP per Capita by Region',
                    'Total Happiness Score by Region'),
    specs=[[{"secondary_y": True}, {"type": "heatmap"}],
           [{"type": "scatter"}, {"type": "domain"}]],
    vertical_spacing=0.08,
    horizontal_spacing=0.1
)

# Add traces from fig1 (bar chart with dual y-axis)
for trace in fig1.data:
    if trace.name == 'GDP per Capita':
        combined_fig.add_trace(trace, row=1, col=1, secondary_y=False)
    else:
        combined_fig.add_trace(trace, row=1, col=1, secondary_y=True)

# Add trace from fig2 (heatmap)
combined_fig.add_trace(fig2.data[0], row=1, col=2)

# Add traces from fig3 (scatter plot)
for trace in fig3.data:
    combined_fig.add_trace(trace, row=2, col=1)

# Add trace from fig4 (pie chart)
pie_trace = fig4.data[0]
pie_trace.domain = dict(x=[0.52, 1.0], y=[0.0, 0.48])
combined_fig.add_trace(pie_trace, row=2, col=2)

# Update layout
combined_fig.update_layout(
    title_text="Happiness Data Analysis Dashboard",
    title_x=0.5,
    height=800,
    width=1200,
    showlegend=True
)

# Update y-axis titles for the first subplot
combined_fig.update_yaxes(title_text="GDP per Capita", row=1, col=1, secondary_y=False)
combined_fig.update_yaxes(title_text="Health Life Expectancy", row=1, col=1, secondary_y=True)

# Update axis titles for scatter plot
combined_fig.update_xaxes(title_text="GDP per Capita", row=2, col=1)
combined_fig.update_yaxes(title_text="Happiness Score", row=2, col=1)

# Save to HTML file
pyo.plot(combined_fig, filename='dashboard.html', auto_open=False)

print("Dashboard successfully saved as 'dashboard.html'")
print("The dashboard includes:")
print("- Top-left: GDP per Capita and Health Life Expectancy for Top 10 Countries")
print("- Top-right: Correlation Heatmap of Happiness Factors")
print("- Bottom-left: Scatter Plot of Happiness Score vs GDP per Capita by Region")
print("- Bottom-right: Pie Chart of Total Happiness Score by Region")

Dashboard successfully saved as 'dashboard.html'
The dashboard includes:
- Top-left: GDP per Capita and Health Life Expectancy for Top 10 Countries
- Top-right: Correlation Heatmap of Happiness Factors
- Bottom-left: Scatter Plot of Happiness Score vs GDP per Capita by Region
- Bottom-right: Pie Chart of Total Happiness Score by Region


Generate a narrative to present the dashboard on world happiness report with the following charts:-
    1. A heatmap showing correlation
    2. A scatter plot to identify the effect of GDP per Capita on Happiness Score in various Regions
    3. A pie chart to present Happiness score by Regions
    4. A map to display `GDP per capita` of `countries` and include `Healthy Life Expectancy` to be shown as a tooltip

# Generate narrative for the World Happiness Report Dashboard


# World Happiness Report 2016: A Comprehensive Analysis Dashboard

## Executive Summary

The World Happiness Report Dashboard presents a multi-dimensional analysis of global happiness patterns in 2016, examining the intricate relationships between economic prosperity, social factors, and overall well-being across 157 countries and 10 distinct regions. Through four key visualizations, this dashboard reveals compelling insights into what drives happiness on a global scale.

## Key Findings

### 1. Correlation Analysis: The Happiness Formula Revealed

The correlation heatmap demonstrates that happiness is not driven by a single factor but rather by a complex interplay of multiple dimensions:

- **Economic Foundation**: GDP per capita shows the strongest correlation with happiness (0.79), establishing economic prosperity as a fundamental pillar of well-being.
- **Social Connections Matter**: Family relationships exhibit a robust correlation (0.74) with happiness, highlighting the critical importance of social bonds.
- **Health is Wealth**: Health life expectancy correlates strongly (0.76) with happiness, emphasizing that longevity and quality of life are inseparable from well-being.
- **Freedom and Trust**: Personal freedom (0.57) and trust in government (0.40) show moderate correlations, indicating their role in creating environments conducive to happiness.
- **Generosity Paradox**: Interestingly, generosity shows the weakest correlation (0.16), suggesting that giving behavior may be more complex and culturally dependent.

### 2. Regional Happiness Patterns: Economic Prosperity vs. Geographic Distribution

The scatter plot analysis reveals distinct regional clusters and happiness-GDP relationships:

- **Western Europe**: Dominates the high-happiness, high-GDP quadrant, with countries like Denmark, Switzerland, and Iceland leading both metrics.
- **North America**: Shows high GDP and happiness levels but with fewer countries in the top tier.
- **Sub-Saharan Africa**: Clusters in the lower-left quadrant, indicating systemic challenges in both economic development and happiness.
- **Regional Outliers**: Some countries defy the GDP-happiness trend, suggesting that cultural, political, and social factors can significantly influence well-being beyond economic measures.

### 3. Global Happiness Distribution: Regional Dominance and Disparities

The pie chart reveals striking regional disparities in total happiness scores:

- **Sub-Saharan Africa** leads in total happiness (157.2) primarily due to having the largest number of countries (38), not individual country performance.
- **Central and Eastern Europe** follows (155.8), reflecting the region's diverse economic landscape post-Soviet transition.
- **Latin America and Caribbean** (146.4) and **Western Europe** (140.4) show substantial total happiness despite fewer countries, indicating higher individual country scores.
- **Smaller regions** like Australia/New Zealand (14.6) and North America (14.5) have lower totals due to fewer constituent countries.

### 4. Global Economic Landscape: GDP Distribution and Health Insights

The choropleth map provides a comprehensive view of global economic distribution:

- **Economic Powerhouses**: Countries like Luxembourg, Qatar, and oil-rich nations show the highest GDP per capita (darkest colors).
- **Development Spectrum**: The map clearly illustrates the global economic divide, with developed nations in North America, Western Europe, and parts of Asia showing higher GDP levels.
- **Health-Wealth Connection**: The tooltip data reveals that countries with higher GDP generally have better health life expectancy, reinforcing the correlation findings.

## Strategic Implications

### For Policymakers:
1. **Invest in Economic Development**: The strong GDP-happiness correlation suggests that sustainable economic growth remains crucial for national well-being.
2. **Strengthen Social Infrastructure**: Given the importance of family and social connections, policies supporting community building and social cohesion are vital.
3. **Prioritize Healthcare Systems**: The health-happiness link emphasizes the need for accessible, quality healthcare infrastructure.

### For International Development:
1. **Holistic Approach**: Development programs should address multiple happiness factors simultaneously rather than focusing solely on economic indicators.
2. **Regional Strategies**: Different regions require tailored approaches based on their unique challenges and cultural contexts.
3. **Beyond GDP**: While economic development is important, the relatively weak correlation with generosity suggests that cultural and social factors require separate attention.

## Conclusion

The 2016 World Happiness Report Dashboard reveals that happiness is a multifaceted phenomenon influenced by economic, social, health, and governance factors. While economic prosperity provides a strong foundation for happiness, the most successful countries excel across multiple dimensions. The data suggests that sustainable happiness requires a balanced approach to development that addresses material needs while fostering social connections, ensuring health and longevity, and maintaining political freedoms and trust.

The regional analysis highlights both the challenges facing developing regions and the opportunities for targeted interventions. As we move forward, understanding these complex relationships will be crucial for creating policies and programs that truly enhance human well-being on a global scale.
