<a href="https://colab.research.google.com/github/Heather-Marsh/move/blob/main/Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Final Project, World Happiness Report 2005 - 2024


 I will be looking at the World Happiness Report from 2005-2024. The full scale report combines data from over 140 countries with high-quality analysis by world-leading researchers from a wide range of academic disciplines. This particular dataset that I have is a smaller scale report that only includes 10 of the 140 countries. The report focuses on factors that impact the happiness score of a country such as the GDP per capita, education index, life satisfaction score, mental health index, climate index and crime rate to name a few. Analyzing this data can show what problems countries are facing, how their populations are being impacted and how they compare to other countries.




**Data Source:**
*  Name: World Happiness Report
*  URL: https://www.kaggle.com/datasets/khushikyad001/world-happiness-report/data
*  Description: This dataset contains 4,000 entries with 24 columns related to happiness, economic, social, and political indicators for different countries across multiple years.





# Question 1: What are the key factors contributing to and from happiness across different countries?
Different countries will be impacted by the different factors at different levels. A country such as the USA will most likely have a higher freedom score than China but China could have a higher life expectancy rate. Seeing which factors are the key contributors and then how high the happiness rates were for those countries could let countries know where they need to invest more resources


In [None]:
#must run this command before you run any of the graphs due to the widget library being used
!pip install -U plotly

In [None]:
import plotly.graph_objects as go
import pandas as pd
import ipywidgets as widgets
from IPython.display import display
import plotly.io as pio
import plotly.express as px
import plotly.colors

pio.renderers.default = 'colab'

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2DES73IL3SINDUKFI2ARJNEA")

def create_interactive_graph():
  # Function to update graph based on selected country
  def update_graph(country):
      country_data = df[df['Country'] == country]
      factors = ['GDP_per_Capita',
                 'Social_Support',
                 'Freedom',
                 'Generosity',
                 'Corruption_Perception',
                 'Unemployment_Rate',
                 'Education_Index',
                 'Urbanization_Rate',
                 'Life_Satisfaction',
                 'Public_Trust',
                 'Mental_Health_Index',
                 'Income_Inequality',
                 'Public_Health_Expenditure',
                 'Climate_Index',
                 'Work_Life_Balance',
                 'Internet_Access',
                 'Crime_Rate',
                 'Political_Stability',
                 'Employment_Rate']

      if not country_data.empty:
          latest_year = country_data['Year'].max()
          latest_data = country_data[country_data['Year'] == latest_year].iloc[0]

          # Normalize values safely (avoid division by zero)
          normalized_values = []
          for factor in factors:
              if factor in df.columns:
                  min_val = df[factor].min()
                  max_val = df[factor].max()
                  val = latest_data[factor]
                  normalized = (val - min_val) / (max_val - min_val) if max_val != min_val else 0.5
                  normalized_values.append(normalized)
              else:
                  normalized_values.append(0.5)  # Default if column not found

            # Create a bar chart with a gradient color scale
          fig = px.bar(x=factors, y=normalized_values,
                      labels={'x': 'Factors', 'y': 'Normalized Value'},
                      title=f"Happiness Contributing and Detracting Factors in {country} ({latest_year})",
                      color=normalized_values,  # Map normalized values to color
                      color_continuous_scale='sunsetdark'  # Choose a gradient color scale
                      )

          # Update x-axis to display full names of factors
          fig.update_xaxes(tickangle=-90,
                          tickfont=dict(size=12, family='Arial', color='black'),
                          autorange="reversed"
                          )

          # Set the size of the graph
          fig.update_layout(width=1200, height=600)  # Adjust width and height as needed


          fig.show()

  # Update widget code
  output_widget = widgets.Output()
  widgets.interact(update_graph, country=widgets.Dropdown(
      options=sorted(df['Country'].unique()),
      description='Country:',
      layout=widgets.Layout(width='50%')
  ))
  display(output_widget)

create_interactive_graph()

interactive(children=(Dropdown(description='Country:', layout=Layout(width='50%'), options=('Australia', 'Braz…

Output()

In [None]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2DES73IL3SINDUKFI2ARJNEA")

# Select relevant columns
columns_of_interest = [
    'GDP_per_Capita',
    'Social_Support',
    'Freedom',
    'Generosity',
    'Corruption_Perception',
    'Unemployment_Rate',
    'Education_Index',
    'Urbanization_Rate',
    'Life_Satisfaction',
    'Public_Trust',
    'Mental_Health_Index',
    'Income_Inequality',
    'Public_Health_Expenditure',
    'Climate_Index',
    'Work_Life_Balance',
    'Internet_Access',
    'Crime_Rate',
    'Political_Stability',
    'Employment_Rate'
]

# Filter and clean
df = df.dropna(subset=columns_of_interest + ["Country"])

# Prepare a dictionary of country-specific correlation matrices
country_corrs = {}
for country in df["Country"].unique():
    country_df = df[df["Country"] == country][columns_of_interest]
    if len(country_df) >= 3:  # Needs enough data for correlation
        corr_matrix = country_df.corr().round(2)
        country_corrs[country] = corr_matrix

# Get variable order from any one matrix
variables = list(columns_of_interest)

# Create a base heatmap (first country)
first_country = list(country_corrs.keys())[0]
z = country_corrs[first_country].values

fig = go.Figure()

# Add heatmap for each country (one trace per country, initially hidden)
for i, (country, corr_matrix) in enumerate(country_corrs.items()):
    visible = True if i == 0 else False
    fig.add_trace(go.Heatmap(
        z=corr_matrix.values,
        x=variables,
        y=variables,
        zmin=-1,
        zmax=1,
        colorscale="Icefire",
        text=corr_matrix.round(2).astype(str),
        texttemplate="%{text}",
        visible=visible,
        colorbar=dict(title="Correlation", len=0.8) if i == 0 else None,
        name=country
    ))

# Add dropdown menu
dropdown_buttons = [
    dict(label=country,
         method="update",
         args=[{"visible": [j == i for j in range(len(country_corrs))]},
               {"title": f"Correlation Heatmap - {country}"}]) for i, country in enumerate(country_corrs.keys())
]


fig.update_layout(
    updatemenus=[dict(
        buttons=dropdown_buttons,
        direction="down",
        showactive=True,
        x=0.01,
        xanchor="left",
        y=1.1,
        yanchor="top"
    )],
    title_text=f"Correlation Heatmap - {first_country}",
    margin=dict(l=150, r=60, t=150, b=150),
    width=1200,
    height=800
)


fig.show()


def update_title(country):
    fig.update_layout(title_text=f"Correlation Heatmap - {country}")

for country in country_corrs.keys():
    update_title(country)


A strong negative correlation indicates a strong inverse relationship between two variables, meaning as one variable increases, the other variable decreases significantly. A strong positive correlation means that two variables tend to move in the same direction, and this relationship is very strong

# Question 2: How does happiness change around the world from year to year?


In [None]:
import plotly.express as px
import pandas as pd

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2DES73IL3SINDUKFI2ARJNEA")

# Clean and prepare data
df["Year"] = pd.to_numeric(df["Year"], errors="coerce")
df['Happiness_Score'] = pd.to_numeric(df['Happiness_Score'], errors="coerce")

df = df.dropna(subset=["Year", "GDP_per_Capita", "Happiness_Score", "Country"])
df["Year"] = df["Year"].astype(int)
df = df.sort_values(by="Year")
df["Year"] = df["Year"].astype(str)


# Calculate the overall minimum and maximum happiness scores for the color scale
min_happiness = df["Happiness_Score"].min()
max_happiness = df["Happiness_Score"].max()

# Build the animated choropleth
fig = px.choropleth(
    df,
    locations="Country",
    locationmode="country names",
    color="Happiness_Score",
    hover_name="Country",
    animation_frame="Year",
    color_continuous_scale="sunsetdark",
    range_color=(min_happiness, max_happiness),
    title="Global Happiness Score Over Time"
)

# Customize layout
fig.update_layout(
    geo=dict(showframe=True, showcoastlines=True),
    coloraxis_colorbar=dict(title="Happiness Score"),
    margin={"r":0,"t":50,"l":0,"b":0}
)

fig.show()


In [None]:
import pandas as pd
import plotly.express as px

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2DES73IL3SINDUKFI2ARJNEA")

# Clean and prepare data
df["Year"] = pd.to_numeric(df["Year"], errors="coerce")
df = df.dropna(subset=["Year", "GDP_per_Capita", "Happiness_Score", "Country"])
df["Year"] = df["Year"].astype(int)
df = df.sort_values(by="Year")
df["Year"] = df["Year"].astype(str)

# Create interactive scatterplot
fig = px.scatter(
    df,
    x="GDP_per_Capita",
    y="Happiness_Score",
    color="GDP_per_Capita",
    hover_name="Country",
    animation_frame="Year",
    title="Do Richer Countries Report Higher Happiness?",
    labels={
        "GDP_per_Capita": "GDP per Capita (USD)",
        "Happiness_Score": "Happiness Score"
    },
    color_continuous_scale="sunsetdark"
)

# Optional: customize markers
fig.update_traces(marker=dict(size=8, opacity=0.7))

# Slower animation settings (1 second per year)
fig.update_layout(
    legend_title_text="Country",
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {
                    "frame": {"duration": 1000, "redraw": True},
                    "transition": {"duration": 500, "easing": "cubic-in-out"}
                }],
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {
                    "frame": {"duration": 0, "redraw": True},
                    "mode": "immediate",
                    "transition": {"duration": 0}
                }],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "type": "buttons",
        "showactive": False
    }]
)

# Show the figure
fig.show()

In [None]:
import pandas as pd
import plotly.express as px

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2DES73IL3SINDUKFI2ARJNEA")

# 1. Get highest, lowest, and median Happiness Scores per Country and Year
df_max = df.groupby(['Country', 'Year'])['GDP_per_Capita'].max().reset_index()
df_min = df.groupby(['Country', 'Year'])['GDP_per_Capita'].min().reset_index()
df_med = df.groupby(['Country', 'Year'])['GDP_per_Capita'].median().reset_index()  # Get median

# 2. Merge with original DataFrame to get other columns
df_max = pd.merge(df_max, df, on=['Country', 'Year', 'GDP_per_Capita'], how='left')
df_min = pd.merge(df_min, df, on=['Country', 'Year', 'GDP_per_Capita'], how='left')
df_med = pd.merge(df_med, df, on=['Country', 'Year', 'GDP_per_Capita'], how='left')  # Merge for median

# 3. Concatenate the DataFrames for highest, lowest, and median scores
df_final = pd.concat([df_max, df_min, df_med])  # Include df_med

# 4. Convert Year to string for animation
df_final['Year'] = df_final['Year'].astype(str)

# Create scatterplot
fig = px.scatter(
    df_final,
    x="GDP_per_Capita",
    y="Happiness_Score",
    color="Country",
    size = "GDP_per_Capita",
    hover_name="Country",
    animation_frame="Year",
    title="Highest, Lowest, and Median GDP per Capita by Country Over Time",  # Update title
    labels={
        "GDP_per_Capita": "GDP per Capita (USD)",
        "Happiness_Score": "Happiness Score"
    },
    template="plotly_white",
    height=800,

)


fig.update_xaxes(range=[0, df_final['GDP_per_Capita'].max()])

fig.update_layout(margin=dict(t=150))

# Optional: customize markers
fig.update_traces(marker=dict(size=10, opacity=0.7))

# Slower animation settings (1 second per year)
fig.update_layout(
    legend_title_text="Country",
    updatemenus=[{
        "buttons": [
            {
                "args": [None, {
                    "frame": {"duration": 1000, "redraw": True},
                    "transition": {"duration": 500, "easing": "cubic-in-out"}
                }],
                "label": "Play",
                "method": "animate"
            },
            {
                "args": [[None], {
                    "frame": {"duration": 0, "redraw": True},
                    "mode": "immediate",
                    "transition": {"duration": 0}
                }],
                "label": "Pause",
                "method": "animate"
            }
        ],
        "type": "buttons",
        "showactive": False
    }]
)

# Show the figure
fig.show()

In [None]:
#must run this command before you run any of the graphs due to the widget library being used
!pip install -U plotly


Collecting plotly
  Downloading plotly-6.0.1-py3-none-any.whl.metadata (6.7 kB)
Downloading plotly-6.0.1-py3-none-any.whl (14.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.8/14.8 MB[0m [31m37.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 5.24.1
    Uninstalling plotly-5.24.1:
      Successfully uninstalled plotly-5.24.1
Successfully installed plotly-6.0.1


In [None]:
import plotly.graph_objects as go
import pandas as pd
import ipywidgets as widgets
from IPython.display import display

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2DES73IL3SINDUKFI2ARJNEA")

# 1. Get average Happiness Score per Country and Year
df_avg = df.groupby(['Country', 'Year'])['Happiness_Score'].mean().reset_index()

# 2. Rename Happiness_Score column for clarity
df_avg = df_avg.rename(columns={'Happiness_Score': 'Avg_Happiness_Score'})

# 3. Convert Year to string for animation
df_avg['Year'] = df_avg['Year'].astype(str)

def create_interactive_linechart():
    def update_linechart(country):
        country_data = df_avg[df_avg['Country'] == country]
        if not country_data.empty:
            fig = go.Figure()
            fig.add_trace(go.Scatter(x=country_data['Year'],
                                     y=country_data['Avg_Happiness_Score'],
                                     mode='lines+markers',
                                     name=country,
                                     line=dict(color='#620042')
                                     ))

            fig.update_yaxes(range=[4, 7])

            fig.update_layout(title=f'Average Happiness Score Over Time for {country}',
                              xaxis_title='Year',
                              yaxis_title='Average Happiness Score',
                              width=1200,
                              height=800 )

            fig.show()

    # Interact widget for country selection
    widgets.interact(update_linechart, country=widgets.Dropdown(
        options=sorted(df_avg['Country'].unique()),
        description='Country:'
    ))


create_interactive_linechart()

interactive(children=(Dropdown(description='Country:', options=('Australia', 'Brazil', 'Canada', 'China', 'Fra…

*   Reference 1, https://raw.githubusercontent.com/odu-cs625-datavis/Spring25-asv-Heather-Marsh/refs/heads/main/Project/world_happiness_report.csv?token=GHSAT0AAAAAADATX7R2HOOFLWYQXIZQVDP62ARKJCQ
*   Reference 2, https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet
*   Reference 3, https://www.kaggle.com/datasets/khushikyad001/world-happiness-report/data
*   Reference 4, https://plotly.com/python/
*   Reference 5, https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20List.html
*   Reference 6, https://ipython.readthedocs.io/en/stable/index.html
*   Reference 7, https://plotly.com/python/builtin-colorscales/
*   Reference 8, https://plotly.com/python/discrete-color
*   Reference 9, https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Basics.html





