# International Debt Analysis

## Introduction & Overview

This comprehensive analysis examines international debt patterns and their relationships with various economic indicators. Using data from multiple countries over several decades, we explore:

- **Government debt levels** as percentage of GDP
- **Interest payment burdens** and their correlation with debt levels
- **Impact of major economic crises** (2008 Financial Crisis, COVID-19 pandemic)
- **Private sector debt patterns** (corporate vs household)
- **Macroeconomic relationships** between debt, inflation, and unemployment

### Key Metrics Analyzed
| Metric | Description | Unit |
|--------|-------------|------|
| `Govt_Debt_GDP` | General Government Debt | % of GDP |
| `Interest_Paid_GDP` | Interest Payments on Government Debt | % of GDP |
| `Household_Debt_GDP` | Private Household Debt | % of GDP |
| `Corporate_Debt_GDP` | Non-Financial Corporate Debt | % of GDP |
| `Inflation` | Consumer Price Index Change | % Year-over-Year |
| `Unemployment` | Unemployment Rate | % of Labor Force |
| `Population_mn` | Total Population | Millions |


## Contents
1. [Question 1: Which countries have the highest or lowest General Government Debt (% of GDP) over time?](#q1)  
2. [Question 2: Do countries with higher public debt have higher interest payments (% of GDP)?](#q2)   
3. [Question 3: What happens to debt levels and interest payments around major global crises (e.g., 2008, 2020)?](#q3)
4. [Question 4: Does corporate debt compare with household debt in different economies?](#q4) 
5. [Question 5: Do countries with higher public debt also have higher inflation?](#q5) 
6. [Question 6: How does debt burden relate to economic stress across different countries and time periods?](#q6) 



## Required Libraries
Install the necessary libraries before running this notebook:
```bash
pip install pandas numpy matplotlib seaborn plotly
```

In [None]:
# Import required libraries for data analysis and visualization
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Data metrics

Our dataset follows a **long format** structure where:
- Each row represents a single observation (Country-Year-Metric combination)
- The `Metric` column indicates the type of measurement
- The `Value` column contains the actual numerical data

In [None]:
df=pd.read_csv("Processed_Data.csv")
print(f"Dataset shape: {df.shape[0]:,} rows × {df.shape[1]} columns")
print(df.head())
print(f"Unique countries: {df['Country'].nunique()}")
print(f"Unique metrics: {df['Metric'].nunique()}")
print(f"Year range: {df['Year'].min()} - {df['Year'].max()}")  


## **Analysis**

<a id="q1"></a>
## Question 1: Which countries have the highest or lowest General Government Debt (% of GDP) over time?

Which countries have the highest or lowest General Government Debt (% of GDP) over time?

### Expected Insights

- Countries with persistent high debt burdens may face fiscal sustainability challenges
- Low-debt countries might have more fiscal flexibility during crises
- Debt trajectories can reveal policy effectiveness and economic structural differences

In [None]:
df_debt = df[df["Metric"] == "Govt_Debt_GDP"].copy()

# Identify top 5 countries with highest average debt
top_countries = (
    df_debt.groupby("Country")["Value"]
    .mean()
    .sort_values(ascending=False)
    .head(5)
    .index
)
print("Countries with the greatest debt over time: " + ", ".join(top_countries) )

# Identify bottom 5 countries with lowest average debt
low_countries = (
    df_debt.groupby("Country")["Value"]
    .mean()
    .sort_values(ascending=False)
    .tail(5)
    .index
)
print("Countries with the lowest debt over time: " + ", ".join(low_countries))

In [None]:
# Create visualization for top 5 highest debt countries
df_top = df_debt[df_debt["Country"].isin(top_countries)]
plt.figure(figsize=(12, 6))
# Create line plot with distinct markers for each country
sns.lineplot(data=df_top, x="Year", y="Value", hue="Country",palette="magma", marker="8")
plt.title("Top 5 Countries by Avg General Gov Debt (% of GDP)")
plt.ylabel("Debt to GDP (%)")
plt.xlabel("Year")
plt.grid(True)
plt.tight_layout()
plt.show()


<a id="q2"></a>
## Question 2: Do countries with higher public debt have higher interest payments (% of GDP)?
Create a scatter plot with debt levels on x-axis and interest payments on y-axis
### Expected Insights
- Higher debt should generally correlate with higher interest payments
- Debt maturity structure and central bank policies influence interest burdens

In [None]:
# Pivot to wide format temporarily: each metric becomes a column
df_wide = df.pivot_table(index=["Country", "Year"], columns="Metric", values="Value").reset_index()
# Filter for observations with both debt and interest payment data
df_plot = df_wide.dropna(subset=["Govt_Debt_GDP", "Interest_Paid_GDP"])

# Calculate correlation coefficient
print(f"Countries: {df_plot['Country'].nunique()}")
correlation = df_plot["Govt_Debt_GDP"].corr(df_plot["Interest_Paid_GDP"])
print(f"Correlation coefficient: {correlation:.3f}")

# Create scatter plot with year-based color coding
plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=df_plot,
    x="Govt_Debt_GDP",
    y="Interest_Paid_GDP",
    hue="Year",  
    palette="viridis",
    alpha=0.7
)
plt.title("Public Debt vs Interest Paid (% of GDP)")
plt.xlabel("General Government Debt (% of GDP)")
plt.ylabel("Interest Payments (% of GDP)")
plt.grid(True,alpha=0.5)
plt.tight_layout()
plt.show()


### Analyze Outliers

In [None]:

# Identify and analyze outliers
# High debt, low interest (efficient debt management)
high_debt_low_interest = df_plot[(df_plot["Govt_Debt_GDP"] > 80) & (df_plot["Interest_Paid_GDP"] < 3)].nlargest(3, "Govt_Debt_GDP")
print("Countries with high debt but low interest payments:") 
for _, row in high_debt_low_interest.iterrows():
    print(f"  • {row['Country']} ({row['Year']}): {row['Govt_Debt_GDP']:.1f}% debt, {row['Interest_Paid_GDP']:.1f}% interest") 

# Low debt, high interest (potential inefficiency)
low_debt_high_interest = df_plot[(df_plot["Govt_Debt_GDP"] < 40) &  (df_plot["Interest_Paid_GDP"] > 4)].nlargest(3, "Interest_Paid_GDP")
print("Countries with low debt but high interest payments:")
for _, row in low_debt_high_interest.iterrows():
    print(f"  • {row['Country']} ({row['Year']}): {row['Govt_Debt_GDP']:.1f}% debt, {row['Interest_Paid_GDP']:.1f}% interest") 

<a id="q3"></a>
## Question 3: What happens to debt levels and interest payments around major global crises (e.g., 2008, 2020)?
Focus on time period 2005-2023 to capture both major crises
Use dual y-axis to show both metrics simultaneously
### Expected Insights
- Debt levels typically spike during crises due to fiscal stimulus
- Interest payments may decrease initially (due to low rates) then increase
- Recovery patterns differ between crises based on policy responses
- Central bank policies significantly influence interest payment trends

In [None]:

# Filter data for crisis analysis
metrics = ["Govt_Debt_GDP", "Interest_Paid_GDP"]

df_crisis = df[df["Metric"].isin(metrics)].copy()
# Define crisis periods 
df_crisis = df_crisis[(df_crisis["Year"] >= 2005) & (df_crisis["Year"] <= 2023)]
# Calculate global averages by year for each metric
df_avg = df_crisis.groupby(["Metric", "Year"])["Value"].mean().reset_index()

# Separate metrics for dual-axis plotting
df_debt = df_avg[df_avg["Metric"] == "Govt_Debt_GDP"]
df_interest = df_avg[df_avg["Metric"] == "Interest_Paid_GDP"]

fig, ax1 = plt.subplots(figsize=(12, 6))

# Primary y-axis for Government Debt
ax1.plot(df_debt["Year"], df_debt["Value"], label="Govt Debt (% GDP)", color="#069AF3", marker="8")
ax1.set_ylabel("Govt Debt (% of GDP)", color="tab:blue")
ax1.tick_params(axis='y', labelcolor='tab:blue')
ax1.set_xlabel("Year")

# Crisis highlights
ax1.axvspan(2007.5, 2009.5, color= "#FFA500", alpha=0.2, label="2008 Crisis")
ax1.axvspan(2019.5, 2021.5, color="tomato", alpha=0.2, label="COVID-19")

# Secondary y-axis for Interest
ax2 = ax1.twinx()
ax2.plot(df_interest["Year"], df_interest["Value"], label="Interest Paid (% GDP)", color="#90EE90", marker="d")
ax2.set_ylabel("Interest Paid (% of GDP)", color="tab:green")
ax2.tick_params(axis='y', labelcolor='tab:green')

# Legends
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc="upper left")

xticks = list(range(df_debt["Year"].min(), df_debt["Year"].max()+1, 3))
ax1.set_xticks(xticks)

plt.title("Global Averages: Public Debt & Interest Payments Around Crises")
plt.grid(True)
plt.tight_layout()
plt.show()

# #90EE90

<a id="q4"></a>
## Question 4: Does corporate debt compare with household debt in different economies?
This analysis examines the relationship between corporate and household debt levels across different countries over four decades (1990, 2000, 2010, 2020). By plotting these metrics as scatter plots for each decade, we can observe:
- Correlation patterns: Do countries with high household debt also have high corporate debt?
- Temporal evolution: How has this relationship changed over time?
- Country positioning: Which countries are outliers in terms of debt composition?

In [None]:
metrics = ["Household_Debt_GDP", "Corporate_Debt_GDP"]
years = [1990, 2000, 2010, 2020]

# Filter only needed metrics
df_debt = df[df["Metric"].isin(metrics)].copy()

# Create subplot grid: 2 rows x 2 columns
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=[f"{year}" for year in years],
    shared_xaxes=False,
    shared_yaxes=True,
    vertical_spacing = 0.18
)

# Helper to map year to subplot position
positions = {(0): (1, 1), (1): (1, 2), (2): (2, 1), (3): (2, 2)}

# Generate scatter plots for each year
for i, year in enumerate(years):
    # Filter data for the current year
    df_year = df_debt[df_debt["Year"] == year]
    # Pivot the data to get household and corporate debt as separate columns
    # This transformation makes it easier to create scatter plots
    df_pivot = df_year.pivot(index="Country", columns="Metric", values="Value").dropna().reset_index()

    # Create the main scatter plot
    scatter = go.Scatter(
        x=df_pivot["Household_Debt_GDP"],
        y=df_pivot["Corporate_Debt_GDP"],
        mode='markers+text',
        text=df_pivot["Country"],
        textposition='top center',
        marker=dict(
            size=10,
            color=px.colors.qualitative.Plotly * 5,  # Cycle through colors
            line=dict(width=0.5, color='DarkSlateGrey')
        ),
        name=str(year),
        showlegend=False
    )
    
    # Calculate mean values for reference
    # This helps identify the "center" of the distribution
    mean_x = df_pivot["Household_Debt_GDP"].mean()
    mean_y = df_pivot["Corporate_Debt_GDP"].mean()

    # Add a distinctive marker for the mean values
    # This provides a reference point for comparing countries
    mean_marker = go.Scatter(
        x=[mean_x],
        y=[mean_y],
        mode='markers',
        marker=dict(
            size=14,
            color='dimgray',
            symbol='4',
            line=dict(width=2, color='white')
        ),
        name='Mean',
        showlegend=False
    )

    row, col = positions[i]
    fig.add_trace(scatter, row=row, col=col)
    fig.add_trace(mean_marker, row=row, col=col)

# Configure axis labels and ranges for consistency
fig.update_xaxes(title_text="Household Debt (% of GDP)", range=[0, 150])
fig.update_yaxes(title_text="Corporate Debt (% of GDP)", range=[0, 350])

# Set overall layout properties
fig.update_layout(
    title_text="Corporate vs Household Debt (% of GDP) — Per Decade",
    title_x=0.5,
    height=800,
    width=1000,
    plot_bgcolor="white"
)

fig.show()

<a id="q5"></a>
## Question 5: Do countries with higher public debt also have higher inflation?

This analysis investigates the relationship between government debt levels and inflation rates across selected countries. We use z-score standardization to:
- Normalize different scales: Debt percentages and inflation rates have different ranges
- Enable comparison: Standardized values allow direct comparison of relative changes
- Identify patterns: Reveal whether debt and inflation move together over time


In [None]:
# Filter to the two metrics
metrics = ["Govt_Debt_GDP", "Inflation"]
df_subset = df[df["Metric"].isin(metrics)].copy()

# This selection includes high-debt countries and USA for comparison
selected_countries = ["Belgium", "Canada", "Greece", "Italy", "Japan","United States"]


# Filter for selected countries
df_selected = df_subset[df_subset["Country"].isin(selected_countries)]

# Transform data from long to wide format
# This creates separate columns for each metric, enabling easier calculations
df_wide = df_selected.pivot_table(
    index=["Country", "Year"],
    columns="Metric",
    values="Value"
).reset_index()

# Define z-score standardization function
# Z-score = (value - mean) / standard deviation
# This normalizes data to have mean=0 and std=1
def z_score(series):
    return (series - series.mean()) / series.std()

# Apply z-score standardization within each country
# This accounts for country-specific baseline levels and focuses on relative changes
df_wide["Z_Debt"] = df_wide.groupby("Country")["Govt_Debt_GDP"].transform(z_score)
df_wide["Z_Inflation"] = df_wide.groupby("Country")["Inflation"].transform(z_score)


# Transform back to long format for plotting
# This structure is required for Plotly's faceted line charts
df_long = df_wide.melt(
    id_vars=["Country", "Year"],
    value_vars=["Z_Debt", "Z_Inflation"],
    var_name="Metric",
    value_name="Value"
)
# Clean up metric names for better display
df_long["Metric"] = df_long["Metric"].replace({
    "Z_Debt": "Standardized Debt",
    "Z_Inflation": "Standardized Inflation"
})

In [None]:
# Create faceted line chart for time series comparison
fig = px.line(
    df_long,
    x="Year",
    y="Value",
    color="Metric",
    color_discrete_map={
        "Standardized Debt": "#A5DB6E",   # Green for debt
        "Standardized Inflation": "#D0A9E6" # Purple for inflation
    },
    facet_col="Country",
    facet_col_wrap=2,   # Two columns per row for better layout
    title="Z-Score Standardized Public Debt vs Inflation Over Time",
    labels={"Value": "Z-Score"},
    height=800,
    width=1200
)

fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))  # Clean facet titles

# Enhance layout for better readability
fig.update_layout(
    showlegend=True,
    legend=dict(
        orientation="h",  # Horizontal legend
        yanchor="bottom",
        y=1.02,
        xanchor="right",
        x=1
    ),
    plot_bgcolor="white",
    font=dict(size=11)
)

# Add horizontal reference line at y=0 (mean level)
fig.add_hline(y=0, line_dash="dash", line_color="gray", opacity=0.5)
fig.show()

### Interpretation Guidelines
**Understanding Z-Scores:**
- **Z-score = 0**: Value equals the country's historical mean
- **Z-score > 0**: Value is above the country's historical mean
- **Z-score < 0**: Value is below the country's historical mean
- **|Z-score| > 2**: Value is more than 2 standard deviations from mean (unusual)

**Pattern Analysis:**
- **Positive correlation**: Lines move in the same direction
- **Negative correlation**: Lines move in opposite directions
- **No correlation**: Lines move independently
- **Lagged effects**: One metric leads changes in the other

### Expected Findings
This analysis may reveal:
1. **Country-specific patterns**: Different relationships in different economies
2. **Temporal variations**: Changing relationships over time
3. **Crisis periods**: Unusual patterns during economic stress
4. **Policy implications**: Evidence for or against debt-inflation relationships



<a id="q6"></a>
## Question 6: How does debt burden relate to economic stress across different countries and time periods?
This analysis explores the relationship between government debt levels and economic stress indicators across countries over time. We create a composite "Economic Stress" metric that combines inflation and unemployment rates to capture overall economic pressure, then examine how this relates to public debt levels.

**Research Questions:**
- Do countries with higher debt burdens also experience higher economic stress?
- How does this relationship evolve over time?
- Are larger economies (by population) more resilient to debt-stress combinations?
- What are the distinct patterns of fiscal vs. macroeconomic stability?

### Methodology 

**Economic Stress = Inflation Rate + Unemployment Rate**
This combination captures two fundamental sources of economic distress:
- **Inflation**: Erodes purchasing power and creates uncertainty
- **Unemployment**: Reduces economic output and increases social costs

The visualization creates four distinct economic profiles:

| Quadrant | Debt Level | Economic Stress | Interpretation |
|----------|------------|-----------------|----------------|
| **Top-Right** | High | High | 🔺 **High Risk**: Fiscal pressure + macroeconomic instability |
| **Bottom-Left** | Low | Low | 🟢 **Stability**: Fiscal space + macroeconomic health |
| **Bottom-Right** | High | Low | 🟡 **Fiscal Pressure**: Debt concerns but stable macro environment |
| **Top-Left** | Low | High | 🟠 **Macro Instability**: Economic stress despite fiscal prudence |

### Data Preparation and Analysis

In [None]:
metrics = ["Govt_Debt_GDP", "Inflation","Unemployment" ,"Population_mn" ]
df_subset = df[df["Metric"].isin(metrics)].copy()

# Transform data from long to wide format for easier calculations
# This creates separate columns for each metric, enabling composite index creation
df_wide = df_subset.pivot_table(
    index=["Country", "Year"],
    columns="Metric",
    values="Value"
).reset_index()

# Create the composite Economic Stress Index
# This simple additive approach gives equal weight to inflation and unemployment
# Alternative approaches could include weighted averages or more complex formulations
df_wide["EconomicStress"] = df_wide["Inflation"] + df_wide["Unemployment"]

# Remove observations with missing data
df_clean = df_wide.dropna(subset=["Govt_Debt_GDP", "EconomicStress", "Population_mn"]).copy()

# Create a visually-adjusted version of Economic Stress for better plotting
# Extreme values can compress the visualization and hide important patterns
df_clean["EconomicStress_Visual"] = df_clean["EconomicStress"].clip(upper=65,lower=-10)

# Cap extremely high stress values (hyperinflation periods)
# Set floor for any negative values (deflation + low unemployment)
# Calculate key statistics for reference lines
debt_median = df_clean["Govt_Debt_GDP"].median()
stress_median = df_clean["EconomicStress"].median()
stress_visual_median = df_clean["EconomicStress_Visual"].median()

print(f"Summary:")
print(f"- Countries analyzed: {df_clean['Country'].nunique()}")
print(f"- Median debt level: {debt_median:.1f}% of GDP")
print(f"- Median economic stress: {stress_median:.1f}%")
print(f"- Observations with extreme stress (>65%): {(df_clean['EconomicStress'] > 65).sum()}")

In [None]:
# Create the main animated scatter plot
fig = px.scatter(
    df_clean,
    x="Govt_Debt_GDP",
    y="EconomicStress_Visual",
    size="Population_mn",       # Bubble size represents country size
    animation_frame="Year",     # Time animation slider
    animation_group="Country",  # Maintain country identity across frames
    color="Country",            # Color-code by country
    hover_name="Country",       # Country name on hover
    hover_data={
        "Inflation": True,
        "Unemployment": True,
        "Population_mn": True,
        "EconomicStress": True,         # real value shown in hover
        "EconomicStress_Visual": False  # Hide the capped visual value
    },
    title="Debt Burden vs Economic Stress Over Time (Population-Weighted)",
    labels={
        "Govt_Debt_GDP": "Public Debt (% of GDP)",
        "EconomicStress": "Inflation + Unemployment (%)"
    },
    size_max=50,      # Maximum bubble size
    width=1000,
    height=700
)

fig.update_layout(
    xaxis=dict(title="Public Debt (% of GDP)", range=[-20, df_clean["Govt_Debt_GDP"].max()]),
    yaxis=dict(title="Inflation + Unemployment (%)", range=[df_clean["EconomicStress_Visual"].min()-5, 70]))

# Add median reference lines to create quadrants
# These lines help viewers quickly identify which quadrant each country falls into

# Horizontal line: separates high vs low economic stress
fig.add_shape(  
    type="line",
    x0=df_clean["Govt_Debt_GDP"].min(),
    x1=df_clean["Govt_Debt_GDP"].max(),
    y0=df_clean["EconomicStress"].median(),
    y1=df_clean["EconomicStress"].median(),
    line=dict(dash="dash", color="gray")
)
# Vertical line: separates high vs low debt burden
fig.add_shape(  
    type="line",
    x0=df_clean["Govt_Debt_GDP"].median(),
    x1=df_clean["Govt_Debt_GDP"].median(),
    y0=df_clean["EconomicStress"].min(),
    y1=df_clean["EconomicStress"].max(),
    line=dict(dash="dash", color="gray")
)
fig.show()

### Recent Data: Statistical Insights

In [None]:
# Calculate correlation between debt and economic stress
correlation = df_clean["Govt_Debt_GDP"].corr(df_clean["EconomicStress"])

# Identify countries in each quadrant (using most recent year)
recent_year = df_clean["Year"].max()
recent_data = df_clean[df_clean["Year"] == recent_year].copy()

# Classify countries into quadrants
recent_data["Quadrant"] = ""
recent_data.loc[
    (recent_data["Govt_Debt_GDP"] > debt_median) & 
    (recent_data["EconomicStress"] > stress_median), 
    "Quadrant"
] = "High Risk"

recent_data.loc[
    (recent_data["Govt_Debt_GDP"] <= debt_median) & 
    (recent_data["EconomicStress"] <= stress_median), 
    "Quadrant"
] = "Stability"

recent_data.loc[
    (recent_data["Govt_Debt_GDP"] > debt_median) & 
    (recent_data["EconomicStress"] <= stress_median), 
    "Quadrant"
] = "Fiscal Pressure"

recent_data.loc[
    (recent_data["Govt_Debt_GDP"] <= debt_median) & 
    (recent_data["EconomicStress"] > stress_median), 
    "Quadrant"
] = "Macro Instability"

# Display summary statistics
print(f"Overall correlation (Debt vs Economic Stress): {correlation:.3f}")
print(f"\nCountries by Quadrant ({recent_year}):")
for quadrant in ["Stability", "Fiscal Pressure", "Macro Instability", "High Risk"]:
    countries = recent_data[recent_data["Quadrant"] == quadrant]["Country"].tolist()
    print(f"  {quadrant}: {len(countries)} countries")
    if countries:
        print(f"    Examples: {', '.join(countries[:5])}")