# Notebook 2: Analysis of Structural Differences
**Goal:** Statistically compare the economic growth (GDP) and stability between China and Western economies.

**Methods:** Kruskal-Wallis Test (Robust ANOVA), Dunn's Post-hoc Test (Bonferroni), Confidence Intervals.


In [5]:
import pandas as pd
import numpy as np
from scipy import stats
import scikit_posthocs as sp

# Load Data
try:
    df_analysis = pd.read_pickle("imf_data_clean.pkl")
    print("Data loaded. Shape:", df_analysis.shape)
except FileNotFoundError:
    print("Error: Run Notebook 01 first!")

countries_to_analyze = ['Switzerland', 'Germany', 'United States', 'China']
df_gdp_growth = df_analysis[df_analysis['Subject Descriptor'] == 'GDP Growth Rate (YoY)']

Data loaded. Shape: (548, 12)


## 1. Hypothesis 1: Growth Differences (Niveau)
**$H_0$:** The distribution of GDP growth rates is the same across all 4 countries.

We use **Kruskal-Wallis** because EDA showed outliers and skewed distributions.

In [6]:
print("--- Hypothesis Test: GDP Growth Differences (Kruskal-Wallis) ---")

groups = []
for country in countries_to_analyze:
    values = df_gdp_growth[df_gdp_growth['Country'] == country]['Value'].values
    groups.append(values)

stat, p_value = stats.kruskal(*groups)

print(f"Kruskal-Wallis H-statistic: {stat:.4f}")
print(f"p-value: {p_value:.4e}")

if p_value < 0.05:
    print("Result: Reject H0. Significant differences found.")
else:
    print("Result: Fail to reject H0.")

--- Hypothesis Test: GDP Growth Differences (Kruskal-Wallis) ---
Kruskal-Wallis H-statistic: 100.3439
p-value: 1.3108e-21
Result: Reject H0. Significant differences found.


## 2. Stability Analysis (Quality of Growth)
Calculating **95% Confidence Intervals** (VL 6) to assess the precision and volatility of the growth mean.


In [7]:
print("--- Confidence Intervals (95%) for GDP Growth ---")
print(f"{'Country':<15} | {'Mean Growth':<12} | {'95% CI Range':<20} | {'Stability (Width)'}")
print("-" * 80)

for country in countries_to_analyze:
    data = df_gdp_growth[df_gdp_growth['Country'] == country]['Value']

    mean = np.mean(data)
    sem = stats.sem(data)
    margin = sem * stats.t.ppf((1 + 0.95) / 2., len(data)-1)

    lower = mean - margin
    upper = mean + margin
    width = upper - lower

    # Interpretation
    stability = "Very Stable" if width < 2.0 else "High Variance"

    print(f"{country:<15} | {mean:.2f}%        | [{lower:.2f}%, {upper:.2f}%]       | {width:.2f} ({stability})")

--- Confidence Intervals (95%) for GDP Growth ---
Country         | Mean Growth  | 95% CI Range         | Stability (Width)
--------------------------------------------------------------------------------
Switzerland     | 1.72%        | [1.16%, 2.27%]       | 1.11 (Very Stable)
Germany         | 1.60%        | [0.95%, 2.24%]       | 1.29 (Very Stable)
United States   | 2.46%        | [1.87%, 3.05%]       | 1.18 (Very Stable)
China           | 8.92%        | [8.04%, 9.80%]       | 1.76 (Very Stable)


### Interim Conclusion: Growth & Stability

**1. Statistical Significance (Kruskal-Wallis)**
We reject the null hypothesis ($H_0$) because the p-value is extremely low ($p < 0.05$). This statistically confirms that there are significant global differences in the distribution of GDP growth rates among the analyzed countries.

**2. Stability Analysis (Confidence Intervals)**
The 95% Confidence Intervals (calculated according to VL 6) reveal a structural trade-off:
* **Western Economies (CH, DE, US):** Show narrow intervals (Width $\approx 1.1 - 1.3$), indicating high **stability** and predictability of economic performance.
* **China:** Displays a significantly wider interval (Width $1.76$). However, considering the massive mean growth of **8.92%**, this indicates an **exceptionally robust** and sustained expansion. The growth is not "volatile" in a negative sense, but consistently high.

**Next Step: Pairwise Comparisons**
Since the Kruskal-Wallis test is an *omnibus test* (it indicates *that* differences exist, but not *where*), we will now perform **Dunn's Post-hoc Test** with Bonferroni correction to formally confirm that China forms a distinct statistical cluster compared to the Western economies.

> **$H_0$ (Pairwise):** There is no significant difference in the growth distribution between Country A and Country B.

### Post-hoc Analysis
Identifying exactly *which* countries differ using **Dunn's Test** with Bonferroni correction (VL 9).

In [8]:
print("--- Post-hoc Test: Dunn's Test (Pairwise Comparisons) ---")

p_values_matrix = sp.posthoc_dunn(
    df_gdp_growth,
    val_col='Value',
    group_col='Country',
    p_adjust='bonferroni'
)

print(p_values_matrix.round(5))

--- Post-hoc Test: Dunn's Test (Pairwise Comparisons) ---
               China  Germany  Switzerland  United States
China            1.0  0.00000      0.00000        0.00000
Germany          0.0  1.00000      1.00000        0.24886
Switzerland      0.0  1.00000      1.00000        0.37597
United States    0.0  0.24886      0.37597        1.00000


### Conclusion: GDP Growth Clusters

**1. Statistical Decision**

We reject the null hypothesis ($H_0$) for all pairs involving China, as $p_{China, x} \ll 0.05$ for $x \in \{\text{Germany, Switzerland, United States}\}$. Conversely, we find no significant differences between the Western economies ($p > 0.05$).

**2. Economic Interpretation**

We statistically identify two distinct economic clusters:
* **The 'Emerging Market' (China):** Characterized by **consistently higher growth**. While the absolute variance is larger, the growth trajectory is robust, indicating sustained expansion rather than instability.
* **The 'Developed Economies' (USA, DE, CH):** A homogeneous cluster showing moderate growth with no statistically significant differences between them.