# Real-World Case Study: Economic Growth AnalysisThis notebook presents a **complete, publication-ready analysis** using real economic data, demonstrating PanelBox's full capabilities in a research context.## Research Question**Does trade openness affect economic growth?**This classic question in development economics allows us to demonstrate:- Data preparation and cleaning- Exploratory data analysis- Multiple model specifications- Robustness checks- GMM for dynamics and endogeneity- Complete validation- Publication-ready reporting## Table of Contents1. [Introduction & Literature](#introduction)2. [Data Collection & Preparation](#data)3. [Exploratory Data Analysis](#eda)4. [Baseline Models](#baseline)5. [Addressing Endogeneity](#endogeneity)6. [Robustness Checks](#robustness)7. [Complete Validation](#validation)8. [Results & Interpretation](#results)9. [Publication-Ready Output](#publication)---

## 1. Introduction & Literature {#introduction}### Research Question**Does trade openness lead to higher economic growth?**### Theoretical Background**Arguments FOR trade openness:**- Specialization gains (Ricardo 1817)- Technology spillovers- Competition effects- Scale economies**Arguments AGAINST:**- Infant industry protection- Terms of trade volatility- Dutch disease- Unequal distribution### Empirical Challenge**Endogeneity**: Trade openness is not exogenous!- Reverse causality: Growth → Trade- Omitted variables: Institutions, geography- Measurement error**Solution**: Dynamic panel GMM (Arellano-Bond, Blundell-Bond)### LiteratureKey papers:- **Sachs & Warner (1995)**: Trade liberalization and growth- **Rodriguez & Rodrik (2001)**: Critique of cross-country regressions- **Wacziarg & Welch (2008)**: Trade liberalization episodes- **Feyrer (2009)**: IV approach using distance and air transport### Our ContributionWe use:- Panel data (exploits within-country variation)- GMM (addresses endogeneity)- Multiple robustness checks- Modern econometric methods (PanelBox)Let's begin!

In [None]:
# Import librariesimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom scipy import statsimport warningswarnings.filterwarnings('ignore')import panelbox as pb# Configurationpd.set_option('display.max_columns', None)pd.set_option('display.precision', 4)np.random.seed(42)plt.style.use('seaborn-v0_8-whitegrid')sns.set_palette("Set2")plt.rcParams['figure.dpi'] = 100print(f"PanelBox version: {pb.__version__}")print("Analysis environment ready!")

---## 2. Data Collection & Preparation {#data}### Data SourceWe use a synthetic dataset inspired by Penn World Table (PWT) with:- **Countries**: 50 countries- **Time period**: 1990-2020 (31 years)- **Variables**:  - GDP per capita growth (%)  - Trade openness (% GDP)  - Investment rate (% GDP)  - Population growth (%)  - Education (years)  - Institutions (index)**Note**: In a real paper, you would use actual PWT data. This is a demonstration.

In [None]:
# Create synthetic but realistic panel datadef create_realistic_growth_data(n_countries=50, n_years=31, seed=42):    """    Create synthetic economic growth data.        Data Generating Process inspired by Solow-Swan + Trade literature:    growth = β0 + β1*trade + β2*investment + β3*pop_growth +              β4*education + β5*institutions + α_i + ε_it        With endogeneity: trade correlated with unobserved factors    """    np.random.seed(seed)        # Country IDs    countries = [f"Country_{i:02d}" for i in range(1, n_countries+1)]    years = list(range(1990, 1990+n_years))        # Create panel structure    data = []        for i, country in enumerate(countries):        # Country-specific characteristics (unobserved heterogeneity)        alpha_i = np.random.normal(0, 2)  # Country fixed effect        quality_institutions = np.random.uniform(0, 1)  # Institutional quality        geography = np.random.normal(0, 1)  # Geography advantage                # Time-invariant base characteristics        base_education = np.random.uniform(4, 12)        base_institutions = quality_institutions * 10                for t, year in enumerate(years):            # Time trends            time_trend = t * 0.1                        # Exogenous variables (with some persistence)            if t == 0:                investment = np.random.uniform(15, 35)                pop_growth = np.random.uniform(-0.5, 3)                education = base_education                institutions = base_institutions            else:                # AR(1) process for variables                investment = 0.7 * prev_investment + np.random.normal(25, 5)                pop_growth = 0.8 * prev_pop_growth + np.random.normal(1, 0.5)                education = prev_education + np.random.normal(0.1, 0.05)                institutions = 0.9 * prev_institutions + np.random.normal(0, 0.5)                        # Trade openness (endogenous!)            # Depends on: institutions, geography, and unobserved factors            trade = (30 +                     0.5 * institutions +                     10 * geography +                     0.3 * investment +                    alpha_i * 2 +  # Correlated with fixed effect!                    np.random.normal(0, 10))            trade = np.clip(trade, 10, 150)                        # GDP growth (true relationship)            # True coefficients: trade=0.05, investment=0.15, pop=-0.8, educ=0.3, inst=0.2            growth = (2 +                      0.05 * trade +  # TRUE EFFECT of trade                     0.15 * investment +                     -0.8 * pop_growth +                     0.3 * education +                     0.2 * institutions +                     alpha_i +  # Country fixed effect                     time_trend * 0.05 +                     np.random.normal(0, 2))  # Shock                        data.append({                'country': country,                'year': year,                'growth': growth,                'trade': trade,                'investment': investment,                'pop_growth': pop_growth,                'education': education,                'institutions': institutions            })                        # Store for next period            prev_investment = investment            prev_pop_growth = pop_growth            prev_education = education            prev_institutions = institutions        df = pd.DataFrame(data)    return df# Generate dataprint("Generating realistic economic growth panel data...")data = create_realistic_growth_data(n_countries=50, n_years=31, seed=42)print("\nData generated successfully!")print(f"Shape: {data.shape}")print(f"Countries: {data['country'].nunique()}")print(f"Years: {data['year'].min()} - {data['year'].max()}")print(f"\nFirst observations:")data.head(10)

### Data Description

In [None]:
# Descriptive statisticsprint("DESCRIPTIVE STATISTICS")print("="*70)print(data[['growth', 'trade', 'investment', 'pop_growth', 'education', 'institutions']].describe())# Check for missing valuesprint(f"\nMissing values:")print(data.isnull().sum())# Panel structureprint(f"\nPanel Structure:")print(f"  Balanced: {len(data) == data['country'].nunique() * data['year'].nunique()}")print(f"  Observations: {len(data)}")print(f"  Expected if balanced: {data['country'].nunique() * data['year'].nunique()}")

---## 3. Exploratory Data Analysis {#eda}### 3.1 Distributions

In [None]:
# Distribution plotsfig, axes = plt.subplots(2, 3, figsize=(15, 10))variables = ['growth', 'trade', 'investment', 'pop_growth', 'education', 'institutions']titles = ['GDP Growth (%)', 'Trade Openness (% GDP)', 'Investment Rate (%)',          'Population Growth (%)', 'Education (years)', 'Institutions (index)']for ax, var, title in zip(axes.flat, variables, titles):    data[var].hist(bins=30, ax=ax, alpha=0.7, edgecolor='black')    ax.set_xlabel(title, fontsize=10)    ax.set_ylabel('Frequency', fontsize=10)    ax.set_title(f'Distribution: {title}', fontsize=11, fontweight='bold')    ax.axvline(data[var].mean(), color='red', linestyle='--', linewidth=2, label='Mean')    ax.legend()    ax.grid(True, alpha=0.3)plt.tight_layout()plt.show()print("Key observations:")print("- Growth rates vary substantially")print("- Trade openness shows wide variation (10-150% GDP)")print("- Investment rates cluster around 25%")

### 3.2 Time Trends

In [None]:
# Time trendsfig, axes = plt.subplots(2, 2, figsize=(14, 10))# Average growth over timeyearly_growth = data.groupby('year')['growth'].mean()axes[0,0].plot(yearly_growth.index, yearly_growth.values, marker='o', linewidth=2)axes[0,0].set_xlabel('Year', fontsize=11)axes[0,0].set_ylabel('Average Growth (%)', fontsize=11)axes[0,0].set_title('Average GDP Growth Over Time', fontsize=12, fontweight='bold')axes[0,0].grid(True, alpha=0.3)# Average trade over timeyearly_trade = data.groupby('year')['trade'].mean()axes[0,1].plot(yearly_trade.index, yearly_trade.values, marker='o', linewidth=2, color='green')axes[0,1].set_xlabel('Year', fontsize=11)axes[0,1].set_ylabel('Average Trade (% GDP)', fontsize=11)axes[0,1].set_title('Average Trade Openness Over Time', fontsize=12, fontweight='bold')axes[0,1].grid(True, alpha=0.3)# Sample of countries - growthsample_countries = data['country'].unique()[:5]for country in sample_countries:    country_data = data[data['country'] == country]    axes[1,0].plot(country_data['year'], country_data['growth'],                    marker='o', label=country, alpha=0.7, linewidth=1.5)axes[1,0].set_xlabel('Year', fontsize=11)axes[1,0].set_ylabel('Growth (%)', fontsize=11)axes[1,0].set_title('Growth Trajectories (Sample Countries)', fontsize=12, fontweight='bold')axes[1,0].legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)axes[1,0].grid(True, alpha=0.3)# Sample of countries - tradefor country in sample_countries:    country_data = data[data['country'] == country]    axes[1,1].plot(country_data['year'], country_data['trade'],                    marker='o', label=country, alpha=0.7, linewidth=1.5)axes[1,1].set_xlabel('Year', fontsize=11)axes[1,1].set_ylabel('Trade (% GDP)', fontsize=11)axes[1,1].set_title('Trade Trajectories (Sample Countries)', fontsize=12, fontweight='bold')axes[1,1].legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)axes[1,1].grid(True, alpha=0.3)plt.tight_layout()plt.show()print("Observations:")print("- Considerable heterogeneity across countries")print("- Some common time trends")print("- Both between and within variation present")

### 3.3 Correlations

In [None]:
# Correlation matrixcorr_matrix = data[variables].corr()fig, ax = plt.subplots(figsize=(10, 8))sns.heatmap(corr_matrix, annot=True, fmt='.3f', cmap='RdBu_r', center=0,            square=True, linewidths=1, cbar_kws={"shrink": 0.8}, ax=ax,            vmin=-1, vmax=1)ax.set_title('Correlation Matrix', fontsize=14, fontweight='bold', pad=20)plt.tight_layout()plt.show()print("\nKey correlations:")print(f"Growth vs Trade: {corr_matrix.loc['growth', 'trade']:.3f}")print(f"Growth vs Investment: {corr_matrix.loc['growth', 'investment']:.3f}")print(f"Growth vs Education: {corr_matrix.loc['growth', 'education']:.3f}")print(f"\nNote: Positive correlation between growth and trade, but is it causal?")

### 3.4 Scatter Plots

In [None]:
# Scatter plotsfig, axes = plt.subplots(1, 2, figsize=(14, 5))# Growth vs Tradeaxes[0].scatter(data['trade'], data['growth'], alpha=0.3, s=20)z = np.polyfit(data['trade'], data['growth'], 1)p = np.poly1d(z)axes[0].plot(data['trade'].sort_values(), p(data['trade'].sort_values()),              "r--", linewidth=2, label=f'OLS: slope={z[0]:.3f}')axes[0].set_xlabel('Trade Openness (% GDP)', fontsize=11)axes[0].set_ylabel('GDP Growth (%)', fontsize=11)axes[0].set_title('Growth vs Trade Openness', fontsize=12, fontweight='bold')axes[0].legend()axes[0].grid(True, alpha=0.3)# Growth vs Investmentaxes[1].scatter(data['investment'], data['growth'], alpha=0.3, s=20, color='green')z2 = np.polyfit(data['investment'], data['growth'], 1)p2 = np.poly1d(z2)axes[1].plot(data['investment'].sort_values(), p2(data['investment'].sort_values()),              "r--", linewidth=2, label=f'OLS: slope={z2[0]:.3f}')axes[1].set_xlabel('Investment Rate (% GDP)', fontsize=11)axes[1].set_ylabel('GDP Growth (%)', fontsize=11)axes[1].set_title('Growth vs Investment', fontsize=12, fontweight='bold')axes[1].legend()axes[1].grid(True, alpha=0.3)plt.tight_layout()plt.show()print("Positive relationships visible, but:")print("- Omitted variable bias?")print("- Reverse causality?")print("- Need panel methods!")

---## 4. Baseline Models {#baseline}We estimate three baseline specifications:1. Pooled OLS (ignores panel structure)2. Fixed Effects (controls for country heterogeneity)3. Random Effects (GLS estimation)### 4.1 Pooled OLS

In [None]:
# Pooled OLSpooled = pb.PooledOLS(    formula="growth ~ trade + investment + pop_growth + education + institutions",    data=data,    entity_col="country",    time_col="year")pooled_results = pooled.fit()print("MODEL 1: POOLED OLS")print("="*70)print(pooled_results.summary())# Store for comparisontrade_coef_pooled = pooled_results.params['trade']print(f"\nTrade coefficient: {trade_coef_pooled:.4f}")print(f"P-value: {pooled_results.pvalues['trade']:.4f}")

**Problem with Pooled OLS**: Ignores country-specific factors (α_i) that may be correlated with trade.

### 4.2 Fixed EffectsControl for unobserved country heterogeneity:

In [None]:
# Fixed Effectsfe = pb.FixedEffects(    formula="growth ~ trade + investment + pop_growth + education + institutions",    data=data,    entity_col="country",    time_col="year")fe_results = fe.fit(cov_type='clustered', cluster_entity=True)print("\nMODEL 2: FIXED EFFECTS (Clustered SE)")print("="*70)print(fe_results.summary())trade_coef_fe = fe_results.params['trade']print(f"\nTrade coefficient: {trade_coef_fe:.4f}")print(f"Compare to Pooled OLS: {trade_coef_pooled:.4f}")

**Note**: FE coefficient likely different from Pooled OLS due to controlling for α_i.