© 2025 KR-Labs. All rights reserved.  
SPDX-License-Identifier: Apache-2.0

# Tutorial 1: Economic Forecasting with Time Series Models

**Author:** KRL Model Zoo Team  
**Affiliation:** KR-Labs  
**Version:** v1.0  
**Date:** 2025-10-25  
**License:** Apache 2.0  
**Tier:** 1-3 (Foundational to Intermediate)

---

## Tutorial Overview

This tutorial demonstrates how to use KRL Model Zoo's econometric models for forecasting economic indicators using time series analysis techniques.

**Models Covered:**
- **ARIMA** - Autoregressive Integrated Moving Average
- **SARIMA** - Seasonal ARIMA for periodic patterns
- **VAR** - Vector Autoregression (multivariate forecasting)

**Dataset:** Synthetic quarterly GDP data with trend, seasonality, and business cycles

**Learning Objectives:**
1. Understand univariate vs multivariate time series forecasting
2. Implement ARIMA and SARIMA models for economic data
3. Compare model performance using statistical metrics
4. Interpret forecast results and confidence intervals

**Prerequisites:**
- Basic understanding of time series concepts
- Familiarity with Python and pandas
- Understanding of economic indicators (GDP, employment)

**Estimated Time:** 30-45 minutes

---

## Business Applications

1. **GDP Forecasting:** Predict economic growth for budget planning
2. **Policy Analysis:** Assess impact of interventions on economic indicators
3. **Risk Management:** Identify potential economic downturns
4. **Resource Allocation:** Plan based on economic trajectory forecasts

---

## Data Provenance

**Source:** Synthetic data generated for demonstration  
**Characteristics:**
- Quarterly frequency (60 observations)
- Trend component (~2% growth per quarter)
- Seasonal pattern (quarterly cycles)
- Business cycle component (~8 year period)
- Random noise component

**Real-World Equivalents:**
- U.S. Bureau of Economic Analysis (BEA) GDP data
- FRED Economic Data (Federal Reserve Bank of St. Louis)
- Census Bureau economic indicators

## Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

# Import KRL Model Zoo models
from krl_models.econometric import ARIMAModel, SARIMAModel, VARModel
from krl_core import ModelMeta

# Set display options
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

## Load Data

In [None]:
# Load GDP sample data
df = pd.read_csv('../data/gdp_sample.csv')
df['date'] = pd.to_datetime(df['date'])

print(f"Dataset shape: {df.shape}")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
df.head()

In [None]:
# Visualize the data
fig, axes = plt.subplots(2, 1, figsize=(12, 8))

axes[0].plot(df['date'], df['gdp'])
axes[0].set_title('Quarterly GDP', fontsize=14, fontweight='bold')
axes[0].set_ylabel('GDP (Billions)')
axes[0].grid(True, alpha=0.3)

axes[1].plot(df['date'], df['gdp_growth'])
axes[1].set_title('Quarterly GDP Growth Rate', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Growth Rate (%)')
axes[1].axhline(y=0, color='r', linestyle='--', alpha=0.5)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Model 1: ARIMA - Basic Time Series Forecasting

ARIMA (AutoRegressive Integrated Moving Average) is a classic model for univariate time series forecasting. It combines:
- AR (p): Autoregression - uses past values
- I (d): Integration - differencing to achieve stationarity
- MA (q): Moving Average - uses past forecast errors

In [None]:
# Split data into train and test
train_size = int(len(df) * 0.8)
train_data = df[:train_size].copy()
test_data = df[train_size:].copy()

print(f"Training observations: {len(train_data)}")
print(f"Test observations: {len(test_data)}")

In [None]:
# Configure ARIMA model
arima_params = {
    'time_col': 'date',
    'value_col': 'gdp',
    'p': 2,  # AR order
    'd': 1,  # Differencing order
    'q': 2   # MA order
}

meta = ModelMeta(
    name="GDP_ARIMA",
    version="1.0",
    author="Tutorial",
    description="ARIMA model for quarterly GDP forecasting"
)

# Fit model
arima = ARIMAModel(train_data, arima_params, meta)
result = arima.fit()

print(f"\nModel fitted successfully!")
print(f"AIC: {result.payload['aic']:.2f}")
print(f"BIC: {result.payload['bic']:.2f}")

In [None]:
# Make predictions
forecast = arima.predict(train_data, steps=len(test_data))

# Extract forecast values
forecast_values = forecast.payload['forecast']
lower_bound = forecast.payload.get('forecast_lower', None)
upper_bound = forecast.payload.get('forecast_upper', None)

print(f"Forecast shape: {len(forecast_values)} periods")

In [None]:
# Visualize ARIMA forecast
plt.figure(figsize=(14, 6))

# Plot training data
plt.plot(train_data['date'], train_data['gdp'], label='Training Data', color='blue', linewidth=2)

# Plot test data (actual)
plt.plot(test_data['date'], test_data['gdp'], label='Actual', color='green', linewidth=2)

# Plot forecast
plt.plot(test_data['date'], forecast_values, label='ARIMA Forecast', color='red', linewidth=2, linestyle='--')

# Plot confidence intervals if available
if lower_bound is not None and upper_bound is not None:
    plt.fill_between(test_data['date'], lower_bound, upper_bound, alpha=0.2, color='red', label='95% Confidence Interval')

plt.title('ARIMA(2,1,2) GDP Forecast', fontsize=16, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('GDP (Billions)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate forecast error
mape = np.mean(np.abs((test_data['gdp'].values - forecast_values) / test_data['gdp'].values)) * 100
rmse = np.sqrt(np.mean((test_data['gdp'].values - forecast_values) ** 2))

print(f"\nForecast Accuracy:")
print(f"  MAPE: {mape:.2f}%")
print(f"  RMSE: {rmse:.2f}")

## Model 2: SARIMA - Seasonal ARIMA

SARIMA extends ARIMA to handle seasonal patterns by adding seasonal AR, I, and MA components.
Format: SARIMA(p,d,q)(P,D,Q,s) where s is the seasonal period.

In [None]:
# Configure SARIMA model with quarterly seasonality
sarima_params = {
    'time_col': 'date',
    'value_col': 'gdp',
    'order': (1, 1, 1),        # (p,d,q)
    'seasonal_order': (1, 0, 1, 4)  # (P,D,Q,s) - quarterly seasonality
}

meta_sarima = ModelMeta(
    name="GDP_SARIMA",
    version="1.0",
    author="Tutorial",
    description="SARIMA model with quarterly seasonality"
)

# Fit SARIMA model
sarima = SARIMAModel(train_data, sarima_params, meta_sarima)
sarima_result = sarima.fit()

print(f"SARIMA Model fitted!")
print(f"AIC: {sarima_result.payload['aic']:.2f}")
print(f"BIC: {sarima_result.payload['bic']:.2f}")

In [None]:
# Make SARIMA predictions
sarima_forecast = sarima.predict(train_data, steps=len(test_data))
sarima_values = sarima_forecast.payload['forecast']

# Visualize SARIMA forecast
plt.figure(figsize=(14, 6))

plt.plot(train_data['date'], train_data['gdp'], label='Training Data', color='blue', linewidth=2)
plt.plot(test_data['date'], test_data['gdp'], label='Actual', color='green', linewidth=2)
plt.plot(test_data['date'], sarima_values, label='SARIMA Forecast', color='purple', linewidth=2, linestyle='--')

plt.title('SARIMA(1,1,1)(1,0,1,4) GDP Forecast', fontsize=16, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('GDP (Billions)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate SARIMA forecast error
sarima_mape = np.mean(np.abs((test_data['gdp'].values - sarima_values) / test_data['gdp'].values)) * 100
sarima_rmse = np.sqrt(np.mean((test_data['gdp'].values - sarima_values) ** 2))

print(f"\nSARIMA Forecast Accuracy:")
print(f"  MAPE: {sarima_mape:.2f}%")
print(f"  RMSE: {sarima_rmse:.2f}")

## Model 3: VAR - Multivariate Forecasting

Vector Autoregression (VAR) models multiple time series simultaneously, capturing interdependencies.
Let's use employment data to demonstrate multivariate forecasting.

In [None]:
# Load employment data
employment_df = pd.read_csv('../data/employment_sample.csv')
employment_df['date'] = pd.to_datetime(employment_df['date'])

# Select multiple series for VAR
var_cols = ['manufacturing', 'services', 'technology']
var_data = employment_df[['date'] + var_cols].copy()

print(f"VAR data shape: {var_data.shape}")
var_data.head()

In [None]:
# Split VAR data
var_train_size = int(len(var_data) * 0.8)
var_train = var_data[:var_train_size].copy()
var_test = var_data[var_train_size:].copy()

# Configure VAR model
var_params = {
    'time_col': 'date',
    'value_cols': var_cols,
    'max_lags': 4,  # Automatic lag selection up to 4
    'ic': 'aic'     # Information criterion for lag selection
}

meta_var = ModelMeta(
    name="Employment_VAR",
    version="1.0",
    author="Tutorial",
    description="VAR model for multi-industry employment"
)

# Fit VAR model
var_model = VARModel(var_train, var_params, meta_var)
var_result = var_model.fit()

print(f"\nVAR Model fitted!")
print(f"Selected lag order: {var_result.payload.get('selected_lag', 'N/A')}")
print(f"AIC: {var_result.payload['aic']:.2f}")

In [None]:
# Make VAR predictions
var_forecast = var_model.predict(var_train, steps=len(var_test))
var_forecasts = var_forecast.payload['forecasts']  # Dictionary with forecasts per variable

# Visualize VAR forecasts
fig, axes = plt.subplots(3, 1, figsize=(14, 12))

for idx, col in enumerate(var_cols):
    axes[idx].plot(var_train['date'], var_train[col], label='Training', color='blue', linewidth=2)
    axes[idx].plot(var_test['date'], var_test[col], label='Actual', color='green', linewidth=2)
    axes[idx].plot(var_test['date'], var_forecasts[col], label='VAR Forecast', color='red', linewidth=2, linestyle='--')
    
    axes[idx].set_title(f'{col.capitalize()} Employment', fontsize=14, fontweight='bold')
    axes[idx].set_ylabel('Employment')
    axes[idx].legend()
    axes[idx].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate VAR forecast accuracy for each series
print("\nVAR Forecast Accuracy by Series:")
for col in var_cols:
    mape = np.mean(np.abs((var_test[col].values - var_forecasts[col]) / var_test[col].values)) * 100
    print(f"  {col.capitalize()}: MAPE = {mape:.2f}%")

## Model Comparison

Compare ARIMA and SARIMA performance on GDP forecasting.

In [None]:
# Create comparison table
comparison = pd.DataFrame({
    'Model': ['ARIMA(2,1,2)', 'SARIMA(1,1,1)(1,0,1,4)'],
    'MAPE (%)': [mape, sarima_mape],
    'RMSE': [rmse, sarima_rmse],
    'AIC': [result.payload['aic'], sarima_result.payload['aic']],
    'BIC': [result.payload['bic'], sarima_result.payload['bic']]
})

print("\nModel Comparison:")
print(comparison.to_string(index=False))

# Plot side-by-side comparison
plt.figure(figsize=(14, 6))

plt.plot(test_data['date'], test_data['gdp'], label='Actual', color='black', linewidth=2.5)
plt.plot(test_data['date'], forecast_values, label='ARIMA', color='red', linewidth=2, linestyle='--', alpha=0.7)
plt.plot(test_data['date'], sarima_values, label='SARIMA', color='purple', linewidth=2, linestyle=':', alpha=0.7)

plt.title('GDP Forecast Comparison: ARIMA vs SARIMA', fontsize=16, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('GDP (Billions)')
plt.legend(fontsize=12)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## Key Takeaways

1. **ARIMA** is suitable for non-seasonal univariate time series with trend and autocorrelation
2. **SARIMA** extends ARIMA to handle seasonal patterns (quarterly, monthly, etc.)
3. **VAR** captures interdependencies between multiple related time series
4. **Model Selection:** Compare AIC/BIC for in-sample fit, MAPE/RMSE for forecast accuracy
5. **Seasonality Matters:** SARIMA typically outperforms ARIMA when seasonal patterns exist

## Next Steps

- Try different ARIMA orders (p,d,q) to optimize performance
- Use `auto_arima` functionality if available
- Experiment with VAR lag selection criteria
- Explore cointegration analysis for long-run relationships
- Compare with Prophet for business forecasting scenarios

---

<div align="center">

**KR-Labs** | Data-Driven Economic Analysis

[kr-labs.com](https://kr-labs.com) | [info@kr-labs.com](mailto:info@kr-labs.com)

**License:** Apache 2.0 | **Repository:** [github.com/KR-Labs/krl-model-zoo](https://github.com/KR-Labs/krl-model-zoo)

</div>

## References

1. **Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M.** (2015). *Time Series Analysis: Forecasting and Control* (5th ed.). Wiley.

2. **Hyndman, R. J., & Athanasopoulos, G.** (2021). *Forecasting: Principles and Practice* (3rd ed.). OTexts. https://otexts.com/fpp3/

3. **Hamilton, J. D.** (1994). *Time Series Analysis*. Princeton University Press.

4. **Lütkepohl, H.** (2005). *New Introduction to Multiple Time Series Analysis*. Springer.

5. **Shumway, R. H., & Stoffer, D. S.** (2017). *Time Series Analysis and Its Applications: With R Examples* (4th ed.). Springer.

6. **Federal Reserve Bank of St. Louis.** (2024). *FRED Economic Data*. https://fred.stlouisfed.org

7. **U.S. Bureau of Economic Analysis.** (2024). *Gross Domestic Product*. https://www.bea.gov/data/gdp

---

## Citation

To cite this tutorial:

```bibtex
@misc{krl_economic_forecasting_2025,
  title = {Tutorial 1: Economic Forecasting with Time Series Models},
  author = {KRL Model Zoo Team},
  year = {2025},
  publisher = {KR-Labs},
  url = {https://github.com/KR-Labs/krl-model-zoo},
  note = {Tutorial from KRL Model Zoo v1.0.0}
}
```

To cite KRL Model Zoo:

```bibtex
@software{krl_model_zoo_2025,
  title = {KRL Model Zoo: Production-Grade Models for Socioeconomic Analysis},
  author = {KR-Labs},
  year = {2025},
  url = {https://github.com/KR-Labs/krl-model-zoo},
  version = {1.0.0},
  license = {Apache-2.0}
}
```

In [None]:
from datetime import datetime
from pathlib import Path
import json

# Create output directory
output_dir = Path('../outputs') / f'economic_forecasting_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
output_dir.mkdir(parents=True, exist_ok=True)

print(f"Exporting results to: {output_dir}\n")

# 1. Export data
train_data.to_csv(output_dir / 'train_data.csv', index=False)
test_data.to_csv(output_dir / 'test_data.csv', index=False)
print("✓ Exported training and test data")

# 2. Export forecast results
forecast_df = pd.DataFrame({
    'date': test_data['date'],
    'actual_gdp': test_data['gdp'].values,
    'arima_forecast': forecast_values,
    'sarima_forecast': sarima_values
})
forecast_df.to_csv(output_dir / 'forecast_results.csv', index=False)
print("✓ Exported forecast results")

# 3. Export model comparison
comparison.to_csv(output_dir / 'model_comparison.csv', index=False)
print("✓ Exported model comparison")

# 4. Export metadata
metadata = {
    "tutorial": "01_economic_forecasting.ipynb",
    "version": "v1.0",
    "execution_date": datetime.now().isoformat(),
    "models": ["ARIMA(2,1,2)", "SARIMA(1,1,1)(1,0,1,4)", "VAR"],
    "dataset": {
        "name": "gdp_sample.csv",
        "records": len(df),
        "train_size": len(train_data),
        "test_size": len(test_data)
    },
    "performance": {
        "arima": {
            "mape": float(mape),
            "rmse": float(rmse),
            "aic": float(result.payload['aic']),
            "bic": float(result.payload['bic'])
        },
        "sarima": {
            "mape": float(sarima_mape),
            "rmse": float(sarima_rmse),
            "aic": float(sarima_result.payload['aic']),
            "bic": float(sarima_result.payload['bic'])
        }
    },
    "reproducibility": {
        "random_seed": None,
        "python_version": "3.9+",
        "required_packages": ["krl-model-zoo", "pandas", "numpy", "matplotlib"]
    }
}

with open(output_dir / 'execution_metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)
print("✓ Exported execution metadata")

print(f"\n{'='*60}")
print("EXPORT COMPLETE")
print(f"{'='*60}")
print(f"\nAll results saved to: {output_dir}")
print("\nExported files:")
print("  - train_data.csv")
print("  - test_data.csv")
print("  - forecast_results.csv")
print("  - model_comparison.csv")
print("  - execution_metadata.json")

## Export Results & Reproducibility

This section exports model results, visualizations, and metadata for reproducibility.

## Responsible Use & Limitations

### Ethical Considerations

1. **Data Privacy:**
   - This analysis uses aggregated synthetic data
   - Real applications should use publicly available aggregate data
   - Avoid using models for individual-level predictions without proper validation

2. **Bias & Fairness:**
   - Models reflect patterns in training data
   - Historical data may encode past inequities
   - Consider socioeconomic context when interpreting results

3. **Limitations:**
   - Synthetic data for demonstration purposes only
   - Real forecasts require domain expertise and validation
   - Models assume stationary patterns may not hold during structural breaks
   - Forecast accuracy decreases with prediction horizon
   - External shocks (pandemics, policy changes) not captured

4. **Recommended Use Cases:**
   - Educational purposes and learning
   - Trend analysis and pattern identification
   - Comparative model evaluation
   - Policy scenario planning
   - Not recommended: High-stakes automated decisions without validation
   - Not recommended: Individual financial planning without expert review
   - Not recommended: Regulatory compliance without proper documentation

5. **Model Assumptions:**
   - ARIMA assumes linear relationships and stationarity
   - SARIMA requires consistent seasonal patterns
   - VAR assumes stable relationships between variables
   - All models sensitive to structural breaks and outliers

### Best Practices

- Always validate forecasts against holdout test data
- Use multiple models for comparison
- Document assumptions and limitations
- Consult domain experts for interpretation
- Update models regularly with new data
- Monitor forecast performance continuously

For questions about responsible use: info@krlabs.dev

---

<div style="text-align: center; padding: 20px 0; border-top: 2px solid #333; margin-top: 40px;">
  <p style="font-size: 14px; font-weight: bold; margin-bottom: 5px;">
    KR-Labs | Data-Driven Economic Analysis
  </p>
  <p style="font-size: 12px; color: #666; margin: 5px 0;">
    Contact: <a href="mailto:info@krlabs.dev">info@krlabs.dev</a>
  </p>
  <p style="font-size: 11px; color: #666; margin: 5px 0;">
    © 2025 KR-Labs. All rights reserved.<br>
    <strong>KR-Labs™</strong> is a trademark of Quipu Research Labs, LLC, a subsidiary of Sudiata Giddasira, Inc.
  </p>
  <p style="font-size: 11px; color: #666; margin: 5px 0;">
    <a href="https://www.apache.org/licenses/LICENSE-2.0" target="_blank">Apache 2.0 License</a> | 
    <a href="https://github.com/KR-Labs/krl-model-zoo" target="_blank">GitHub Repository</a>
  </p>
</div>