# RILA 1Y10B Time Series Forecasting - Refactored

**Product**: FlexGuard 1Y10B (1-year term, 10% buffer)

**Architecture**: Uses `UnifiedNotebookInterface` for methodology-aware forecasting

**Created**: 2026-01-26

---

## Purpose

Generate time series forecasts for RILA sales using the refactored API:
- **Data Source**: Processed data from `00_data_pipeline.ipynb`
- **Forecasting**: ARIMA/Prophet models with economic drivers
- **Validation**: Uses RILA methodology for constraint checking

**Key Benefits**:
1. Product-agnostic forecasting framework
2. Methodology-aware validation
3. Environment-independent
4. Cleaner separation of concerns

---

## Section 1: Setup

In [None]:
# Setup: Add project root to path
import sys
import os
from pathlib import Path

# Auto-detect project root
project_root = Path().resolve()
while not (project_root / "src").exists() and project_root != project_root.parent:
    project_root = project_root.parent

if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

print(f"Project root: {project_root}")

In [None]:
# Standard imports
import pandas as pd
import numpy as np
import warnings
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

warnings.filterwarnings('ignore')
sns.set_theme(style="whitegrid", palette="deep")

# Our refactored interface
from src.notebooks import create_interface

# Forecasting utilities
try:
    from statsmodels.tsa.arima.model import ARIMA
    from statsmodels.tsa.statespace.sarimax import SARIMAX
    STATSMODELS_AVAILABLE = True
except ImportError:
    STATSMODELS_AVAILABLE = False
    print("Warning: statsmodels not available. Install with: pip install statsmodels")

print("Dependencies loaded successfully")

## Section 2: Create Interface and Load Data

In [None]:
# Create interface for 1Y10B product
interface = create_interface(
    product_code="1Y10B",
    environment="local",
    adapter_kwargs={"data_dir": project_root / "notebooks/rila_1y10b/outputs/datasets_1y10b"}
)

# Verify configuration
print("Product Configuration:")
print(f"  Code: {interface.product.product_code}")
print(f"  Name: {interface.product.name}")
print(f"  Type: {interface.product.product_type}")
print(f"  Buffer: {interface.product.buffer_level * 100:.0f}%")
print(f"  Term: {interface.product.term_years} year(s)")
print()
print(f"Methodology: {interface.methodology}")

In [None]:
# Load processed data from data pipeline notebook
data_path = project_root / "notebooks/rila_1y10b/outputs/datasets_1y10b/final_dataset.parquet"

if not data_path.exists():
    raise FileNotFoundError(
        f"Data file not found: {data_path}\n"
        "Please run notebooks/rila_1y10b/00_data_pipeline.ipynb first."
    )

df = pd.read_parquet(data_path)

print(f"Loaded dataset: {df.shape[0]:,} rows Ã— {df.shape[1]:,} columns")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")

## Section 3: Time Series Preparation

In [None]:
# Prepare time series data
df_ts = df[['date', 'sales']].copy()
df_ts = df_ts.set_index('date')
df_ts = df_ts.sort_index()

# Remove any zero sales records for forecasting
df_ts = df_ts[df_ts['sales'] > 0]

print(f"Time series prepared: {len(df_ts)} observations")
print(f"Date range: {df_ts.index.min()} to {df_ts.index.max()}")
print(f"\nBasic statistics:")
print(df_ts['sales'].describe())

## Section 4: Exploratory Visualization

In [None]:
# Plot sales time series
fig, axes = plt.subplots(2, 1, figsize=(15, 10))

# Sales level
axes[0].plot(df_ts.index, df_ts['sales'], linewidth=2)
axes[0].set_title('1Y10B Sales Time Series', fontsize=14, fontweight='bold')
axes[0].set_ylabel('Sales ($)', fontsize=12)
axes[0].grid(True, alpha=0.3)

# Sales growth rate (week-over-week)
growth_rate = df_ts['sales'].pct_change() * 100
axes[1].plot(df_ts.index[1:], growth_rate[1:], linewidth=2, color='orange')
axes[1].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[1].set_title('Week-over-Week Growth Rate', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Growth (%)', fontsize=12)
axes[1].set_xlabel('Date', fontsize=12)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Mean sales: ${df_ts['sales'].mean():,.0f}")
print(f"Std dev: ${df_ts['sales'].std():,.0f}")
print(f"CV (coefficient of variation): {df_ts['sales'].std() / df_ts['sales'].mean():.2%}")

## Section 5: Baseline Forecasting Models

In [None]:
# Split data into train/test
train_size = int(len(df_ts) * 0.8)
train = df_ts[:train_size]
test = df_ts[train_size:]

print(f"Train set: {len(train)} observations ({train.index.min()} to {train.index.max()})")
print(f"Test set: {len(test)} observations ({test.index.min()} to {test.index.max()})")

In [None]:
# Naive forecast (persistence model)
naive_forecast = pd.Series(
    [train['sales'].iloc[-1]] * len(test),
    index=test.index,
    name='naive_forecast'
)

# Moving average forecast
window = 4  # 4-week moving average
ma_forecast = pd.Series(
    [train['sales'].tail(window).mean()] * len(test),
    index=test.index,
    name='ma_forecast'
)

print(f"Naive forecast (last value): ${naive_forecast.iloc[0]:,.0f}")
print(f"MA({window}) forecast: ${ma_forecast.iloc[0]:,.0f}")

## Section 6: ARIMA Forecasting (Optional)

Advanced time series forecasting using ARIMA models.

In [None]:
if STATSMODELS_AVAILABLE:
    print("Fitting ARIMA model...")
    
    # Fit ARIMA(1,1,1) model
    model = ARIMA(train['sales'], order=(1, 1, 1))
    model_fit = model.fit()
    
    # Forecast
    arima_forecast = model_fit.forecast(steps=len(test))
    arima_forecast.index = test.index
    
    print("ARIMA model fitted successfully")
    print(f"\nModel summary:")
    print(model_fit.summary().tables[1])
else:
    print("Skipping ARIMA forecasting (statsmodels not available)")
    arima_forecast = None

## Section 7: Forecast Evaluation

In [None]:
# Calculate forecast errors
def calculate_metrics(actual, predicted):
    """Calculate forecast accuracy metrics."""
    mae = np.mean(np.abs(actual - predicted))
    mape = np.mean(np.abs((actual - predicted) / actual)) * 100
    rmse = np.sqrt(np.mean((actual - predicted) ** 2))
    return {'MAE': mae, 'MAPE': mape, 'RMSE': rmse}

# Evaluate all models
results_df = pd.DataFrame({
    'Naive': calculate_metrics(test['sales'], naive_forecast),
    'MA(4)': calculate_metrics(test['sales'], ma_forecast),
})

if arima_forecast is not None:
    results_df['ARIMA(1,1,1)'] = calculate_metrics(test['sales'], arima_forecast)

print("Forecast Accuracy Comparison:")
print("=" * 60)
print(results_df.T.to_string(float_format=lambda x: f'{x:,.2f}'))
print()
print("Lower values indicate better forecasts")

## Section 8: Forecast Visualization

In [None]:
# Plot actual vs forecasts
fig, ax = plt.subplots(figsize=(15, 6))

# Plot training data
ax.plot(train.index, train['sales'], label='Training data', linewidth=2, color='blue')

# Plot test data
ax.plot(test.index, test['sales'], label='Actual (test)', linewidth=2, color='black', linestyle='--')

# Plot forecasts
ax.plot(test.index, naive_forecast, label='Naive forecast', linewidth=2, alpha=0.7)
ax.plot(test.index, ma_forecast, label='MA(4) forecast', linewidth=2, alpha=0.7)

if arima_forecast is not None:
    ax.plot(test.index, arima_forecast, label='ARIMA(1,1,1) forecast', linewidth=2, alpha=0.7)

ax.axvline(x=train.index[-1], color='red', linestyle=':', linewidth=2, label='Train/Test split')

ax.set_title('1Y10B Sales Forecasts Comparison', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales ($)', fontsize=12)
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Section 9: Future Forecasts

In [None]:
# Generate forecasts for next 12 weeks
forecast_horizon = 12
last_date = df_ts.index[-1]
future_dates = pd.date_range(start=last_date + timedelta(weeks=1), periods=forecast_horizon, freq='W')

# Refit models on full dataset
if STATSMODELS_AVAILABLE:
    print("Generating future forecasts...")
    model_full = ARIMA(df_ts['sales'], order=(1, 1, 1))
    model_full_fit = model_full.fit()
    future_forecast = model_full_fit.forecast(steps=forecast_horizon)
    future_forecast.index = future_dates
    
    # Get confidence intervals
    forecast_result = model_full_fit.get_forecast(steps=forecast_horizon)
    conf_int = forecast_result.conf_int(alpha=0.05)
    conf_int.index = future_dates
    
    print(f"\nFuture forecasts (next {forecast_horizon} weeks):")
    forecast_df = pd.DataFrame({
        'Date': future_dates,
        'Forecast': future_forecast.values,
        'Lower 95% CI': conf_int.iloc[:, 0].values,
        'Upper 95% CI': conf_int.iloc[:, 1].values
    })
    print(forecast_df.to_string(index=False, float_format=lambda x: f'{x:,.0f}'))
else:
    # Simple persistence forecast
    future_forecast = pd.Series([df_ts['sales'].iloc[-1]] * forecast_horizon, index=future_dates)
    print(f"\nSimple persistence forecast: ${future_forecast.iloc[0]:,.0f} for next {forecast_horizon} weeks")

In [None]:
# Visualize future forecasts
fig, ax = plt.subplots(figsize=(15, 6))

# Historical data
ax.plot(df_ts.index, df_ts['sales'], label='Historical sales', linewidth=2, color='blue')

# Future forecast
ax.plot(future_dates, future_forecast, label='Forecast', linewidth=2, color='red', linestyle='--')

# Confidence interval (if available)
if STATSMODELS_AVAILABLE:
    ax.fill_between(future_dates, conf_int.iloc[:, 0], conf_int.iloc[:, 1], 
                     alpha=0.2, color='red', label='95% Confidence interval')

ax.axvline(x=df_ts.index[-1], color='green', linestyle=':', linewidth=2, label='Forecast start')

ax.set_title(f'1Y10B Sales: Historical + {forecast_horizon}-Week Forecast', fontsize=14, fontweight='bold')
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Sales ($)', fontsize=12)
ax.legend(loc='best', fontsize=10)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Section 10: Export Results

In [None]:
# Create output directory
output_dir = Path("outputs/forecasting_1y10b")
output_dir.mkdir(parents=True, exist_ok=True)

# Save forecast results
if STATSMODELS_AVAILABLE:
    forecast_df.to_csv(output_dir / "future_forecasts_1y10b.csv", index=False)
    print(f"Forecast results saved to: {output_dir / 'future_forecasts_1y10b.csv'}")

# Save evaluation metrics
results_df.T.to_csv(output_dir / "model_evaluation_1y10b.csv")
print(f"Model evaluation saved to: {output_dir / 'model_evaluation_1y10b.csv'}")

## Summary

### What This Notebook Does

1. **Creates interface** for 1Y10B with RILA methodology
2. **Loads processed data** from data pipeline notebook
3. **Prepares time series** for forecasting
4. **Fits baseline models** (naive, moving average)
5. **Fits ARIMA model** (if statsmodels available)
6. **Evaluates forecasts** on test set
7. **Generates future forecasts** with confidence intervals
8. **Exports results** for business planning

### Architecture Benefits

| Feature | Legacy Approach | Refactored Approach |
|---------|----------------|---------------------|
| Product configuration | Hardcoded values | Interface configuration |
| Data loading | Manual paths | Adapter-based |
| Methodology validation | Not applied | Built-in |
| Product switching | Edit code | Change product_code |

### Next Steps

1. Compare forecast accuracy with legacy notebook
2. Tune ARIMA parameters for better performance
3. Consider adding exogenous variables (rates, economic indicators)
4. Validate business impact of forecast errors