A comprehensive Python library for time series forecasting that compares baseline, statistical, and machine learning models with proper backtesting and evaluation.
This project demonstrates best practices in time series forecasting by implementing and comparing multiple forecasting approaches:
- Baseline Models: Naive, Mean, and Drift forecasts
- Statistical Models: ARIMA and Prophet
- Machine Learning Models: XGBoost and LSTM neural networks
- Multiple Model Types: Compare classical statistical methods with modern ML approaches
- Robust Backtesting: Time series cross-validation with expanding and rolling windows
- Comprehensive Evaluation: 9 different metrics including MAE, RMSE, MAPE, MASE, and R-squared
- Synthetic Data Generation: Create realistic time series with trend, seasonality, noise, and changepoints
- Easy-to-Use API: Scikit-learn style interface (fit/predict) for all models
- Visualization: Built-in plotting functions for forecasts and comparisons
forecasting-system/
├── forecasting/
│ ├── __init__.py # Package initialization
│ ├── baseline.py # Naive, Mean, Drift forecasters
│ ├── statistical.py # ARIMA, Prophet forecasters
│ ├── ml_forecaster.py # XGBoost, LSTM forecasters
│ ├── evaluation.py # Evaluation metrics
│ ├── backtesting.py # Cross-validation framework
│ ├── data_generator.py # Synthetic data generation
│ └── config.py # Configuration
├── tests/
│ ├── __init__.py
│ └── test_statistical.py # Test suite
├── data/
│ ├── create_sample.py # Sample data generator
│ └── sample_datasets.py # Embedded sample data
├── notebooks/
│ └── forecasting_demo.ipynb # Jupyter notebook demo
├── main.py # Main pipeline script
├── requirements.txt # Dependencies
├── pytest.ini # Pytest configuration
├── setup.py # Package setup
└── README.md
- Python 3.8+
- pip or conda
# Create virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtFor LSTM support, install TensorFlow:
pip install tensorflow>=2.8.0import pandas as pd
import numpy as np
from forecasting import (
NaiveForecaster,
ARIMAForecaster,
XGBoostForecaster,
ForecastEvaluator,
)
from forecasting.data_generator import generate_sample_data
# Generate sample data
y = generate_sample_data(n_samples=365, random_seed=42)
# Split into train/test
y_train = y.iloc[:-30]
y_test = y.iloc[-30:]
# Fit and predict with different models
models = {
"Naive": NaiveForecaster(),
"ARIMA": ARIMAForecaster(order=(1, 1, 1)),
"XGBoost": XGBoostForecaster(lags=12, n_estimators=100),
}
predictions = {}
for name, model in models.items():
model.fit(y_train)
predictions[name] = model.predict(30)
# Evaluate
evaluator = ForecastEvaluator()
comparison = evaluator.compare_models(y_test.values, predictions)
print(comparison)python main.py --test-size 30 --n-splits 5 --horizon 14| Option | Description | Default |
|---|---|---|
--data |
Data source ('sample' or CSV path) | 'sample' |
--test-size |
Number of test observations | 30 |
--n-splits |
Number of CV splits | 5 |
--horizon |
Forecast horizon | 14 |
--output-dir |
Output directory | 'output' |
--no-plot |
Disable plotting | False |
Uses the last observed value (or seasonal value) for all future predictions.
from forecasting import NaiveForecaster
# Random walk forecast
model = NaiveForecaster(strategy="last")
# Seasonal naive (e.g., use same day last week)
model = NaiveForecaster(strategy="seasonal", seasonality=7)Predicts the historical mean (or rolling window mean).
from forecasting import MeanForecaster
# Full history mean
model = MeanForecaster()
# Rolling window mean
model = MeanForecaster(window=30)Extends the last value with the average historical trend.
from forecasting import DriftForecaster
model = DriftForecaster()AutoRegressive Integrated Moving Average model.
from forecasting import ARIMAForecaster
# Simple ARIMA
model = ARIMAForecaster(order=(1, 1, 1))
# Seasonal ARIMA (SARIMA)
model = ARIMAForecaster(
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12) # Monthly seasonality
)
# Get forecasts with confidence intervals
mean, lower, upper = model.predict_with_ci(30)Facebook's Prophet for time series with trends and seasonality.
from forecasting import ProphetForecaster
model = ProphetForecaster(
yearly_seasonality=True,
weekly_seasonality=True,
changepoint_prior_scale=0.05
)
# Prophet requires DatetimeIndex
model.fit(y) # y must be pd.Series with DatetimeIndexGradient boosting with lagged features.
from forecasting import XGBoostForecaster
model = XGBoostForecaster(
lags=12, # Number of lagged observations
n_estimators=100,
max_depth=6,
learning_rate=0.1,
include_trend=True # Add time index feature
)
# Get feature importance
importance = model.get_feature_importance()Long Short-Term Memory neural network.
from forecasting import LSTMForecaster
model = LSTMForecaster(
lags=12,
hidden_units=[50, 25], # Stacked LSTM layers
epochs=100,
batch_size=32,
dropout=0.2
)The system provides comprehensive evaluation metrics:
| Metric | Description | Interpretation |
|---|---|---|
| MAE | Mean Absolute Error | Average magnitude of errors |
| MSE | Mean Squared Error | Penalizes larger errors |
| RMSE | Root Mean Squared Error | In same units as data |
| MAPE | Mean Absolute Percentage Error | Error as percentage |
| sMAPE | Symmetric MAPE | Handles zero values better |
| MASE | Mean Absolute Scaled Error | Scaled by naive forecast |
| R² | Coefficient of Determination | Variance explained |
| Theil's U | Relative to naive | <1 better than naive |
| Bias | Mean Bias Error | Over/under-forecasting |
from forecasting.evaluation import ForecastEvaluator, evaluate
# Single evaluation
metrics = evaluate(y_true, y_pred, metrics=["mae", "rmse", "mape"])
# Compare multiple models
evaluator = ForecastEvaluator(metrics=["mae", "rmse", "mase"])
comparison = evaluator.compare_models(
y_test,
{"Naive": naive_pred, "ARIMA": arima_pred},
y_train=y_train # For MASE calculation
)Time series cross-validation ensures robust model evaluation.
Training set grows with each fold:
from forecasting.backtesting import TimeSeriesCrossValidator
cv = TimeSeriesCrossValidator(
n_splits=5,
horizon=10,
gap=0,
expanding=True
)
for train_idx, test_idx in cv.split(y):
y_train = y.iloc[train_idx]
y_test = y.iloc[test_idx]
# ... fit and evaluateFixed training set size:
cv = TimeSeriesCrossValidator(
n_splits=5,
horizon=10,
min_train_size=100,
expanding=False # Rolling window
)from forecasting.backtesting import run_backtest
results = run_backtest(
model,
y,
n_splits=5,
horizon=14,
metrics=["mae", "rmse"]
)
print(f"Mean RMSE: {results['mean_metrics']['rmse']:.4f}")Create synthetic time series for testing:
from forecasting.data_generator import TimeSeriesGenerator
gen = TimeSeriesGenerator(
n_samples=365,
start_date="2023-01-01",
freq="D",
random_seed=42
)
y = gen.generate(
trend="linear", # 'linear', 'exponential', 'polynomial'
trend_strength=0.5,
seasonality=[7, 365], # Weekly and yearly
seasonality_strength=0.3,
noise=True,
noise_std=0.02,
outliers=True, # Add some outliers
changepoints=True, # Add trend changepoints
)# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=forecasting --cov-report=html
# Run specific test file
pytest tests/test_statistical.py -v
# Skip slow tests
pytest tests/ -v -m "not slow"from forecasting import (
NaiveForecaster, MeanForecaster, DriftForecaster,
ARIMAForecaster, XGBoostForecaster
)
from forecasting.evaluation import ForecastEvaluator
from forecasting.data_generator import generate_sample_data
# Generate data
y = generate_sample_data(n_samples=500, random_seed=42)
y_train, y_test = y.iloc[:-50], y.iloc[-50:]
# Define models
models = {
"Naive": NaiveForecaster(),
"Mean": MeanForecaster(),
"Drift": DriftForecaster(),
"ARIMA": ARIMAForecaster(order=(1, 1, 1)),
"XGBoost": XGBoostForecaster(lags=12),
}
# Fit, predict, and compare
predictions = {}
for name, model in models.items():
model.fit(y_train)
predictions[name] = model.predict(50)
evaluator = ForecastEvaluator()
comparison = evaluator.compare_models(y_test.values, predictions)
print(comparison.round(4))from forecasting.backtesting import BacktestEngine, TimeSeriesCrossValidator
# Setup
cv = TimeSeriesCrossValidator(n_splits=10, horizon=7)
engine = BacktestEngine(cv, metrics=["mae", "rmse", "mase"])
# Compare models
models = {
"Naive": NaiveForecaster(),
"SNaive": NaiveForecaster(strategy="seasonal", seasonality=7),
"Mean": MeanForecaster(window=30),
}
results = engine.compare_models(models, y)
print(results)| Data Pattern | Recommended Models |
|---|---|
| Stationary | Mean, ARIMA |
| Trend | Drift, Prophet, XGBoost with trend |
| Seasonal | Prophet, Seasonal Naive, SARIMA |
| Complex patterns | XGBoost, LSTM |
| Limited history | Naive, Mean |
| Many series | Naive (as benchmark), XGBoost |
- Always use baselines: Compare sophisticated models against naive/mean forecasts
- Use cross-validation: Single train/test splits can be misleading
- Check MASE: Values > 1 indicate the model is worse than naive
- Consider bias: Positive bias = under-forecasting, negative = over-forecasting
- Respect temporal order: Never use future data in training
- Required: numpy, pandas, scikit-learn
- Statistical Models: statsmodels, prophet
- ML Models: xgboost, tensorflow (optional)
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License
- Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: Principles and Practice, 3rd edition
- Taylor, S.J., & Letham, B. (2018) Forecasting at Scale, The American Statistician
- Box, G.E.P., & Jenkins, G.M. (1976) Time Series Analysis: Forecasting and Control