Releases: Parisi-Labs/forecastops
v0.2.1
Patch release adding the agent-facing diagnosis layer and fixing benchmark scoring.
Added
fops.diagnose(run_id)(andfops diagnose <run_id>) — a compact, machine-readable run diagnosis: overall metrics, skill vs. benchmark, worst horizons / series / regimes, validation, and artifact URIs. Worst-series is aggregated server-side over the full forecast artifact.- Richer capture traces — the root
forecast.runspan now carries the run's semantic context (adapter, group, cutoff/target ranges, horizon, series/points counts, validation status, artifact URIs). fops backtestCLI command, and agroupoption onfops capture.
Fixed
compare()no longer computes benchmark-side pinball from the model's quantile columns.sMAPEnow earns a skill score vs. the benchmark.- The local UI reports the installed package version instead of a hardcoded
0.1.0.
See CHANGELOG.md for details.
⚠️ Pre-release: still on the 0.x line; APIs may change.
v0.2.0
Second release of ForecastOps — the experiment loop and diagnostics layer on top of the 0.1.0 capture/evaluate foundation.
Added
fops.backtest(...)— evaluate a rolling-origin forecast panel as one grouped backtest, with per-cutoff and aggregate (mean/std) metrics.- Experiment groups —
capture(group="...")tags related runs into a named group, with a Groups view and a group detail page showing per-metric mean ± std and stability across runs. - New metrics — sMAPE (scale-free ratio) and pinball/quantile loss (averaged over
yhat_p<level>columns). - Regime slicing — metrics sliced by any categorical columns kept via a schema's
extra_columns(region, holiday_flag, event_type, …), so error-by-regime breakdowns appear automatically. - Diagnostics cockpit on the run detail page — residual distribution, error by horizon, per-series worst offenders, and per-regime breakdowns.
Changed
- The Nixtla adapter parses
<model>-lo-<level>/<model>-hi-<level>prediction-interval columns into interval bounds and per-level quantile columns, so coverage, interval width, and pinball loss work for statsforecast/neuralforecast outputs.
Existing local stores are migrated in place to add the group columns. See CHANGELOG.md for details.
⚠️ Pre-release: still on the 0.x line; APIs may change.
v0.1.0
Initial public pre-release of ForecastOps — a local-first observability and evaluation layer for production forecasts.
Highlights
fops.capture()normalizes forecasts from existing pipelines into local Parquet artifacts with a DuckDB run index;evaluate,compare, anddiffcompute horizon-aware metrics, benchmark skill, and run-to-run deltas.- Local read-only UI (
fops ui) with Runs, run detail (per-series forecast inspector), Projects (error trends across captures), and Compare views. - Static HTML reports (
fops report) and anfopsCLI for capture, lint, evaluate, diff, and report workflows. - Optional OpenTelemetry export of aggregate metrics and capture traces — off by default, never includes raw forecast points.
Privacy defaults
- UI binds to
127.0.0.1and refuses other hosts unless--allow-remoteis passed - No outbound network calls; raw forecast points never leave the local store
See CHANGELOG.md for the full release notes, including the pre-release hardening pass (merge safety, DuckDB lock handling, query pushdown, real OTel export).
⚠️ Pre-release: APIs may change before a stable 0.x line is established.