Skip to content

v0.4.0

Choose a tag to compare

@github-actions github-actions released this 10 May 16:10
· 152 commits to main since this release
2972389

Added

  • Backend run groups now persist grouped Run executions, instantiate selected templates per target, launch child runs concurrently, expose /run-groups create/read/cancel endpoints, and isolate per-target failures.
  • Results now has a run-backed /results-view/query API and /results-view/runs/:runId detail API for the merged Dashboard/History experience, including filter metadata, scorecards, chart series, recent runs, dense history rows, and drawer data.
  • Evaluation detail is now available at GET /evaluations/:evaluationId so leaderboard rows can open a detail drawer for the representative evaluation.
  • Inference parameter presets are now persisted through /inference-param-presets CRUD endpoints and exposed in the shared frontend context bar.
  • Evaluate now has a queue API backed by completed test_results, with source-linked scoring and skip persistence while preserving the existing five 1-5 leaderboard score fields.

Changed

  • CI, release, and local Node version guidance now target Node.js 25 while declaring the supported runtime range as >=22.19 <26, matching Undici 8 requirements without claiming Node 26 support before native SQLite dependencies allow it.
  • better-sqlite3 is now pinned to the latest verified 12.9 release line for the current Node runtime window.
  • Frontend styling now loads the new design-system foundation tokens, vendored IBM Plex fonts, and shared component primitives for cards, buttons, inputs, health pills, metrics, and architecture-tree surfaces.
  • The frontend shell now uses React Router with a 220px always-expanded five-item sidebar, URL-backed Catalog/Results sub-tabs, legacy route redirects, and sidebar health/count status instead of the former global metric-card header.
  • Catalog now replaces the legacy Inference Servers and Models bodies with a merged Servers/Models funnel, URL-backed server/model filters, server health view, slide-over add/edit drawer, card grids, and a full-width model inspector layout.
  • Run now uses a unified 1-8 model workflow with query-backed model chips, shared template/options controls, single-target detail rendering, multi-target comparison columns, and summary aggregation.
  • Results now uses a single merged Dashboard/Leaderboard/History page with a shared 240px filter rail, URL-owned tab/filter/sort/pagination/detail state, export/share/reset actions, run detail drawers for Dashboard and History, and evaluation detail drawers for Leaderboard.
  • Package 06 polish adds shared reg-lights, a persistent inference context bar on Run/Templates/Results/Evaluate, a two-pane Templates layout, and a manual Evaluate scoring queue.
  • Run, Templates, Results, and Evaluate now share merged page headers with the inference context bar aligned directly below the page header.
  • Results now uses a full-width staged funnel with relationship-aware Servers -> Models -> Tests/range filtering, a full-width empty dashboard state, and downstream pruning when upstream selections change.
  • Results and Catalog Models funnels now share numbered stages, aligned Clear/Collapse controls, Catalog-style collapsible rail treatment, and persisted collapse state.
  • Results Tests/range and Catalog Models filter rails now use scoped Clear actions that preserve upstream selections while clearing only the filters owned by that rail.
  • Leaderboard remains backed by evaluations while accepting server, model, score range, sort, and group query parameters, including grouping by server and inference_config.quantization_level.
  • Inference server authentication can now use stored raw bearer/custom-header tokens for backend probes and runs while preserving the existing token_env fallback.

Fixed

  • Backend Vitest runs now ignore production SQLite database defaults, use a dedicated backend-test.sqlite by default, and fail fast if a backend test tries to open the production DB.
  • Backend proxy support now sends plain HTTP outbound requests to the configured proxy in absolute-form while retaining CONNECT tunneling for HTTPS targets, routes backend outbound fetches through the configured Undici dispatcher directly, and no longer lets process-level NO_PROXY bypass backend proxy routing unless AITESTBENCH_INFERENCE_NO_PROXY is set.
  • Inference server API responses now mask stored raw auth tokens and expose only token presence metadata.