You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Build
Drop path override on rig-memvid dev-dep so release-plz can resolve crates.io fallback
CI
Add release-plz workflow and config
Added
Bootstrap confidence intervals on MetricReport::mean: new MetricCi
type plus MetricReport::bootstrap_ci(iterations, level, seed) and
builder MetricReport::with_bootstrap_ci(...). Uses a deterministic
SplitMix64 stream (no rand dependency) so the same seed reproduces
the same interval. MetricCi is #[serde(default, skip_serializing_if = "Option::is_none")] so existing JSON reports
stay schema-compatible. MetricDelta carries current_ci / baseline_ci alongside the existing mean delta.
Non-zero CI exit support on ReportDiff: is_clean(&gate) -> bool
and exit_code(&gate) -> i32 (0 = pass, 1 = regression). Lets eval
binaries gate CI with std::process::exit(diff.exit_code(&gate))
without rebuilding the regression-walking logic at every call site.
observe feature: emit MultiReport and ReportDiff as rig-tap-compatible JSON envelopes on the rig_tap tracing target
without depending on rig-tap. Introduces EvalEnvelope + EvalKind
(eval.retrieval_report, eval.regression_diff), report_envelopes, diff_envelopes, and tracing helpers emit_report / emit_diff. Each event carries the full JSON envelope
plus stable scalar rig_tap.kind / rig_tap.metric / rig_tap.regressed / rig_tap.conversation_id attributes so existing
OpenTelemetry collectors route eval reports the same way they route
prompt and tool events.
FreshnessReport and FreshnessQueryRollup for rolling StalenessReport / ConflictReport outputs into MultiReport. MultiReport::with_freshness attaches the dataset/per-query rollup, while with_freshness_metrics also appends score-like freshness.stale_free_rate@k and freshness.conflict_free_rate@k metric
rows so freshness regressions trip the existing RegressionGate / diff
path.
Opt-in MinHash-style chunk near-duplicate linting via NearDuplicateLintConfig, NearDuplicatePair, ChunkStats::near_duplicate_pairs, and ChunkLintWarning::NearDuplicateChunks. The default remains disabled so
existing ingestion gates keep their current warning set until hosts opt in.