Skip to content

benchflow 0.6.4

Latest

Choose a tag to compare

@github-actions github-actions released this 27 Jun 22:40
2252a5d

What's Changed

  • Refresh install guidance to track latest BenchFlow by @bingran-you in #798
  • fix(agents): route OpenCode-family + pi-acp agents through the LLM usage proxy by @Yiminnn in #797
  • feat(eval): bind the environment (S) and config (C) axes at the CLI by @xdotli in #790
  • feat(cli): move continue under bench eval continue (keep deprecated top-level alias) by @xdotli in #800
  • fix(agents): resolve bare model ids to their provider so harnesses route correctly by @xdotli in #805
  • ci(integration): Add tiered L0-L3 integration gates with scope planner and codex review by @Yiminnn in #802
  • fix(integration): repair the full L0–L3 workflow + ready-to-merge codex auto-trigger by @Yiminnn in #806
  • fix(integration): add uv.lock to plan-job sparse-checkout (setup-uv cache) by @Yiminnn in #807
  • fix(integration): install codex CLI + isolate codex auth from the deepseek judge by @Yiminnn in #808
  • fix(integration): pin codex model + demote false pinned-baseline parity blocker by @Yiminnn in #809
  • fix(integration): codex reviewer on gpt-5.5 (xhigh) + evidence serialization + R-OUTCOME demote by @Yiminnn in #810
  • feat(integration): codex reviewer on DeepSeek-v4-pro via Moon Bridge by @Yiminnn in #811
  • fix(integration): pin moon-bridge + injection-safe key + clean fail-closed on bridge absence by @Yiminnn in #812
  • fix(integration): calibrate L3 gate — slot matching, V-TAMPER false-positive, codex robustness by @Yiminnn in #814
  • Add MLE-bench adapter by @ZhengShenghan in #792
  • Add adapter skill by @ZhengShenghan in #793
  • fix(integration): clear residual greptile findings on the L3 gate by @Yiminnn in #817
  • fix(eval): bind resolved S-axis env + C-axis overlay on the sharded and run-config paths by @xdotli in #804
  • Make LiteLLM proxy mandatory for routable agents (never bypass; always capture usage/cost/trajectory) by @bingran-you in #820
  • Strengthen experiment review trajectory gate by @bingran-you in #821
  • fix(eval): correct verifier-error resume log by @bingran-you in #819
  • fix(eval): expose context-root on eval run by @bingran-you in #816
  • Preserve pi-acp model metadata through LiteLLM proxy by @bingran-you in #803
  • fix(integration): avoid file-editor judge false positives by @bingran-you in #823
  • fix(integration): audit summaryless result roots by @bingran-you in #824
  • fix(eval): reject .git and file --source-path with a clean error (#548) by @bingran-you in #822
  • fix(pi): avoid context-window retry storms by @bingran-you in #831
  • fix(eval): surface provider failure cause and harden trajectory redaction by @Yiminnn in #834
  • feat(agents): all-paths decouple core — manifest loader (8 ACP) + omnigent session-factory seam (additive, gated) by @Yiminnn in #825
  • fix(eval): keep failure traceback consistent with surfaced error.message by @Yiminnn in #835
  • test(loader): lock 3-path (acp/ ai-sdk/ omnigent/) manifest discovery by @Yiminnn in #836
  • test(agents): lock core<->manifest byte-identical parity + CI gate by @Yiminnn in #837
  • fix(eval): emit llm_trajectory.jsonl for streaming claude-agent-acp rollouts by @Yiminnn in #839
  • Add train convert Prime SFT export by @bingran-you in #828
  • chore: release v0.6.4 by @xdotli in #801

New Contributors

Full Changelog: v0.6.3...v0.6.4