Release benchflow 0.6.4 · benchflow-ai/benchflow

What's Changed

Refresh install guidance to track latest BenchFlow by @bingran-you in #798
fix(agents): route OpenCode-family + pi-acp agents through the LLM usage proxy by @Yiminnn in #797
feat(eval): bind the environment (S) and config (C) axes at the CLI by @xdotli in #790
feat(cli): move continue under bench eval continue (keep deprecated top-level alias) by @xdotli in #800
fix(agents): resolve bare model ids to their provider so harnesses route correctly by @xdotli in #805
ci(integration): Add tiered L0-L3 integration gates with scope planner and codex review by @Yiminnn in #802
fix(integration): repair the full L0–L3 workflow + ready-to-merge codex auto-trigger by @Yiminnn in #806
fix(integration): add uv.lock to plan-job sparse-checkout (setup-uv cache) by @Yiminnn in #807
fix(integration): install codex CLI + isolate codex auth from the deepseek judge by @Yiminnn in #808
fix(integration): pin codex model + demote false pinned-baseline parity blocker by @Yiminnn in #809
fix(integration): codex reviewer on gpt-5.5 (xhigh) + evidence serialization + R-OUTCOME demote by @Yiminnn in #810
feat(integration): codex reviewer on DeepSeek-v4-pro via Moon Bridge by @Yiminnn in #811
fix(integration): pin moon-bridge + injection-safe key + clean fail-closed on bridge absence by @Yiminnn in #812
fix(integration): calibrate L3 gate — slot matching, V-TAMPER false-positive, codex robustness by @Yiminnn in #814
Add MLE-bench adapter by @ZhengShenghan in #792
Add adapter skill by @ZhengShenghan in #793
fix(integration): clear residual greptile findings on the L3 gate by @Yiminnn in #817
fix(eval): bind resolved S-axis env + C-axis overlay on the sharded and run-config paths by @xdotli in #804
Make LiteLLM proxy mandatory for routable agents (never bypass; always capture usage/cost/trajectory) by @bingran-you in #820
Strengthen experiment review trajectory gate by @bingran-you in #821
fix(eval): correct verifier-error resume log by @bingran-you in #819
fix(eval): expose context-root on eval run by @bingran-you in #816
Preserve pi-acp model metadata through LiteLLM proxy by @bingran-you in #803
fix(integration): avoid file-editor judge false positives by @bingran-you in #823
fix(integration): audit summaryless result roots by @bingran-you in #824
fix(eval): reject .git and file --source-path with a clean error (#548) by @bingran-you in #822
fix(pi): avoid context-window retry storms by @bingran-you in #831
fix(eval): surface provider failure cause and harden trajectory redaction by @Yiminnn in #834
feat(agents): all-paths decouple core — manifest loader (8 ACP) + omnigent session-factory seam (additive, gated) by @Yiminnn in #825
fix(eval): keep failure traceback consistent with surfaced error.message by @Yiminnn in #835
test(loader): lock 3-path (acp/ ai-sdk/ omnigent/) manifest discovery by @Yiminnn in #836
test(agents): lock core<->manifest byte-identical parity + CI gate by @Yiminnn in #837
fix(eval): emit llm_trajectory.jsonl for streaming claude-agent-acp rollouts by @Yiminnn in #839
Add train convert Prime SFT export by @bingran-you in #828
chore: release v0.6.4 by @xdotli in #801

New Contributors

@ZhengShenghan made their first contribution in #792

Full Changelog: v0.6.3...v0.6.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benchflow 0.6.4

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!