CCF v1.5.0
[1.5.0] — 2026-06-17
Benchmark slash commands + fair comparison design.
Added
/fusion-benchmarkslash command: runs SOLO (orchestrator answers directly)
vs FUSION (panel → orchestrator judges → synthesizes) on 5 coding tasks.
Sequential only — no parallel panelist calls, respects rate limits. Outputs
markdown files per task tobenchmark/results/./fusion-benchmark-reportslash command: generatesREPORT.mdfrom
benchmark results. Grades both arms on the rubric (correctness/completeness/
blind spots/code quality), produces blind-spot analysis table, cost comparison,
and honest verdict with caveats about orchestrator/judge bias.benchmark/run-benchmark.sh: standalone sequential data collector. Calls
each enabled panelist one at a time with identical prompts, saves raw responses.
Use without Claude Code for CI or batch collection.- Fair comparison design: SOLO = orchestrator answers alone (no panelists
called). FUSION = orchestrator calls panelists → judges. This isolates the
panel's value-add. Documents the bias warning: if orchestrator = same model
family as a panelist, the test is not fair.
[1.4.0] — 2026-06-17
Five new features: Codex/GPT-5.5 panelist (#10), Tavily web search (#11), fusion verification
fixes (#12), analytics dashboard (#13), benchmark suite (#14).
Added
- Codex/GPT-5.5 panelist (#10): new
codex-responsestransport. Auto-reads~/.codex/auth.json,
extracts ChatGPT-Account-ID from JWT, injects Cloudflare headers (User-Agent: codex_cli_rs/0.0.0,
originator,ChatGPT-Account-ID). Requires SSE streaming (stream:true),store:false, and
instructionsfield. Does NOT supportmax_output_tokens. Disabled by default — enable in panel.json. - Tavily web search (#11):
--searchflag onfusion-callenables multi-turn tool calling.
Panelists request searches via standard function calling → CCF calls Tavily locally → results fed
back. Max 3 rounds (override withMAX_SEARCH_TURNS). Zero change without--search. - Analytics dashboard (#13):
/fusion-analyticsslash command. Pure bash + jq text dashboard
readingfusion.log. Shows total runs, success rate, per-panelist latency + ok%, cost saved vs
OpenRouter Fusion ($0.50/call), recent errors. - Benchmark suite (#14): 5 real coding tasks (bug fix, security, refactor, architecture,
concurrency) with 0-100 grading rubric. Two new slash commands:/fusion-benchmarkruns
SOLO (orchestrator answers directly) vs FUSION (panel → orchestrator judges) sequentially;
/fusion-benchmark-reportgenerates comparison REPORT.md with blind-spot analysis. Standalone
benchmark/run-benchmark.shcollects raw panelist responses without AI judgment. Sequential
only — no parallel calls, respects rate limits.
Changed
- Fusion verification (#12): Judge JSON schema now matches OpenRouter (
partial_coverage,
unique_insights).FUSION CONFIRMED: N/Nline is mandatory.failed_modelstracked. Judge
instructed to verify claims via tools. Warning when <2 panelists respond.