Skip to content

Proposal: ensemble + judge-refinement loop for early pipeline tasks#395

Merged
neoneye merged 1 commit intoPlanExeOrg:mainfrom
VoynichLabs:proposal/ensemble-judge-refinement
Mar 28, 2026
Merged

Proposal: ensemble + judge-refinement loop for early pipeline tasks#395
neoneye merged 1 commit intoPlanExeOrg:mainfrom
VoynichLabs:proposal/ensemble-judge-refinement

Conversation

@82deutschmark
Copy link
Copy Markdown
Collaborator

Three-stage ensemble + judge-refinement loop for early pipeline tasks.

  1. N non-reasoning models run in parallel (candidates)
  2. Reasoning model scores all N + identifies gaps (short output)
  3. If best score below threshold: retry candidates with gap hints

Targets PremiseAttackTask and RedlineGateTask first — where quality propagates most downstream. Extends existing LLMExecutor.max_validation_retries pattern to quality-based retries. Backward-compatible with local/Ollama (sequential, no judge step if no judge model configured).

See docs/proposals/ensemble-judge-refinement.md for full design + open questions.

Follows PR #393 (parallel racing proposal).

…tasks

Three-stage design per Simon's direction:
1. N non-reasoning candidates run in parallel
2. Reasoning model judges all N, scores + gap hints (short output)
3. Conditional retry with gap hints if below threshold

Applies highest-leverage to early tasks (PremiseAttack, RedlineGate,
ProjectPlan) where quality propagates downstream.

Extends existing LLMExecutor.max_validation_retries pattern to
quality-based retries. Backward-compatible with local/Ollama setups.
@neoneye neoneye merged commit 093bc1d into PlanExeOrg:main Mar 28, 2026
3 checks passed
@neoneye neoneye deleted the proposal/ensemble-judge-refinement branch March 28, 2026 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants