Proposal: ensemble + judge-refinement loop for early pipeline tasks by 82deutschmark · Pull Request #395 · PlanExeOrg/PlanExe

82deutschmark · 2026-03-27T20:49:14Z

Three-stage ensemble + judge-refinement loop for early pipeline tasks.

N non-reasoning models run in parallel (candidates)
Reasoning model scores all N + identifies gaps (short output)
If best score below threshold: retry candidates with gap hints

Targets PremiseAttackTask and RedlineGateTask first — where quality propagates most downstream. Extends existing LLMExecutor.max_validation_retries pattern to quality-based retries. Backward-compatible with local/Ollama (sequential, no judge step if no judge model configured).

See docs/proposals/ensemble-judge-refinement.md for full design + open questions.

Follows PR #393 (parallel racing proposal).

…tasks Three-stage design per Simon's direction: 1. N non-reasoning candidates run in parallel 2. Reasoning model judges all N, scores + gap hints (short output) 3. Conditional retry with gap hints if below threshold Applies highest-leverage to early tasks (PremiseAttack, RedlineGate, ProjectPlan) where quality propagates downstream. Extends existing LLMExecutor.max_validation_retries pattern to quality-based retries. Backward-compatible with local/Ollama setups.

neoneye merged commit 093bc1d into PlanExeOrg:main Mar 28, 2026
3 checks passed

neoneye deleted the proposal/ensemble-judge-refinement branch March 28, 2026 20:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: ensemble + judge-refinement loop for early pipeline tasks#395

Proposal: ensemble + judge-refinement loop for early pipeline tasks#395
neoneye merged 1 commit intoPlanExeOrg:mainfrom
VoynichLabs:proposal/ensemble-judge-refinement

82deutschmark commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

82deutschmark commented Mar 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants