Skip to content

Port response-quality scoring and consensus fan-out to TypeScript#66

Merged
BillJr99 merged 3 commits into
masterfrom
claude/pr-changes-pi-extension-wwEZP
May 24, 2026
Merged

Port response-quality scoring and consensus fan-out to TypeScript#66
BillJr99 merged 3 commits into
masterfrom
claude/pr-changes-pi-extension-wwEZP

Conversation

@BillJr99
Copy link
Copy Markdown
Owner

Summary

This PR ports the Python harness's response-quality scoring and stochastic consensus logic into the Pi extension as real, enforceable tools. Previously, orchestration policy lived only in the chief's prompt; now it's enforced in code so the chief cannot accidentally skip critical steps like parallel sampling, quality validation, or Ralph branch creation.

Key Changes

  • Response-quality scorer (src/quality.ts): TypeScript port of clk_harness/orchestration/response_quality.py. Detects empty responses, refusals, malformed blocks, low confidence, and missing declared outputs. Generates repair hints for recoverable failures so re-rolls fix specific issues rather than re-rolling at random.

  • Consensus and quality-dispatch primitives (src/consensus.ts):

    • dispatchWithQuality() — wraps a single subagent dispatch with automatic quality re-dispatch loop (up to maxRetries attempts with repair preambles)
    • runConsensus() — fans out N parallel tmux subagent samples, scores each, returns the highest-scoring winner plus all candidates for traceability
  • New orchestration tools (src/tools.ts):

    • clk_consensus — fan-out N parallel samples (default 3, clamped 1..6) for high-stakes decisions
    • clk_subagent_quality — single subagent with quality gate and automatic repair re-rolls
    • clk_autoresearch — bounded researcher + critic alternation (default 2 iterations)
    • clk_ralph — one-call Ralph iteration: creates branch, fans out consensus, returns winner (branch creation and fan-out happen in one step and cannot be skipped)
  • Git push integration (src/git.ts):

    • hasRemote() — check if a remote exists
    • commitsAhead() — count local commits not yet on upstream
    • pushBestEffort() — best-effort git push that never throws, returns success/failure with reason
    • pushIfEnabled() helper in tools.ts auto-pushes on CLK_GITHUB_PUSH_ON_COMMIT=true and surfaces ↑N ahead count in status bar
  • Updated chief primer (src/prompts.ts): New dispatch tool quick reference explaining when to use each tool (subagent vs. quality vs. consensus vs. autoresearch vs. ralph).

  • Comprehensive test coverage:

    • tests/quality.test.ts — unit tests for scoring logic (mirrors Python harness tests)
    • tests/consensus.test.ts — tests for quality-dispatch loop and consensus fan-out with injectable spawn function
    • tests/git.test.ts — tests for remote/push/ahead helpers
    • Updated tests/index.test.ts and tests/prompts.test.ts to verify new tools are registered
  • Documentation (pi-extension/README.md): Expanded with tool reference section, quality-scoring rules, and clarification that orchestration policy is now enforced in code, not just in the prompt.

Notable Implementation Details

  • Quality scorer uses pure regex/string operations (no I/O) so it's fast and testable without mocking.
  • Repair hints quote every failure reason back to the worker so it understands what to fix.
  • Consensus fan-out uses a configurable maxParallel to limit concurrent tmux sessions (default min(4, samples)).
  • All push operations are best-effort and never block tool results on git bookkeeping failures.
  • The chief can no longer accidentally skip Ralph branch creation or consensus fan-out by misreading the prompt — the tools enforce the shape.

https://claude.ai/code/session_012nKhcka2fhuazbVbhQpRm1

claude added 3 commits May 24, 2026 16:19
…arness

Two of the recent CLK harness PRs have a direct parallel in pi-extension:

* push-on-commit + ahead counter (756723c). pi-extension already commits
  every clk_checkpoint / clk_merge call, but never pushes — a
  remote-backed Pi workspace silently accumulated local commits.
    - src/git.ts: hasRemote, commitsAhead, pushBestEffort (best-effort,
      never throws; mirrors clk_harness/git_ops.py).
    - src/tools.ts: pushIfEnabled helper called after clk_checkpoint and
      clk_merge. Gated on CLK_GITHUB_PUSH_ON_COMMIT=true to match the
      Python TUI; surfaces an ↑N ahead count on push failure or when
      auto-push is disabled but commits exist.
    - src/index.ts: /clk-doctor now reports the ahead count and warns
      when local commits haven't reached origin.

* multi-line objective truncation (24f379b). idea.slice(0, 60) was being
  done before splitting on newlines, so a multi-line idea could leak a
  fragment of line 2 into the status bar.
    - src/index.ts: new firstLineShort helper, used at every
      ctx.ui.setStatus("clk-idea", …) site and in /clk-doctor.

Tests: tests/git.test.ts covers no-remote/sync/unreachable cases for
pushBestEffort and commitsAhead. tests/index.test.ts asserts
firstLineShort returns single-line, capped output for multi-line input.
… Ralph

Ports the Python harness's orchestration loops into the TypeScript
extension so the chief can drive real code-enforced fan-out instead of
having to fan-out by emitting parallel clk_subagent calls and hoping it
followed the prompt.

src/quality.ts (new)
  Port of clk_harness/orchestration/response_quality.py. Pure regex /
  string scorer — no I/O, no provider calls. Detects empty bodies,
  refusal phrases, malformed ACTION / POST blocks, missing declared
  POST PRODUCES keys, low CONFIDENCE: <n> values, and NEEDS_REVIEW:
  true. Exposes scoreResponse, repairHint, isRecoverable, summarise.

src/consensus.ts (new)
  Two primitives, both with an injectable spawn function so tests can
  drive them without tmux / pi installed:
    * dispatchWithQuality — wraps a single spawnSubagent in the
      quality re-dispatch loop. Re-runs with a repair-preamble
      preface on every recoverable failure up to maxRetries.
    * runConsensus — fan-out N parallel tmux samples for the same
      task, score each, return all + the winner. Pool runner caps
      concurrent in-flight sessions via maxParallel.

src/subagent.ts
  Exposes spawnSubagent + SpawnOptions so consensus.ts can call them.
  Behaviour unchanged.

src/tools.ts (+428 LOC)
  Four new tools registered alongside the existing roster:
    * clk_subagent_quality — one subagent + quality re-rolls.
    * clk_consensus       — N samples, scored, winner returned.
    * clk_autoresearch    — researcher + critic alternation
                            (iterations are recorded on progress.md).
    * clk_ralph           — branch + consensus fan-out in one call;
                            the chief then calls clk_merge or
                            clk_revert based on validation.
  Each tool surfaces a structured details payload so the chief sees
  scores, attempts, and flags rather than just the winning text.

src/prompts.ts
  Updated chief primer to direct the chief through the new tools
  (Dispatch tool quick reference, restated rules 3, 4, 5A). The old
  "emit 3-5 clk_subagent calls in the same message" guidance is
  replaced by "call clk_consensus" so fan-out is enforced in code,
  not by chief compliance.

src/index.ts
  /clk-help lists every orchestration tool and notes the
  CLK_GITHUB_PUSH_ON_COMMIT auto-push behaviour landed in the prior
  commit.

Tests: 24 new tests across quality.test.ts (happy paths, every failure
mode, repairHint / isRecoverable / summarise) and consensus.test.ts
(injected spawn covers ok / retry / max-retries / non-recoverable
refusal / fan-out winner picking / sample clamping / error capture /
maxParallel concurrency). index.test.ts and prompts.test.ts updated to
assert the new tools are registered and named in the chief primer.
All 94 tests pass, typecheck clean.
…, doctor

Updates both READMEs to reflect the orchestration work that just landed
in pi-extension and the recent main-line PRs (push-on-commit, doctor /
diag CLI, multi-line truncation fix) that already shipped to master but
weren't fully cross-referenced.

pi-extension/README.md (full rewrite, +293 net lines)
  * Replaces the "8 small tools" narrative with a proper Tool Reference
    that groups roster / dispatch / iterative-refinement and explains
    when to pick clk_subagent vs clk_subagent_quality vs clk_consensus
    vs clk_autoresearch vs clk_ralph.
  * New "Response-quality scoring" section listing every flag the
    detector raises and how the repair-preamble loop quotes them back
    to the worker. Cross-references the Python harness's
    response_quality.py so behaviour drift between the two
    implementations is one diff away from being noticed.
  * New "Auto-push (opt-in)" section covering CLK_GITHUB_PUSH_ON_COMMIT,
    the ↑N ahead counter, and the pre-push secret-scanner interaction.
  * Commands table extended with /clk-help, /clk-doctor, /clk-undo
    (these existed in the code but the README only listed /clk and
    /clk-abort).
  * "What you keep / what changes" tables rewritten: stochastic
    consensus, quality re-dispatch, and Ralph refinement are now
    described as code-enforced (not chief-compliance dependent), and
    the comparison row about robustness loops names the new tools as
    the per-call equivalents of the Python harness's
    clk.config.json::robustness.* knobs.
  * Repository layout updated with src/quality.ts, src/consensus.ts,
    the new test files, and explicit per-file purposes.
  * "Testing" section reflects the real 96-test count and notes the
    suite runs entirely offline (consensus tests inject a fake spawn).

README.md (main) — targeted updates
  * Pi extension section: brief but accurate rundown of the new
    orchestration tools, a Commands table that matches /clk-help, the
    CLK_GITHUB_PUSH_ON_COMMIT env var, and an updated example
    transcript that uses clk_consensus / clk_autoresearch / clk_ralph
    by name rather than the "fans out to 3 subagents" abstraction.
  * Layout section: pi-extension/ subtree expanded to show every src/
    file with a one-line purpose, including the new quality.ts and
    consensus.ts.
  * Testing section: pi-extension test count corrected from 53 to 96
    (~1s → ~2s), and the per-suite description rewritten to name the
    new modules (quality / consensus / git auto-push helpers /
    firstLineShort) so a contributor browsing the README knows what
    is and isn't covered.
@BillJr99 BillJr99 merged commit 66d8ee6 into master May 24, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants