Multi-turn + rich-iteration agent_input for Clone MCP by Turtle-Hwan · Pull Request #2 · cloneisyou/clone-loop

Turtle-Hwan · 2026-05-12T06:54:28Z

Summary

Three things land in this PR, each measured against the live Clone MCP endpoint:

Multi-turn agent_input — the original prompt plus every prior Clone-injected user turn (predictions + auto-answered AskUserQuestion Q/A) reconstructed from clone-loop.history.local.jsonl. User-turn window capped at 20.
Rich current-iteration footer — what Claude did this iteration is now sent as a chronological timeline of text + [tool_use] <Name>: <args> + [tool_result <Name>]: ... blocks (long tool results summarized head + tail) instead of a single concatenated text blob.
Prior-iteration timelines — under each prior ### user (clone-prediction): marker, Clone now also sees the per-iter assistant timeline that produced it. Combined prior-iter payload is char-capped; oldest iters drop first when over budget.

Why it matters (measured)

Live 5x A/B/C against https://api.clone.is/mcp with the demo token, identical scenario:

	input chars	confidence mean	confidence median	auto (>=0.8) hit	hallucinated next-step
A (text-only baseline)	830	0.7385	0.7275	0/5	60% (suggested PUT when only PATCH existed)
B (rich, this iter only)	1623	0.7647	0.7825	1/5	60% (same PUT hallucination — this-iter Read alone wasn't enough)
C (rich + prior iters)	3215	0.7757	0.7825	1/5	0%

Prior-iter context erased the PUT hallucination entirely; every C run predicted PATCH/DELETE coverage correctly. Predictions also got longer and more specific (avg 71 chars vs 56).

Test plan

pnpm test -> 23/23 passing, including new rich-iteration cases.
Live A/B/C smoke via node scripts/manual-e2e-compare3.mjs -> documented above.
Reviewer: confirm PRIOR_ITER_TOTAL_CHAR_CAP = 12000 is the right budget. Combined with the 20-turn user window and 1200-char per-block cap, worst case is roughly 14-18K chars (~4-5K tokens) per prediction.
Reviewer: verify the README rewrite still answers the basics for new users.

What changed

scripts/conversation-context.mjs — new helpers (iterationBlocksThisIteration, loadIterationBoundaries, iterationTimelinesByBoundary, formatIterationBlocks) and extended formatConversationHistory({iterationBlocks, priorIterTimelines, ...}).
hooks/stop-hook.mjs, hooks/ask-user-question-hook.mjs — wired to feed transcript-derived timelines.
tests/stop-hook-v2.test.mjs — 2 new cases for tool_use/tool_result and head/tail summarization.
scripts/manual-e2e-compare.mjs, scripts/manual-e2e-compare3.mjs — live A/B and A/B/C probes.
scripts/manual-e2e-multiturn.mjs — updated to use the rich builder.
README.md — full rewrite for clarity and adoption (single canonical install block, 5-step How It Works, 3-bullet "What Clone actually sees").

🤖 Generated with Claude Code

Stop hook and AskUserQuestion hook now build agent_input from: - the original 1-turn user prompt (always preserved) - chronological Clone-injected user turns (predictions + auto-answers) reconstructed from clone-loop.history.local.jsonl - all assistant text blocks emitted during the current iteration, extracted from the transcript via timestamp filtering Capped at HISTORY_WINDOW_TURNS = 20 to prevent token blow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous formatConversationHistory rendered every current-iteration assistant text twice: once inside the windowed history list and once in the dedicated "assistant (current iter N)" footer. That doubled the token cost and risked confusing the predictor. Window only the Clone-injected user turns; render this-iteration assistant text exactly once in the footer block. Verified against the live Clone MCP endpoint: prediction reasoning now cites prior turns correctly without the duplicate context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Removed test cases that asserted the contents of static config files (plugin.json, hooks.json, command markdown frontmatter) without exercising real code paths. Removed redundant token-precedence cases that ran in three different files, keeping the canonical pair in stop-hook-v2 and api-key-manager. Removed the live-MCP smoke (tests/remote-mcp-connect.test.mjs) since the e2e suite already exercises the same RPC surface, and removed the e2e invalid-args case since malformed-input handling is the MCP server's concern, not the plugin's. Added scripts/manual-e2e-multiturn.mjs as a manual debugging tool that exercises the real conversation-context builder against the live Clone MCP endpoint. pnpm test: 43 -> 21 cases, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

conversation-context.mjs gains: - iterationBlocksThisIteration: pulls assistant text + tool_use + tool_result blocks for the current iteration window with summarized multi-line tool output (HEAD/TAIL constants). - loadIterationBoundaries + iterationTimelinesByBoundary: split the transcript into per-iteration timelines using loop-history JSONL ts boundaries. - formatConversationHistory: accept iterationBlocks for the current-iter footer and priorIterTimelines for per-iter assistant blocks rendered under each user (clone-prediction) marker, with a total-char cap that drops oldest iters whole when exceeded. Live A/B/C against Clone MCP showed prior-iter timelines eliminate hallucinated next-step suggestions (e.g. "PUT /todos" when the codebase only has PATCH) and lift mean confidence from 0.74 -> 0.78. Stop-hook rich-iteration cases added to tests/stop-hook-v2.test.mjs: - includes tool_use and tool_result blocks - summarizes long tool_result content with head and tail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

manual-e2e-compare.mjs runs baseline (text-only) vs rich (this-iter text + tool_use + tool_result) against the live Clone MCP endpoint once each and prints both payloads + predictions side by side. manual-e2e-compare3.mjs extends that to three variants (adds "richer-all-iters" which inserts prior-iteration timelines under each clone-prediction marker and widens tool_result summaries) and aggregates results across COMPARE_RUNS calls per variant. Used to validate that prior-iter timelines lift mean confidence and remove hallucinated next-step suggestions. Neither script is part of pnpm test; both call the live MCP endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Trim the wall-of-text in How It Works to 5 numbered steps. Replace the two redundant Conversation context window paragraphs with a single 3-bullet "What Clone actually sees" section that names the new rich-iteration timeline (text + tool_use + tool_result) and prior-iter context contribution without burying the reader in constants. Open with a one-line value proposition and a "Why people use it" block so the README sells the loop before describing internals. Consolidate the API key, installation, and update sections so each only appears once. Drop the duplicated install snippets between Quick Install, Installation Commands, macOS/Linux, and Windows PowerShell — there's now one canonical install block at the top and an Installing & updating section near the bottom for the longer flows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Turtle-Hwan and others added 6 commits May 12, 2026 15:33

Turtle-Hwan changed the title ~~Send multi-turn conversation context to Clone MCP~~ Multi-turn + rich-iteration agent_input for Clone MCP May 12, 2026

Turtle-Hwan merged commit a5f9013 into main May 12, 2026

Turtle-Hwan mentioned this pull request May 12, 2026

Hotfix: unbreak release workflow + tighten README + restore Ralph vs Clone #3

Merged

2 tasks

Turtle-Hwan deleted the feat/multi-turn-mcp-context branch May 12, 2026 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-turn + rich-iteration agent_input for Clone MCP#2

Multi-turn + rich-iteration agent_input for Clone MCP#2
Turtle-Hwan merged 6 commits into
mainfrom
feat/multi-turn-mcp-context

Turtle-Hwan commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Turtle-Hwan commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why it matters (measured)

Test plan

What changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Turtle-Hwan commented May 12, 2026 •

edited

Loading