Skip to content

Multi-turn + rich-iteration agent_input for Clone MCP#2

Merged
Turtle-Hwan merged 6 commits into
mainfrom
feat/multi-turn-mcp-context
May 12, 2026
Merged

Multi-turn + rich-iteration agent_input for Clone MCP#2
Turtle-Hwan merged 6 commits into
mainfrom
feat/multi-turn-mcp-context

Conversation

@Turtle-Hwan
Copy link
Copy Markdown
Contributor

@Turtle-Hwan Turtle-Hwan commented May 12, 2026

Summary

Three things land in this PR, each measured against the live Clone MCP endpoint:

  1. Multi-turn agent_input — the original prompt plus every prior Clone-injected user turn (predictions + auto-answered AskUserQuestion Q/A) reconstructed from clone-loop.history.local.jsonl. User-turn window capped at 20.
  2. Rich current-iteration footer — what Claude did this iteration is now sent as a chronological timeline of text + [tool_use] <Name>: <args> + [tool_result <Name>]: ... blocks (long tool results summarized head + tail) instead of a single concatenated text blob.
  3. Prior-iteration timelines — under each prior ### user (clone-prediction): marker, Clone now also sees the per-iter assistant timeline that produced it. Combined prior-iter payload is char-capped; oldest iters drop first when over budget.

Why it matters (measured)

Live 5x A/B/C against https://api.clone.is/mcp with the demo token, identical scenario:

input chars confidence mean confidence median auto (>=0.8) hit hallucinated next-step
A (text-only baseline) 830 0.7385 0.7275 0/5 60% (suggested PUT when only PATCH existed)
B (rich, this iter only) 1623 0.7647 0.7825 1/5 60% (same PUT hallucination — this-iter Read alone wasn't enough)
C (rich + prior iters) 3215 0.7757 0.7825 1/5 0%

Prior-iter context erased the PUT hallucination entirely; every C run predicted PATCH/DELETE coverage correctly. Predictions also got longer and more specific (avg 71 chars vs 56).

Test plan

  • pnpm test -> 23/23 passing, including new rich-iteration cases.
  • Live A/B/C smoke via node scripts/manual-e2e-compare3.mjs -> documented above.
  • Reviewer: confirm PRIOR_ITER_TOTAL_CHAR_CAP = 12000 is the right budget. Combined with the 20-turn user window and 1200-char per-block cap, worst case is roughly 14-18K chars (~4-5K tokens) per prediction.
  • Reviewer: verify the README rewrite still answers the basics for new users.

What changed

  • scripts/conversation-context.mjs — new helpers (iterationBlocksThisIteration, loadIterationBoundaries, iterationTimelinesByBoundary, formatIterationBlocks) and extended formatConversationHistory({iterationBlocks, priorIterTimelines, ...}).
  • hooks/stop-hook.mjs, hooks/ask-user-question-hook.mjs — wired to feed transcript-derived timelines.
  • tests/stop-hook-v2.test.mjs — 2 new cases for tool_use/tool_result and head/tail summarization.
  • scripts/manual-e2e-compare.mjs, scripts/manual-e2e-compare3.mjs — live A/B and A/B/C probes.
  • scripts/manual-e2e-multiturn.mjs — updated to use the rich builder.
  • README.md — full rewrite for clarity and adoption (single canonical install block, 5-step How It Works, 3-bullet "What Clone actually sees").

🤖 Generated with Claude Code

Turtle-Hwan and others added 6 commits May 12, 2026 15:33
Stop hook and AskUserQuestion hook now build agent_input from:
- the original 1-turn user prompt (always preserved)
- chronological Clone-injected user turns (predictions + auto-answers)
  reconstructed from clone-loop.history.local.jsonl
- all assistant text blocks emitted during the current iteration,
  extracted from the transcript via timestamp filtering
Capped at HISTORY_WINDOW_TURNS = 20 to prevent token blow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous formatConversationHistory rendered every current-iteration
assistant text twice: once inside the windowed history list and once in
the dedicated "assistant (current iter N)" footer. That doubled the
token cost and risked confusing the predictor.

Window only the Clone-injected user turns; render this-iteration
assistant text exactly once in the footer block. Verified against the
live Clone MCP endpoint: prediction reasoning now cites prior turns
correctly without the duplicate context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed test cases that asserted the contents of static config files
(plugin.json, hooks.json, command markdown frontmatter) without
exercising real code paths. Removed redundant token-precedence cases
that ran in three different files, keeping the canonical pair in
stop-hook-v2 and api-key-manager. Removed the live-MCP smoke
(tests/remote-mcp-connect.test.mjs) since the e2e suite already
exercises the same RPC surface, and removed the e2e invalid-args case
since malformed-input handling is the MCP server's concern, not the
plugin's.

Added scripts/manual-e2e-multiturn.mjs as a manual debugging tool that
exercises the real conversation-context builder against the live Clone
MCP endpoint.

pnpm test: 43 -> 21 cases, all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
conversation-context.mjs gains:
- iterationBlocksThisIteration: pulls assistant text + tool_use +
  tool_result blocks for the current iteration window with summarized
  multi-line tool output (HEAD/TAIL constants).
- loadIterationBoundaries + iterationTimelinesByBoundary: split the
  transcript into per-iteration timelines using loop-history JSONL
  ts boundaries.
- formatConversationHistory: accept iterationBlocks for the current-iter
  footer and priorIterTimelines for per-iter assistant blocks rendered
  under each user (clone-prediction) marker, with a total-char cap that
  drops oldest iters whole when exceeded.

Live A/B/C against Clone MCP showed prior-iter timelines eliminate
hallucinated next-step suggestions (e.g. "PUT /todos" when the codebase
only has PATCH) and lift mean confidence from 0.74 -> 0.78.

Stop-hook rich-iteration cases added to tests/stop-hook-v2.test.mjs:
- includes tool_use and tool_result blocks
- summarizes long tool_result content with head and tail

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
manual-e2e-compare.mjs runs baseline (text-only) vs rich (this-iter
text + tool_use + tool_result) against the live Clone MCP endpoint
once each and prints both payloads + predictions side by side.

manual-e2e-compare3.mjs extends that to three variants (adds
"richer-all-iters" which inserts prior-iteration timelines under each
clone-prediction marker and widens tool_result summaries) and aggregates
results across COMPARE_RUNS calls per variant. Used to validate that
prior-iter timelines lift mean confidence and remove hallucinated
next-step suggestions.

Neither script is part of pnpm test; both call the live MCP endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the wall-of-text in How It Works to 5 numbered steps. Replace the
two redundant Conversation context window paragraphs with a single
3-bullet "What Clone actually sees" section that names the new
rich-iteration timeline (text + tool_use + tool_result) and prior-iter
context contribution without burying the reader in constants.

Open with a one-line value proposition and a "Why people use it" block
so the README sells the loop before describing internals. Consolidate
the API key, installation, and update sections so each only appears
once. Drop the duplicated install snippets between Quick Install,
Installation Commands, macOS/Linux, and Windows PowerShell — there's
now one canonical install block at the top and an Installing & updating
section near the bottom for the longer flows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Turtle-Hwan Turtle-Hwan changed the title Send multi-turn conversation context to Clone MCP Multi-turn + rich-iteration agent_input for Clone MCP May 12, 2026
@Turtle-Hwan Turtle-Hwan merged commit a5f9013 into main May 12, 2026
@Turtle-Hwan Turtle-Hwan deleted the feat/multi-turn-mcp-context branch May 12, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant