Multi-turn + rich-iteration agent_input for Clone MCP#2
Merged
Conversation
Stop hook and AskUserQuestion hook now build agent_input from: - the original 1-turn user prompt (always preserved) - chronological Clone-injected user turns (predictions + auto-answers) reconstructed from clone-loop.history.local.jsonl - all assistant text blocks emitted during the current iteration, extracted from the transcript via timestamp filtering Capped at HISTORY_WINDOW_TURNS = 20 to prevent token blow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous formatConversationHistory rendered every current-iteration assistant text twice: once inside the windowed history list and once in the dedicated "assistant (current iter N)" footer. That doubled the token cost and risked confusing the predictor. Window only the Clone-injected user turns; render this-iteration assistant text exactly once in the footer block. Verified against the live Clone MCP endpoint: prediction reasoning now cites prior turns correctly without the duplicate context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed test cases that asserted the contents of static config files (plugin.json, hooks.json, command markdown frontmatter) without exercising real code paths. Removed redundant token-precedence cases that ran in three different files, keeping the canonical pair in stop-hook-v2 and api-key-manager. Removed the live-MCP smoke (tests/remote-mcp-connect.test.mjs) since the e2e suite already exercises the same RPC surface, and removed the e2e invalid-args case since malformed-input handling is the MCP server's concern, not the plugin's. Added scripts/manual-e2e-multiturn.mjs as a manual debugging tool that exercises the real conversation-context builder against the live Clone MCP endpoint. pnpm test: 43 -> 21 cases, all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
conversation-context.mjs gains: - iterationBlocksThisIteration: pulls assistant text + tool_use + tool_result blocks for the current iteration window with summarized multi-line tool output (HEAD/TAIL constants). - loadIterationBoundaries + iterationTimelinesByBoundary: split the transcript into per-iteration timelines using loop-history JSONL ts boundaries. - formatConversationHistory: accept iterationBlocks for the current-iter footer and priorIterTimelines for per-iter assistant blocks rendered under each user (clone-prediction) marker, with a total-char cap that drops oldest iters whole when exceeded. Live A/B/C against Clone MCP showed prior-iter timelines eliminate hallucinated next-step suggestions (e.g. "PUT /todos" when the codebase only has PATCH) and lift mean confidence from 0.74 -> 0.78. Stop-hook rich-iteration cases added to tests/stop-hook-v2.test.mjs: - includes tool_use and tool_result blocks - summarizes long tool_result content with head and tail Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
manual-e2e-compare.mjs runs baseline (text-only) vs rich (this-iter text + tool_use + tool_result) against the live Clone MCP endpoint once each and prints both payloads + predictions side by side. manual-e2e-compare3.mjs extends that to three variants (adds "richer-all-iters" which inserts prior-iteration timelines under each clone-prediction marker and widens tool_result summaries) and aggregates results across COMPARE_RUNS calls per variant. Used to validate that prior-iter timelines lift mean confidence and remove hallucinated next-step suggestions. Neither script is part of pnpm test; both call the live MCP endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the wall-of-text in How It Works to 5 numbered steps. Replace the two redundant Conversation context window paragraphs with a single 3-bullet "What Clone actually sees" section that names the new rich-iteration timeline (text + tool_use + tool_result) and prior-iter context contribution without burying the reader in constants. Open with a one-line value proposition and a "Why people use it" block so the README sells the loop before describing internals. Consolidate the API key, installation, and update sections so each only appears once. Drop the duplicated install snippets between Quick Install, Installation Commands, macOS/Linux, and Windows PowerShell — there's now one canonical install block at the top and an Installing & updating section near the bottom for the longer flows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three things land in this PR, each measured against the live Clone MCP endpoint:
clone-loop.history.local.jsonl. User-turn window capped at 20.text+[tool_use] <Name>: <args>+[tool_result <Name>]: ...blocks (long tool results summarized head + tail) instead of a single concatenated text blob.### user (clone-prediction):marker, Clone now also sees the per-iter assistant timeline that produced it. Combined prior-iter payload is char-capped; oldest iters drop first when over budget.Why it matters (measured)
Live 5x A/B/C against
https://api.clone.is/mcpwith the demo token, identical scenario:Prior-iter context erased the PUT hallucination entirely; every C run predicted PATCH/DELETE coverage correctly. Predictions also got longer and more specific (avg 71 chars vs 56).
Test plan
pnpm test-> 23/23 passing, including new rich-iteration cases.node scripts/manual-e2e-compare3.mjs-> documented above.PRIOR_ITER_TOTAL_CHAR_CAP = 12000is the right budget. Combined with the 20-turn user window and 1200-char per-block cap, worst case is roughly 14-18K chars (~4-5K tokens) per prediction.What changed
scripts/conversation-context.mjs— new helpers (iterationBlocksThisIteration,loadIterationBoundaries,iterationTimelinesByBoundary,formatIterationBlocks) and extendedformatConversationHistory({iterationBlocks, priorIterTimelines, ...}).hooks/stop-hook.mjs,hooks/ask-user-question-hook.mjs— wired to feed transcript-derived timelines.tests/stop-hook-v2.test.mjs— 2 new cases for tool_use/tool_result and head/tail summarization.scripts/manual-e2e-compare.mjs,scripts/manual-e2e-compare3.mjs— live A/B and A/B/C probes.scripts/manual-e2e-multiturn.mjs— updated to use the rich builder.README.md— full rewrite for clarity and adoption (single canonical install block, 5-step How It Works, 3-bullet "What Clone actually sees").🤖 Generated with Claude Code