Problem
When running agentv eval with a Copilot target, the stream log files are hard to read because each agent_message_chunk ACP event is written as a separate log line. A single assistant response can produce dozens of fragmented lines:
```
[+00:02] [agent_message_chunk] Let me
[+00:02] [agent_message_chunk] analyze
[+00:02] [agent_message_chunk] this code
[+00:02] [agent_message_chunk] and fix the bug
```
Root Cause
CopilotStreamLogger.handleEvent() in copilot-utils.ts emits one log line per ACP event. The summarizeAcpEvent() function in copilot-cli.ts limits chunks to 200 chars each, but does nothing to consolidate them.
Best Practice (from code-insights research)
code-insights uses a flush-on-turn-boundary pattern:
- Accumulate
assistant.message_delta / chunk events into a text buffer
- Flush the buffer as a single complete entry only at turn boundaries (
tool_call, session.idle, session.shutdown, or close())
- Result:
[+00:05] [assistant_message] Let me analyze this code and fix the bug...
Proposed Fix
Modify CopilotStreamLogger in copilot-utils.ts:
- Add a
pendingText buffer for chunk accumulation
- In
handleEvent(): buffer chunk events instead of writing immediately; flush on non-chunk events
- In
close(): flush any remaining buffered text before closing
Purely a log formatting improvement — no behavioral change to eval execution.
Acceptance Criteria
Problem
When running
agentv evalwith a Copilot target, the stream log files are hard to read because eachagent_message_chunkACP event is written as a separate log line. A single assistant response can produce dozens of fragmented lines:```
[+00:02] [agent_message_chunk] Let me
[+00:02] [agent_message_chunk] analyze
[+00:02] [agent_message_chunk] this code
[+00:02] [agent_message_chunk] and fix the bug
```
Root Cause
CopilotStreamLogger.handleEvent()incopilot-utils.tsemits one log line per ACP event. ThesummarizeAcpEvent()function incopilot-cli.tslimits chunks to 200 chars each, but does nothing to consolidate them.Best Practice (from code-insights research)
code-insights uses a flush-on-turn-boundary pattern:
assistant.message_delta/ chunk events into a text buffertool_call,session.idle,session.shutdown, orclose())[+00:05] [assistant_message] Let me analyze this code and fix the bug...Proposed Fix
Modify
CopilotStreamLoggerincopilot-utils.ts:pendingTextbuffer for chunk accumulationhandleEvent(): buffer chunk events instead of writing immediately; flush on non-chunk eventsclose(): flush any remaining buffered text before closingPurely a log formatting improvement — no behavioral change to eval execution.
Acceptance Criteria
[assistant_message]log line, not dozens of[agent_message_chunk]linesjsonformat mode is unaffected (keep per-event for full fidelity)copilot-cliandcopilot-sdkproviders benefit (sharedCopilotStreamLogger)