Instead of the compact_summary tool, we can just take all assistant messages in the stream after the compact request and concatenate them into the final summary. Beyond less code, there are two key UX advantages:
- Live feedback: we can show the compaction artifact as its streaming
- Context efficiency: with a tool call the models still generate an assistant message, at best its a brief snippet of the compaction summary, at worst its the whole thing repeated.