refactor(anthropic): fix N+1 thinking message storage issue#7958
refactor(anthropic): fix N+1 thinking message storage issue#7958DOsinga merged 1 commit intoblock:mainfrom
Conversation
ae84c8a to
c95de17
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c95de17474
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The thinking delta handling was proposed in this PR initially, but block#7626 merged first. This adds certain optimizations proposed earlier in this PR. As thinking_delta is yielded and stored individually, then the complete block re-emitted at content_block_stop — N+1 entries per thinking block end up in both database and session memory. Streaming partial thinking deltas serves no purpose: without a signature they cannot be sent back to the API (format_messages drops them), and the UI renders thinking as a complete block rather than incrementally. Emit once at content_block_stop. Replace JSON delta comparisons with typed ContentDelta enum. Add streaming tests. Change-Id: Id42262e30f1eea8d24f1ddb87381395714a91821 Signed-off-by: Rabi Mishra <mishra.rabi@gmail.com>
c95de17 to
fc16a5e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc16a5e473
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
DOsinga
left a comment
There was a problem hiding this comment.
The fix is correct — buffering thinking deltas and emitting a single complete message at content_block_stop is the right approach. The N+1 storage issue is real and this resolves it cleanly. The ContentDelta enum is a nice improvement over stringly-typed JSON checks. Tests follow the existing run_streaming_test pattern and test real behaviour. Good work!
Signed-off-by: Rabi Mishra <mishra.rabi@gmail.com> Signed-off-by: esnyder <elijah.snyder1@gmail.com>
Signed-off-by: Rabi Mishra <mishra.rabi@gmail.com> Signed-off-by: esnyder <elijah.snyder1@gmail.com>
Summary
Thinking delta handling was initially proposed in this PR, but #7626 merged afterwards seems to have added it now. This adds certain optimizations proposed earlier in this PR.
As thinking_delta is yielded and stored individually, then the complete block re-emitted at content_block_stop, N+1 entries per thinking block end up in both database and session memory though they're filtered in format_messages when replaying.
Partial thinking deltas serves no purpose without a signature and the UI renders thinking as a complete block rather than incrementally. Emit once at content_block_stop.
Replace JSON delta comparisons with typed ContentDelta enum. Add streaming tests.
Testing
Locally tested with anthropic provider.