fix: pass reasoning_content back in thinking mode to avoid HTTP 400#324
fix: pass reasoning_content back in thinking mode to avoid HTTP 400#324
Conversation
For thinking mode models (e.g., Kimi-thinking-preview, QwQ), the API requires the reasoning_content to be passed back in subsequent requests. Without this, the API returns 'invalid_request_error: invalid_request_error'. Changes: - Add ReasoningContent field to SerializedChatMessage for snapshot persistence - Capture reasoning_content during streaming updates - Restore reasoning_content when deserializing history for API calls - Use reflection to access OpenAI SDK internal properties This ensures multi-turn conversations with thinking mode models work correctly.
📝 WalkthroughWalkthroughThe changes extend the serialized chat message model with a new optional field to store LLM thinking-mode reasoning content. The OpenAI service is updated to capture reasoning from streaming responses and propagate it through serialization and deserialization layers using reflection helpers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs (2)
1371-1463:⚠️ Potential issue | 🔴 CriticalReflection-based round-trip won't fix the HTTP 400 — OpenAI SDK 2.10.0 does not expose these properties.
The official OpenAI .NET SDK (version 2.10.0, as pinned in this repo) does not define public
ReasoningorReasoningContentUpdateproperties onAssistantChatMessageorStreamingChatCompletionUpdate. As a result, this entire reflection-based approach is non-functional:
GetStreamingReasoningContentwill always returnnullsince the properties don't exist —reasoningContentBuildernever accumulates anything.SetAssistantReasoningContentsilently fails when the property check returns false, even if a property of that name were found, the SDK's request serializer won't emitreasoning_contentin outgoing JSON (it's not part of the Chat Completions API request schema).The HTTP 400 in thinking-mode multi-turn calls remains unfixed. For providers requiring
reasoning_contentto be echoed back, use one of:
- A custom
PipelinePolicyonOpenAIClientOptions.Transportto mutate outgoing request JSON and injectreasoning_contentonto assistant messages before send.- Call the endpoint directly with
HttpClientfor these models instead of usingChatClient.Verify end-to-end (not just "build passes") on a thinking-mode model that:
- Streaming reasoning is captured (
reasoningContentnon-empty at Line 1016).- The next request includes
reasoning_contentin the assistant message JSON.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1371 - 1463, The reflection-based getters/setters (GetAssistantReasoningContent, GetStreamingReasoningContent, SetAssistantReasoningContent and the DeserializeProviderHistory usage) won’t produce or serialize reasoning_content with OpenAI .NET SDK v2.10.0; replace this approach by mutating outgoing request JSON (injecting reasoning_content into assistant messages) via a custom PipelinePolicy on OpenAIClientOptions.Transport or by sending requests directly with HttpClient to the thinking-mode endpoint, and remove/stop relying on the reflection helpers — then verify end-to-end that streaming reasoning is captured and the subsequent request JSON includes reasoning_content in assistant messages.
1244-1244:⚠️ Potential issue | 🟠 Major
ResumeFromSnapshotAsyncdoesn't restore tool-call assistant messages from snapshot.
DeserializeProviderHistoryonly reconstructs plain text assistant messages usingnew AssistantChatMessage(msg.Content ?? ""). Snapshots taken mid tool-cycle containAssistantChatMessage(chatToolCalls)entries whose tool-call structure is lost on resume. Tool result messages (ToolChatMessagewith "tool" role) are also not serialized or restored. This re-introduces HTTP 400 errors once a snapshot is resumed against a thinking-mode provider that expects the full tool-calling context.ReasoningContent is properly handled via
SetAssistantReasoningContent(), so it survives snapshot restore.If snapshot resume should support thinking-mode + tool-calling flows,
SerializedChatMessageneeds fields for tool-call structure (tool IDs, names, arguments) and support for "tool" role messages, not justRole/Content/ReasoningContent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` at line 1244, ResumeFromSnapshotAsync currently calls DeserializeProviderHistory which recreates only plain AssistantChatMessage instances and drops tool-call structure and ToolChatMessage ("tool" role) entries from snapshots; update SerializedChatMessage to include tool-call fields (tool id/name/args and any serialized tool response) and extend DeserializeProviderHistory to reconstruct AssistantChatMessage instances with tool-call metadata and ToolChatMessage entries so thinking-mode providers get full context; ensure ResumeFromSnapshotAsync uses the new deserialization and preserve ReasoningContent via SetAssistantReasoningContent as before.
🧹 Nitpick comments (2)
TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs (2)
1375-1419: CachePropertyInfolookups; called per streaming chunk.
GetStreamingReasoningContentis invoked for everyStreamingChatCompletionUpdate(potentially hundreds of times per response) and each call does twoType.GetPropertylookups. Cache the resolvedPropertyInfo(or a typedFunc<,>) instatic readonlyfields keyed byTypeto avoid the repeated reflection cost on the hot streaming path. The same applies to the get/set helpers onAssistantChatMessage.Also applies to: 1454-1463
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1375 - 1419, GetStreamingReasoningContent and GetAssistantReasoningContent are doing Type.GetProperty on every call (hot path) — cache the resolved PropertyInfo or compiled accessors to avoid repeated reflection. Add static readonly ConcurrentDictionary<Type, PropertyInfo?> (or ConcurrentDictionary<Type, Func<object, string?>> for typed getters) for the ReasoningContentUpdate and Reasoning properties and use those caches in GetStreamingReasoningContent and GetAssistantReasoningContent (look up by update.GetType() or assistantMsg.GetType(), retrieve cached PropertyInfo/Func, and invoke it if present). Apply the same caching pattern to the corresponding helper code referenced around the other block (the get/set helpers at the 1454–1463 region) so reflection is resolved once per Type instead of per streaming chunk.
1342-1369: Tool-call assistant messages serialize as emptyContent.For an
AssistantChatMessageconstructed fromchatToolCalls(Line 1032),assistantMsg.Contentis typically empty, so this serializer writesContent = ""and drops theChatToolCalllist entirely. Combined with the resume-path issue above, snapshots of mid tool-cycle states are lossy: on restore the model sees a no-op assistant turn followed by orphantoolmessages — which most providers reject with 400 (toolmessage must follow assistant message containingtool_calls).This isn't introduced by this PR, but the new
ReasoningContentplumbing makes the gap more visible (reasoning is preserved, tool calls aren't). Worth scoping a follow-up to extendSerializedChatMessagewithToolCalls/ToolCallIdso resume actually round-trips.Want me to draft the extended
SerializedChatMessageschema plus matching serialize/deserialize logic and open a follow-up issue?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1342 - 1369, SerializeProviderHistory currently builds SerializedChatMessage.Content from AssistantChatMessage.Content, but AssistantChatMessage instances created from chatToolCalls have empty Content and instead carry tool call data, so tool calls are lost on serialization; update SerializedChatMessage to include tool call metadata (e.g., ToolCalls and/or ToolCallId) and modify SerializeProviderHistory to extract and populate those fields from AssistantChatMessage (and preserve existing ReasoningContent via GetAssistantReasoningContent) so mid-tool-cycle assistant turns round-trip correctly; ensure the new fields are set when msg is AssistantChatMessage (and include any ChatToolCall list from assistantMsg or related properties) and adjust deserialization to restore assistant messages plus their tool_calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs`:
- Around line 1387-1390: The three static reflection helpers
(SetAssistantReasoningContent, GetAssistantReasoningContent,
GetStreamingReasoningContent) currently swallow all exceptions; change each
empty catch to catch Exception ex and emit a Debug/Trace-level log containing
the exception message/stack (e.g., Debug.WriteLine or
Trace.TraceInformation/TraceEvent) and make the logging occur only once by
guarding with a static bool flag per method so you don’t flood logs; keep the
methods static (no DI) and ensure the original behavior still returns null/false
after logging.
- Around line 1015-1016: The code trims reasoningContent which breaks provider
compatibility; change the assignment that uses
reasoningContentBuilder.ToString().Trim() so it uses the raw string
(reasoningContentBuilder.ToString()) without calling Trim(), leaving
responseText (responseText = contentBuilder.ToString().Trim()) unchanged; ensure
the variable reasoningContent (and any downstream use in OpenAIService.cs /
methods that send chat.completions) preserves leading/trailing whitespace
exactly as produced by reasoningContentBuilder.
---
Outside diff comments:
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs`:
- Around line 1371-1463: The reflection-based getters/setters
(GetAssistantReasoningContent, GetStreamingReasoningContent,
SetAssistantReasoningContent and the DeserializeProviderHistory usage) won’t
produce or serialize reasoning_content with OpenAI .NET SDK v2.10.0; replace
this approach by mutating outgoing request JSON (injecting reasoning_content
into assistant messages) via a custom PipelinePolicy on
OpenAIClientOptions.Transport or by sending requests directly with HttpClient to
the thinking-mode endpoint, and remove/stop relying on the reflection helpers —
then verify end-to-end that streaming reasoning is captured and the subsequent
request JSON includes reasoning_content in assistant messages.
- Line 1244: ResumeFromSnapshotAsync currently calls DeserializeProviderHistory
which recreates only plain AssistantChatMessage instances and drops tool-call
structure and ToolChatMessage ("tool" role) entries from snapshots; update
SerializedChatMessage to include tool-call fields (tool id/name/args and any
serialized tool response) and extend DeserializeProviderHistory to reconstruct
AssistantChatMessage instances with tool-call metadata and ToolChatMessage
entries so thinking-mode providers get full context; ensure
ResumeFromSnapshotAsync uses the new deserialization and preserve
ReasoningContent via SetAssistantReasoningContent as before.
---
Nitpick comments:
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs`:
- Around line 1375-1419: GetStreamingReasoningContent and
GetAssistantReasoningContent are doing Type.GetProperty on every call (hot path)
— cache the resolved PropertyInfo or compiled accessors to avoid repeated
reflection. Add static readonly ConcurrentDictionary<Type, PropertyInfo?> (or
ConcurrentDictionary<Type, Func<object, string?>> for typed getters) for the
ReasoningContentUpdate and Reasoning properties and use those caches in
GetStreamingReasoningContent and GetAssistantReasoningContent (look up by
update.GetType() or assistantMsg.GetType(), retrieve cached PropertyInfo/Func,
and invoke it if present). Apply the same caching pattern to the corresponding
helper code referenced around the other block (the get/set helpers at the
1454–1463 region) so reflection is resolved once per Type instead of per
streaming chunk.
- Around line 1342-1369: SerializeProviderHistory currently builds
SerializedChatMessage.Content from AssistantChatMessage.Content, but
AssistantChatMessage instances created from chatToolCalls have empty Content and
instead carry tool call data, so tool calls are lost on serialization; update
SerializedChatMessage to include tool call metadata (e.g., ToolCalls and/or
ToolCallId) and modify SerializeProviderHistory to extract and populate those
fields from AssistantChatMessage (and preserve existing ReasoningContent via
GetAssistantReasoningContent) so mid-tool-cycle assistant turns round-trip
correctly; ensure the new fields are set when msg is AssistantChatMessage (and
include any ChatToolCall list from assistantMsg or related properties) and
adjust deserialization to restore assistant messages plus their tool_calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: faa699ed-b425-41fa-8700-a054d6e82283
📒 Files selected for processing (2)
TelegramSearchBot.Common/Model/AI/LlmContinuationSnapshot.csTelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs
| string responseText = contentBuilder.ToString().Trim(); | ||
| string reasoningContent = reasoningContentBuilder.ToString().Trim(); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does Kimi-thinking-preview or QwQ require the reasoning_content echoed back in subsequent chat.completions requests to match exactly (including leading/trailing whitespace) the streamed value, or is whitespace-trimmed content accepted?
💡 Result:
Kimi-thinking-preview (likely referring to Moonshot AI's Kimi K2 Thinking models like kimi-k2-thinking or kimi-k2.6 with thinking enabled) and QwQ (likely QwQ-32B-Preview, a reasoning model comparable to Kimi K2 Thinking) require the reasoning_content echoed back in subsequent chat.completions requests to match exactly the streamed value, including leading/trailing whitespace. Official Moonshot AI/Kimi docs explicitly state to "keep the reasoning_content from every historical assistant message in messages as-is. The simplest way is to append the assistant message returned from the previous API call directly back into messages." No sources indicate whitespace trimming is accepted; "as-is" implies exact preservation is necessary for proper preserved thinking and chain-of-thought continuity in multi-turn conversations. This applies to both official API and compatible implementations like vLLM, where field handling bugs highlight the need for precise input matching.
Citations:
- 1: https://platform.kimi.ai/docs/guide/use-kimi-k2-thinking-model
- 2: https://platform.moonshot.ai/docs/api/chat
- 3: https://platform.kimi.ai/docs/api/overview
- 4: https://anotherwrapper.com/tools/llm-pricing/kimi-k2-thinking/qwq-32b-preview
- 5: https://artificialanalysis.ai/models/comparisons/kimi-k2-5-vs-QwQ-32B-Preview
Remove Trim() on reasoningContent to preserve provider compatibility.
Kimi-thinking-preview and QwQ require the reasoning_content field to be echoed back in subsequent chat.completions requests with exact preservation, including leading/trailing whitespace. Official Moonshot AI documentation explicitly states to "keep the reasoning_content from every historical assistant message in messages as-is." The current Trim() operation violates this requirement and may cause validation or continuity issues in multi-turn conversations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1015 -
1016, The code trims reasoningContent which breaks provider compatibility;
change the assignment that uses reasoningContentBuilder.ToString().Trim() so it
uses the raw string (reasoningContentBuilder.ToString()) without calling Trim(),
leaving responseText (responseText = contentBuilder.ToString().Trim())
unchanged; ensure the variable reasoningContent (and any downstream use in
OpenAIService.cs / methods that send chat.completions) preserves
leading/trailing whitespace exactly as produced by reasoningContentBuilder.
| } catch { | ||
| // Reflection failed, return null | ||
| } | ||
| return null; |
There was a problem hiding this comment.
Don't silently swallow reflection failures — at least log at Debug.
All three reflection helpers swallow every exception with empty catch { }. Combined with the concern above, that means if the SDK changes property names, or the property exists but isn't writable, or the cast fails, you'll get zero diagnostics — the feature will simply appear not to work in production. Capture the exception and log it at Debug/Trace level (and ideally only once via a static flag) so this is actually observable.
🛡️ Suggested change (illustrative, applied to SetAssistantReasoningContent)
- private static void SetAssistantReasoningContent(AssistantChatMessage msg, string reasoningContent) {
- try {
- var prop = msg.GetType().GetProperty("Reasoning");
- if (prop != null && prop.CanWrite) {
- prop.SetValue(msg, reasoningContent);
- }
- } catch {
- // Reflection failed, ignore
- }
- }
+ private static int _reasoningReflectionWarned;
+ private static void SetAssistantReasoningContent(AssistantChatMessage msg, string reasoningContent) {
+ try {
+ var prop = msg.GetType().GetProperty("Reasoning");
+ if (prop != null && prop.CanWrite) {
+ prop.SetValue(msg, reasoningContent);
+ } else if (System.Threading.Interlocked.Exchange(ref _reasoningReflectionWarned, 1) == 0) {
+ System.Diagnostics.Debug.WriteLine(
+ "AssistantChatMessage has no writable 'Reasoning' property; reasoning_content round-trip is a no-op.");
+ }
+ } catch (Exception ex) {
+ System.Diagnostics.Debug.WriteLine($"SetAssistantReasoningContent failed: {ex}");
+ }
+ }(Same pattern applies to GetAssistantReasoningContent and GetStreamingReasoningContent. An injected ILogger would be even better, but these methods are static.)
Also applies to: 1415-1418, 1460-1462
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs` around lines 1387 -
1390, The three static reflection helpers (SetAssistantReasoningContent,
GetAssistantReasoningContent, GetStreamingReasoningContent) currently swallow
all exceptions; change each empty catch to catch Exception ex and emit a
Debug/Trace-level log containing the exception message/stack (e.g.,
Debug.WriteLine or Trace.TraceInformation/TraceEvent) and make the logging occur
only once by guarding with a static bool flag per method so you don’t flood
logs; keep the methods static (no DI) and ensure the original behavior still
returns null/false after logging.
🔍 PR检查报告📋 检查概览
🧪 测试结果
📊 代码质量
📁 测试产物
🔗 相关链接此报告由GitHub Actions自动生成 |
Summary
Changes
Testing
Summary by CodeRabbit