fix(provider/google): robust Gemini 3 support with ReAct parsing, rea…#144
Merged
duguwanglong merged 2 commits intoAgentFlocks:mainfrom Apr 24, 2026
Merged
Conversation
…soning persistence, and token tracking
xiami762
reviewed
Apr 21, 2026
Contributor
xiami762
left a comment
There was a problem hiding this comment.
Thanks for your contribution. We reviewed the code and found the following issues. Please check and fix them.
Findings
- High:
chat_stream()now emitsreasoningin the sameStreamChunkasdeltaand/ortool_calls, but the current consumer treats any chunk withreasoningas reasoning-only and immediatelycontinues. That means text output or tool calls in the same chunk can be dropped. In practice, Gemini responses may show only thinking text, lose answer text, or miss tool calls entirely.
event_type = getattr(chunk, 'event_type', None)
if event_type == 'reasoning' or (hasattr(chunk, 'reasoning') and chunk.reasoning):
reasoning_text = chunk.reasoning if hasattr(chunk, 'reasoning') else chunk.delta
if reasoning_text:
# ...
await processor.process_event(ReasoningDeltaEvent(
id=self._current_reasoning_id,
text=reasoning_text,
))
continue- High: this PR no longer forwards
thinkingConfigor caller-providedmax_tokensto Gemini. Instead, it hard-codesmax_output_tokens = 8192. The main flow already builds Gemini-specificthinkingConfigfor Gemini 2.5 / Gemini 3, so this change effectively disables that behavior and ignores model/config token limits.
elif provider_id == "google":
if "gemini" in model_lower:
if "2.5" in model_lower:
options["thinkingConfig"] = {
"includeThoughts": True,
"thinkingBudget": thinking_budget,
}
elif "gemini-3" in model_lower:
options["thinkingConfig"] = {
"includeThoughts": True,
"thinkingLevel": "high",
}- Medium: the “database-backed parts for perfect reasoning retrieval” branch in
_convert_messages()is effectively unreachable in the normal path. It depends onmessages[0].sessionID, butChatMessagein this PR only addsreasoningandcustom_settings, notsessionID, and the runner still constructsChatMessageobjects with onlyrole/content/tool_calls. So the advertised DB-backed reconstruction path does not actually activate in normal execution.
class ChatMessage(BaseModel):
"""Chat message"""
role: str = Field(..., description="Message role (user, assistant, system, tool)")
content: Union[str, List[Dict[str, Any]]] = Field("", description="Message content")
# Reasoning/Thinking content (optional)
reasoning: Optional[str] = Field(None, description="Reasoning or thinking content")
# OpenAI function-calling fields (optional)
tool_calls: Optional[List[Dict[str, Any]]] = Field(None, description="Tool calls for assistant messages")
tool_call_id: Optional[str] = Field(None, description="Tool call ID for tool-result messages")
name: Optional[str] = Field(None, description="Tool name for tool-result messages")
custom_settings: Dict[str, Any] = Field(default_factory=dict)# Add assistant message
if assistant_content_parts or structured_tool_calls:
chat_messages.append(ChatMessage(
role="assistant",
content="\n\n".join(assistant_content_parts) if assistant_content_parts else "",
tool_calls=structured_tool_calls if structured_tool_calls else None,
))
# Append tool-result messages immediately after the assistant message
chat_messages.extend(pending_tool_results)…xed chunks, forward thinking config, activate session-id DB path Three issues raised in the PR AgentFlocks#144 review are addressed end-to-end: 1. (High) chat_stream() previously emitted reasoning together with delta / tool_calls in the same StreamChunk, while SessionRunner's consumer loop treated any reasoning-bearing chunk as reasoning-only and `continue`d, silently dropping text and tool calls. The Google provider now emits reasoning, text+tool_calls, and usage as separate chunks; the runner loop is rewritten to process all three pieces of a chunk sequentially without early `continue`, so future mixed chunks from any provider also survive. 2. (High) chat() and chat_stream() no longer hard-code max_output_tokens= 8192. A new _build_generate_config() helper honours caller-provided max_tokens and thinkingConfig (built per-model by build_provider_options for Gemini 2.5 thinkingBudget / Gemini 3 thinkingLevel) and only falls back to 8192 when no value is provided. 3. (Medium) The DB-backed reasoning replay branch in _convert_messages was unreachable because ChatMessage carries no sessionID. SessionRunner now forwards session_id=self.session.id via kwargs (verified safe across all 27 providers, which all accept **kwargs and selectively extract keys). _convert_messages accepts an explicit session_id argument; legacy per-message attributes are still honoured for forward-compat. Self-review hardening: 4. _convert_messages now uses a two-phase commit so a mid-loop exception, a DB snapshot containing only system rows, or an empty cache cannot leave system_msg / raw_gemini_messages partially populated and then let the fallback path append the in-memory messages on top. Without this guard the bug-fix in (3) would itself produce a duplicated system prompt and a duplicated conversation history. Tests added (18 cases, all green): - tests/provider/test_google_gemini_fixes.py – chunk separation, thinkingConfig / max_tokens forwarding, session_id propagation, and DB-cache state-pollution edge cases. - tests/session/test_runner_chunk_handling.py – consumer loop preserves text and tool_calls in reasoning-bearing chunks, end-to-end transitions between reasoning/text/tool blocks, and usage-only chunks. Made-with: Cursor
duguwanglong
approved these changes
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…soning persistence, and token tracking