Skip to content

fix(provider/google): robust Gemini 3 support with ReAct parsing, rea…#144

Merged
duguwanglong merged 2 commits intoAgentFlocks:mainfrom
dll-create:fix/gemini-3-support
Apr 24, 2026
Merged

fix(provider/google): robust Gemini 3 support with ReAct parsing, rea…#144
duguwanglong merged 2 commits intoAgentFlocks:mainfrom
dll-create:fix/gemini-3-support

Conversation

@dll-create
Copy link
Copy Markdown
Contributor

…soning persistence, and token tracking

@xiami762 xiami762 self-requested a review April 21, 2026 07:56
Copy link
Copy Markdown
Contributor

@xiami762 xiami762 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. We reviewed the code and found the following issues. Please check and fix them.

Findings

  • High: chat_stream() now emits reasoning in the same StreamChunk as delta and/or tool_calls, but the current consumer treats any chunk with reasoning as reasoning-only and immediately continues. That means text output or tool calls in the same chunk can be dropped. In practice, Gemini responses may show only thinking text, lose answer text, or miss tool calls entirely.
event_type = getattr(chunk, 'event_type', None)

if event_type == 'reasoning' or (hasattr(chunk, 'reasoning') and chunk.reasoning):
    reasoning_text = chunk.reasoning if hasattr(chunk, 'reasoning') else chunk.delta
    if reasoning_text:
        # ...
        await processor.process_event(ReasoningDeltaEvent(
            id=self._current_reasoning_id,
            text=reasoning_text,
        ))
    continue
  • High: this PR no longer forwards thinkingConfig or caller-provided max_tokens to Gemini. Instead, it hard-codes max_output_tokens = 8192. The main flow already builds Gemini-specific thinkingConfig for Gemini 2.5 / Gemini 3, so this change effectively disables that behavior and ignores model/config token limits.
elif provider_id == "google":
    if "gemini" in model_lower:
        if "2.5" in model_lower:
            options["thinkingConfig"] = {
                "includeThoughts": True,
                "thinkingBudget": thinking_budget,
            }
        elif "gemini-3" in model_lower:
            options["thinkingConfig"] = {
                "includeThoughts": True,
                "thinkingLevel": "high",
            }
  • Medium: the “database-backed parts for perfect reasoning retrieval” branch in _convert_messages() is effectively unreachable in the normal path. It depends on messages[0].sessionID, but ChatMessage in this PR only adds reasoning and custom_settings, not sessionID, and the runner still constructs ChatMessage objects with only role / content / tool_calls. So the advertised DB-backed reconstruction path does not actually activate in normal execution.
class ChatMessage(BaseModel):
    """Chat message"""
    role: str = Field(..., description="Message role (user, assistant, system, tool)")
    content: Union[str, List[Dict[str, Any]]] = Field("", description="Message content")
    # Reasoning/Thinking content (optional)
    reasoning: Optional[str] = Field(None, description="Reasoning or thinking content")
    # OpenAI function-calling fields (optional)
    tool_calls: Optional[List[Dict[str, Any]]] = Field(None, description="Tool calls for assistant messages")
    tool_call_id: Optional[str] = Field(None, description="Tool call ID for tool-result messages")
    name: Optional[str] = Field(None, description="Tool name for tool-result messages")
    custom_settings: Dict[str, Any] = Field(default_factory=dict)
# Add assistant message
if assistant_content_parts or structured_tool_calls:
    chat_messages.append(ChatMessage(
        role="assistant",
        content="\n\n".join(assistant_content_parts) if assistant_content_parts else "",
        tool_calls=structured_tool_calls if structured_tool_calls else None,
    ))
    # Append tool-result messages immediately after the assistant message
    chat_messages.extend(pending_tool_results)

…xed chunks, forward thinking config, activate session-id DB path

Three issues raised in the PR AgentFlocks#144 review are addressed end-to-end:

1. (High) chat_stream() previously emitted reasoning together with delta /
   tool_calls in the same StreamChunk, while SessionRunner's consumer loop
   treated any reasoning-bearing chunk as reasoning-only and `continue`d,
   silently dropping text and tool calls.  The Google provider now emits
   reasoning, text+tool_calls, and usage as separate chunks; the runner
   loop is rewritten to process all three pieces of a chunk sequentially
   without early `continue`, so future mixed chunks from any provider also
   survive.

2. (High) chat() and chat_stream() no longer hard-code max_output_tokens=
   8192.  A new _build_generate_config() helper honours caller-provided
   max_tokens and thinkingConfig (built per-model by build_provider_options
   for Gemini 2.5 thinkingBudget / Gemini 3 thinkingLevel) and only falls
   back to 8192 when no value is provided.

3. (Medium) The DB-backed reasoning replay branch in _convert_messages was
   unreachable because ChatMessage carries no sessionID.  SessionRunner now
   forwards session_id=self.session.id via kwargs (verified safe across all
   27 providers, which all accept **kwargs and selectively extract keys).
   _convert_messages accepts an explicit session_id argument; legacy
   per-message attributes are still honoured for forward-compat.

Self-review hardening:

4. _convert_messages now uses a two-phase commit so a mid-loop exception,
   a DB snapshot containing only system rows, or an empty cache cannot
   leave system_msg / raw_gemini_messages partially populated and then
   let the fallback path append the in-memory messages on top.  Without
   this guard the bug-fix in (3) would itself produce a duplicated system
   prompt and a duplicated conversation history.

Tests added (18 cases, all green):

- tests/provider/test_google_gemini_fixes.py – chunk separation,
  thinkingConfig / max_tokens forwarding, session_id propagation, and
  DB-cache state-pollution edge cases.
- tests/session/test_runner_chunk_handling.py – consumer loop preserves
  text and tool_calls in reasoning-bearing chunks, end-to-end transitions
  between reasoning/text/tool blocks, and usage-only chunks.

Made-with: Cursor
@duguwanglong duguwanglong merged commit 473aced into AgentFlocks:main Apr 24, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants