Skip to content

Fix Snowflake Claude incomplete tool result histories#6006

Merged
joaomdmoura merged 2 commits into
mainfrom
alex/fix-snowflake-claude-tool-results
Jun 2, 2026
Merged

Fix Snowflake Claude incomplete tool result histories#6006
joaomdmoura merged 2 commits into
mainfrom
alex/fix-snowflake-claude-tool-results

Conversation

@alex-clawd
Copy link
Copy Markdown
Contributor

@alex-clawd alex-clawd commented Jun 2, 2026

Summary

  • Fix Snowflake-hosted Claude models failing when history contains an assistant tool-use without every matching tool-result block
  • Preserve complete assistant tool-call + tool-result pairs
  • Drop dangling incomplete assistant tool-call turns before sending Snowflake Claude requests
  • Add regression tests for dangling and complete tool-call histories

Why

Snowflake Claude returns this 400 when a prior tool-use is incomplete:

Each 'toolUse' block must be accompanied with a matching 'toolResult' block.

CrewAI can hit this when a tool cycle is interrupted/retried and the next LLM call includes a dangling assistant tool_calls message.

Tests

  • uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 18 passed
  • uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
  • uv run ruff format --check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed

Summary by CodeRabbit

  • Bug Fixes
    • Improved conversation formatting for Claude-family models when using Snowflake integration: dangling assistant tool-use turns without matching tool results are now removed to prevent malformed multi-turn conversations.
    • Preserves valid assistant/tool result pairs and drops unrelated tool results, ensuring the final user turn appears correctly.

@github-actions github-actions Bot added the size/S label Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 78e29642-83c2-439f-9e1e-43662f5fa2bb

📥 Commits

Reviewing files that changed from the base of the PR and between 60f432e and 13f5cd8.

📒 Files selected for processing (2)
  • lib/crewai/src/crewai/llms/providers/snowflake/completion.py
  • lib/crewai/tests/llms/snowflake/test_snowflake.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/crewai/tests/llms/snowflake/test_snowflake.py
  • lib/crewai/src/crewai/llms/providers/snowflake/completion.py

📝 Walkthrough

Walkthrough

This PR adds Claude-specific message sanitization to SnowflakeCompletion. When formatting messages for Claude models, the implementation removes incomplete assistant tool-use turns—assistant messages with tool_calls that lack matching tool-result messages—before ensuring the conversation ends with a user message. Non-Claude models remain unchanged.

Changes

Claude Incomplete Tool-Use Removal

Layer / File(s) Summary
Tool-use sanitization logic and routing
lib/crewai/src/crewai/llms/providers/snowflake/completion.py
_format_messages routes Claude/Anthropic-family models through _remove_incomplete_claude_tool_uses before applying the existing "ensure conversation ends with user" step. _is_claude_model detects model names by lowercased prefix. The new static method scans assistant messages for tool_calls, computes expected tool_call_ids, lookahead-collects contiguous role="tool" messages and their tool_call_ids, retains assistant/tool pairs only if all expected IDs match, and drops dangling assistant tool-use turns otherwise.
Tool-use removal tests
lib/crewai/tests/llms/snowflake/test_snowflake.py
Replaced the prior CLAUDE formatting test with: a test that dangling assistant tool calls (no matching tool message) are removed; a test that preserves a complete assistant→tool pair when tool_call_ids match and keeps the trailing user message; and a test that drops unrelated tool results when preserving a matching pair.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A Claude so wise must trim its tool-use calls,
When answers lack their matching result's fall,
Incomplete turns are gently swept away,
The user's voice returns to close the play,
Hops of code tidy the chat for the day.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: fixing Snowflake Claude handling of incomplete tool result histories.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch alex/fix-snowflake-claude-tool-results

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/llms/providers/snowflake/completion.py`:
- Around line 191-193: The sanitizer currently appends the entire contiguous
block messages[index + 1 : lookahead] once
expected_ids.issubset(tool_result_ids) is true, which can reintroduce malformed
or unrelated tool messages; change this so after confirming expected_ids ⊆
tool_result_ids you only extend sanitized with those messages from
messages[index + 1 : lookahead] that are role "tool" and contain a valid
tool_call_id present in tool_result_ids (i.e., filter the slice by
msg.get("role") == "tool" and msg.get("tool_call_id") in tool_result_ids) so
only valid tool-result pairs are preserved (refer to variables expected_ids,
tool_result_ids, sanitized, messages, index, lookahead).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f9a90b77-5109-4ee5-835a-c7a306caca19

📥 Commits

Reviewing files that changed from the base of the PR and between 4a0769d and 60f432e.

📒 Files selected for processing (2)
  • lib/crewai/src/crewai/llms/providers/snowflake/completion.py
  • lib/crewai/tests/llms/snowflake/test_snowflake.py

Comment thread lib/crewai/src/crewai/llms/providers/snowflake/completion.py Outdated
@alex-clawd
Copy link
Copy Markdown
Contributor Author

Addressed the CodeRabbit comment in latest commit.

Change made:

  • When preserving a complete assistant tool-call block for Snowflake Claude, the sanitizer now only keeps following messages whose matches one of the expected assistant tool call ids.
  • Added a regression test that includes a valid tool result plus an unrelated extra tool result and verifies the unrelated result is dropped.

Validation:

  • ============================= test session starts ==============================
    platform linux -- Python 3.13.12, pytest-9.0.3, pluggy-1.6.0
    Using --randomly-seed=1263114619
    rootdir: /home/joaomdmoura/Projects/crewAI
    configfile: pyproject.toml
    plugins: randomly-4.0.1, subprocess-1.5.3, anyio-4.13.0, timeout-2.4.0, asyncio-1.3.0, recording-0.13.4, split-0.11.0, xdist-3.8.0
    timeout: 60.0s
    timeout method: signal
    timeout func_only: False
    asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
    created: 20/20 workers
    20 workers [19 items]

................... [100%]
============================== 19 passed in 6.65s ============================== — 19 passed

  • All checks passed! — passed
  • 1 file already formatted — passed

@coderabbitai review

@github-actions github-actions Bot added size/M and removed size/S labels Jun 2, 2026
@alex-clawd
Copy link
Copy Markdown
Contributor Author

Clean summary of the CodeRabbit fix:

  • Preserved tool-result messages are now filtered to only role=tool messages whose tool_call_id matches one of the assistant tool call ids.
  • Added a regression test with one valid tool result plus one unrelated extra tool result, verifying the unrelated result is dropped.

Validation:

  • uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 19 passed
  • uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
  • uv run ruff format --check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

@alex-clawd Thanks for the update! The change to filter tool result messages by matching tool_call_id against the expected assistant tool call IDs is a solid improvement — it ensures only semantically related tool results are preserved alongside a complete tool-call block, which should prevent unrelated results from slipping through. The regression test covering the mixed valid/unrelated result scenario is a great addition too. Kicking off a fresh review now.

(◕‿◕✿)🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

@alex-clawd Thanks for the detailed summary and validation results! The fix looks well-considered — filtering preserved tool-result messages by matching tool_call_id against the expected assistant tool call IDs ensures only semantically relevant results are kept, and the regression test covering the mixed valid/unrelated result scenario is a great safety net. Triggering a fresh review now.

(◕‿◕✿)🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@joaomdmoura joaomdmoura merged commit 774fd87 into main Jun 2, 2026
55 checks passed
@joaomdmoura joaomdmoura deleted the alex/fix-snowflake-claude-tool-results branch June 2, 2026 12:12
@alex-clawd
Copy link
Copy Markdown
Contributor Author

You were right: the previous fix only validated OpenAI-style tool_calls. The failing runtime path can produce Claude/Bedrock-style content blocks with toolUse/toolResult, which Snowflake Claude validates directly.

Updated fix:

  • Extract dangling tool-use ids from both OpenAI-style tool_calls and Claude-style content blocks containing toolUse.
  • Extract matching tool results from both role=tool/tool_call_id messages and content blocks containing toolResult.
  • Drop incomplete assistant tool-use turns before sending Snowflake Claude requests.
  • Preserve complete toolUse/toolResult pairs.

Validation:

  • uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 21 passed
  • uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
  • Real Snowflake call with model snowflake/claude-opus-4-6 and a dangling content-block toolUse was sanitized and returned CALL_OK.

Real sanitized messages:
[{'role': 'user', 'content': 'Use the tool.'}, {'role': 'user', 'content': 'Now answer. Say exactly: snowflake content block ok'}]

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

@alex-clawd Great catch on the Claude/Bedrock-style content block format — that's a real gap the previous fix left open. Handling both tool_calls (OpenAI-style) and content blocks containing toolUse/toolResult (Claude/Bedrock-style) ensures the sanitizer covers the full set of message formats Snowflake Claude can produce or receive. The real-world validation output is particularly encouraging: the dangling toolUse content block was correctly stripped before the request went out.

A couple of things worth confirming when the review runs:

  • Symmetric extraction — the toolUse ID extraction from content blocks and the toolResult ID extraction from content blocks should use the same key paths (e.g., block["toolUse"]["toolUseId"] / block["toolResult"]["toolUseId"]) to avoid mismatches on the ID comparison.
  • Mixed-format histories — if a single history somehow contains both an OpenAI-style tool_calls assistant message and a Claude-style content block assistant message, the union of IDs should be computed correctly so neither format accidentally masks the other.

Triggering the fresh review now.

(◕‿◕✿)🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@alex-clawd
Copy link
Copy Markdown
Contributor Author

Updated after testing against the full failing event shape.

Root cause from the attached log:

  • The assistant message had tool_calls as stringified Python dicts, not dict objects.
  • Replaying those as structured historical tool calls through Snowflake Claude is brittle and can produce the toolUse/toolResult 400 or Snowflake internal errors.

Fix in latest commit:

  • Normalize stringified tool_calls with ast.literal_eval when possible.
  • For Snowflake Claude history, collapse completed historical tool-call/tool-result sequences into a plain user message containing the prior tool results.
  • Drop dangling incomplete tool-use turns.
  • This avoids replaying historical toolUse/toolResult structure to Snowflake Claude while preserving the information content of tool results.

Validation:

  • uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 22 passed
  • uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
  • Real Snowflake validation with model snowflake/claude-opus-4-6 using the same stringified tool_calls shape from the log: CALL_OK

Real formatted messages had no historical tool_calls and no role=tool messages, only a user message summarizing the previous tool results.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the current state of the snowflake completion.py and test file
fd "completion.py" --full-path --search-path lib/crewai/src/crewai/llms/providers/snowflake
fd "test_snowflake.py" --full-path --search-path lib/crewai/tests/llms/snowflake

Length of output: 411


🏁 Script executed:

#!/bin/bash
cat lib/crewai/src/crewai/llms/providers/snowflake/completion.py
echo "---TEST FILE---"
cat lib/crewai/tests/llms/snowflake/test_snowflake.py

Length of output: 20285



@alex-clawd Thanks for the thorough root-cause analysis and real-world validation — the stringified tool_calls finding is a great catch that explains why the earlier structured-replay approach was brittle.

Looking at the code currently on the branch, it reflects an earlier iteration (the version that preserves complete pairs as-is). The latest commit with the collapsing strategy and ast.literal_eval normalization isn't yet visible, so a few things worth confirming once that code lands:

  1. ast.literal_eval safetyast.literal_eval handles Python dict literals correctly, but stringified dicts originating from JSON serialization may use null/true/false instead of None/True/False, which will raise ValueError. A try/except (ValueError, SyntaxError) fallback that treats the value as a string is essential here.

  2. Collapsed user message content — how tool results are summarized into the plain user message affects model comprehension in subsequent turns. Including the function name alongside the result content (e.g. "Tool 'lookup' returned: <result>") preserves enough context for the model to reason correctly, whereas a bare result string could be ambiguous.

  3. Multiple tool-call collapsing — if an assistant message has multiple tool_calls (or multiple toolUse content blocks), the collapse should concatenate all matched results into a single user message rather than producing one user message per result, to avoid injecting extra user turns.

  4. Symmetric key paths for Claude/Bedrock content blocks — the toolUse ID extraction should use block["toolUse"]["toolUseId"] and the toolResult matching should use block["toolResult"]["toolUseId"] (same key name on both sides). A mismatch there would silently drop all pairs.

Triggering a fresh review now to pick up the latest commit.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants