Fix Snowflake Claude incomplete tool result histories by alex-clawd · Pull Request #6006 · crewAIInc/crewAI

alex-clawd · 2026-06-02T12:00:24Z

Summary

Fix Snowflake-hosted Claude models failing when history contains an assistant tool-use without every matching tool-result block
Preserve complete assistant tool-call + tool-result pairs
Drop dangling incomplete assistant tool-call turns before sending Snowflake Claude requests
Add regression tests for dangling and complete tool-call histories

Why

Snowflake Claude returns this 400 when a prior tool-use is incomplete:

Each 'toolUse' block must be accompanied with a matching 'toolResult' block.

CrewAI can hit this when a tool cycle is interrupted/retried and the next LLM call includes a dangling assistant tool_calls message.

Tests

uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 18 passed
uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
uv run ruff format --check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed

Summary by CodeRabbit

Bug Fixes
- Improved conversation formatting for Claude-family models when using Snowflake integration: dangling assistant tool-use turns without matching tool results are now removed to prevent malformed multi-turn conversations.
- Preserves valid assistant/tool result pairs and drops unrelated tool results, ensuring the final user turn appears correctly.

coderabbitai · 2026-06-02T12:00:47Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 78e29642-83c2-439f-9e1e-43662f5fa2bb

📥 Commits

Reviewing files that changed from the base of the PR and between 60f432e and 13f5cd8.

📒 Files selected for processing (2)

lib/crewai/src/crewai/llms/providers/snowflake/completion.py
lib/crewai/tests/llms/snowflake/test_snowflake.py

🚧 Files skipped from review as they are similar to previous changes (2)

lib/crewai/tests/llms/snowflake/test_snowflake.py
lib/crewai/src/crewai/llms/providers/snowflake/completion.py

📝 Walkthrough

Walkthrough

This PR adds Claude-specific message sanitization to SnowflakeCompletion. When formatting messages for Claude models, the implementation removes incomplete assistant tool-use turns—assistant messages with tool_calls that lack matching tool-result messages—before ensuring the conversation ends with a user message. Non-Claude models remain unchanged.

Changes

Claude Incomplete Tool-Use Removal

Layer / File(s)	Summary
Tool-use sanitization logic and routing `lib/crewai/src/crewai/llms/providers/snowflake/completion.py`	`_format_messages` routes Claude/Anthropic-family models through `_remove_incomplete_claude_tool_uses` before applying the existing "ensure conversation ends with user" step. `_is_claude_model` detects model names by lowercased prefix. The new static method scans assistant messages for `tool_calls`, computes expected `tool_call_id`s, lookahead-collects contiguous `role="tool"` messages and their `tool_call_id`s, retains assistant/tool pairs only if all expected IDs match, and drops dangling assistant tool-use turns otherwise.
Tool-use removal tests `lib/crewai/tests/llms/snowflake/test_snowflake.py`	Replaced the prior CLAUDE formatting test with: a test that dangling assistant tool calls (no matching `tool` message) are removed; a test that preserves a complete assistant→tool pair when `tool_call_id`s match and keeps the trailing user message; and a test that drops unrelated `tool` results when preserving a matching pair.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A Claude so wise must trim its tool-use calls,
When answers lack their matching result's fall,
Incomplete turns are gently swept away,
The user's voice returns to close the play,
Hops of code tidy the chat for the day.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.11% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: fixing Snowflake Claude handling of incomplete tool result histories.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch alex/fix-snowflake-claude-tool-results

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/crewai/src/crewai/llms/providers/snowflake/completion.py`:
- Around line 191-193: The sanitizer currently appends the entire contiguous
block messages[index + 1 : lookahead] once
expected_ids.issubset(tool_result_ids) is true, which can reintroduce malformed
or unrelated tool messages; change this so after confirming expected_ids ⊆
tool_result_ids you only extend sanitized with those messages from
messages[index + 1 : lookahead] that are role "tool" and contain a valid
tool_call_id present in tool_result_ids (i.e., filter the slice by
msg.get("role") == "tool" and msg.get("tool_call_id") in tool_result_ids) so
only valid tool-result pairs are preserved (refer to variables expected_ids,
tool_result_ids, sanitized, messages, index, lookahead).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: f9a90b77-5109-4ee5-835a-c7a306caca19

📥 Commits

Reviewing files that changed from the base of the PR and between 4a0769d and 60f432e.

📒 Files selected for processing (2)

lib/crewai/src/crewai/llms/providers/snowflake/completion.py
lib/crewai/tests/llms/snowflake/test_snowflake.py

alex-clawd · 2026-06-02T12:08:13Z

Addressed the CodeRabbit comment in latest commit.

Change made:

When preserving a complete assistant tool-call block for Snowflake Claude, the sanitizer now only keeps following messages whose matches one of the expected assistant tool call ids.
Added a regression test that includes a valid tool result plus an unrelated extra tool result and verifies the unrelated result is dropped.

Validation:

============================= test session starts ==============================
platform linux -- Python 3.13.12, pytest-9.0.3, pluggy-1.6.0
Using --randomly-seed=1263114619
rootdir: /home/joaomdmoura/Projects/crewAI
configfile: pyproject.toml
plugins: randomly-4.0.1, subprocess-1.5.3, anyio-4.13.0, timeout-2.4.0, asyncio-1.3.0, recording-0.13.4, split-0.11.0, xdist-3.8.0
timeout: 60.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
created: 20/20 workers
20 workers [19 items]

................... [100%]
============================== 19 passed in 6.65s ============================== — 19 passed

All checks passed! — passed
1 file already formatted — passed

@coderabbitai review

alex-clawd · 2026-06-02T12:08:28Z

Clean summary of the CodeRabbit fix:

Preserved tool-result messages are now filtered to only role=tool messages whose tool_call_id matches one of the assistant tool call ids.
Added a regression test with one valid tool result plus one unrelated extra tool result, verifying the unrelated result is dropped.

Validation:

uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 19 passed
uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
uv run ruff format --check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed

@coderabbitai review

coderabbitai · 2026-06-02T12:08:31Z

@alex-clawd Thanks for the update! The change to filter tool result messages by matching tool_call_id against the expected assistant tool call IDs is a solid improvement — it ensures only semantically related tool results are preserved alongside a complete tool-call block, which should prevent unrelated results from slipping through. The regression test covering the mixed valid/unrelated result scenario is a great addition too. Kicking off a fresh review now.

(◕‿◕✿)🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-06-02T12:08:47Z

@alex-clawd Thanks for the detailed summary and validation results! The fix looks well-considered — filtering preserved tool-result messages by matching tool_call_id against the expected assistant tool call IDs ensures only semantically relevant results are kept, and the regression test covering the mixed valid/unrelated result scenario is a great safety net. Triggering a fresh review now.

(◕‿◕✿)🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

alex-clawd · 2026-06-02T12:17:35Z

You were right: the previous fix only validated OpenAI-style tool_calls. The failing runtime path can produce Claude/Bedrock-style content blocks with toolUse/toolResult, which Snowflake Claude validates directly.

Updated fix:

Extract dangling tool-use ids from both OpenAI-style tool_calls and Claude-style content blocks containing toolUse.
Extract matching tool results from both role=tool/tool_call_id messages and content blocks containing toolResult.
Drop incomplete assistant tool-use turns before sending Snowflake Claude requests.
Preserve complete toolUse/toolResult pairs.

Validation:

uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 21 passed
uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
Real Snowflake call with model snowflake/claude-opus-4-6 and a dangling content-block toolUse was sanitized and returned CALL_OK.

Real sanitized messages:
[{'role': 'user', 'content': 'Use the tool.'}, {'role': 'user', 'content': 'Now answer. Say exactly: snowflake content block ok'}]

@coderabbitai review

coderabbitai · 2026-06-02T12:17:59Z

@alex-clawd Great catch on the Claude/Bedrock-style content block format — that's a real gap the previous fix left open. Handling both tool_calls (OpenAI-style) and content blocks containing toolUse/toolResult (Claude/Bedrock-style) ensures the sanitizer covers the full set of message formats Snowflake Claude can produce or receive. The real-world validation output is particularly encouraging: the dangling toolUse content block was correctly stripped before the request went out.

A couple of things worth confirming when the review runs:

Symmetric extraction — the toolUse ID extraction from content blocks and the toolResult ID extraction from content blocks should use the same key paths (e.g., block["toolUse"]["toolUseId"] / block["toolResult"]["toolUseId"]) to avoid mismatches on the ID comparison.
Mixed-format histories — if a single history somehow contains both an OpenAI-style tool_calls assistant message and a Claude-style content block assistant message, the union of IDs should be computed correctly so neither format accidentally masks the other.

Triggering the fresh review now.

(◕‿◕✿)🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

alex-clawd · 2026-06-02T12:21:28Z

Updated after testing against the full failing event shape.

Root cause from the attached log:

The assistant message had tool_calls as stringified Python dicts, not dict objects.
Replaying those as structured historical tool calls through Snowflake Claude is brittle and can produce the toolUse/toolResult 400 or Snowflake internal errors.

Fix in latest commit:

Normalize stringified tool_calls with ast.literal_eval when possible.
For Snowflake Claude history, collapse completed historical tool-call/tool-result sequences into a plain user message containing the prior tool results.
Drop dangling incomplete tool-use turns.
This avoids replaying historical toolUse/toolResult structure to Snowflake Claude while preserving the information content of tool results.

Validation:

uv run pytest lib/crewai/tests/llms/snowflake/test_snowflake.py — 22 passed
uv run ruff check lib/crewai/src/crewai/llms/providers/snowflake/completion.py lib/crewai/tests/llms/snowflake/test_snowflake.py — passed
Real Snowflake validation with model snowflake/claude-opus-4-6 using the same stringified tool_calls shape from the log: CALL_OK

Real formatted messages had no historical tool_calls and no role=tool messages, only a user message summarizing the previous tool results.

@coderabbitai review

coderabbitai · 2026-06-02T12:23:17Z

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the current state of the snowflake completion.py and test file
fd "completion.py" --full-path --search-path lib/crewai/src/crewai/llms/providers/snowflake
fd "test_snowflake.py" --full-path --search-path lib/crewai/tests/llms/snowflake

Length of output: 411

🏁 Script executed:

#!/bin/bash
cat lib/crewai/src/crewai/llms/providers/snowflake/completion.py
echo "---TEST FILE---"
cat lib/crewai/tests/llms/snowflake/test_snowflake.py

Length of output: 20285

@alex-clawd Thanks for the thorough root-cause analysis and real-world validation — the stringified tool_calls finding is a great catch that explains why the earlier structured-replay approach was brittle.

Looking at the code currently on the branch, it reflects an earlier iteration (the version that preserves complete pairs as-is). The latest commit with the collapsing strategy and ast.literal_eval normalization isn't yet visible, so a few things worth confirming once that code lands:

ast.literal_eval safety — ast.literal_eval handles Python dict literals correctly, but stringified dicts originating from JSON serialization may use null/true/false instead of None/True/False, which will raise ValueError. A try/except (ValueError, SyntaxError) fallback that treats the value as a string is essential here.
Collapsed user message content — how tool results are summarized into the plain user message affects model comprehension in subsequent turns. Including the function name alongside the result content (e.g. "Tool 'lookup' returned: <result>") preserves enough context for the model to reason correctly, whereas a bare result string could be ambiguous.
Multiple tool-call collapsing — if an assistant message has multiple tool_calls (or multiple toolUse content blocks), the collapse should concatenate all matched results into a single user message rather than producing one user message per result, to avoid injecting extra user turns.
Symmetric key paths for Claude/Bedrock content blocks — the toolUse ID extraction should use block["toolUse"]["toolUseId"] and the toolResult matching should use block["toolResult"]["toolUseId"] (same key name on both sides). A mismatch there would silently drop all pairs.

Triggering a fresh review now to pick up the latest commit.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Fix Snowflake Claude incomplete tool result histories

60f432e

github-actions Bot added the size/S label Jun 2, 2026

coderabbitai Bot requested changes Jun 2, 2026

View reviewed changes

Comment thread lib/crewai/src/crewai/llms/providers/snowflake/completion.py Outdated

joaomdmoura approved these changes Jun 2, 2026

View reviewed changes

Filter Snowflake Claude preserved tool results

13f5cd8

github-actions Bot added size/M and removed size/S labels Jun 2, 2026

coderabbitai Bot approved these changes Jun 2, 2026

View reviewed changes

joaomdmoura merged commit 774fd87 into main Jun 2, 2026
55 checks passed

joaomdmoura deleted the alex/fix-snowflake-claude-tool-results branch June 2, 2026 12:12

coderabbitai Bot mentioned this pull request Jun 2, 2026

Handle Snowflake Claude stringified tool calls #6008

Merged

Conversation

alex-clawd commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alex-clawd commented Jun 2, 2026

Uh oh!

alex-clawd commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Uh oh!

Uh oh!

alex-clawd commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Uh oh!

alex-clawd commented Jun 2, 2026

Uh oh!

coderabbitai Bot commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alex-clawd commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading