Fix #5878: Preserve reasoning_content from DeepSeek thinking mode in conversation history#5880
Conversation
Extract reasoning_content from litellm response and store it on the LLM instance so that executors can propagate it into conversation history as required by the DeepSeek API. Changes: - LLM._handle_non_streaming_response: extract reasoning_content from the response message and store it as self.reasoning_content - LLM.call: reset reasoning_content at the start of each call - format_message_for_llm: accept optional reasoning_content param; include it in assistant messages only - LLMMessage TypedDict: add reasoning_content field - CrewAgentExecutor: pass reasoning_content through _append_message for both sync and async loops (ReAct + native tools) - AgentExecutor (experimental): same propagation in _append_message_to_state for native tools path Tests: 13 new tests covering LLM extraction, format_message_for_llm, and executor integration. Co-Authored-By: João <joao@crewai.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds end-to-end support for capturing and propagating LLM ChangesReasoning Content Propagation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Co-Authored-By: João <joao@crewai.com>
|
Could you provide an example of a working Python code? |
| from typing import Any | ||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
lib/crewai/src/crewai/agents/crew_agent_executor.py (1)
531-557:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve
reasoning_contentfor native tool-call turns too.
reasoning_contentis only propagated for terminal assistant responses in these branches. When the model returns tool calls, the assistant tool-call message path still omitsreasoning_content, so that turn’s reasoning is lost before the next request.Suggested fix
@@ - def _append_assistant_tool_calls_message( - self, - parsed_calls: list[tuple[str, str, str | dict[str, Any]]], - ) -> None: + def _append_assistant_tool_calls_message( + self, + parsed_calls: list[tuple[str, str, str | dict[str, Any]]], + reasoning_content: str | None = None, + ) -> None: @@ assistant_message: LLMMessage = { "role": "assistant", "content": None, @@ } + if reasoning_content is not None: + assistant_message["reasoning_content"] = reasoning_content self.messages.append(assistant_message) @@ - self._append_assistant_tool_calls_message( + self._append_assistant_tool_calls_message( [ (call_id, func_name, func_args) for call_id, func_name, func_args, _ in execution_plan - ] + ], + reasoning_content=self._get_llm_reasoning_content(), ) @@ - self._append_assistant_tool_calls_message([(call_id, func_name, func_args)]) + self._append_assistant_tool_calls_message( + [(call_id, func_name, func_args)], + reasoning_content=self._get_llm_reasoning_content(), + )Also applies to: 1348-1375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 531 - 557, The code currently only attaches reasoning_content for terminal assistant responses; update the branches that create assistant/tool-call turns so they also capture reasoning = self._get_llm_reasoning_content() and pass reasoning_content=reasoning into _append_message (and keep calling _invoke_step_callback(formatted_answer) and _show_logs as before) for all paths that construct formatted_answer (including the BaseModel branch and native tool-call branches), ensuring the same pattern is applied where assistant/tool messages are emitted (see functions/methods: _get_llm_reasoning_content, _invoke_step_callback, _append_message, _show_logs and classes/values: AgentFinish, BaseModel; also apply the same change to the other occurrence around the 1348-1375 region).lib/crewai/tests/llms/litellm/test_reasoning_content.py (1)
1-268:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFormatting check is failing in CI.
uv run ruff format --check lib/is currently failing for this PR. Please run formatter and commit the resulting changes so lint passes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/tests/llms/litellm/test_reasoning_content.py` around lines 1 - 268, Formatting check failing in CI because the file lib/crewai/tests/llms/litellm/test_reasoning_content.py (and possibly other files under lib/) is not formatted; run the project's formatter and commit the results: execute the formatter command used by CI (e.g., `uv run ruff format --check lib/` to verify, then `uv run ruff format lib/` or the repo's preferred formatting command), stage and commit the updated files (including test_reasoning_content.py) so the lint/format check passes; no code logic changes required—only apply and commit the formatter's edits.
🧹 Nitpick comments (1)
lib/crewai/tests/llms/litellm/test_reasoning_content.py (1)
236-243: ⚡ Quick winMake invoke-loop assertions order-independent.
Both tests currently assert on
assistant_msgs[0], which can become flaky if assistant message ordering changes while behavior remains correct. Assert withany(...)/all(...)over assistant messages instead.Proposed diff
@@ - assert len(assistant_msgs) >= 1 - assert ( - assistant_msgs[0].get("reasoning_content") - == "Let me reason step by step..." - ) + assert assistant_msgs + assert any( + m.get("reasoning_content") == "Let me reason step by step..." + for m in assistant_msgs + ) @@ - assert len(assistant_msgs) >= 1 - assert "reasoning_content" not in assistant_msgs[0] + assert assistant_msgs + assert all("reasoning_content" not in m for m in assistant_msgs)Also applies to: 263-267
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/tests/llms/litellm/test_reasoning_content.py` around lines 236 - 243, The assertions that check assistant_msgs[0].get("reasoning_content") are order-dependent and flaky; update the tests to assert order-independently by replacing checks on assistant_msgs[0] with a membership-style assertion using any(...) (e.g., assert any(m.get("reasoning_content") == "Let me reason step by step..." for m in assistant_msgs)) and do the same for the other occurrence (the block around lines 263-267) so both tests verify that at least one assistant message contains the expected reasoning_content instead of assuming it is the first message.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 1492-1509: The file fails the project's formatter; run the
configured formatter (ruff format / black as configured) or apply the project's
formatting rules to this function (_append_message) and nearby calls to
format_message_for_llm so the file matches ruff/black expectations; specifically
reformat the signature, docstring and the self.messages.append(...) call to
match the repo's style, then re-run ruff format --check to verify the CI issue
is resolved.
In `@lib/crewai/src/crewai/experimental/agent_executor.py`:
- Around line 1333-1344: Final-answer branch only sets reasoning_content;
tool-call branches skip it causing loss of reasoning. Update the branches that
handle LLM tool calls so that when creating and appending the assistant
tool-call message and when setting state for tool-invocation you also call
_get_llm_reasoning_content() and pass that reasoning_content into
_append_message_to_state (and any AgentAction/assistant tool-call constructors)
just like the AgentFinish path does; look for usages in methods around
_get_llm_reasoning_content, _append_message_to_state, AgentFinish and
AgentAction/assistant tool-call creation and ensure every path that adds an
assistant/tool message forwards the reasoning_content parameter.
- Around line 2824-2841: The file fails ruff formatting; run the project
formatter on the affected function _append_message_to_state (and surrounding
block) so it matches the repository's style (e.g., run ruff format or the
configured formatter/formatter config) and ensure the call to
format_message_for_llm stays properly indented and wrapped to satisfy ruff/black
rules; after formatting, re-run ruff format --check to confirm the file
(including _append_message_to_state and its use of format_message_for_llm) no
longer reports changes.
In `@lib/crewai/src/crewai/llm.py`:
- Around line 1236-1243: The async code path doesn't reset or extract
reasoning_content like the sync path does, causing stale values; update acall()
and _ahandle_non_streaming_response() to mirror the sync logic used in call() by
resetting self.reasoning_content before the call and extracting
reasoning_content from response_message (using getattr(response_message,
"reasoning_content", None) or response_message.get("reasoning_content") if it
has get) after receiving the non-streaming response so async flows receive the
same reasoning_content handling as sync flows.
In `@lib/crewai/src/crewai/utilities/types.py`:
- Line 30: The file contains a ruff formatting violation around the type
declaration reasoning_content: NotRequired[str | None]; run the ruff formatter
(uv run ruff format) to reformat the code and commit the resulting changes so CI
passes, ensuring the declaration for reasoning_content and surrounding
imports/whitespace follow the project's ruff style rules.
---
Outside diff comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 531-557: The code currently only attaches reasoning_content for
terminal assistant responses; update the branches that create
assistant/tool-call turns so they also capture reasoning =
self._get_llm_reasoning_content() and pass reasoning_content=reasoning into
_append_message (and keep calling _invoke_step_callback(formatted_answer) and
_show_logs as before) for all paths that construct formatted_answer (including
the BaseModel branch and native tool-call branches), ensuring the same pattern
is applied where assistant/tool messages are emitted (see functions/methods:
_get_llm_reasoning_content, _invoke_step_callback, _append_message, _show_logs
and classes/values: AgentFinish, BaseModel; also apply the same change to the
other occurrence around the 1348-1375 region).
In `@lib/crewai/tests/llms/litellm/test_reasoning_content.py`:
- Around line 1-268: Formatting check failing in CI because the file
lib/crewai/tests/llms/litellm/test_reasoning_content.py (and possibly other
files under lib/) is not formatted; run the project's formatter and commit the
results: execute the formatter command used by CI (e.g., `uv run ruff format
--check lib/` to verify, then `uv run ruff format lib/` or the repo's preferred
formatting command), stage and commit the updated files (including
test_reasoning_content.py) so the lint/format check passes; no code logic
changes required—only apply and commit the formatter's edits.
---
Nitpick comments:
In `@lib/crewai/tests/llms/litellm/test_reasoning_content.py`:
- Around line 236-243: The assertions that check
assistant_msgs[0].get("reasoning_content") are order-dependent and flaky; update
the tests to assert order-independently by replacing checks on assistant_msgs[0]
with a membership-style assertion using any(...) (e.g., assert
any(m.get("reasoning_content") == "Let me reason step by step..." for m in
assistant_msgs)) and do the same for the other occurrence (the block around
lines 263-267) so both tests verify that at least one assistant message contains
the expected reasoning_content instead of assuming it is the first message.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d71c2bc0-5ede-4e8d-b62d-1cdcc3a1feb8
📒 Files selected for processing (6)
lib/crewai/src/crewai/agents/crew_agent_executor.pylib/crewai/src/crewai/experimental/agent_executor.pylib/crewai/src/crewai/llm.pylib/crewai/src/crewai/utilities/agent_utils.pylib/crewai/src/crewai/utilities/types.pylib/crewai/tests/llms/litellm/test_reasoning_content.py
| reasoning = self._get_llm_reasoning_content() | ||
|
|
||
| if isinstance(answer, BaseModel): | ||
| self.state.current_answer = AgentFinish( | ||
| thought="", | ||
| output=answer, | ||
| text=answer.model_dump_json(), | ||
| ) | ||
| self._invoke_step_callback(self.state.current_answer) | ||
| self._append_message_to_state(answer.model_dump_json()) | ||
| self._append_message_to_state( | ||
| answer.model_dump_json(), reasoning_content=reasoning | ||
| ) |
There was a problem hiding this comment.
Tool-call responses still lose reasoning_content in native flow.
This propagation only covers final-answer branches. If the LLM returns tool calls, routing exits before using reasoning_content, and the assistant tool-call message later added to state does not include it.
Suggested fix
@@
- # Check if the response is a list of tool calls
+ reasoning = self._get_llm_reasoning_content()
+
+ # Check if the response is a list of tool calls
if isinstance(answer, list) and answer and self._is_tool_call_list(answer):
# Store tool calls for sequential processing
self.state.pending_tool_calls = list(answer)
return "native_tool_calls"
-
- reasoning = self._get_llm_reasoning_content()
@@
if tool_calls_to_report:
assistant_message: LLMMessage = {
"role": "assistant",
"content": None,
"tool_calls": tool_calls_to_report,
}
+ reasoning = self._get_llm_reasoning_content()
+ if reasoning is not None:
+ assistant_message["reasoning_content"] = reasoning
if all(type(tc).__qualname__ == "Part" for tc in pending_tool_calls):
assistant_message["raw_tool_call_parts"] = list(pending_tool_calls)
self.state.messages.append(assistant_message)Also applies to: 1355-1367
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/experimental/agent_executor.py` around lines 1333 -
1344, Final-answer branch only sets reasoning_content; tool-call branches skip
it causing loss of reasoning. Update the branches that handle LLM tool calls so
that when creating and appending the assistant tool-call message and when
setting state for tool-invocation you also call _get_llm_reasoning_content() and
pass that reasoning_content into _append_message_to_state (and any
AgentAction/assistant tool-call constructors) just like the AgentFinish path
does; look for usages in methods around _get_llm_reasoning_content,
_append_message_to_state, AgentFinish and AgentAction/assistant tool-call
creation and ensure every path that adds an assistant/tool message forwards the
reasoning_content parameter.
| # Store reasoning_content for models that return it (e.g. DeepSeek thinking mode) | ||
| self.reasoning_content = getattr( | ||
| response_message, "reasoning_content", None | ||
| ) or ( | ||
| response_message.get("reasoning_content") | ||
| if hasattr(response_message, "get") | ||
| else None | ||
| ) |
There was a problem hiding this comment.
Async path is missing reasoning_content reset/extraction parity.
Line 1754 resets state only in call(), and Lines 1236-1243 extract only in sync non-streaming. acall() / _ahandle_non_streaming_response() currently don’t mirror this, so async flows can miss new reasoning_content or carry stale values across turns.
Suggested parity patch
@@
async def _ahandle_non_streaming_response(
@@
response_message = cast(Choices, cast(ModelResponse, response).choices)[
0
].message
text_response = response_message.content or ""
+
+ # Store reasoning_content for models that return it (e.g. DeepSeek thinking mode)
+ self.reasoning_content = getattr(
+ response_message, "reasoning_content", None
+ ) or (
+ response_message.get("reasoning_content")
+ if hasattr(response_message, "get")
+ else None
+ )
@@
async def acall(
@@
- with llm_call_context() as call_id:
+ self.reasoning_content: str | None = None
+ with llm_call_context() as call_id:Also applies to: 1754-1754
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/llm.py` around lines 1236 - 1243, The async code path
doesn't reset or extract reasoning_content like the sync path does, causing
stale values; update acall() and _ahandle_non_streaming_response() to mirror the
sync logic used in call() by resetting self.reasoning_content before the call
and extracting reasoning_content from response_message (using
getattr(response_message, "reasoning_content", None) or
response_message.get("reasoning_content") if it has get) after receiving the
non-streaming response so async flows receive the same reasoning_content
handling as sync flows.
| name: NotRequired[str] | ||
| tool_calls: NotRequired[list[dict[str, Any]]] | ||
| raw_tool_call_parts: NotRequired[list[Any]] | ||
| reasoning_content: NotRequired[str | None] |
There was a problem hiding this comment.
Fix ruff formatting before merge (CI is failing).
Pipeline is failing on uv run ruff format --check lib/. Please run formatter and commit the result to unblock merge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/utilities/types.py` at line 30, The file contains a
ruff formatting violation around the type declaration reasoning_content:
NotRequired[str | None]; run the ruff formatter (uv run ruff format) to reformat
the code and commit the resulting changes so CI passes, ensuring the declaration
for reasoning_content and surrounding imports/whitespace follow the project's
ruff style rules.
Co-Authored-By: João <joao@crewai.com>
Summary
Fixes #5878. DeepSeek V4 models return
reasoning_contentalongsidecontentin thinking mode. The DeepSeek API requires thisreasoning_contentto be passed back in subsequent API calls for multi-turn conversations. Previously, CrewAI discardedreasoning_contentfrom LLM responses, causing DeepSeek API errors in multi-turn conversations.This PR:
reasoning_contentfrom the litellm response inLLM._handle_non_streaming_response()and stores it on the LLM instancereasoning_contentat the start of eachLLM.call()to avoid stale valuesreasoning_contentinto assistant messages viaformat_message_for_llm(),_append_message(), and_append_message_to_state()reasoning_contentas an optional field onLLMMessageTypedDictFiles changed:
lib/crewai/src/crewai/llm.py— extract + store reasoning_contentlib/crewai/src/crewai/utilities/types.py— add field to LLMMessagelib/crewai/src/crewai/utilities/agent_utils.py— format_message_for_llm now accepts reasoning_contentlib/crewai/src/crewai/agents/crew_agent_executor.py— propagate in all sync/async loopslib/crewai/src/crewai/experimental/agent_executor.py— propagate in native tools pathlib/crewai/tests/llms/litellm/test_reasoning_content.py— 13 new testsReview & Testing Checklist for Human
reasoning_contentis correctly extracted from a real DeepSeek API response (the mock tests verify the extraction logic, but a manual test withdeepseek/deepseek-reasonerwould confirm end-to-end)reasoning_contentshould beNoneand assistant messages should not contain the fieldreasoning_contentfield inLLMMessageTypedDict doesn't break any downstream serialization or message processingNotes
getattr+.get()) to handle both attribute-style and dict-style access patterns from litellm's Message objectreasoning_contentis only added to assistant messages (not user/system) per the DeepSeek API specificationCrewAgentExecutoris deprecated but still receives the fix for backward compatibility; the experimentalAgentExecutoris also updatedLink to Devin session: https://app.devin.ai/sessions/584ca5a82022414490972068929d4d53
Summary by CodeRabbit
New Features
Tests