Skip to content

Fix ReAct agent parsing failures with reasoning models (<think> tags)#1667

Merged
rapids-bot[bot] merged 3 commits intoNVIDIA:release/1.5from
yczhang-nv:yuchen-fix-5928823
Feb 26, 2026
Merged

Fix ReAct agent parsing failures with reasoning models (<think> tags)#1667
rapids-bot[bot] merged 3 commits intoNVIDIA:release/1.5from
yczhang-nv:yuchen-fix-5928823

Conversation

@yczhang-nv
Copy link
Contributor

@yczhang-nv yczhang-nv commented Feb 26, 2026

Description

  • Reasoning models (e.g., nemotron-3-nano-30b-a3b) wrap their chain-of-thought in ... tags, which breaks the ReAct output parser. This PR adds multi-layered handling to recover usable content from reasoning model outputs.
  • Preserve reasoning_content from streamed LLM chunks in _stream_llm, which previously discarded additional_kwargs when constructing the final AIMessage
  • Strip tags from LLM output before parsing, with fallbacks: extract content from inside tags, then fall back to reasoning_content from additional_kwargs
  • Accept direct (non-ReAct-formatted) answers from reasoning models instead of failing with ReActAgentParsingFailedError
  • Improve retry hints when content is empty, guiding the model toward producing a Final Answer:

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Bug Fixes

    • Streaming responses from reasoning models now capture and combine main content and separate reasoning streams reliably.
    • Agents more reliably recognize direct final answers even when not in strict ReAct format, reducing missed responses.
    • Improved fallback behavior that adds clear guidance prompts when parsing fails.
  • Improvements

    • More consistent normalization of agent outputs across initial and subsequent interaction flows for steadier results.

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
@yczhang-nv yczhang-nv self-assigned this Feb 26, 2026
@yczhang-nv yczhang-nv requested a review from a team as a code owner February 26, 2026 06:06
@yczhang-nv yczhang-nv added bug Something isn't working non-breaking Non-breaking change labels Feb 26, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

Walkthrough

Base agent separates streaming LLM content and reasoning content and returns both (reasoning via additional_kwargs); ReAct agent normalizes outputs by stripping R1 think tags, falls back to extracting ... or reasoning_content, treats certain non-ReAct outputs as direct final answers, and adds fallback prompts on parse failures.

Changes

Cohort / File(s) Summary
Base Streaming Handler
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
Reworked _stream_llm to collect content_parts and reasoning_parts from streaming events, return concatenated content, and attach reasoning_content in AIMessage.additional_kwargs when present.
ReAct Agent Logic
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py
Added remove_r1_think_tags normalization and a fallback chain: normalized content → extract content inside <think>...</think> → use reasoning_content from additional_kwargs. Handles non-ReAct reasoning-model outputs as final answers when appropriate and appends fallback guidance prompts on parsing failures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is fully related to the main changes in the PR, which focus on fixing ReAct agent parsing failures with reasoning models that use <think> tags.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py (1)

183-193: Extract duplicated content normalization logic into a helper method.

This normalization logic is duplicated at lines 217-227. Consider extracting it into a private method to improve maintainability.

♻️ Proposed refactor

Add a helper method to the class:

def _normalize_reasoning_model_output(self, output_message: AIMessage) -> None:
    """Normalize output from reasoning models by stripping think tags and extracting content."""
    if not isinstance(output_message.content, str):
        return
    raw_content = output_message.content
    output_message.content = remove_r1_think_tags(raw_content)
    if not output_message.content.strip():
        think_match = re.search(r'<think>(.*?)</think>', raw_content, re.DOTALL)
        if think_match:
            output_message.content = think_match.group(1).strip()
    if not output_message.content.strip():
        reasoning = output_message.additional_kwargs.get('reasoning_content', '')
        if reasoning:
            output_message.content = reasoning

Then replace both occurrences with:

-                    if isinstance(output_message.content, str):
-                        raw_content = output_message.content
-                        output_message.content = remove_r1_think_tags(raw_content)
-                        if not output_message.content.strip():
-                            think_match = re.search(r'<think>(.*?)</think>', raw_content, re.DOTALL)
-                            if think_match:
-                                output_message.content = think_match.group(1).strip()
-                        if not output_message.content.strip():
-                            reasoning = output_message.additional_kwargs.get('reasoning_content', '')
-                            if reasoning:
-                                output_message.content = reasoning
+                    self._normalize_reasoning_model_output(output_message)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`
around lines 183 - 193, Extract the duplicated content-normalization logic into
a private helper on the class (e.g., def _normalize_reasoning_model_output(self,
output_message: AIMessage) -> None) that returns early if output_message.content
is not a str, calls remove_r1_think_tags(raw_content), falls back to
re.search(r'<think>(.*?)</think>', raw_content, re.DOTALL) to extract inner text
if the stripped content is empty, and finally uses
output_message.additional_kwargs.get('reasoning_content', '') as the last
fallback; then replace both inline blocks (the one around output_message
handling at lines shown and the duplicate at 217-227) with calls to
self._normalize_reasoning_model_output(output_message) and add a short docstring
to the helper for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Around line 183-193: Extract the duplicated content-normalization logic into a
private helper on the class (e.g., def _normalize_reasoning_model_output(self,
output_message: AIMessage) -> None) that returns early if output_message.content
is not a str, calls remove_r1_think_tags(raw_content), falls back to
re.search(r'<think>(.*?)</think>', raw_content, re.DOTALL) to extract inner text
if the stripped content is empty, and finally uses
output_message.additional_kwargs.get('reasoning_content', '') as the last
fallback; then replace both inline blocks (the one around output_message
handling at lines shown and the duplicate at 217-227) with calls to
self._normalize_reasoning_model_output(output_message) and add a short docstring
to the helper for clarity.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3892d75 and 063612d.

📒 Files selected for processing (2)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py (1)

182-192: Extract duplicated normalization/fallback logic into one helper.

The same block appears twice; centralizing it avoids drift and keeps first/subsequent cycle behavior consistent.

Refactor sketch
+    `@staticmethod`
+    def _normalize_agent_output_content(output_message: AIMessage) -> None:
+        if not isinstance(output_message.content, str):
+            return
+        raw_content = output_message.content
+        output_message.content = remove_r1_think_tags(raw_content)
+        if not output_message.content.strip():
+            think_match = re.search(r"<think>(.*?)</think>", raw_content, re.DOTALL)
+            if think_match:
+                output_message.content = think_match.group(1).strip()
+        if not output_message.content.strip():
+            reasoning = output_message.additional_kwargs.get("reasoning_content", "")
+            if isinstance(reasoning, str) and reasoning.strip():
+                output_message.content = reasoning.strip()
-                    if isinstance(output_message.content, str):
-                        ...
+                    self._normalize_agent_output_content(output_message)

Also applies to: 216-226

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`
around lines 182 - 192, Extract the duplicated normalization/fallback logic that
processes output_message.content (calls remove_r1_think_tags, falls back to
<think> tag content, then to additional_kwargs['reasoning_content']) into a
single helper function (e.g., normalize_output_message_content(output_message)
or similar) and replace both code blocks (the one around output_message handling
at the current location and the other at the later occurrence) with calls to
that helper; ensure the helper accepts an OutputMessage-like object, performs
the remove_r1_think_tags -> check empty -> extract <think> -> check empty -> use
reasoning_content flow, and preserves trimming/strip behavior so first and
subsequent cycle behavior stays identical.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Around line 182-192: The code assigns output_message.content = reasoning from
output_message.additional_kwargs['reasoning_content'] without validating its
type; guard that value in the blocks inside the remove_r1_think_tags flow and
the similar block later (the code around output_message, additional_kwargs and
reasoning_content) by checking isinstance(reasoning, str) before assigning, and
if it is not a str either skip the assignment or coerce safely (e.g., convert to
str(reasoning) only if that behavior is acceptable), ensuring downstream text
parsing always receives a string.
- Around line 281-290: The fallback that accepts parse-failed outputs as a final
answer (the block using ex.missing_action, content_str, AGENT_LOG_PREFIX,
state.messages, and state.final_answer) is too permissive because it only
rejects content that starts with "Thought:"; update the gating so you only
accept content as a final answer when it contains no ReAct markers anywhere.
Specifically, modify the condition around ex.missing_action to scan content_str
(case-insensitive) for any occurrences of ReAct tokens such as "thought:",
"action:", "observation:", "action input:" (anchored at line starts or
standalone tokens) and only proceed to log, append AIMessage, set
state.final_answer and return state when none of those markers are present;
otherwise continue allowing tools to run or raise the parse error path. Ensure
you use the existing variables (content_str, ex.missing_action, state,
AGENT_LOG_PREFIX) so the change is localized to that block.

---

Nitpick comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Around line 182-192: Extract the duplicated normalization/fallback logic that
processes output_message.content (calls remove_r1_think_tags, falls back to
<think> tag content, then to additional_kwargs['reasoning_content']) into a
single helper function (e.g., normalize_output_message_content(output_message)
or similar) and replace both code blocks (the one around output_message handling
at the current location and the other at the later occurrence) with calls to
that helper; ensure the helper accepts an OutputMessage-like object, performs
the remove_r1_think_tags -> check empty -> extract <think> -> check empty -> use
reasoning_content flow, and preserves trimming/strip behavior so first and
subsequent cycle behavior stays identical.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 063612d and 5a62c65.

📒 Files selected for processing (1)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py

Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py (2)

189-192: ⚠️ Potential issue | 🟡 Minor

Guard reasoning_content type before assignment.

The additional_kwargs dictionary is provider-defined and may contain non-string values. Add a type check to prevent downstream parsing failures.

Suggested fix
                         if not output_message.content.strip():
                             reasoning = output_message.additional_kwargs.get('reasoning_content', '')
-                            if reasoning:
+                            if isinstance(reasoning, str) and reasoning.strip():
                                 output_message.content = reasoning
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`
around lines 189 - 192, The code assigns output_message.content = reasoning from
output_message.additional_kwargs['reasoning_content'] without validating type;
update the branch in the block handling output_message so you first fetch
reasoning = output_message.additional_kwargs.get('reasoning_content', None) and
only assign it if it is a non-empty string (e.g., isinstance(reasoning, str) and
reasoning.strip()), otherwise leave output_message.content unchanged (or coerce
safely to string only after explicit validation) to avoid downstream parsing
errors.

278-291: ⚠️ Potential issue | 🟡 Minor

Consider stricter validation for direct-answer fallback.

The current check only rejects content starting with Thought:, Question:, or Previous conversation. Content with Action: or Observation: markers on later lines could be incorrectly accepted as a final answer.

Since ex.missing_action is already true (meaning no valid action was parsed), this may be acceptable. However, if you want stricter validation to avoid accepting malformed ReAct outputs:

Suggested stricter check
                     content_str = str(output_message.content).strip()
-                    if (ex.missing_action and content_str and not re.match(
-                            r'\s*(thought\s*:|question\s*:|previous\s+conversation)', content_str, re.IGNORECASE)):
+                    react_marker_pattern = re.compile(
+                        r'^\s*(thought|action|action\s*input|observation)\s*:',
+                        re.IGNORECASE | re.MULTILINE,
+                    )
+                    if ex.missing_action and content_str and not react_marker_pattern.search(content_str):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`
around lines 278 - 291, The direct-answer fallback currently accepts any
non-empty content when ex.missing_action is true as long as it doesn't start
with Thought:/Question:/Previous..., which can still let ReAct markers like
"Action:" or "Observation:" appear later and be accepted; update the validation
in the block that builds content_str (used with ex.missing_action,
AGENT_LOG_PREFIX, state.messages, AIMessage, state.final_answer) to perform a
stricter check—either reject when any line (not just the first) matches common
ReAct markers (e.g., Action:, Observation:, Input:, Answer:) using a regex
search (re.search) or require the first non-empty line to be free of ReAct
markers—so malformed ReAct outputs are not treated as final answers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Around line 189-192: The code assigns output_message.content = reasoning from
output_message.additional_kwargs['reasoning_content'] without validating type;
update the branch in the block handling output_message so you first fetch
reasoning = output_message.additional_kwargs.get('reasoning_content', None) and
only assign it if it is a non-empty string (e.g., isinstance(reasoning, str) and
reasoning.strip()), otherwise leave output_message.content unchanged (or coerce
safely to string only after explicit validation) to avoid downstream parsing
errors.
- Around line 278-291: The direct-answer fallback currently accepts any
non-empty content when ex.missing_action is true as long as it doesn't start
with Thought:/Question:/Previous..., which can still let ReAct markers like
"Action:" or "Observation:" appear later and be accepted; update the validation
in the block that builds content_str (used with ex.missing_action,
AGENT_LOG_PREFIX, state.messages, AIMessage, state.final_answer) to perform a
stricter check—either reject when any line (not just the first) matches common
ReAct markers (e.g., Action:, Observation:, Input:, Answer:) using a regex
search (re.search) or require the first non-empty line to be free of ReAct
markers—so malformed ReAct outputs are not treated as final answers.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a62c65 and 222e57e.

📒 Files selected for processing (2)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py

@yczhang-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 7bb10aa into NVIDIA:release/1.5 Feb 26, 2026
15 checks passed
@yczhang-nv yczhang-nv deleted the yuchen-fix-5928823 branch February 26, 2026 19:58
Charlie-Yi-2002 pushed a commit to Charlie-Yi-2002/NeMo-Agent-Toolkit that referenced this pull request Mar 5, 2026
…s) (NVIDIA#1667)

- Reasoning models (e.g., nemotron-3-nano-30b-a3b) wrap their chain-of-thought in <think>...</think> tags, which breaks the ReAct output parser. This PR adds multi-layered handling to recover usable content from reasoning model outputs.
- Preserve reasoning_content from streamed LLM chunks in _stream_llm, which previously discarded additional_kwargs when constructing the final AIMessage
- Strip <think> tags from LLM output before parsing, with fallbacks: extract content from inside tags, then fall back to reasoning_content from additional_kwargs
- Accept direct (non-ReAct-formatted) answers from reasoning models instead of failing with ReActAgentParsingFailedError
- Improve retry hints when content is empty, guiding the model toward producing a Final Answer:

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.



## Summary by CodeRabbit

* **Bug Fixes**
  * Streaming responses from reasoning models now capture and combine main content and separate reasoning streams reliably.
  * Agents more reliably recognize direct final answers even when not in strict ReAct format, reducing missed responses.
  * Improved fallback behavior that adds clear guidance prompts when parsing fails.

* **Improvements**
  * More consistent normalization of agent outputs across initial and subsequent interaction flows for steadier results.

Authors:
  - Yuchen Zhang (https://github.com/yczhang-nv)

Approvers:
  - Will Killian (https://github.com/willkill07)

URL: NVIDIA#1667
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants