Skip to content

Fix thought matching issue in ReAct agent with the Llama-3.1-Nemotron-Nano-4B-v1.1 model#1675

Merged
rapids-bot[bot] merged 6 commits intoNVIDIA:release/1.5from
dagardner-nv:david-local-llm-recursion
Feb 27, 2026
Merged

Fix thought matching issue in ReAct agent with the Llama-3.1-Nemotron-Nano-4B-v1.1 model#1675
rapids-bot[bot] merged 6 commits intoNVIDIA:release/1.5from
dagardner-nv:david-local-llm-recursion

Conversation

@dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Feb 26, 2026

Description

  • The model's first response to the prompt from the docs/source/build-workflows/llms/using-local-llms.md example is (I have no idea why the <think> tags aren't balanced):
<think>
Okay, the user is asking about LangSmith now. Let me check the conversation history. In the previous question, they asked, "What is LangSmith?" and I need to continue answering based on that context.

Since the user hasn't provided any new information that requires a tool, I should try to answer based on existing knowledge. However, LangSmith might be a specific tool or project that isn't widely known, so maybe I should look it up. But the only tool available is webpage_query. I need to structure the query correctly.

The user might be referring to a software tool, a language processing system, or something else. To use the webpage_query tool, I need to frame a proper question. If LangSmith is a project, maybe the query should be about its documentation, features, or purpose. Let me try a general query first.

So, the query would be to search for LangSmith on the web. The proper JSON format for the query would be {'query': 'What is LangSmith?'}. That should get relevant results. Since there's no specific information given to use the current datetime, I don't need that tool here.

Wait, maybe the user is referring to the new AI assistant LangSmith. If that's the case, the main response should explain that LangSmith is a hypothetical or upcoming tool. But since I might not have up-to-date info, using the webpage_query is safer. Let me proceed with that.
</think>

Thought
</think>

Thought

The agent is looking for Thought: to indicate that the LLM isn't done yet, however this LLM responds without the :.

This PR relaxes that a little bit.

Also FWIW this same model hosted on build.nvidia.com doesn't exhibit this behavior, my only guess is that this is the result of running locally on a GPU with constrained resources.

  • Fix model spelling error
  • Misc drive-by documentation improvement.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Documentation

    • Clarified local LLM guide to assume two GPUs, added GPU-enabled container runtime guidance, expanded vLLM version-specific notes, and updated embedding-serving instructions and multi-GPU setup explanation.
  • Bug Fixes

    • Improved agent final-answer detection to better handle varied output formats.
  • Configuration

    • Updated the referenced model identifier for local deployment.

Signed-off-by: David Gardner <dagardner@nvidia.com>
…olkit into david-local-llm-recursion

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv self-assigned this Feb 26, 2026
@dagardner-nv dagardner-nv requested a review from a team as a code owner February 26, 2026 21:16
@dagardner-nv dagardner-nv added bug Something isn't working non-breaking Non-breaking change labels Feb 26, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f031f0b and e5da0ee.

📒 Files selected for processing (1)
  • docs/source/build-workflows/llms/using-local-llms.md

Walkthrough

Documentation and a deployment example were updated to require NVIDIA GPU runtime flags and clarify a two‑GPU assumption; a NIM model name was corrected. The ReAct agent regex was relaxed to accept an optional colon after "thought", affecting final-answer detection.

Changes

Cohort / File(s) Summary
Documentation
docs/source/build-workflows/llms/using-local-llms.md
Added --runtime=nvidia to Docker run commands, clarified NIM assumes two GPUs (one per model) and that commands may be adjusted per setup, specified vLLM v0.16.0 note, and updated vLLM embedding serve flags to --port 8001 --runner pooling --convert embed --pooler-config.
Example config
examples/documentation_guides/locally_hosted_llms/nim_config.yml
Corrected nim_llm.model_name from nvidia/llama3.1-nemotron-nano-4b-v1.1 to nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1.
ReAct Agent Logic
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py
Relaxed regex used to detect ReAct prompt echoes to allow thought with an optional trailing colon, changing detection of non-ReAct final answers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error The title accurately describes the main change (fixing a thought matching issue in the ReAct agent) but exceeds the recommended 72-character limit at 92 characters, violating the style requirement. Shorten the title to approximately 72 characters by removing the model name or using a more concise description, such as 'Fix thought matching issue in ReAct agent' or 'Fix ReAct agent thought pattern matching.'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py (1)

325-325: Tighten the thought regex with a word boundary.

Line 325 currently allows prefix matches like Thoughtful..., which can over-classify direct answers as prompt echoes. Consider anchoring thought as a word.

Proposed tweak
-                            r'\s*(thought\s*:?|question\s*:|previous\s+conversation)', content_str, re.IGNORECASE)):
+                            r'\s*(thought\b\s*:?|question\b\s*:|previous\s+conversation\b)', content_str, re.IGNORECASE)):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`
at line 325, The regex that checks for the "thought" token is too permissive and
matches prefixes like "Thoughtful"; update the pattern used where content_str is
tested (the re.search call in the React agent logic) to use word boundaries for
"thought" — e.g. replace the current
r'\s*(thought\s*:?|question\s*:|previous\s+conversation)' with a pattern that
anchors "thought" as a whole word such as
r'\s*(\bthought\b\s*:?|question\s*:|previous\s+conversation)' while keeping
re.IGNORECASE and the same variable (content_str) and surrounding logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Line 325: The regex that checks for the "thought" token is too permissive and
matches prefixes like "Thoughtful"; update the pattern used where content_str is
tested (the re.search call in the React agent logic) to use word boundaries for
"thought" — e.g. replace the current
r'\s*(thought\s*:?|question\s*:|previous\s+conversation)' with a pattern that
anchors "thought" as a whole word such as
r'\s*(\bthought\b\s*:?|question\s*:|previous\s+conversation)' while keeping
re.IGNORECASE and the same variable (content_str) and surrounding logic intact.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a00fdeb and 03358f0.

📒 Files selected for processing (3)
  • docs/source/build-workflows/llms/using-local-llms.md
  • examples/documentation_guides/locally_hosted_llms/nim_config.yml
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py

Signed-off-by: David Gardner <dagardner@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/build-workflows/llms/using-local-llms.md`:
- Line 176: Update the documentation note to reference the correct vLLM flag
name by replacing any occurrences of the deprecated flag
--override-pooler-config with the current --pooler-config; ensure the example
command shown (vllm serve ... --pooler-config '{"pooling_type": "MEAN"}' ...)
and the explanatory note both use --pooler-config so they match (verify any
mention in the surrounding text likewise references --pooler-config instead of
--override-pooler-config).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03358f0 and f031f0b.

📒 Files selected for processing (1)
  • docs/source/build-workflows/llms/using-local-llms.md

… specific link

Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 3d91186 into NVIDIA:release/1.5 Feb 27, 2026
17 checks passed
@dagardner-nv dagardner-nv deleted the david-local-llm-recursion branch February 27, 2026 02:40
Charlie-Yi-2002 pushed a commit to Charlie-Yi-2002/NeMo-Agent-Toolkit that referenced this pull request Mar 5, 2026
…ron-Nano-4B-v1.1` model (NVIDIA#1675)

* The model's first response to the prompt from the `docs/source/build-workflows/llms/using-local-llms.md` example is (I have no idea why the `<think>` tags aren't balanced):
```
<think>
Okay, the user is asking about LangSmith now. Let me check the conversation history. In the previous question, they asked, "What is LangSmith?" and I need to continue answering based on that context.

Since the user hasn't provided any new information that requires a tool, I should try to answer based on existing knowledge. However, LangSmith might be a specific tool or project that isn't widely known, so maybe I should look it up. But the only tool available is webpage_query. I need to structure the query correctly.

The user might be referring to a software tool, a language processing system, or something else. To use the webpage_query tool, I need to frame a proper question. If LangSmith is a project, maybe the query should be about its documentation, features, or purpose. Let me try a general query first.

So, the query would be to search for LangSmith on the web. The proper JSON format for the query would be {'query': 'What is LangSmith?'}. That should get relevant results. Since there's no specific information given to use the current datetime, I don't need that tool here.

Wait, maybe the user is referring to the new AI assistant LangSmith. If that's the case, the main response should explain that LangSmith is a hypothetical or upcoming tool. But since I might not have up-to-date info, using the webpage_query is safer. Let me proceed with that.
</think>

Thought
</think>

Thought
```

The agent is looking for `Thought:` to indicate that the LLM isn't done yet, however this LLM responds without the `:`. 

This PR relaxes that a little bit.

Also FWIW this same model hosted on build.nvidia.com doesn't exhibit this behavior, my only guess is that this is the result of running locally on a GPU with constrained resources.

* Fix model spelling error
* Misc drive-by documentation improvement.

## By Submitting this PR I confirm:
- I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md).
- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
  - Any contribution which contains commits that are not Signed-Off will not be accepted.
- When the PR is ready for review, new or existing tests cover these changes.
- When the PR is ready for review, the documentation is up to date with these changes.



## Summary by CodeRabbit

* **Documentation**
  * Clarified local LLM guide to assume two GPUs, added GPU-enabled container runtime guidance, expanded vLLM version-specific notes, and updated embedding-serving instructions and multi-GPU setup explanation.

* **Bug Fixes**
  * Improved agent final-answer detection to better handle varied output formats.

* **Configuration**
  * Updated the referenced model identifier for local deployment.

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Will Killian (https://github.com/willkill07)
  - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah)

URL: NVIDIA#1675
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants