Fix `thought` matching issue in ReAct agent with the `Llama-3.1-Nemotron-Nano-4B-v1.1` model by dagardner-nv · Pull Request #1675 · NVIDIA/NeMo-Agent-Toolkit

dagardner-nv · 2026-02-26T21:16:54Z

Description

The model's first response to the prompt from the docs/source/build-workflows/llms/using-local-llms.md example is (I have no idea why the <think> tags aren't balanced):

<think>
Okay, the user is asking about LangSmith now. Let me check the conversation history. In the previous question, they asked, "What is LangSmith?" and I need to continue answering based on that context.

Since the user hasn't provided any new information that requires a tool, I should try to answer based on existing knowledge. However, LangSmith might be a specific tool or project that isn't widely known, so maybe I should look it up. But the only tool available is webpage_query. I need to structure the query correctly.

The user might be referring to a software tool, a language processing system, or something else. To use the webpage_query tool, I need to frame a proper question. If LangSmith is a project, maybe the query should be about its documentation, features, or purpose. Let me try a general query first.

So, the query would be to search for LangSmith on the web. The proper JSON format for the query would be {'query': 'What is LangSmith?'}. That should get relevant results. Since there's no specific information given to use the current datetime, I don't need that tool here.

Wait, maybe the user is referring to the new AI assistant LangSmith. If that's the case, the main response should explain that LangSmith is a hypothetical or upcoming tool. But since I might not have up-to-date info, using the webpage_query is safer. Let me proceed with that.
</think>

Thought
</think>

Thought

The agent is looking for Thought: to indicate that the LLM isn't done yet, however this LLM responds without the :.

This PR relaxes that a little bit.

Also FWIW this same model hosted on build.nvidia.com doesn't exhibit this behavior, my only guess is that this is the result of running locally on a GPU with constrained resources.

Fix model spelling error
Misc drive-by documentation improvement.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Documentation
- Clarified local LLM guide to assume two GPUs, added GPU-enabled container runtime guidance, expanded vLLM version-specific notes, and updated embedding-serving instructions and multi-GPU setup explanation.
Bug Fixes
- Improved agent final-answer detection to better handle varied output formats.
Configuration
- Updated the referenced model identifier for local deployment.

Signed-off-by: David Gardner <dagardner@nvidia.com>

…olkit into david-local-llm-recursion Signed-off-by: David Gardner <dagardner@nvidia.com>

Signed-off-by: David Gardner <dagardner@nvidia.com>

coderabbitai · 2026-02-26T21:17:43Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f031f0b and e5da0ee.

📒 Files selected for processing (1)

docs/source/build-workflows/llms/using-local-llms.md

Walkthrough

Documentation and a deployment example were updated to require NVIDIA GPU runtime flags and clarify a two‑GPU assumption; a NIM model name was corrected. The ReAct agent regex was relaxed to accept an optional colon after "thought", affecting final-answer detection.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/build-workflows/llms/using-local-llms.md`	Added `--runtime=nvidia` to Docker run commands, clarified NIM assumes two GPUs (one per model) and that commands may be adjusted per setup, specified vLLM v0.16.0 note, and updated vLLM embedding serve flags to `--port 8001 --runner pooling --convert embed --pooler-config`.
Example config `examples/documentation_guides/locally_hosted_llms/nim_config.yml`	Corrected `nim_llm.model_name` from `nvidia/llama3.1-nemotron-nano-4b-v1.1` to `nvidia/Llama-3.1-Nemotron-Nano-4B-v1.1`.
ReAct Agent Logic `packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`	Relaxed regex used to detect ReAct prompt echoes to allow `thought` with an optional trailing colon, changing detection of non-ReAct final answers.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Title check	❌ Error	The title accurately describes the main change (fixing a thought matching issue in the ReAct agent) but exceeds the recommended 72-character limit at 92 characters, violating the style requirement.	Shorten the title to approximately 72 characters by removing the model name or using a more concise description, such as 'Fix thought matching issue in ReAct agent' or 'Fix ReAct agent thought pattern matching.'
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py (1)

325-325: Tighten the thought regex with a word boundary.

Line 325 currently allows prefix matches like Thoughtful..., which can over-classify direct answers as prompt echoes. Consider anchoring thought as a word.

Proposed tweak

-                            r'\s*(thought\s*:?|question\s*:|previous\s+conversation)', content_str, re.IGNORECASE)):
+                            r'\s*(thought\b\s*:?|question\b\s*:|previous\s+conversation\b)', content_str, re.IGNORECASE)):

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`
at line 325, The regex that checks for the "thought" token is too permissive and
matches prefixes like "Thoughtful"; update the pattern used where content_str is
tested (the re.search call in the React agent logic) to use word boundaries for
"thought" — e.g. replace the current
r'\s*(thought\s*:?|question\s*:|previous\s+conversation)' with a pattern that
anchors "thought" as a whole word such as
r'\s*(\bthought\b\s*:?|question\s*:|previous\s+conversation)' while keeping
re.IGNORECASE and the same variable (content_str) and surrounding logic intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Line 325: The regex that checks for the "thought" token is too permissive and
matches prefixes like "Thoughtful"; update the pattern used where content_str is
tested (the re.search call in the React agent logic) to use word boundaries for
"thought" — e.g. replace the current
r'\s*(thought\s*:?|question\s*:|previous\s+conversation)' with a pattern that
anchors "thought" as a whole word such as
r'\s*(\bthought\b\s*:?|question\s*:|previous\s+conversation)' while keeping
re.IGNORECASE and the same variable (content_str) and surrounding logic intact.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a00fdeb and 03358f0.

📒 Files selected for processing (3)

docs/source/build-workflows/llms/using-local-llms.md
examples/documentation_guides/locally_hosted_llms/nim_config.yml
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py

Signed-off-by: David Gardner <dagardner@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/build-workflows/llms/using-local-llms.md`:
- Line 176: Update the documentation note to reference the correct vLLM flag
name by replacing any occurrences of the deprecated flag
--override-pooler-config with the current --pooler-config; ensure the example
command shown (vllm serve ... --pooler-config '{"pooling_type": "MEAN"}' ...)
and the explanatory note both use --pooler-config so they match (verify any
mention in the surrounding text likewise references --pooler-config instead of
--override-pooler-config).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03358f0 and f031f0b.

📒 Files selected for processing (1)

docs/source/build-workflows/llms/using-local-llms.md

docs/source/build-workflows/llms/using-local-llms.md

… specific link Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv · 2026-02-27T02:40:45Z

/merge

…ron-Nano-4B-v1.1` model (NVIDIA#1675) * The model's first response to the prompt from the `docs/source/build-workflows/llms/using-local-llms.md` example is (I have no idea why the `<think>` tags aren't balanced): ``` <think> Okay, the user is asking about LangSmith now. Let me check the conversation history. In the previous question, they asked, "What is LangSmith?" and I need to continue answering based on that context. Since the user hasn't provided any new information that requires a tool, I should try to answer based on existing knowledge. However, LangSmith might be a specific tool or project that isn't widely known, so maybe I should look it up. But the only tool available is webpage_query. I need to structure the query correctly. The user might be referring to a software tool, a language processing system, or something else. To use the webpage_query tool, I need to frame a proper question. If LangSmith is a project, maybe the query should be about its documentation, features, or purpose. Let me try a general query first. So, the query would be to search for LangSmith on the web. The proper JSON format for the query would be {'query': 'What is LangSmith?'}. That should get relevant results. Since there's no specific information given to use the current datetime, I don't need that tool here. Wait, maybe the user is referring to the new AI assistant LangSmith. If that's the case, the main response should explain that LangSmith is a hypothetical or upcoming tool. But since I might not have up-to-date info, using the webpage_query is safer. Let me proceed with that. </think> Thought </think> Thought ``` The agent is looking for `Thought:` to indicate that the LLM isn't done yet, however this LLM responds without the `:`. This PR relaxes that a little bit. Also FWIW this same model hosted on build.nvidia.com doesn't exhibit this behavior, my only guess is that this is the result of running locally on a GPU with constrained resources. * Fix model spelling error * Misc drive-by documentation improvement. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Clarified local LLM guide to assume two GPUs, added GPU-enabled container runtime guidance, expanded vLLM version-specific notes, and updated embedding-serving instructions and multi-GPU setup explanation. * **Bug Fixes** * Improved agent final-answer detection to better handle varied output formats. * **Configuration** * Updated the referenced model identifier for local deployment. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#1675

dagardner-nv added 4 commits February 26, 2026 10:45

Expand the note about the gpus flag, explicitly set the docker runtime

71237c1

Signed-off-by: David Gardner <dagardner@nvidia.com>

Merge branch 'release/1.5' of https://github.com/NVIDIA/NeMo-Agent-To…

d6a9d12

…olkit into david-local-llm-recursion Signed-off-by: David Gardner <dagardner@nvidia.com>

Make matching the colon optional after the word 'thought'

acbe48c

Signed-off-by: David Gardner <dagardner@nvidia.com>

Fix model name

03358f0

Signed-off-by: David Gardner <dagardner@nvidia.com>

dagardner-nv self-assigned this Feb 26, 2026

dagardner-nv requested a review from a team as a code owner February 26, 2026 21:16

dagardner-nv added bug Something isn't working non-breaking Non-breaking change labels Feb 26, 2026

willkill07 approved these changes Feb 26, 2026

View reviewed changes

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

AnuradhaKaruppiah approved these changes Feb 26, 2026

View reviewed changes

Adjust CLI flags for vllm 0.16.0

f031f0b

Signed-off-by: David Gardner <dagardner@nvidia.com>

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

docs/source/build-workflows/llms/using-local-llms.md Show resolved Hide resolved

Fix pooler config flag, and update the documentation url to a version…

e5da0ee

… specific link Signed-off-by: David Gardner <dagardner@nvidia.com>

rapids-bot bot merged commit 3d91186 into NVIDIA:release/1.5 Feb 27, 2026
17 checks passed

dagardner-nv deleted the david-local-llm-recursion branch February 27, 2026 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `thought` matching issue in ReAct agent with the `Llama-3.1-Nemotron-Nano-4B-v1.1` model#1675

Fix `thought` matching issue in ReAct agent with the `Llama-3.1-Nemotron-Nano-4B-v1.1` model#1675
rapids-bot[bot] merged 6 commits intoNVIDIA:release/1.5from
dagardner-nv:david-local-llm-recursion

dagardner-nv commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Pre-merge checks failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

dagardner-nv commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dagardner-nv commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dagardner-nv commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dagardner-nv commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading