Fix thought matching issue in ReAct agent with the Llama-3.1-Nemotron-Nano-4B-v1.1 model#1675
Conversation
Signed-off-by: David Gardner <dagardner@nvidia.com>
…olkit into david-local-llm-recursion Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
WalkthroughDocumentation and a deployment example were updated to require NVIDIA GPU runtime flags and clarify a two‑GPU assumption; a NIM model name was corrected. The ReAct agent regex was relaxed to accept an optional colon after "thought", affecting final-answer detection. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (1 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py (1)
325-325: Tighten thethoughtregex with a word boundary.Line 325 currently allows prefix matches like
Thoughtful..., which can over-classify direct answers as prompt echoes. Consider anchoringthoughtas a word.Proposed tweak
- r'\s*(thought\s*:?|question\s*:|previous\s+conversation)', content_str, re.IGNORECASE)): + r'\s*(thought\b\s*:?|question\b\s*:|previous\s+conversation\b)', content_str, re.IGNORECASE)):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py` at line 325, The regex that checks for the "thought" token is too permissive and matches prefixes like "Thoughtful"; update the pattern used where content_str is tested (the re.search call in the React agent logic) to use word boundaries for "thought" — e.g. replace the current r'\s*(thought\s*:?|question\s*:|previous\s+conversation)' with a pattern that anchors "thought" as a whole word such as r'\s*(\bthought\b\s*:?|question\s*:|previous\s+conversation)' while keeping re.IGNORECASE and the same variable (content_str) and surrounding logic intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py`:
- Line 325: The regex that checks for the "thought" token is too permissive and
matches prefixes like "Thoughtful"; update the pattern used where content_str is
tested (the re.search call in the React agent logic) to use word boundaries for
"thought" — e.g. replace the current
r'\s*(thought\s*:?|question\s*:|previous\s+conversation)' with a pattern that
anchors "thought" as a whole word such as
r'\s*(\bthought\b\s*:?|question\s*:|previous\s+conversation)' while keeping
re.IGNORECASE and the same variable (content_str) and surrounding logic intact.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/source/build-workflows/llms/using-local-llms.mdexamples/documentation_guides/locally_hosted_llms/nim_config.ymlpackages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py
Signed-off-by: David Gardner <dagardner@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/build-workflows/llms/using-local-llms.md`:
- Line 176: Update the documentation note to reference the correct vLLM flag
name by replacing any occurrences of the deprecated flag
--override-pooler-config with the current --pooler-config; ensure the example
command shown (vllm serve ... --pooler-config '{"pooling_type": "MEAN"}' ...)
and the explanatory note both use --pooler-config so they match (verify any
mention in the surrounding text likewise references --pooler-config instead of
--override-pooler-config).
… specific link Signed-off-by: David Gardner <dagardner@nvidia.com>
|
/merge |
…ron-Nano-4B-v1.1` model (NVIDIA#1675) * The model's first response to the prompt from the `docs/source/build-workflows/llms/using-local-llms.md` example is (I have no idea why the `<think>` tags aren't balanced): ``` <think> Okay, the user is asking about LangSmith now. Let me check the conversation history. In the previous question, they asked, "What is LangSmith?" and I need to continue answering based on that context. Since the user hasn't provided any new information that requires a tool, I should try to answer based on existing knowledge. However, LangSmith might be a specific tool or project that isn't widely known, so maybe I should look it up. But the only tool available is webpage_query. I need to structure the query correctly. The user might be referring to a software tool, a language processing system, or something else. To use the webpage_query tool, I need to frame a proper question. If LangSmith is a project, maybe the query should be about its documentation, features, or purpose. Let me try a general query first. So, the query would be to search for LangSmith on the web. The proper JSON format for the query would be {'query': 'What is LangSmith?'}. That should get relevant results. Since there's no specific information given to use the current datetime, I don't need that tool here. Wait, maybe the user is referring to the new AI assistant LangSmith. If that's the case, the main response should explain that LangSmith is a hypothetical or upcoming tool. But since I might not have up-to-date info, using the webpage_query is safer. Let me proceed with that. </think> Thought </think> Thought ``` The agent is looking for `Thought:` to indicate that the LLM isn't done yet, however this LLM responds without the `:`. This PR relaxes that a little bit. Also FWIW this same model hosted on build.nvidia.com doesn't exhibit this behavior, my only guess is that this is the result of running locally on a GPU with constrained resources. * Fix model spelling error * Misc drive-by documentation improvement. ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **Documentation** * Clarified local LLM guide to assume two GPUs, added GPU-enabled container runtime guidance, expanded vLLM version-specific notes, and updated embedding-serving instructions and multi-GPU setup explanation. * **Bug Fixes** * Improved agent final-answer detection to better handle varied output formats. * **Configuration** * Updated the referenced model identifier for local deployment. Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) URL: NVIDIA#1675
Description
docs/source/build-workflows/llms/using-local-llms.mdexample is (I have no idea why the<think>tags aren't balanced):The agent is looking for
Thought:to indicate that the LLM isn't done yet, however this LLM responds without the:.This PR relaxes that a little bit.
Also FWIW this same model hosted on build.nvidia.com doesn't exhibit this behavior, my only guess is that this is the result of running locally on a GPU with constrained resources.
By Submitting this PR I confirm:
Summary by CodeRabbit
Documentation
Bug Fixes
Configuration