fix: apply coderabbit suggestions from forward merge by willkill07 · Pull Request #1089 · NVIDIA/NeMo-Agent-Toolkit

willkill07 · 2025-10-23T00:43:05Z

Description

This PR contains documentation and example fixes that were mentioned in the forward merge PR from post 1.3.0-rc6 -> develop.

Closes

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Documentation
- Updated evaluator metric references throughout documentation
- Improved formatting with reusable documentation blocks
- Updated configuration examples to reflect naming changes
Chores
- Standardized evaluator key naming across configuration files
- Updated configuration file structure
Bug Fixes
- Fixed typo in test assertion message

coderabbitai · 2025-10-23T00:43:15Z

Warning

Rate limit exceeded

@willkill07 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 52 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 32c5a17 and e9bc4a8.

📒 Files selected for processing (9)

docs/source/reference/evaluate.md (1 hunks)
docs/source/workflows/evaluate.md (2 hunks)
docs/source/workflows/mcp/mcp-client.md (1 hunks)
docs/source/workflows/profiler.md (1 hunks)
examples/agents/react/configs/config.yml (1 hunks)
examples/agents/rewoo/configs/config.yml (1 hunks)
examples/agents/tool_calling/configs/config.yml (1 hunks)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1 hunks)
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (1 hunks)

Walkthrough

Refactors evaluator metric names across documentation and configuration files by removing RAG prefixes (rag_accuracy → accuracy, rag_groundedness → groundedness, rag_relevance → relevance) and reorders evaluator key mappings in example configuration files, with supporting documentation and reference updates.

Changes

Cohort / File(s)	Summary
Documentation: Reference & Output File Updates `docs/source/reference/evaluate.md`, `docs/source/workflows/evaluate.md`, `examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py`	Updated sample evaluator output references from `rag_accuracy_output.json` to `accuracy_output.json`; replaced bold note with reusable admonition block in evaluate workflow documentation.
Metric Name Refactoring `docs/source/workflows/profiler.md`	Renamed evaluator metric names from `rag_accuracy`, `rag_groundedness`, `rag_relevance` to `accuracy`, `groundedness`, `relevance` respectively.
Evaluator Key Mapping Swaps `examples/agents/react/configs/config.yml`, `examples/agents/rewoo/configs/config.yml`, `examples/agents/tool_calling/configs/config.yml`	Swapped evaluator key assignments under `eval.general.evaluators.accuracy` or `eval.evaluators`: `groundedness` ↔ `relevance` keys; metric bindings updated accordingly (relevance now maps to ContextRelevance, groundedness to ResponseGroundedness).
MCP Configuration `docs/source/workflows/mcp/mcp-client.md`	Renamed YAML configuration key from `workflows` to `workflow` in MCP Client documentation example.
Test Fix `examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py`	Corrected typo in assertion message: "whould" → "should".

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix: apply coderabbit suggestions from forward merge" is concise at 52 characters (well under the 72-character limit), uses imperative mood with the verb "apply," and is related to the actual changes in the PR. The changeset includes documentation updates, configuration adjustments, metric name refactoring (removing "rag_" prefixes), file reference updates, and minor fixes. While the title is somewhat generic in that it doesn't specify the nature of these fixes (e.g., metric renaming, documentation updates), it is clearly connected to the PR's stated objective of applying fixes from a forward merge code review and conveys meaningful information about the changes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cad2a3 and 32c5a17.

📒 Files selected for processing (9)

docs/source/reference/evaluate.md (1 hunks)
docs/source/workflows/evaluate.md (2 hunks)
docs/source/workflows/mcp/mcp-client.md (1 hunks)
docs/source/workflows/profiler.md (1 hunks)
examples/agents/react/configs/config.yml (1 hunks)
examples/agents/rewoo/configs/config.yml (1 hunks)
examples/agents/tool_calling/configs/config.yml (1 hunks)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1 hunks)
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

{docs/source/**/*.md,**/README.@(md|ipynb)}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{docs/source/**/*.md,**/README.@(md|ipynb)}: Use the full name “NVIDIA NeMo Agent toolkit” on first use in documentation, then “NeMo Agent toolkit”; in headings use “NeMo Agent Toolkit” (capital T)
Do not use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq) in documentation unless explicitly referring to deprecated names
Never use “NAT”/“nat” abbreviations in documentation
Documentation must be clear/comprehensive; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling is correct

Files:

docs/source/workflows/profiler.md
docs/source/reference/evaluate.md
docs/source/workflows/mcp/mcp-client.md
docs/source/workflows/evaluate.md

docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Documentation sources are Markdown files under docs/source
Surround code entities with backticks in docs to avoid Vale false positives
Do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt; accepted words in accept.txt are allowed

Files:

docs/source/workflows/profiler.md
docs/source/reference/evaluate.md
docs/source/workflows/mcp/mcp-client.md
docs/source/workflows/evaluate.md

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{**/*.py,**/*.sh,**/*.md,**/*.toml,**/*.y?(a)ml,**/*.json,**/*.txt,**/*.ini,**/*.cfg,**/*.ipynb}: Every file must start with the standard SPDX Apache-2.0 header
Confirm copyright years are up to date when a file is changed
All source files must include the SPDX Apache-2.0 header template (copy from an existing file)

Files:

docs/source/workflows/profiler.md
docs/source/reference/evaluate.md
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
docs/source/workflows/mcp/mcp-client.md
docs/source/workflows/evaluate.md
examples/agents/react/configs/config.yml
examples/agents/rewoo/configs/config.yml
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
examples/agents/tool_calling/configs/config.yml

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

docs/source/workflows/profiler.md
docs/source/reference/evaluate.md
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
docs/source/workflows/mcp/mcp-client.md
docs/source/workflows/evaluate.md
examples/agents/react/configs/config.yml
examples/agents/rewoo/configs/config.yml
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
examples/agents/tool_calling/configs/config.yml

docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

docs/source/workflows/profiler.md
docs/source/reference/evaluate.md
docs/source/workflows/mcp/mcp-client.md
docs/source/workflows/evaluate.md

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: In code comments use the abbreviations: nat (API namespace/CLI), nvidia-nat (package), NAT (env var prefixes); never use these abbreviations in documentation
Follow PEP 20 and PEP 8 for Python style
Run yapf with column_limit=120; yapf is used for formatting (run second)
Indent with 4 spaces (no tabs) and end each file with a single trailing newline
Use ruff (ruff check --fix) as a linter (not formatter) per pyproject.toml; fix warnings unless explicitly ignored
Respect Python naming schemes: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
When re-raising exceptions, use bare raise to preserve stack trace; log with logger.error(), not logger.exception()
When catching and logging without re-raising, use logger.exception() to capture full stack trace
Provide Google-style docstrings for every public module, class, function, and CLI command
Docstring first line must be a concise description ending with a period
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file I/O)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorized operations when beneficial and feasible

Files:

examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
examples/agents/react/configs/config.yml
examples/agents/rewoo/configs/config.yml
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
examples/agents/tool_calling/configs/config.yml

**/configs/**

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Configuration files consumed by code should be stored next to that code in a configs/ folder

Files:

examples/agents/react/configs/config.yml
examples/agents/rewoo/configs/config.yml
examples/agents/tool_calling/configs/config.yml

🔇 Additional comments (9)

examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)

139-139: Typo fix in assertion message is appropriate.

Correcting "whould" to "should" improves clarity in test failure messages.

docs/source/reference/evaluate.md (1)

443-443: Evaluator metric naming clarification.

Updating the sample output label from rag_accuracy to accuracy aligns with the PR's standardization of evaluator metric names. The underlying JSON structure remains unchanged, so this is a documentation-only update for clarity.

docs/source/workflows/mcp/mcp-client.md (1)

59-59: Verify YAML configuration key name matches schema.

The example changes workflows: to workflow: on line 59. Please confirm this aligns with the actual configuration schema used by the MCP client component. If the schema was recently changed in source code, ensure all documentation examples are consistently updated.

examples/agents/rewoo/configs/config.yml (1)

57-64: Evaluator key reordering aligns with metric naming updates.

The evaluator configuration is reordered so that relevance binds to ContextRelevance and groundedness binds to ResponseGroundedness. This appears intentional and consistent with the PR's metric naming standardization, assuming output file names align (e.g., relevance_output.json).

examples/agents/react/configs/config.yml (1)

55-62: Evaluator naming consistent with other agent configurations.

The reordering mirrors changes in other agent configs (rewoo, tool_calling), maintaining consistency across the example suite.

docs/source/workflows/evaluate.md (2)

264-266: Documentation formatting improvement.

Converting the cleanup behavior explanation to a reusable :::{note} admonition block improves consistency with other admonitions in the documentation and enhances readability.

295-295: Evaluator output filename updated to match metric naming convention.

Changing the sample path from rag_accuracy_output.json to accuracy_output.json aligns with the standardized metric naming across the documentation suite.

examples/agents/tool_calling/configs/config.yml (1)

55-62: Evaluator naming consistent across agent configurations.

This change mirrors the evaluator key reordering applied to react and rewoo agents, maintaining consistency across the example suite.

docs/source/workflows/profiler.md (1)

272-274: Metric renaming correctly applied and consistent with configuration examples.

The changes correctly update the evaluator metric descriptions from rag-prefixed names (rag_accuracy, rag_groundedness, rag_relevance) to their non-prefixed equivalents (accuracy, groundedness, relevance). This aligns with the configuration examples shown earlier in the file (lines 156-168) and the PR objectives. Code entities are properly wrapped in backticks per documentation guidelines.

examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py

Signed-off-by: Will Killian <wkillian@nvidia.com>

willkill07 · 2025-10-23T01:27:31Z

/merge

willkill07 requested a review from a team as a code owner October 23, 2025 00:43

willkill07 added doc Improvements or additions to documentation non-breaking Non-breaking change labels Oct 23, 2025

coderabbitai bot reviewed Oct 23, 2025

View reviewed changes

examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py Outdated Show resolved Hide resolved

fix: apply coderabbit suggestions from forward merge

e9bc4a8

Signed-off-by: Will Killian <wkillian@nvidia.com>

willkill07 force-pushed the wkk_fix-coderabbit-suggestions-after-forward-merge branch from 32c5a17 to e9bc4a8 Compare October 23, 2025 01:02

willkill07 added bug Something isn't working and removed doc Improvements or additions to documentation labels Oct 23, 2025

bbednarski9 approved these changes Oct 23, 2025

View reviewed changes

rapids-bot bot merged commit 8fc8348 into NVIDIA:release/1.3 Oct 23, 2025
17 checks passed

willkill07 deleted the wkk_fix-coderabbit-suggestions-after-forward-merge branch October 23, 2025 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: apply coderabbit suggestions from forward merge#1089

fix: apply coderabbit suggestions from forward merge#1089
rapids-bot[bot] merged 1 commit intoNVIDIA:release/1.3from
willkill07:wkk_fix-coderabbit-suggestions-after-forward-merge

willkill07 commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 23, 2025 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai bot left a comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

willkill07 commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

willkill07 commented Oct 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

willkill07 commented Oct 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

willkill07 commented Oct 23, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 23, 2025 •

edited

Loading