dynamo llm integration with examples, analysis, and custom predictive routers by bbednarski9 · Pull Request #1242 · NVIDIA/NeMo-Agent-Toolkit

bbednarski9 · 2025-12-12T06:20:11Z

Description

This is an initial integration between the NVIDIA NAT and Dynamo SDKs. The scope of this PR is around establishing key integrations and a robust evaluation harness, that will allow the NAT team to explore agentic inference acceleration using Dynamo servers. There are a few key components for review:

[dynamo components] (./external/dynamo): startup and shutdown scripts for three variates of dyanmo servers - unified server with KV aware routing, unified server with predictive KV aware routing (thompson sampling), and disaggregated serving with KV aware routing
[dynamo LLM provider] (./src/nat/llm/dynamo_llm.py): a new llm provider that enabled custom request prefix headers which can be optimized for the predictive kv aware cache router (thompson). Framework integration in place for langchain and google ADK
[dynamo integration example agents] (./examples/dynamo_integration) - example project for running nat workflows (run, eval, optimize, profile) and compare performance between difference inference endpoints. Analysis and visualization scripts included.

Musings

tool calling for the Galileo-V2 dataset used here is not parameterized by the agent in order to help normalize the OSL.
post-processing of experiment data is not integrated with profiler yet. There are currently two sets of plotting tools with histograms being the newer.
After internal discussion, we opted to build a new dynamo llm provider (dynamo_llm.py), rather than implementing this as a mixin. It keeps development isolated to the dynamo components so that we can be more agile moving forward with development.
Features in this PR are largely experimental and non-breaking to the core library
There is an additional evaluator defined in action_completion_evaluator.py, its not used in current evaluations but im going to leave it here for completeness and possible extension.

Closes

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

New Features
- Dynamo LLM provider and end-to-end Dynamo backend integration with prefix-hint support and streaming/non-streaming API.
New Agents & Tools
- React benchmark agent, decision-only tool stubs, banking tools group, self-evaluating agent with feedback/retry loop, and intent-buffer utilities.
New Evaluators
- Tool Selection Quality (TSQ) and Action Completion (AC) evaluators.
Documentation
- Comprehensive Dynamo integration guides, architecture docs, and READMEs.
Tools & Scripts
- Deployment, start/stop, monitoring, benchmarking, plotting, dataset utilities, and env templates.
Tests
- Extensive unit and integration tests across LLMs, agents, evaluators, and analysis tools.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-12-12T06:20:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-12-12T06:20:21Z

Walkthrough

Adds a Dynamo backend and example React Benchmark Agent: new Dynamo LLM provider with prefix‑header hooks, ADK/LangChain adapters, FastAPI frontend/processor/router services, decision‑only tool stubs, self‑evaluating agent and evaluators, many configs, deployment/monitoring scripts, analysis/plotting utilities, docs, and tests.

Changes

Cohort / File(s)	Summary
Env templates & top-level docs `examples/dynamo_integration/.env.example`, `external/dynamo/.env.example`, `examples/dynamo_integration/README.md`, `examples/dynamo_integration/ARCHITECTURE.md`, `external/dynamo/README.md`	Added environment templates and comprehensive Dynamo integration documentation and architecture guides.
React benchmark agent package `examples/dynamo_integration/react_benchmark_agent/pyproject.toml`, `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py`, `.../register.py`, `.../react_benchmark_agent.py`, `.../banking_tools.py`	New package metadata and modules: package init/export, register side‑effect imports, React benchmark function (decision‑only and normal modes), and banking tools FunctionGroup registration.
Tool intent stubs `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py`	Decision‑only tool stub system with per‑scenario/global registries, ToolIntentBuffer, permissive input schema, stub creation, and mock response generation.
Self‑evaluating agent wrapper `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py`	Self‑evaluating agent wrapper (legacy and feedback variants) implementing deterministic scenario IDs, retry/feedback loop, evaluator LLM calls, and intent isolation.
Evaluators `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py`, `.../tsq_evaluator.py`, `.../action_completion_evaluator.py`	New TSQ and Action Completion evaluators: configs, trajectory extraction/normalization, scoring logic, and registration.
Configs & profiles `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/*.yml`	Many YAML configs for Dynamo e2e, prefix/header tests, rethinking/no‑rethinking evals, profiling, optimization, and related workflows.
Agent docs & READMEs `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md`, `examples/dynamo_integration/react_benchmark_agent/README.md`, other READMEs	New user and implementation guides describing components, workflows, evaluation/optimization patterns, and file layout.
Data prep & dataset tooling `examples/dynamo_integration/scripts/create_test_subset.py`, `.../download_agent_leaderboard_v2.py`	Utilities to create test subsets and download/transform Agent Leaderboard v2 into NAT format.
Throughput analysis & plotting `examples/dynamo_integration/scripts/throughput_analysis.py`, `.../plot_throughput_histograms_per_request.py`, `.../plot_throughput_vs_tsq_per_request.py`	Scripts to compute tokens/sec/ITL, aggregate per‑request/per‑LLM‑call stats, and produce throughput vs TSQ visualizations.
Benchmark orchestration `examples/dynamo_integration/scripts/run_concurrency_benchmark.sh`	Shell script to run concurrency benchmarks and produce aggregated reports.
Deployment & monitoring scripts `external/dynamo/start_dynamo_unified.sh`, `.../start_dynamo_unified_thompson_hints.sh`, `.../start_dynamo_disagg.sh`, `external/dynamo/stop_dynamo.sh`, `external/dynamo/monitor_dynamo.sh`, `external/dynamo/test_dynamo_integration.sh`	Startup/shutdown/monitor/test scripts for unified/disaggregated/Thompson modes with health checks and orchestration logic.
Dynamo generalized backend `external/dynamo/generalized/frontend.py`, `.../processor.py`, `.../router.py`	New FastAPI frontend (OpenAI‑compatible + prefix hints), streaming processor with router integration and metrics, and workload‑aware router with bandit‑based worker selection and feedback endpoints.
Dynamo provider core `src/nat/llm/dynamo_llm.py`, `src/nat/llm/register.py`	New `DynamoModelConfig`, `DynamoPrefixContext`, httpx client factory injecting Dynamo prefix headers, and provider registration/import wiring.
Framework adapters `packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py`, `packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py`	ADK and LangChain adapters for `DynamoModelConfig` with conditional prefix/header injection and httpx hook wiring (LangChain wrapper appears duplicated in diff).
ADK / LangChain tests `packages/nvidia_nat_adk/tests/test_adk_llm.py`, `packages/nvidia_nat_langchain/tests/test_llm_langchain.py`	Tests for Dynamo ADK/LangChain wrapper behavior, header injection, RESPONSES API branching, and argument filtering.
Dynamo LLM tests `tests/nat/llm/test_dynamo_llm.py`	Unit tests for `DynamoModelConfig` validation, `DynamoPrefixContext` lifecycle, request hook header injection, and httpx client creation.
Agent tests `examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py`, `.../tests/test_tsq_formula.py`, `.../tests/test_tool_intent_buffer.py`	Tests for self‑evaluation parsing, decision/feedback logic, TSQ formulas/normalization/deduplication, trajectory extraction, and ToolIntentBuffer behavior.
Scripts cluster `examples/dynamo_integration/scripts/*`	Additional utilities for plotting, throughput/TSQ correlation, subset creation, and concurrency orchestration.
Misc scaffolding & placeholders `examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs` (placeholder), small READMEs	Minor scaffolding and documentation placeholders.
Project examples mapping `pyproject.toml`	Added example workspace/source entry for the new nat_react_benchmark_agent example.
Vale vocabulary `ci/vale/styles/config/vocabularies/nat/accept.txt`	Added acceptance regex for capitalized "Leaderboard".

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as NAT Client
participant Frontend as Dynamo Frontend
participant Router as Workload Router
participant Processor as Request Processor
participant Engine as LLM Engine

Client->>Frontend: POST /v1/chat/completions (may include prefix headers)
Frontend->>Frontend: Extract/attach prefix headers, validate/tokenize
Frontend->>Router: find_worker(token_ids, prefix_id, reuse_budget)
Router-->>Frontend: decision_id, worker_id
Frontend->>Processor: Forward request to selected worker (process/generate)
Processor->>Engine: Stream generation (engine.generate)
Engine-->>Processor: Stream tokens/deltas
Processor-->>Frontend: Relay deltas/usage/finish
Frontend-->>Client: Streamed response (SSE) or aggregated JSON
Frontend->>Router: feedback(decision_id, latency, tokens, success)
Router->>Router: Update bandit models / metrics

mermaid
sequenceDiagram
participant Agent as React Agent
participant Tools as Tool Stubs
participant Buffer as ToolIntentBuffer
participant Evaluator as Evaluator LLM

Note over Agent,Tools: Decision‑only tool invocation path
Agent->>Tools: Call stub (records intent)
Tools->>Buffer: Record intent (tool_name, params)
Tools-->>Agent: Return canned/mock response

Agent->>Evaluator: Submit formatted tool sequence + question for evaluation
Evaluator-->>Agent: Return evaluation JSON (is_sufficient, confidence, reasoning)

alt Accept
    Agent-->>Agent: Accept result and finish
else Retry
    Agent->>Agent: Generate feedback from evaluation
    Agent->>Buffer: Clear/reset intents for scenario
    Agent->>Tools: Re‑invoke tools with feedback/history
end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Areas needing focused review:

examples/dynamo_integration/react_benchmark_agent/self_evaluating_agent_with_feedback.py — async control flow, retry/feedback injection, scenario/isolation semantics.
external/dynamo/generalized/router.py — bandit math, reward/baseline computations, pending‑decision lifecycle, concurrency safety.
external/dynamo/generalized/processor.py — streaming orchestration, tokenizer handling, worker selection, metrics logging.
external/dynamo/generalized/frontend.py — streaming vs non‑streaming behavior, header extraction, TPS logging.
src/nat/llm/dynamo_llm.py and framework adapters — contextvar prefix lifecycle, httpx request hook correctness, header formats, concurrency semantics.
react_benchmark_agent evaluators (tsq_evaluator.py, action_completion_evaluator.py) — trajectory extraction, normalization, and metric correctness.
Deployment scripts (start/stop/monitor/test) — container orchestration, environment propagation, GPU/model handling, safety checks.
Tests and duplicated LangChain adapter entry — confirm intended duplication and test fixture expectations.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 78.77% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is fully related to the changeset, accurately summarizing the main addition of Dynamo LLM integration with supporting examples, analysis tools, and custom routers.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-14T00:51:27Z

From @willkill07:

Example README:

This file also needs a prerequisites section which includes:
minimum hardware requirements
any system dependencies which are different from other examples (e.g. docker)
other things I can't think of right now :)

Complete ✅

bbednarski9 · 2025-12-14T00:51:56Z

From @willkill07:

Needs to be a relative symlink rather than absolute

Complete ✅

bbednarski9 · 2025-12-14T00:54:00Z

From @willkill07: (re configs)
For the workflow configs that rely on function groups, you can just list the function group name and all of the tools will be added

Complete ✅

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai

Actionable comments posted: 6

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (15)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml-94-118 (1)
94-118: Use function group reference instead of enumerating all tools.

The configuration defines the banking_tools function group (lines 31-56), but the workflow explicitly lists all individual tools. Per PR reviewer feedback, workflow configs relying on function groups should list the function group name; all tools in that group will be added automatically.

Based on PR comments, replace the explicit tool enumeration with a function group reference:
 workflow:
   _type: react_agent
   llm_name: dynamo_llm
-  tool_names: [
-    banking_tools.get_account_balance,
-    banking_tools.get_transaction_history,
-    banking_tools.transfer_funds,
-    banking_tools.get_loan_information,
-    banking_tools.get_credit_card_information,
-    banking_tools.get_mortgage_details,
-    banking_tools.get_savings_account_products,
-    banking_tools.schedule_appointment,
-    banking_tools.check_loan_application_status,
-    banking_tools.find_nearby_locations,
-    banking_tools.get_investment_products,
-    banking_tools.report_lost_stolen_card,
-    banking_tools.update_contact_information,
-    banking_tools.setup_automatic_bill_pay,
-    banking_tools.initiate_transaction_dispute,
-    banking_tools.get_exchange_rates,
-    banking_tools.calculate_loan_payment,
-    banking_tools.manage_account_alerts,
-    banking_tools.check_wire_transfer_status,
-    banking_tools.get_cd_products
-  ]
+  tool_names: [banking_tools]
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml-97-122 (1)
97-122: Use function group reference instead of enumerating all tools.

The configuration defines the banking_tools function group (lines 35-60), but the workflow explicitly lists all individual tools. Per PR reviewer feedback, workflow configs relying on function groups should list the function group name; all tools in that group will be added automatically.

Based on PR comments, replace the explicit tool enumeration with a function group reference:
 workflow:
   _type: react_agent
   llm_name: dynamo_llm
-  tool_names: [
-    banking_tools.get_account_balance,
-    banking_tools.get_transaction_history,
-    banking_tools.transfer_funds,
-    banking_tools.get_loan_information,
-    banking_tools.get_credit_card_information,
-    banking_tools.get_mortgage_details,
-    banking_tools.get_savings_account_products,
-    banking_tools.schedule_appointment,
-    banking_tools.check_loan_application_status,
-    banking_tools.find_nearby_locations,
-    banking_tools.get_investment_products,
-    banking_tools.report_lost_stolen_card,
-    banking_tools.update_contact_information,
-    banking_tools.setup_automatic_bill_pay,
-    banking_tools.initiate_transaction_dispute,
-    banking_tools.get_exchange_rates,
-    banking_tools.calculate_loan_payment,
-    banking_tools.manage_account_alerts,
-    banking_tools.check_wire_transfer_status,
-    banking_tools.get_cd_products
-  ]
+  tool_names: [banking_tools]
external/dynamo/start_dynamo_disagg.sh-434-436 (1)
434-436: Remove hardcoded user-specific paths from example commands.

The example NAT integration test contains hardcoded user-specific paths that won't work for other users.

Apply this diff to use placeholders:
-    echo "cd /raid/bbednarski/NeMo-Agent-Toolkit"
-    echo "source /home/nfs/bbednarski/.venvs/nat_dynamo_eval2/bin/activate"
+    echo "cd /path/to/NeMo-Agent-Toolkit"
+    echo "source .venv/bin/activate"
     echo "export HF_HOME=~/.cache/huggingface"
external/dynamo/start_dynamo_disagg.sh-36-36 (1)
36-36: Remove hardcoded user-specific path.

The hardcoded path /raid/bbednarski/models/... contains a user-specific directory that won't work for other users.

Make this configurable or use environment variables:
 # Local paths
-LOCAL_MODEL_DIR="/raid/bbednarski/models/Llama-3.3-70B-Instruct"
+LOCAL_MODEL_DIR="${LOCAL_MODEL_DIR:-$HOME/models/Llama-3.3-70B-Instruct}"
This allows users to override via environment variable while providing a sensible default.
external/dynamo/start_dynamo_unified.sh-35-35 (1)
35-35: Remove hardcoded user-specific path.

The hardcoded path /raid/bbednarski/models/... contains a user-specific directory that won't work for other users.

Make this configurable or use environment variables:
 # Local paths
-LOCAL_MODEL_DIR="/raid/bbednarski/models/Llama-3.3-70B-Instruct"
+LOCAL_MODEL_DIR="${LOCAL_MODEL_DIR:-$HOME/models/Llama-3.3-70B-Instruct}"
This allows users to override via environment variable while providing a sensible default.
examples/dynamo_integration/scripts/throughput_analysis.py-70-78 (1)
70-78: Add explicit return type hint.

Per coding guidelines, all public APIs require Python 3.11+ type hints on parameters and return values.

Apply this diff:
-def calculate_tokens_per_second(csv_path: str):
+def calculate_tokens_per_second(csv_path: str) -> tuple[pd.DataFrame | None, np.ndarray | None, dict | None]:
     """Calculate tokens/sec and inter-token latency for each LLM call from NEW_TOKEN events.
     
     Returns:
         tuple: (results_df, all_itls_array, aggregate_stats)
             - results_df: DataFrame with per-call metrics
             - all_itls_array: Array of all inter-token latencies
             - aggregate_stats: Dict with aggregate throughput metrics
     """
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py-245-252 (1)
245-252: JSON string normalization may corrupt valid string values.

Replacing all single quotes with double quotes (line 248) can corrupt JSON string values that legitimately contain single quotes (e.g., {"name": "O'Brien"}). This would incorrectly become {"name": "O"Brien"} and fail to parse.

Consider a more robust approach:
         if isinstance(v, str):
             try:
-                # Handle both single and double quotes in JSON strings
-                normalized = v.replace("'", '"')
-                return json.loads(normalized)
+                return json.loads(v)
             except json.JSONDecodeError:
-                logger.warning("Failed to parse input_params string as JSON: %s", v[:100])
-                return {}
+                # Try replacing single quotes only if standard parsing fails
+                # This is a fallback for non-standard JSON-like strings
+                try:
+                    normalized = v.replace("'", '"')
+                    return json.loads(normalized)
+                except json.JSONDecodeError:
+                    logger.warning("Failed to parse input_params string as JSON: %s", v[:100])
+                    return {}
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/register.py-1-17 (1)
1-17: Add Apache SPDX header to register.py

This Python file is missing the standard Apache-2.0 SPDX header required across the repo.

Add the usual header above the flake8: noqa line, e.g.:
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 # flake8: noqa
The side-effect import pattern for registrations otherwise looks consistent with the rest of NAT.
external/dynamo/test_dynamo_integration.sh-1-41 (1)
1-41: Add SPDX header and fix example path / install instructions

Two concrete issues here:
Missing Apache SPDX header

Shell scripts are required to carry the Apache-2.0 SPDX header. Add it immediately after the shebang:
#!/bin/bash
SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Wrong example directory for NAT installation and docs

EXAMPLE_DIR is currently set to $SCRIPT_DIR (i.e., external/dynamo), but the example package and README live under examples/dynamo_integration. This makes the “activate/install this example” guidance misleading.

You can derive the example and workflow dirs from REPO_ROOT:
-# Get script location and derive paths
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-EXAMPLE_DIR="$SCRIPT_DIR"
-REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-CONFIG_DIR="$REPO_ROOT/examples/dynamo_integration/react_benchmark_agent/configs"
+# Get script location and derive paths
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+EXAMPLE_DIR="$REPO_ROOT/examples/dynamo_integration"
+WORKFLOW_DIR="$EXAMPLE_DIR/react_benchmark_agent"
+CONFIG_DIR="$WORKFLOW_DIR/configs"
@@
echo "✗ NAT command not found"
echo " Please activate your NAT environment first:"
echo " cd $REPO_ROOT"
echo " source .venv/bin/activate"
echo ""
echo " Then install this example:"

echo " cd $EXAMPLE_DIR"

echo " cd $WORKFLOW_DIR"
echo " uv pip install -e ."
@@

echo "For detailed setup instructions, see:"

echo " $EXAMPLE_DIR/README.md"

echo "For detailed setup instructions, see:"

echo " $EXAMPLE_DIR/README.md"
After this change, the script will point users at the correct example package for `uv pip install -e .` and at the top-level Dynamo integration README for troubleshooting.  



Also applies to: 57-69, 174-187

</blockquote></details>
<details>
<summary>packages/nvidia_nat_adk/tests/test_adk_llm.py-185-191 (1)</summary><blockquote>

`185-191`: **Fixture does not disable prefix headers as intended.**

The docstring says "no header injection," but `DynamoModelConfig` defaults `prefix_template` to `"nat-dynamo-{uuid}"`. Without explicitly setting `prefix_template=None`, prefix headers will still be injected, causing `test_basic_creation_without_prefix` to fail.



```diff
  @pytest.fixture
  def dynamo_cfg_no_prefix(self):
      """Dynamo config without prefix template (no header injection)."""
      return DynamoModelConfig(
          model_name="test-model",
          base_url="http://localhost:8000/v1",
+            prefix_template=None,
      )
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py-1-12 (1)
1-12: Missing Apache 2.0 license header.

Per coding guidelines, all Python source files must include the SPDX Apache-2.0 header template. Add the standard license header at the top of the file.

Apply this diff:
+# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import logging
 from typing import Any
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py-62-65 (1)
62-65: Remove unused variable eval_builder.

Ruff flagged this as unused. The variable is assigned but never referenced.
-    # Store reference to builder for accessing intent buffer
-    eval_builder = builder
external/dynamo/start_dynamo_unified_thompson_hints.sh-36-39 (1)
36-39: Hardcoded absolute paths make the script non-portable.

These paths are specific to a developer's machine and will not work for other users. Per PR comments, paths should be relative. Consider parameterizing these via environment variables or command-line arguments with sensible defaults.

Apply this diff to use environment variables with defaults:
 # Local paths
-LOCAL_MODEL_DIR="/raid/bbednarski/models/Llama-3.3-70B-Instruct"
-LOCAL_REPO_DIR="/raid/bbednarski/NeMo-Agent-Toolkit"
-CUSTOM_DYNAMO_DIR="$LOCAL_REPO_DIR/external/dynamo/generalized"
+LOCAL_MODEL_DIR="${NAT_MODEL_DIR:-${HOME}/models/Llama-3.3-70B-Instruct}"
+LOCAL_REPO_DIR="${NAT_REPO_DIR:-$(cd "$(dirname "${BASH_SOURCE[0]}")/../.." && pwd)}"
+CUSTOM_DYNAMO_DIR="${LOCAL_REPO_DIR}/external/dynamo/generalized"
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py-1-31 (1)
1-31: Missing required SPDX Apache-2.0 license header.

The script starts with a shebang and docstring but is missing the required license header.

Apply this diff:
 #!/usr/bin/env python3
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 """
 Unified plotting script for throughput metrics vs TSQ (Tool Selection Quality) scores.
external/dynamo/start_dynamo_unified_thompson_hints.sh-1-2 (1)
1-2: Missing required SPDX Apache-2.0 license header.

Per coding guidelines, all shell scripts must include the standard SPDX Apache-2.0 header at the top of the file.

Apply this diff to add the required license header:
 #!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

 # Dynamo SGLang FULL STACK with Thompson Sampling Router & Prefix Hints

🟡 Minor comments (23)

external/dynamo/README.md-208-208 (1)
208-208: Fix redundant "minutes seconds" phrase.

The phrase "~5 minutes seconds" contains a typo; it should read "~5 minutes".
-**Startup time**: ~5 minutes seconds for 70B model
+**Startup time**: ~5 minutes for 70B model
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py-257-259 (1)
257-259: Redundant exception object in logging.exception call.

Per Ruff TRY401 and coding guidelines, logger.exception() automatically includes the exception information. Remove the redundant exception object from the format arguments.
         except Exception as e:
-            logger.exception("Failed to load domain %s: %s", domain, e)
+            logger.exception("Failed to load domain %s", domain)
             continue
examples/dynamo_integration/.env.example-2-5 (1)
2-5: Fix copy command to use the correct filename

The example copy command should reference .env.example, not env.example, otherwise users will get a “No such file” error.
-#   cp env.example .env
+#   cp .env.example .env
Everything else in this template looks reasonable for an example (placeholders, no real secrets).
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_prefix_e2e_test.yml-13-18 (1)

13-18: Align “Key Features” comment with actual config fields

The bullet enable_dynamic_prefix: true is mentioned in comments but there is no corresponding field in the dynamo_llm config, while later comments say prefix_template alone enables dynamic prefixes. To avoid confusion, either:

add the actual enable_dynamic_prefix field if it exists in the API, or

remove/adjust that bullet so it only refers to prefix_template and the prefix_* hints.

Config structure and 2-space indentation look consistent with existing YAML style.

Based on learnings, 2-space indentation is correct for this repo.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py-6-9 (1)
6-9: Sort __all__ entries to satisfy Ruff

Ruff flags __all__ as unsorted; reordering keeps behavior the same and fixes lint.
-from .action_completion_evaluator import action_completion_evaluator_function
-from .tsq_evaluator import tsq_evaluator_function
-
-__all__ = ["tsq_evaluator_function", "action_completion_evaluator_function"]
+from .action_completion_evaluator import action_completion_evaluator_function
+from .tsq_evaluator import tsq_evaluator_function
+
+__all__ = ["action_completion_evaluator_function", "tsq_evaluator_function"]
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py-4-7 (1)
4-7: Remove unused noqa directive to satisfy Ruff

Ruff reports RUF100 for the # noqa: F401 here. Since F401 isn’t enabled (and you’re already exporting register via __all__), the directive is unnecessary.
-# Import register module to trigger function registrations
-from . import register  # noqa: F401
+ # Import register module to trigger function registrations (side-effect registration)
+from . import register
examples/dynamo_integration/README.md-7-20 (1)
7-20: Fix typos, align naming, and update function-group example

A few documentation nits worth fixing:
Typos / wording

Line 9: duplicated word.

Line 17: “preditve” spelling.
-... with respect to conversation length, system prompts, and tool-calling. We aim to codesign our inference servers to provide better performance in a repeatable, mock, decision-only evaluation harness. The harness uses ...
+... for conversation length, system prompts, and tool-calling. We aim to co-design our inference servers to provide better performance in a repeatable, mock, decision-only evaluation harness. The harness uses ...

-Most of these examples could be tested ... However, the intended analysis analysis would require ...
+Most of these examples could be tested ... However, the intended analysis would require ...
@@
-- Dynamo Backend: Fast LLM inference with KV cache optimization (default Dynamo method) and a preditive Thompson sampling router (new implementation)
+- Dynamo Backend: Fast LLM inference with KV cache optimization (default Dynamo method) and a predictive Thompson sampling router (new implementation)
2. **NAT naming consistency**

Per repo guidance, in body text prefer “NVIDIA NeMo Agent toolkit” on first mention and “NeMo Agent toolkit” thereafter. You can keep the heading as-is (it already matches the heading rule), but adjust the first bold sentence accordingly:

```diff
-**This set of example agents and evaluations demonstrate the capability to integrate NVIDIA NeMo Agent Toolkit (NAT) agents...
+**This set of example agents and evaluations demonstrate the capability to integrate NVIDIA NeMo Agent toolkit (NAT) agents...
Function-group example

The config snippet under “Basic Evaluation (No Self-Evaluation)” still suggests listing individual tools from a function group:
tool_names: [banking_tools.get_account_balance, ...]
For function groups, list the group name; all tools in the group are added automatically
tool_names: [banking_tools]
This keeps the README aligned with both the style guidelines and the reviewer’s feedback about function-group usage.

Also applies to: 80-90, 185-204, 209-223
examples/dynamo_integration/scripts/run_concurrency_benchmark.sh-82-82 (1)
82-82: Inconsistent concurrency level count.

Line 97 displays "Run X/8" but CONCURRENCY_LEVELS only contains 2 values (16, 32). Either expand the array to 8 levels or fix the progress indicator.
-CONCURRENCY_LEVELS=(16 32)
+# Add more levels if you want 8 runs, or fix the progress indicator below
+CONCURRENCY_LEVELS=(1 2 4 8 16 32 48 64)
Or fix the progress indicator:
-    echo -e "${BLUE}[Run $((${#JOB_IDS[@]} + 1))/8]${NC} Running eval with max_concurrency = ${YELLOW}$CONCURRENCY${NC}"
+    echo -e "${BLUE}[Run $((${#JOB_IDS[@]} + 1))/${#CONCURRENCY_LEVELS[@]}]${NC} Running eval with max_concurrency = ${YELLOW}$CONCURRENCY${NC}"
Also applies to: 97-97
examples/dynamo_integration/scripts/run_concurrency_benchmark.sh-176-182 (1)
176-182: Markdown report contains incorrect description.

Line 178 says "1-8" concurrency levels but the script only tests 2 levels (16, 32). Update to match actual configuration.
 ## Summary
 
-This benchmark evaluates Dynamo performance across different concurrency levels (1-8).
+This benchmark evaluates Dynamo performance across different concurrency levels.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_predictive_prefix_headers.yml-268-283 (1)
268-283: Weight comments don't match actual values.

The inline comments describe weights as 50%/30%/20% but the actual values are 0.7/0.2/0.1 (70%/20%/10%).
     # Primary: Minimize average LLM latency (most impactful)
     llm_latency:
       evaluator_name: avg_llm_latency
       direction: minimize
-      weight: 0.7  # 50% weight - primary objective
+      weight: 0.7  # 70% weight - primary objective
     
     # Secondary: Minimize total workflow runtime
     workflow_runtime:
       evaluator_name: avg_workflow_runtime
       direction: minimize
-      weight: 0.2  # 30% weight
+      weight: 0.2  # 20% weight
     
     # Tertiary: Minimize number of LLM calls (efficiency)
     num_calls:
       evaluator_name: avg_num_llm_calls
       direction: minimize
-      weight: 0.1  # 20% weight
+      weight: 0.1  # 10% weight
examples/dynamo_integration/ARCHITECTURE.md-1-3 (1)
1-3: Missing SPDX license header.

Per coding guidelines, all source files including Markdown must include the SPDX Apache-2.0 header.

Add the license header at the top of the file:
+
+
+
 # System Architecture
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md-464-467 (1)

464-467: Address or remove dead code reference.

Line 466 documents prefix_utils.py with status "To be removed - imports from non-existent module". If this file doesn't exist or shouldn't be included, remove this entry from the documentation to avoid confusion.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md-5-16 (1)
5-16: Fix broken internal link in Table of Contents.

The TOC entry at line 16 links to #deprecation-analysis--recommendations, but this section doesn't exist in the document. Either add the section or remove the TOC entry.

Apply this diff to remove the broken link:
 4. [Source Code Reference](#source-code-reference)
 5. [Evaluators](#evaluators)
-6. [Deprecation Analysis & Recommendations](#deprecation-analysis--recommendations)
Or add the missing section before the end of the document.
examples/dynamo_integration/react_benchmark_agent/README.md-28-28 (1)
28-28: Fix spelling error: "explainations" → "explanations"
-> **Note:** For a more abbreviated way to kick off experimentation, see the [Quick Start](../README.md#quick-start) section in the parent README. This document provides a more detailed explainations of the different test patterns and configurations available.
+> **Note:** For a more abbreviated way to kick off experimentation, see the [Quick Start](../README.md#quick-start) section in the parent README. This document provides a more detailed explanations of the different test patterns and configurations available.
examples/dynamo_integration/react_benchmark_agent/README.md-5-5 (1)
5-5: Fix spelling error: "exculsivly" → "exclusively"

Static analysis flagged this typo.
-Currently this agent supports evaluation exculsivly for the [Galileo Agent Leaderboard v2](https://huggingface.co/datasets/galileo-ai/agent-leaderboard-v2). However, we plan to extend the set of evaluation tool sets and benchmarks and will update this document accordingly.
+Currently this agent supports evaluation exclusively for the [Galileo Agent Leaderboard v2](https://huggingface.co/datasets/galileo-ai/agent-leaderboard-v2). However, we plan to extend the set of evaluation tool sets and benchmarks and will update this document accordingly.
external/dynamo/generalized/router.py-964-964 (1)
964-964: Typo in comment.

Line 964 has "EMAThe" which appears to be a typo for "EMA".
-    # Latency EMAThe 
+    # Latency EMA
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py-145-155 (1)
145-155: Remove unused variable expected_tsq.

The variable expected_tsq is assigned but never used in the test assertion.
     def test_scenario_000_score(self):
         """Verify the score from banking_scenario_000."""
         tool_accuracy = 4 / 7  # 0.5714285714285715
         param_accuracy = 0.0
         
-        expected_tsq = 1.0 * tool_accuracy + 0.0 * param_accuracy
-        # = 1.0 * 0.5714... + 0.0 * 0
-        # = 0.5714...
+        # expected: 1.0 * tool_accuracy + 0.0 * param_accuracy = 0.5714...
         
         tsq = calculate_tsq_score(tool_accuracy, param_accuracy)
         assert tsq == pytest.approx(4 / 7)
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py-227-230 (1)
227-230: Simplify and fix the None handling test.

The assertion on line 230 is convoluted and will always pass due to the or True fallback. This doesn't actually test None handling.
     def test_empty_tool_name(self):
         """Verify empty tool names are handled."""
         assert normalize_tool_name("") == ""
-        assert normalize_tool_name(None) == "" if normalize_tool_name(None) is not None else True
+        # After fixing normalize_tool_name to handle None:
+        assert normalize_tool_name(None) == ""
external/dynamo/generalized/router.py-1-29 (1)
1-29: Missing SPDX license header and duplicate import.

The file has an abbreviated license header and a duplicate asyncio import on line 7.

Apply this diff:
 # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA
 # SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.

 import asyncio
 import argparse
 import uvloop
-import asyncio
 import logging
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py-1-3 (1)
1-3: Missing SPDX license header.

Per coding guidelines, all source files must include the SPDX Apache-2.0 header template.

Apply this diff to add the required license header:
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 #!/usr/bin/env python3
 """
 Histogram plotting script for throughput metrics distribution analysis.
external/dynamo/generalized/frontend.py-367-371 (1)
367-371: Use logging.exception and chain the exception properly.

Per coding guidelines: use logger.exception() to capture full stack trace, and chain exceptions with raise ... from err.
             except HTTPException:
                 raise
             except Exception as e:
-                logging.error("Error in chat completions: %s", e)
-                raise HTTPException(status_code=500, detail=str(e))
+                logging.exception("Error in chat completions")
+                raise HTTPException(status_code=500, detail=str(e)) from e
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_predictive_prefix_headers.yml-228-230 (1)
228-230: Fix inconsistent comment: spike_threshold value and description do not match.

Line 230 has a comment stating # Alert when concurrent functions >= 5, but the actual spike_threshold value is 24. This mismatch is misleading and makes the intention unclear.

Correct the comment to match the threshold value:
      concurrency_spike_analysis:
        enable: true
-       spike_threshold: 24  # Alert when concurrent functions >= 5
+       spike_threshold: 24  # Alert when concurrent functions >= 24
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_predictive_prefix_headers.yml-143-147 (1)
143-147: Remove or clarify inline comments showing alternative parameter values.

Lines 145–147 contain inline comments with different values (e.g., 10 # 6, MEDIUM # HIGH, MEDIUM # LOW). These are confusing and suggest uncertainty about the intended configuration. Either remove these comments or document their purpose explicitly.

Apply this diff to remove misleading comments:
    # Note: Setting prefix_template enables dynamic prefix headers
    prefix_template: "react-benchmark-{uuid}"
-   prefix_total_requests: 10 # 6
-   prefix_osl: MEDIUM # HIGH
-   prefix_iat: MEDIUM # LOW
+   prefix_total_requests: 10
+   prefix_osl: MEDIUM
+   prefix_iat: MEDIUM
    # Enable parameter optimization for this LLM
Committable suggestion skipped: line range outside the PR's diff.

examples/dynamo_integration/react_benchmark_agent/configs

examples/dynamo_integration/scripts/throughput_analysis.py

external/dynamo/monitor_dynamo.sh

external/dynamo/start_dynamo_disagg.sh

external/dynamo/start_dynamo_unified.sh

external/dynamo/stop_dynamo.sh

willkill07

Adding review for nvidia_nat_adk, nvidia_nat_langchain, and src/nat/llm/dynamo_llm.py

src/nat/llm/dynamo_llm.py

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py

src/nat/llm/dynamo_llm.py

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

external/dynamo/start_dynamo_unified_thompson_hints.sh (1)
1-5: Add required SPDX Apache-2.0 license header.

Per coding guidelines, all source files including shell scripts must include the SPDX Apache-2.0 header.

Apply this diff:
 #!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 
 # Dynamo SGLang FULL STACK with Thompson Sampling Router & Prefix Hints
external/dynamo/start_dynamo_unified.sh (1)
1-5: Add required SPDX Apache-2.0 license header.

Per coding guidelines, all source files including shell scripts must include the SPDX Apache-2.0 header.

Apply this diff:
 #!/bin/bash
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 
 # Dynamo SGLang FULL STACK with Unified Worker

🧹 Nitpick comments (18)

external/dynamo/.env.example (1)

1-72: Well-organized environment template with clear structure.

The .env.example file is properly organized with five logical sections (GPU, Network, Model, Disaggregated, Repository), consistent DYNAMO_ variable naming, and well-documented defaults. The structure aligns with the Dynamo integration's support for both unified and disaggregated serving modes.

A minor suggestion: Consider whether this template should also support standard dotenv format (without export keywords) for compatibility with Python-based dotenv loaders used elsewhere in the codebase. If the startup scripts use shell sourcing, the current format is appropriate.
external/dynamo/README.md (3)
3-3: Use full toolkit name on first mention.

Per coding guidelines, the first reference should be "NVIDIA NeMo Agent Toolkit", with "NeMo Agent Toolkit" used subsequently.
-This guide covers setting up, running, and configuring the NVIDIA Dynamo backend for the React Benchmark Agent evaluations.
+This guide covers setting up, running, and configuring the NVIDIA Dynamo backend for the NeMo Agent Toolkit's React Benchmark Agent evaluations.
Or update line 3 to:
-> ⚠️ **EXPERIMENTAL**: The NeMo Agent Toolkit and Dynamo integration is experimental and under active development. APIs, configurations, and features may change without notice.
+> ⚠️ **EXPERIMENTAL**: The NVIDIA NeMo Agent Toolkit and Dynamo integration is experimental and under active development. APIs, configurations, and features may change without notice.
33-150: Specify language identifiers for code blocks.

Multiple code blocks are missing language specifications (MD040). Add appropriate language identifiers to improve syntax highlighting and readability.

Examples of fixes:
-```
+```
 ┌──────────────────────────────────────────────────────────────────────────────┐
-```
+```output
 +-----------------------------------------------------------------------------------------+
 | NVIDIA-SMI 580.65.06              Driver Version: 580.65.06      CUDA Version: 13.0     |
-```
+```json
 {"choices": [...],
Use appropriate language identifiers:

bash or shell for shell commands/output

json for JSON examples

yaml for YAML configurations

text or output for command output (nvidia-smi, test results, etc.)

Leave empty for ASCII diagrams if no highlighting is applicable

Also applies to: 236-261, 410-445, 466-483, 513-523, 703-708, 906-922

447-447: Consider more formal wording for verb choice.

Static analysis suggests using a more formal verb than "provides" in this context.
-If any tests fail, the script provides guidance on how to fix the issue.
+If any tests fail, the script offers guidance on how to fix the issue.
Or alternatively: "the script supplies", "the script includes", or "you'll find".
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml (1)

35-63: Consider using the function group reference instead of the explicit tool list to reduce verbosity.

The react_workflow lists all 19 banking tools individually (lines 38–59) instead of referencing the banking_tools function group (lines 104–130). The workflow system supports both patterns, but using the group name directly is simpler and maintains consistency with the function group design pattern. This would change tool_names from a 20-line list to a single line: tool_names: [banking_tools].
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py (13)
125-129: Avoid catching broad Exception; catch specific exceptions instead.

Catching bare Exception can mask unexpected errors. Narrow the exception type to pd.errors.ParserError or pd.errors.EmptyDataError for CSV parsing issues.
     try:
         df = pd.read_csv(csv_path)
-    except Exception as e:
+    except (pd.errors.ParserError, pd.errors.EmptyDataError, FileNotFoundError) as e:
         print(f"    Warning: Error reading CSV {csv_path}: {e}")
         return None, None
144-144: Rename unused loop variable idx to _.

The loop variable is not used within the loop body. Per static analysis and convention, rename it to _.
-        for idx, row in example_df.iterrows():
+        for _, row in example_df.iterrows():
216-227: Fix implicit Optional type hint.

PEP 484 prohibits implicit Optional. Use explicit union type for the parameter default.
-def collect_job_data_from_dir(jobs_dir: Path, experiment_label: str = None) -> tuple[list[dict], list[dict]]:
+def collect_job_data_from_dir(jobs_dir: Path, experiment_label: str | None = None) -> tuple[list[dict], list[dict]]:
352-363: Add strict=True to zip() for safety.

Adding strict=True ensures the iterables have equal length and prevents silent data loss.
-    for label, key in zip(stat_labels, stat_keys):
+    for label, key in zip(stat_labels, stat_keys, strict=True):
368-371: Remove assignment to unused variable.

The text_obj variable is assigned but never used. Either remove the assignment or prefix with _ if kept for debugging.
-    text_obj = ax.text(0.98, 0.98, table_text, transform=ax.transAxes, fontsize=8,
+    ax.text(0.98, 0.98, table_text, transform=ax.transAxes, fontsize=8,
404-415: Add strict=True to zip() for safety.

Same as above - adding strict=True ensures the iterables have equal length.
-    for label, key in zip(stat_labels, stat_keys):
+    for label, key in zip(stat_labels, stat_keys, strict=True):
433-449: Add return type hint.

Per coding guidelines, all public APIs require type hints on return values. Since this function returns None, add the explicit type hint.
 def create_histogram_plots(df: pd.DataFrame, output_dir: Path,
-                           llm_call_df: pd.DataFrame = None):
+                           llm_call_df: pd.DataFrame | None = None) -> None:
493-493: Prefix unused fig variable with underscore.

The fig variable from plt.subplots() is not used. Rename to _ to indicate intentional non-use.
-        fig, ax = plt.subplots(figsize=(10, 7))
+        _, ax = plt.subplots(figsize=(10, 7))
604-618: Add return type hint and fix implicit Optional.

Per coding guidelines, add explicit return type and use | None instead of implicit Optional.
 def create_summary_histogram_plot(df: pd.DataFrame, output_dir: Path,
-                                   llm_call_df: pd.DataFrame = None):
+                                   llm_call_df: pd.DataFrame | None = None) -> None:
620-621: Prefix unused fig variable with underscore.
-    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
+    _, axes = plt.subplots(2, 3, figsize=(15, 10))
664-667: Add strict=True to zip() for safety.

The axes array has 6 elements and all_metrics has 6 items, so they should match. Adding strict=True ensures this invariant is enforced.
-    for ax, (metric_col, label, data_df, num_bins) in zip(axes, all_metrics):
+    for ax, (metric_col, label, data_df, num_bins) in zip(axes, all_metrics, strict=True):
767-786: Add return type hint for main().

Per coding guidelines, add explicit return type hint.
-def main():
+def main() -> None:
823-823: Remove extraneous f prefix from string without placeholders.

This f-string has no interpolation; use a regular string instead.
-    print(f"Histogram bins: TTFT/ITL/TPS=100, Tokens=50, Calls=25, Duration=25")
+    print("Histogram bins: TTFT/ITL/TPS=100, Tokens=50, Calls=25, Duration=25")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5d7081 and d3a86a2.

📒 Files selected for processing (8)

examples/dynamo_integration/react_benchmark_agent/configs (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py (1 hunks)
external/dynamo/.env.example (1 hunks)
external/dynamo/README.md (1 hunks)
external/dynamo/start_dynamo_disagg.sh (1 hunks)
external/dynamo/start_dynamo_unified.sh (1 hunks)
external/dynamo/start_dynamo_unified_thompson_hints.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

examples/dynamo_integration/react_benchmark_agent/configs
external/dynamo/start_dynamo_disagg.sh

🧰 Additional context used

📓 Path-based instructions (11)

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

external/dynamo/start_dynamo_unified.sh
external/dynamo/start_dynamo_unified_thompson_hints.sh
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

external/dynamo/start_dynamo_unified.sh
external/dynamo/README.md
external/dynamo/start_dynamo_unified_thompson_hints.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

external/dynamo/README.md
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

external/dynamo/README.md

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

external/dynamo/README.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

external/dynamo/README.md
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

🧠 Learnings (4)

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php} : Every file must start with the standard SPDX Apache-2.0 header

Applied to files:

external/dynamo/start_dynamo_unified.sh

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php,sh} : All source files must include the SPDX Apache-2.0 header template

Applied to files:

external/dynamo/start_dynamo_unified.sh

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: CI verifies SPDX headers via ci/scripts/github/checks.sh; do not bypass this check

Applied to files:

external/dynamo/start_dynamo_unified.sh

📚 Learning: 2025-12-12T20:49:44.305Z

Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:44.305Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, YAML files generally use 2-space indentation. When reviewing YAML, prefer 2-space indentation to match the existing style over a 4-space guideline until a repo-wide standardization is performed. This applies to YAML configuration files (e.g., red-teaming.yml) and, more broadly, all *.yml files in the project.

Applied to files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml

🪛 dotenv-linter (4.0.0)

external/dynamo/.env.example

[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")

(QuoteCharacter)

[warning] 4-4: [QuoteCharacter] The value has quote characters (', ")

(QuoteCharacter)

🪛 LanguageTool

external/dynamo/README.md

[style] ~447-~447: Consider using a different verb for a more formal wording.
Context: ... the script provides guidance on how to fix the issue. --- ## Stopping Dynamo A ...

(FIX_RESOLVE)

🪛 markdownlint-cli2 (0.18.1)

external/dynamo/README.md

33-33: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

236-236: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

410-410: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

466-466: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

513-513: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

703-703: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

906-906: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.8)

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

11-11: Docstring contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF002)

95-95: Consider moving this statement to an else block

(TRY300)

127-127: Do not catch blind exception: Exception

(BLE001)

144-144: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

216-216: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

329-329: Docstring contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF002)

349-349: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

352-352: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

368-368: Local variable text_obj is assigned to but never used

Remove assignment to unused variable text_obj

(F841)

401-401: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

404-404: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

493-493: Unpacked variable fig is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

570-570: Prefer next(iter(job_stats.values())) over single element slice

Replace with next(iter(job_stats.values()))

(RUF015)

620-620: Unpacked variable fig is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

664-664: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

734-734: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

741-741: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

823-823: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (17)

external/dynamo/.env.example (1)

2-4: Static analysis warnings are false positives in shell context.

The dotenv-linter tool flags QuoteCharacter warnings on lines 2-4, but these are false positives because the file uses shell export syntax (not standard dotenv format). Quotes around paths are appropriate and recommended in shell scripts to handle potential whitespace or special characters. No action needed.
external/dynamo/README.md (2)
275-330: Approve comprehensive Dynamo setup documentation.

The README effectively addresses the PR objectives by providing:

Clear prerequisites section (hardware and software requirements)

Three deployment modes with explicit advantages

Detailed architecture diagrams

Step-by-step setup instructions

Health checks and validation tests

Troubleshooting guidance

Configuration reference with environment variables

The documentation is well-structured, user-friendly, and should enable practitioners to set up and validate the Dynamo integration effectively.

1-10: Add required SPDX Apache-2.0 license header.

Per coding guidelines, all source files including markdown documentation must include the SPDX Apache-2.0 header.

Apply this diff:
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+
 # Dynamo Backend Setup Guide
⛔ Skipped due to learnings
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{md,rst,py} : Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{md,rst} : Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings
Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:51.370Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, most YAML configuration files use 2-space indentation rather than 4-space indentation. When reviewing YAML files, prefer consistency with the existing codebase style (2-space indentation) over the stated coding guideline (4-space indentation) until a repository-wide standardization is performed.
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php} : Every file must start with the standard SPDX Apache-2.0 header
external/dynamo/start_dynamo_unified_thompson_hints.sh (5)

79-93: Approve custom component validation logic.

The verification of custom components (frontend.py, router.py, processor.py) before container startup is a solid defensive practice that prevents silent failures later.

166-193: Approve pragmatic HuggingFace token handling.

The fallback logic for HF_TOKEN is well-designed:

Checks for cached local model first

Uses dummy token if model exists locally

Prompts user for token if needed

Provides clear guidance on skipping

This approach balances security, user experience, and failure recovery.

264-294: Approve robust worker initialization wait function.

The wait_for_worker function demonstrates excellent defensive coding:

Verifies process is still running

Provides progress updates

Has appropriate timeout (120s max)

Assumes initialization after 60s for long-loading models like 70B

Reports clear error messages on failure

322-356: Approve comprehensive startup sequence and diagnostics.

The multi-stage startup (ETCD → NATS → Worker → Router → Processor → Frontend) with detailed logging and health checks demonstrates excellent orchestration:

Each component outputs its PID for debugging

Clear step numbering aids problem diagnosis

Metrics CSV paths are documented

Request flow diagram helps users understand data paths

424-444: Approve process monitoring loop.

The monitoring loop that detects failures in critical processes (Frontend, Processor, Router, Worker) and exits immediately is appropriate. This prevents the container from running in a degraded state.

external/dynamo/start_dynamo_unified.sh (4)

29-32: Approve flexible port configuration.

Non-default port selections for ETCD (2379) and NATS (4222) with explicit environment variable overrides prevent conflicts with other services and documentation makes this clear.

78-99: Approve ETCD peer configuration.

The addition of ETCD_PEER_PORT (line 31) and peer URL configuration (lines 80-82) enables future clustering without breaking backward compatibility.

230-260: Approve robust infrastructure verification.

The in-container verification of ETCD and NATS accessibility with clear error messages ensures the container detects configuration issues early rather than failing mysteriously during worker startup.

405-459: Approve NAT integration documentation.

The inclusion of NAT integration test commands (lines 405-414) in the startup output helps users transition from basic validation to end-to-end workflow evaluation.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml (1)

1-261: Confirm file ends with a single newline.

Per coding guidelines, all files should end with a single newline. Please verify that this file has a trailing newline after line 261.

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py (4)

42-44: LGTM!

Simple utility function with proper type hints and docstring.

47-49: LGTM!

Simple utility function with proper type hints and docstring.

52-99: LGTM!

Function correctly parses TSQ scores with proper error handling and type hints.

293-321: LGTM!

Function correctly aggregates data from multiple directories with proper type hints.

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai

Actionable comments posted: 19

🧹 Nitpick comments (4)

external/dynamo/.env.example (1)
1-1: Optional: Add brief usage instruction at the top.

Consider adding a one-line comment explaining how to use this file (e.g., # Source this file or copy values to your shell environment). This will help users unfamiliar with .env.example patterns understand the intended usage quickly.

Example:
+# Usage: Source this file or copy variable definitions to your shell environment
+#   source .env.example
+#   or manually export the variables below
+
 # Required: Set your model directory path
examples/dynamo_integration/README.md (1)

9-9: Consider rewording "with respect to" for clarity.

Line 9 uses "LLM requests have predictable patterns with respect to conversation length..." — this phrasing is somewhat verbose. Consider: "Agentic LLM requests have predictable patterns in conversation length, system prompts, and tool-calling." This is more direct and reads more smoothly.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml (1)

126-128: Clarify commented configuration alternatives.

Lines 126–128 contain commented-out values (e.g., prefix_total_requests: 10 # 6, prefix_osl: MEDIUM # HIGH). If these are intentional for benchmarking/testing purposes, consider adding a brief comment explaining why alternatives are documented. If they should be removed, clean them up.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml (1)

79-86: Address the commented optimizer fields and NAT type resolution bug reference.

Line 79 references a "NAT type resolution bug" preventing optimizer parameters. This should either:

Link to a tracked issue/GitHub issue number for future resolution

Include a temporal note (e.g., "FIXME: Re-enable after NAT issue #XYZ is resolved")

Be updated if the bug has been fixed

Consider replacing the generic "bug" reference with traceable issue tracking information.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3a86a2 and 565663b.

📒 Files selected for processing (10)

examples/dynamo_integration/ARCHITECTURE.md (1 hunks)
examples/dynamo_integration/README.md (1 hunks)
examples/dynamo_integration/react_benchmark_agent/README.md (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml (1 hunks)
external/dynamo/.env.example (1 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/README.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/README.md
examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/README.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/README.md
examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/README.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/README.md
examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

examples/dynamo_integration/react_benchmark_agent/README.md
examples/dynamo_integration/README.md
examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

examples/dynamo_integration/react_benchmark_agent/README.md
examples/dynamo_integration/README.md
examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

examples/dynamo_integration/react_benchmark_agent/README.md
examples/dynamo_integration/README.md
examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

🧠 Learnings (2)

📚 Learning: 2025-12-12T20:49:44.305Z

Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:44.305Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, YAML files generally use 2-space indentation. When reviewing YAML, prefer 2-space indentation to match the existing style over a 4-space guideline until a repo-wide standardization is performed. This applies to YAML configuration files (e.g., red-teaming.yml) and, more broadly, all *.yml files in the project.

Applied to files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{md,rst,py} : Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Applied to files:

examples/dynamo_integration/README.md

🪛 dotenv-linter (4.0.0)

external/dynamo/.env.example

[warning] 2-2: [QuoteCharacter] The value has quote characters (', ")

(QuoteCharacter)

[warning] 4-4: [QuoteCharacter] The value has quote characters (', ")

(QuoteCharacter)

🪛 LanguageTool

examples/dynamo_integration/react_benchmark_agent/README.md

[grammar] ~5-~5: Ensure spelling is correct
Context: ...urrently this agent supports evaluation exculsivly for the [Galileo Agent Leaderboard v2](...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~28-~28: Ensure spelling is correct
Context: ... This document provides a more detailed explainations of the different test patterns and conf...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

examples/dynamo_integration/README.md

[style] ~9-~9: ‘with respect to’ might be wordy. Consider a shorter alternative.
Context: ... LLM requests have predictable patterns with respect to conversation length, system prompts, an...

(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)

[grammar] ~17-~17: Ensure spelling is correct
Context: ...imization (default Dynamo method) and a preditive Thompson sampling router (new implement...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

examples/dynamo_integration/react_benchmark_agent/README.md

330-330: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

346-346: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

427-427: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

522-522: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

570-570: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

645-645: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/dynamo_integration/README.md

103-103: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/dynamo_integration/ARCHITECTURE.md

19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

255-255: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

333-333: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

464-464: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

552-552: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

16-16: Link fragments should be valid

(MD051, link-fragments)

22-22: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

119-119: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

217-217: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

369-369: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

431-431: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (20)

external/dynamo/.env.example (2)

1-4: QuoteCharacter warnings are false positives—quotes are correct in bash export context.

The dotenv-linter warnings on lines 2 and 4 flag quotes in path values. However, since this file uses bash export statements (not standard .env format), the quotes are necessary and correct. They prevent word splitting and properly handle paths with spaces. These can be safely ignored.

1-68: Well-organized configuration template with clear sections and sensible defaults.

The file is logically structured with:

Required variables prominently listed

Optional variables grouped by concern (GPU, Network, Model, Disaggregated Mode)

Descriptive comments, default values, and example usage for each variable

This provides good guidance for users setting up the Dynamo integration.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md (1)

465-467: Clarify the deprecation status of prefix_utils.py.

The note states this file "imports from non-existent module" and is "To be removed". Please clarify:

Is this file currently in the codebase or already removed?

If still present, should it be removed as part of this PR?

If it remains, what is the migration path for users currently relying on it?

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml (2)

29-220: YAML indentation follows repo convention.

The file uses 2-space indentation throughout, which aligns with the established pattern in this repository (per learnings from PR 1243) rather than the generic 4-space guideline. This is correct and consistent with existing configurations.

1-220: The file already ends with a single newline as required by the coding guidelines. No action needed.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml (8)

1-2: File header and license formatting look correct.

Apache 2.0 header is present with current copyright year (2024-2025). Per coding guidelines, this is correctly placed.

4-35: Excellent comprehensive documentation and parameter reference.

The header documentation is well-structured with clear explanations of Dynamo prefix parameters (prefix_osl, prefix_iat, prefix_total_requests) and router behavior. The ASCII table and bullet points make it easy to understand optimization targets. This is exemplary documentation.

113-165: Dynamo LLM configuration with optimizable parameters is well-structured.

The dynamo_llm section properly defines:

Model configuration (model_name, base_url, temperature, max_tokens)

Prefix template and parameters (prefix_template, prefix_total_requests, prefix_osl, prefix_iat)

Optimizable parameters list and search space definitions (lines 142-156)

The search_space definitions are clear and reasonable. The eval_llm secondary LLM is appropriately minimal for evaluation purposes.

167-174: Workflow configuration using self-evaluation pattern is clear.

The workflow wraps react_workflow with self_evaluating_agent_with_feedback, properly configuring evaluator_llm, max_retries, and confidence threshold. The pass_feedback_to_agent flag enables iterative improvement. Configuration is sound.

232-279: Optimizer configuration with weighted metrics is well-designed.

The optimizer section properly defines:

reps_per_param_set=1 (appropriate for testing; noted in comment)

eval_metrics with weights summing to 1.0 (0.7 + 0.2 + 0.1 = 1.0) ✓

Numeric optimization with grid sampler (note: comment correctly identifies that 180 combinations may be excessive)

Prompt optimization intentionally disabled (line 279) with clear rationale

The configuration is logically sound for runtime performance tuning.

1-280: File correctly ends with a single newline. The file complies with the coding guideline requirement that all YAML files must end with a single newline character.

229-229: Evaluator type avg_tokens_per_llm_end is correctly registered.

The type name is properly registered in src/nat/eval/runtime_evaluator/register.py as AverageTokensPerLLMEndConfig(EvaluatorBaseConfig, name="avg_tokens_per_llm_end") and is used consistently across the codebase in other example configurations.

88-88: The relative path is correct for the intended workflow. The README explicitly documents in Step 3 of the Quick Start section that the dataset must be downloaded first using python scripts/download_agent_leaderboard_v2.py --domains banking. No changes needed.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml (4)

107-136: System prompt is well-structured and comprehensive.

The custom system prompt clearly instructs the agent on tool selection evaluation, single-tool-per-turn constraint, and placeholder value handling. This is appropriately documented with CRITICAL RULES section.

1-3: Apache License header and copyright years are correct.

The file properly includes the SPDX license identifier and up-to-date copyright years.

59-83: Dynamo LLM parameters are properly supported and validated.

The Dynamo LLM provider implementation in src/nat/llm/dynamo_llm.py properly recognizes, validates, and documents all configured parameters (prefix_template, prefix_total_requests, prefix_osl, prefix_iat). Each parameter is defined with Pydantic field validation, detailed docstring documentation, and comprehensive test coverage. Graceful error handling includes fallback behavior when prefix_template is None and bounds validation for prefix_total_requests (1-50 range). The configuration is correct.

33-33: No action needed. The paths in the configuration files are correct.

The data files referenced in the config (./examples/dynamo_integration/data/raw/banking/tools.json and ./examples/dynamo_integration/data/agent_leaderboard_v2_banking.json) are meant to be executed from the project root using nat eval, as documented in the Quick Start guide. The data directory does not yet exist because these files must first be downloaded using the download script: python scripts/download_agent_leaderboard_v2.py --domains banking. The paths are properly designed for this documented execution model and should not be changed to relative paths like ../../data/....

Likely an incorrect or invalid review comment.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml (3)

98-108: Inconsistent workflow configuration between full and minimal test variants.

File 1 (eval_config_no_rethinking_full_test.yml) includes max_history: 1000 in the workflow section (line 103), but this file omits it. Clarify whether this omission is intentional or if it should be added for consistency. If intentional, document the reason in a comment.

142-148: Profiler configuration differs significantly from full test variant.

This minimal test variant has most profiler features enabled (lines 159-183), while the full test disables them. Document the rationale for this difference if it's intentional. If the full test should also have these features enabled, consider enabling them there as well for consistency.

1-3: Apache License header and copyright years are correct.

The file properly includes the SPDX license identifier and up-to-date copyright years.

examples/dynamo_integration/ARCHITECTURE.md

examples/dynamo_integration/react_benchmark_agent/README.md

...react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

examples/dynamo_integration/README.md

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai

Actionable comments posted: 16

♻️ Duplicate comments (2)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml (1)
211-211: Fix comment mismatch on spike_threshold.

The comment states the threshold alerts when concurrent functions >= 5, but the value is set to 24. This discrepancy could mislead developers about the actual threshold behavior.

Clarify the intent and update either the comment or the value. If the threshold should alert at 5 concurrent functions, apply this diff:
-        spike_threshold: 24  # Alert when concurrent functions >= 5
+        spike_threshold: 5  # Alert when concurrent functions >= 5
If 24 is the intended threshold, update the comment to reflect the actual behavior:
-        spike_threshold: 24  # Alert when concurrent functions >= 5
+        spike_threshold: 24  # Alert when concurrent functions >= 24
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml (1)
88-89: Ensure referenced tools/dataset JSON files exist at the configured paths.

The config depends on:

tools_json_path: ./examples/dynamo_integration/data/raw/banking/tools.json

file_path: ./examples/dynamo_integration/data/agent_leaderboard_v2_banking.json

If these files are absent or moved, the banking tool group and evaluation dataset will fail to load at runtime. Please confirm they are present at these exact paths or update the config accordingly.
#!/bin/bash
set -euo pipefail
for f in \
  "examples/dynamo_integration/data/raw/banking/tools.json" \
  "examples/dynamo_integration/data/agent_leaderboard_v2_banking.json"
do
  if [[ -f "$f" ]]; then
    echo "✓ Found: $f"
  else
    echo "✗ Missing: $f"
  fi
done
Also applies to: 182-184

🧹 Nitpick comments (20)

examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py (1)
205-205: Consider the default domain selection.

The default is set to only ["banking"] with a comment "for testing". For a production script, consider either using all available domains by default or documenting why banking-only is the intended default behavior.

If all domains should be the default:
     if domains is None:
-        domains = ["banking"]  # Start with just one for testing
-        logger.info("No domains specified, using default: %s", domains)
+        domains = available_domains
+        logger.info("No domains specified, using all available domains: %s", domains)
external/dynamo/start_dynamo_unified_thompson_hints.sh (1)
215-225: Quote host paths and container name in docker run for robustness.

$LOCAL_MODEL_DIR, $CUSTOM_DYNAMO_DIR, and $CONTAINER_NAME are currently unquoted. If users set paths with spaces, the docker run command will break. Using quoted interpolations is safer:
docker run -d \
  --name "$CONTAINER_NAME" \
  -v "${LOCAL_MODEL_DIR}:${MODEL}:ro" \
  -v "${CUSTOM_DYNAMO_DIR}:/workspace/custom_dynamo:ro" \
  ...
external/dynamo/.env.example (1)
4-7: Consider unquoting example values to satisfy dotenv linters.

dotenv-linter flags the quoted values on Lines 5 and 7. Since these example paths contain no spaces, you can drop the quotes to quiet the warning:
export DYNAMO_MODEL_DIR=/path/to/your/models/Llama-3.3-70B-Instruct
export DYNAMO_REPO_DIR=/path/to/NeMo-Agent-Toolkit
Users can re-add quotes if they actually need spaces.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py (2)
42-53: Add an explicit return type for the public evaluator function.

Per guidelines, public APIs should have return type hints. action_completion_evaluator_function is an async generator yielding EvaluatorInfo; consider:
from collections.abc import AsyncIterator

@register_evaluator(config_type=ActionCompletionEvaluatorConfig)
async def action_completion_evaluator_function(
    config: ActionCompletionEvaluatorConfig, builder: EvalBuilder
) -> AsyncIterator[EvaluatorInfo]:
    ...
72-77: Avoid bare except Exception or at least log at debug level.

Catching a broad Exception and silently continue-ing makes trajectory parsing failures invisible and harder to debug. Either narrow the exception type (e.g., to TypeError/ValueError) or log at debug:
except Exception:
    logger.debug("Failed to model_dump trajectory step of type %s", type(step))
    continue
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py (2)
241-257: Consider logging parse failures with stack traces if they indicate bugs.

Right now parse_string_to_dict swallows json.JSONDecodeError and logs a warning without a stack trace. If malformed JSON here is unexpected (rather than just noisy input), switching to logger.exception (without re-raising) would give better observability:
except json.JSONDecodeError:
    logger.exception("Failed to parse input_params string as JSON: %s", v[:100])
    return {}
If malformed JSON is expected, leaving this as warning is fine and the Ruff suggestion can be ignored.

275-279: Remove unused local variables to satisfy Ruff and keep the function tight.

properties and required_params are assigned but never used in create_tool_stub_function. You can safely drop them:
tool_name = tool_schema.get("title", "unknown_tool")
tool_description = tool_schema.get("description", "")
# properties = tool_schema.get("properties", {})
# required_params = tool_schema.get("required", [])
Similarly, in _generate_mock_response, prop_desc on Line 335 is unused and can be removed.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py (2)
38-52: Add a return type hint for the public function generator.

To match the project’s typing guidelines, annotate the return type of react_benchmark_agent_function, e.g.:
from collections.abc import AsyncIterator

@register_function(...)
async def react_benchmark_agent_function(
    config: ReactBenchmarkAgentFunctionConfig, builder: Builder
) -> AsyncIterator[FunctionInfo]:
    ...
68-80: Mark the unused query parameter to satisfy Ruff and intent.

_decision_only_info doesn’t actually use its query argument, which is fine for a status endpoint but triggers ARG001. Renaming to _query (and adjusting the docstring) makes the intent clear and avoids the warning:
async def _decision_only_info(_query: str) -> str:
    ...
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py (3)
43-47: Non-cryptographic use of md5 is OK, but consider clarifying or switching hash.

_get_scenario_id_for_question uses hashlib.md5 purely to derive a short scenario identifier, not for security. To avoid future confusion (and S324 warnings), either:

Switch to sha256 for a “safe by default” choice, or

Add a brief comment that this hash is non-cryptographic and used only for ID generation.

Example:
# Non-cryptographic hash for short scenario IDs
question_hash = hashlib.sha256(question.encode()).hexdigest()[:12]
304-305: Drop the explicit exception argument in logger.exception calls.

logger.exception already logs the current exception and its stack trace; passing %s with e is redundant and slightly noisy. You can simplify to:
except Exception:
    logger.exception("💥 Error executing wrapped agent on attempt %d", attempt + 1)
    attempt += 1
    continue
Same applies to the similar calls at Lines 514 and 678 (and to the JSON parsing branch if you decide to switch to logger.exception there).

617-685: Optional: make JSON parsing more robust and explicit.

The current _evaluate_tool_sequence implementation heuristically slices the first {...} block from the LLM response. That’s reasonable for examples but brittle if the model ever includes braces in prose. If you see flaky parsing, consider:

Instructing the evaluator LLM to reply with JSON only and using response_text.strip() directly, or

Using a small helper that scans for the first valid JSON object instead of find/rfind.

Given this is example code, the existing behavior is acceptable; keep as-is if it’s stable in your workloads.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py (1)
4-7: Optional: remove unused noqa if F401 isn’t enabled.

Ruff reports the # noqa: F401 as unused. If F401 isn’t part of your active rule set, you can safely drop the directive:
from . import register
If you expect F401 to be enabled later, keeping the pragma is also reasonable.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py (1)
6-9: Sort __all__ to satisfy Ruff and keep exports predictable.

Ruff is flagging __all__ as unsorted. Sorting also makes the public surface easier to scan.
-from .action_completion_evaluator import action_completion_evaluator_function
-from .tsq_evaluator import tsq_evaluator_function
-
-__all__ = ["tsq_evaluator_function", "action_completion_evaluator_function"]
+from .action_completion_evaluator import action_completion_evaluator_function
+from .tsq_evaluator import tsq_evaluator_function
+
+__all__ = ["action_completion_evaluator_function", "tsq_evaluator_function"]
external/dynamo/start_dynamo_disagg.sh (1)
171-199: Quote model directory in Docker volume mount for robustness.

To be safe if DYNAMO_MODEL_DIR ever contains spaces or shell metacharacters, it’s better to quote the bind‑mount paths:
-  -v $LOCAL_MODEL_DIR:$MODEL:ro \
+  -v "$LOCAL_MODEL_DIR":"$MODEL":ro \
Everything else in the startup and health‑check flow looks solid.
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py (1)
30-47: Add a return type for banking_tools_group_function (and optionally simplify the FileNotFound path).

Per the project’s typing guidelines, this async generator should have an explicit return type:
from collections.abc import AsyncIterator

@register_function_group(config_type=BankingToolsGroupConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN])
async def banking_tools_group_function(
    config: BankingToolsGroupConfig, builder: Builder
) -> AsyncIterator[FunctionGroup]:
    ...
Optionally, to address the TRY003 hint and avoid duplicating error text between the log and exception, you could rely on the exception message alone:
-    if not tools_path.exists():
-        logger.error("tools.json not found at %s", tools_path)
-        raise FileNotFoundError(f"Banking tools file not found: {tools_path}")
+    if not tools_path.exists():
+        raise FileNotFoundError(f"Banking tools file not found: {tools_path}")
The overall intent‑buffer wiring and stub registration logic otherwise looks good.

Also applies to: 49-93
external/dynamo/start_dynamo_unified.sh (1)
141-165: Avoid echoing HF token back to the terminal.

read -p echoes the Hugging Face token in cleartext, which is undesirable even in local scripts.

Use silent input (and -r) instead:
-        read -p "Please enter your HuggingFace token (or press Enter to skip): " HF_TOKEN
+        read -r -s -p "Please enter your HuggingFace token (or press Enter to skip): " HF_TOKEN
+        echo
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py (2)
145-155: Remove unused expected_tsq local in test_scenario_000_score.

expected_tsq is computed but never used; the test directly asserts against 4/7. This triggers Ruff F841 and adds noise.

You can either drop the variable or use it in the assertion, e.g.:
-        expected_tsq = 4 / 7  # 0.5714285714285715
-        ...
-        accuracy = calculate_tool_accuracy(actual, expected)
-        assert accuracy == pytest.approx(4/7)
+        expected_tsq = 4 / 7  # 0.5714285714285715
+        ...
+        accuracy = calculate_tool_accuracy(actual, expected)
+        assert accuracy == pytest.approx(expected_tsq)
227-231: Tighten test_empty_tool_name assertion and avoid redundant calls.

The current assertion:
assert normalize_tool_name(None) == "" if normalize_tool_name(None) is not None else True
is hard to read and calls normalize_tool_name(None) twice. If the intention is to accept None and normalize to the empty string, this can be simplified to:
assert normalize_tool_name(None) == ""
which better documents expected behavior.
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py (1)

89-96: Remove unused # noqa: F401 suppressions on import tests.

The imports in test_module_imports and test_module_with_feedback_imports are intentionally unused, but Ruff is not enforcing F401 here, so the # noqa: F401 markers show up as unused (RUF100).

You can safely drop the # noqa comments on these two import lines; the tests remain clear and lint‑clean.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 565663b and 590f727.

📒 Files selected for processing (33)

examples/dynamo_integration/.env.example (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_adk_e2e_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_e2e_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_prefix_e2e_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/register.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py (1 hunks)
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py (1 hunks)
examples/dynamo_integration/scripts/create_test_subset.py (1 hunks)
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py (1 hunks)
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py (1 hunks)
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py (1 hunks)
examples/dynamo_integration/scripts/run_concurrency_benchmark.sh (1 hunks)
examples/dynamo_integration/scripts/throughput_analysis.py (1 hunks)
external/dynamo/.env.example (1 hunks)
external/dynamo/monitor_dynamo.sh (1 hunks)
external/dynamo/start_dynamo_disagg.sh (1 hunks)
external/dynamo/start_dynamo_unified.sh (1 hunks)
external/dynamo/start_dynamo_unified_thompson_hints.sh (1 hunks)
external/dynamo/stop_dynamo.sh (1 hunks)
external/dynamo/test_dynamo_integration.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (9)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_prefix_e2e_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_adk_e2e_test.yml
external/dynamo/stop_dynamo.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/config_dynamo_e2e_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/register.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/optimize_rethinking_full_test.yml
external/dynamo/test_dynamo_integration.sh
examples/dynamo_integration/scripts/create_test_subset.py

🧰 Additional context used

📓 Path-based instructions (11)

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

external/dynamo/monitor_dynamo.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
external/dynamo/start_dynamo_unified.sh
external/dynamo/start_dynamo_unified_thompson_hints.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
external/dynamo/start_dynamo_disagg.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py
examples/dynamo_integration/scripts/run_concurrency_benchmark.sh

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

external/dynamo/monitor_dynamo.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
external/dynamo/start_dynamo_unified.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
external/dynamo/start_dynamo_unified_thompson_hints.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
external/dynamo/start_dynamo_disagg.sh
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py
examples/dynamo_integration/scripts/run_concurrency_benchmark.sh

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py
examples/dynamo_integration/scripts/throughput_analysis.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py
examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py
examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py
examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py
examples/dynamo_integration/scripts/run_concurrency_benchmark.sh

**/test_*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/test_*.py: Use pytest with pytest-asyncio for asynchronous code testing
Test functions should be named using the test_ prefix, using snake_case
Extract frequently repeated code into pytest fixtures, which should be named using the fixture_ prefix and define the name argument in the decorator
Mock external services with pytest_httpserver or unittest.mock instead of hitting live endpoints
Mark slow tests with @pytest.mark.slow so they can be skipped in the default test suite
Mark integration tests requiring external services with @pytest.mark.integration so they can be skipped in the default test suite

Files:

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py

🧠 Learnings (7)

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php,sh} : All source files must include the SPDX Apache-2.0 header template

Applied to files:

external/dynamo/monitor_dynamo.sh
external/dynamo/start_dynamo_unified.sh
examples/dynamo_integration/scripts/throughput_analysis.py
external/dynamo/start_dynamo_disagg.sh

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,js,ts,java,cpp,c,go,rb,php} : Every file must start with the standard SPDX Apache-2.0 header

Applied to files:

external/dynamo/monitor_dynamo.sh
external/dynamo/start_dynamo_unified.sh
examples/dynamo_integration/scripts/throughput_analysis.py
external/dynamo/start_dynamo_disagg.sh

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: CI verifies SPDX headers via ci/scripts/github/checks.sh; do not bypass this check

Applied to files:

external/dynamo/monitor_dynamo.sh
external/dynamo/start_dynamo_unified.sh
examples/dynamo_integration/scripts/throughput_analysis.py
external/dynamo/start_dynamo_disagg.sh

📚 Learning: 2025-12-12T20:49:44.305Z

Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:44.305Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, YAML files generally use 2-space indentation. When reviewing YAML, prefer 2-space indentation to match the existing style over a 4-space guideline until a repo-wide standardization is performed. This applies to YAML configuration files (e.g., red-teaming.yml) and, more broadly, all *.yml files in the project.

Applied to files:

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_rethinking_full_test.yml

📚 Learning: 2025-12-03T18:42:23.494Z

Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1147
File: packages/nvidia_nat_a2a/pyproject.toml:1-10
Timestamp: 2025-12-03T18:42:23.494Z
Learning: In the packages/ directory, pyproject.toml files typically do not include SPDX license headers. Out of 34 packages, only nvidia_nat_strands is an exception. This pattern differs from the requirement for SPDX headers in source code files (.py, .js, .ts, etc.).

Applied to files:

examples/dynamo_integration/scripts/throughput_analysis.py

📚 Learning: 2025-11-05T11:45:35.119Z

Learnt from: thepatrickchin
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1152
File: examples/config_inheritance/pyproject.toml:1-25
Timestamp: 2025-11-05T11:45:35.119Z
Learning: In the examples/ directory, pyproject.toml files typically do not include SPDX license headers, with only one exception (adk_demo). This is an established pattern that differs from the general guideline requiring SPDX headers in all .toml files.

Applied to files:

examples/dynamo_integration/scripts/throughput_analysis.py

📚 Learning: 2025-11-14T20:33:53.944Z

Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1181
File: packages/nvidia_nat_test/tests/test_test_llm.py:419-484
Timestamp: 2025-11-14T20:33:53.944Z
Learning: The NVIDIA NeMo-Agent-Toolkit project uses pytest-asyncio in strict mode (the default), which requires pytest.mark.asyncio decorator on all async test functions. All async tests in packages/nvidia_nat_test/tests/test_test_llm.py consistently follow this pattern.

Applied to files:

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py

🧬 Code graph analysis (9)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py (2)

src/nat/data_models/component_ref.py (1)

LLMRef (116-124)

src/nat/eval/evaluator/evaluator_model.py (4)

EvalInput (46-47)

EvalInputItem (23-43)

EvalOutput (56-58)

EvalOutputItem (50-53)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py (2)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py (1)

action_completion_evaluator_function (43-245)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py (1)

tsq_evaluator_function (52-328)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py (1)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py (2)

ToolIntentBuffer (80-137)

create_tool_stub_function (260-317)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py (1)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py (4)

get_global_intents (140-152)

set_current_scenario_id (37-56)

clear (132-137)

get_intents (123-130)

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py (1)

examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py (4)

get_job_label (47-49)

get_experiment_label (52-54)

extract_per_request_tsq_scores (146-193)

calculate_per_request_throughput_metrics (196-307)

examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py (1)

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py (6)

get_job_label (45-47)

get_experiment_label (50-52)

extract_per_request_tsq_scores (55-102)

calculate_per_request_throughput_metrics (105-216)

collect_job_data_from_dir (219-293)

collect_job_data (296-324)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py (3)

src/nat/builder/framework_enum.py (1)

LLMFrameworkEnum (19-26)

src/nat/builder/function_info.py (2)

FunctionInfo (290-625)

from_fn (552-625)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py (2)

ToolIntentBuffer (80-137)

get_intents (123-130)

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py (1)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py (2)

SelfEvaluatingAgentConfig (53-126)

SelfEvaluatingAgentWithFeedbackConfig (132-201)

examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py (1)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py (3)

normalize_tool_name (134-156)

calculate_tool_accuracy (158-177)

extract_tool_calls_from_trajectory (66-132)

🪛 dotenv-linter (4.0.0)

external/dynamo/.env.example

[warning] 5-5: [QuoteCharacter] The value has quote characters (', ")

(QuoteCharacter)

[warning] 7-7: [QuoteCharacter] The value has quote characters (', ")

(QuoteCharacter)

🪛 Ruff (0.14.8)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py

75-76: try-except-continue detected, consider logging the exception

(S112)

75-75: Do not catch blind exception: Exception

(BLE001)

148-148: Consider moving this statement to an else block

(TRY300)

149-149: Do not catch blind exception: Exception

(BLE001)

210-210: Redundant exception object included in logging.exception call

(TRY401)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/__init__.py

9-9: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py

64-64: Local variable eval_builder is assigned to but never used

Remove assignment to unused variable eval_builder

(F841)

88-88: Do not catch blind exception: Exception

(BLE001)

264-264: Do not catch blind exception: Exception

(BLE001)

293-293: Redundant exception object included in logging.exception call

(TRY401)

examples/dynamo_integration/scripts/throughput_analysis.py

99-99: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

176-176: f-string without any placeholders

Remove extraneous f prefix

(F541)

178-178: f-string without any placeholders

Remove extraneous f prefix

(F541)

208-208: f-string without any placeholders

Remove extraneous f prefix

(F541)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/banking_tools.py

91-91: Avoid specifying long messages outside the exception class

(TRY003)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/self_evaluating_agent_with_feedback.py

46-46: Probable use of insecure hash functions in hashlib: md5

(S324)

304-304: Redundant exception object included in logging.exception call

(TRY401)

514-514: Redundant exception object included in logging.exception call

(TRY401)

666-666: Consider moving this statement to an else block

(TRY300)

678-678: Redundant exception object included in logging.exception call

(TRY401)

682-682: Use explicit conversion flag

Replace with conversion flag

(RUF010)

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py

14-14: Docstring contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF002)

98-98: Consider moving this statement to an else block

(TRY300)

130-130: Do not catch blind exception: Exception

(BLE001)

147-147: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

219-219: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

332-332: Docstring contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF002)

352-352: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

355-355: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

371-371: Local variable text_obj is assigned to but never used

Remove assignment to unused variable text_obj

(F841)

404-404: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

407-407: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

496-496: Unpacked variable fig is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

573-573: Prefer next(iter(job_stats.values())) over single element slice

Replace with next(iter(job_stats.values()))

(RUF015)

623-623: Unpacked variable fig is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

667-667: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

737-737: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

744-744: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

826-826: f-string without any placeholders

Remove extraneous f prefix

(F541)

examples/dynamo_integration/scripts/plot_throughput_vs_tsq_per_request.py

13-13: Docstring contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF002)

88-88: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

93-93: Consider moving this statement to an else block

(TRY300)

95-95: Do not catch blind exception: Exception

(BLE001)

189-189: Consider moving this statement to an else block

(TRY300)

221-221: Do not catch blind exception: Exception

(BLE001)

238-238: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

310-310: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

447-447: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

499-499: Unpacked variable fig is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

557-557: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

620-620: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

674-674: Unpacked variable p_value is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

674-674: Unpacked variable std_err is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

710-710: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

716-716: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

777-777: String contains ambiguous σ (GREEK SMALL LETTER SIGMA). Did you mean o (LATIN SMALL LETTER O)?

(RUF001)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/__init__.py

5-5: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/react_benchmark_agent.py

68-68: Unused function argument: query

(ARG001)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py

277-277: Local variable properties is assigned to but never used

Remove assignment to unused variable properties

(F841)

278-278: Local variable required_params is assigned to but never used

Remove assignment to unused variable required_params

(F841)

335-335: Local variable prop_desc is assigned to but never used

Remove assignment to unused variable prop_desc

(F841)

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py

91-91: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

95-95: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

examples/dynamo_integration/react_benchmark_agent/tests/test_tsq_formula.py

150-150: Local variable expected_tsq is assigned to but never used

Remove assignment to unused variable expected_tsq

(F841)

examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py

1-1: The file is executable but no shebang is present

(EXE002)

258-258: Redundant exception object included in logging.exception call

(TRY401)

🪛 Shellcheck (0.11.0)

examples/dynamo_integration/scripts/run_concurrency_benchmark.sh

[warning] 186-186: JOB_ID appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (15)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/profile_rethinking_full_test.yml (1)

1-220: The file already ends with a single newline as required. No action needed.

examples/dynamo_integration/scripts/download_agent_leaderboard_v2.py (5)

1-7: LGTM!

The license header and module docstring follow the project's standards.

20-32: LGTM!

The function correctly converts JSON strings to dictionaries with proper defensive copying.

35-127: LGTM!

The heuristic approach for deriving expected tool calls is well-documented and handles various parameter types appropriately. The deduplication logic ensures unique tool names while preserving order.

130-186: LGTM!

The transformation function properly converts scenarios to NAT format with comprehensive metadata and graceful handling of missing personas.

271-296: LGTM!

The CLI setup properly handles argument parsing and sets a sensible default for HF_HOME with appropriate logging.

external/dynamo/monitor_dynamo.sh (1)

1-170: Monitoring script is clear and functionally sound.

The menu flow, Docker checks, and health/inference probes are well-structured and defensive enough for this dev‑tool script. I don’t see issues that need changes.

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml (8)

1-3: Apache License header present and correctly formatted.

The copyright year (2025-2026) and license identifier are properly included.

4-25: Excellent documentation and prerequisites section.

The header provides clear purpose, use cases, prerequisites with reproducible setup steps, usage instructions, and expected outcomes. This significantly improves discoverability and maintainability.

27-61: Functions and function_groups configuration looks sound.

The react_benchmark_agent function and banking_tools_group are properly declared with tool inclusions specified clearly. The decision_only flags and templates align with the agent architecture.

62-96: LLM configuration is well-structured with appropriate comments.

Both dynamo_llm and eval_llm are properly configured. The stop tokens (line 72), prefix template settings (lines 75-78), and commented-out optimizer fields (lines 79-86) are appropriately documented. Note that the api_key: dummy placeholder is acceptable for a test configuration.

98-138: Workflow and system prompt configuration is comprehensive.

The ReACT agent workflow is well-configured with verbose logging, elevated limits (max_tool_calls: 20, recursion_limit: 50), and a detailed system prompt optimized for tool selection evaluation. The prompt instructions are clear and appropriate for the intended use case.

140-193: Eval and evaluator configuration is well-structured.

The evaluation section includes comprehensive profiler settings (LLM metrics, token forecasting, prompt caching analysis, bottleneck analysis), and the tool_selection_quality evaluator is properly configured with TSQ-based evaluation.

194-230: Optimizer configuration provides clear parameter exploration setup.

The optimizer section with grid search sampling (5 trials), metric direction (maximize TSQ), and disabled prompt optimization provides a solid foundation for parameter exploration. The configuration is well-documented with explanatory comments.

37-37: Resolve the missing data files and paths used in config.

The configuration references data files that do not exist in the repository:

./examples/dynamo_integration/data/raw/banking/tools.json (NOT FOUND)

./examples/dynamo_integration/data/agent_leaderboard_v2_test_subset.json (NOT FOUND)

While the YAML comments document a prerequisite to generate the test subset file, the core issue is that the source data files are not available. Per the coding guidelines, sample data should be checked into git-lfs. Either ensure these files are present and properly configured in git-lfs, or provide clear documentation on how to obtain/generate them before the config can be used. The absolute-style paths (e.g., ./examples/...) also assume execution from the repository root; consider using relative paths or documenting the required working directory.

examples/dynamo_integration/.env.example

...benchmark_agent/src/react_benchmark_agent/configs/eval_config_no_rethinking_minimal_test.yml

...on/react_benchmark_agent/src/react_benchmark_agent/evaluators/action_completion_evaluator.py

...namo_integration/react_benchmark_agent/src/react_benchmark_agent/evaluators/tsq_evaluator.py

...ples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/tool_intent_stubs.py

examples/dynamo_integration/scripts/run_concurrency_benchmark.sh

examples/dynamo_integration/scripts/throughput_analysis.py

external/dynamo/start_dynamo_unified_thompson_hints.sh

external/dynamo/start_dynamo_unified.sh

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (1)

src/nat/llm/dynamo_llm.py (1)
355-355: Add return type hint to function signature.

Per coding guidelines, all public APIs require type hints on parameters and return values. This was also noted in past review comments.

Apply this diff to add the return type hint:
+from collections.abc import AsyncIterator
+
 @register_llm_provider(config_type=DynamoModelConfig)
-async def dynamo_llm(config: DynamoModelConfig, _builder: Builder):
+async def dynamo_llm(config: DynamoModelConfig, _builder: Builder) -> AsyncIterator[LLMProviderInfo]:
     """Register the Dynamo LLM provider."""

🧹 Nitpick comments (2)

tests/nat/llm/test_dynamo_llm.py (1)
308-308: Consider using next(iter(...)) for readability.

Static analysis suggests preferring next(iter(prefix_ids)) over single element slice for better semantics when extracting a single element from a set.

Apply this diff:
-        assert list(prefix_ids)[0].startswith("session-")
+        assert next(iter(prefix_ids)).startswith("session-")
src/nat/llm/dynamo_llm.py (1)
287-310: Consider adding type hint to request parameter.

While this is an internal function, adding a type hint improves code clarity and helps with IDE support.

Apply this diff:
-    async def on_request(request):
+    async def on_request(request: "httpx.Request") -> None:
         """Inject Dynamo prefix headers before each request."""

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 590f727 and cf08736.

📒 Files selected for processing (5)

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py (1 hunks)
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py (2 hunks)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (2 hunks)
src/nat/llm/dynamo_llm.py (1 hunks)
tests/nat/llm/test_dynamo_llm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (14)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py
examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

Files:

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py
packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py

src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

src/nat/llm/dynamo_llm.py

**/test_*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/test_*.py: Use pytest with pytest-asyncio for asynchronous code testing
Test functions should be named using the test_ prefix, using snake_case
Extract frequently repeated code into pytest fixtures, which should be named using the fixture_ prefix and define the name argument in the decorator
Mock external services with pytest_httpserver or unittest.mock instead of hitting live endpoints
Mark slow tests with @pytest.mark.slow so they can be skipped in the default test suite
Mark integration tests requiring external services with @pytest.mark.integration so they can be skipped in the default test suite

Files:

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py

tests/**/*.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

tests/nat/llm/test_dynamo_llm.py

🧠 Learnings (2)

📚 Learning: 2025-09-15T21:26:29.430Z

Learnt from: saglave
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 726
File: examples/frameworks/adk_demo/src/nat_adk_demo/weather_update_tool.py:0-0
Timestamp: 2025-09-15T21:26:29.430Z
Learning: In NAT's ADK integration, the docstring of registered functions serves as the tool description sent to the LLM, not standard Python function documentation. The docstring should describe the logical tool interface (parameters the LLM will provide) rather than the NAT framework wrapper parameters like tool_config and builder.

Applied to files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py

📚 Learning: 2025-11-14T20:33:53.944Z

Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1181
File: packages/nvidia_nat_test/tests/test_test_llm.py:419-484
Timestamp: 2025-11-14T20:33:53.944Z
Learning: The NVIDIA NeMo-Agent-Toolkit project uses pytest-asyncio in strict mode (the default), which requires pytest.mark.asyncio decorator on all async test functions. All async tests in packages/nvidia_nat_test/tests/test_test_llm.py consistently follow this pattern.

Applied to files:

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py
tests/nat/llm/test_dynamo_llm.py

🧬 Code graph analysis (3)

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)

src/nat/llm/dynamo_llm.py (4)

dynamo_llm (355-360)

create_httpx_client_with_dynamo_hooks (314-344)

DynamoModelConfig (153-243)

get_dynamo_field_names (221-243)

src/nat/llm/dynamo_llm.py (2)

src/nat/data_models/optimizable.py (2)

OptimizableField (157-196)

SearchSpace (35-154)

src/nat/llm/openai_llm.py (2)

openai_llm (57-59)

OpenAIModelConfig (32-53)

tests/nat/llm/test_dynamo_llm.py (2)

src/nat/llm/dynamo_llm.py (9)

dynamo_llm (355-360)

create_httpx_client_with_dynamo_hooks (314-344)

DynamoModelConfig (153-243)

DynamoPrefixContext (71-145)

_create_dynamo_request_hook (251-311)

get_dynamo_field_names (221-243)

clear (111-114)

get (117-119)

scope (123-145)

src/nat/data_models/common.py (2)

static_type (157-158)

static_full_type (161-162)

🪛 Ruff (0.14.8)

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py

91-91: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

95-95: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tests/nat/llm/test_dynamo_llm.py

205-205: Avoid specifying long messages outside the exception class

(TRY003)

308-308: Prefer next(iter(prefix_ids)) over single element slice

Replace with next(iter(prefix_ids))

(RUF015)

🔇 Additional comments (11)

tests/nat/llm/test_dynamo_llm.py (3)

34-144: LGTM! Comprehensive configuration validation tests.

The test coverage for DynamoModelConfig is thorough, including defaults, custom values, boundary conditions, validation logic, inheritance, and utility methods.

151-223: LGTM! Thorough context management tests.

The tests cover all aspects of DynamoPrefixContext including basic operations, overwriting, context manager usage, exception safety, and nested scope behavior.

230-445: LGTM! Comprehensive hook and client creation tests.

The test suite thoroughly validates:

Header injection mechanics

Context variable override behavior

Prefix ID consistency across requests

Hook isolation between clients

Default template fallback

Case normalization

HTTPX client configuration

Provider registration

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py (1)

200-248: LGTM! Dynamo LangChain integration follows established patterns.

The implementation correctly:

Excludes Dynamo-specific fields using get_dynamo_field_names() (as per past review feedback)

Creates httpx client with hooks only when prefix headers are enabled

Handles RESPONSES API consistently with openai_langchain

Applies retry and thinking patches via _patch_llm_based_on_config

Logs Dynamo configuration for debugging

src/nat/llm/dynamo_llm.py (3)

71-145: LGTM! Well-designed context management class.

The DynamoPrefixContext class properly:

Uses ContextVar for async-safe context management

Provides clear set/get/clear methods with proper type hints

Implements a context manager for scoped usage with automatic cleanup

Includes comprehensive docstrings with usage examples

Logs debug information for troubleshooting

153-243: LGTM! Well-structured configuration model.

The DynamoModelConfig class properly:

Extends OpenAIModelConfig for compatibility

Uses OptimizableField with SearchSpace for tunable parameters

Provides clear field descriptions with optimization hints

Includes validation constraints for bounds checking

Implements get_dynamo_field_names() utility as suggested in past review feedback

Documents the purpose and default behavior (headers enabled by default)

314-344: LGTM! HTTPX client factory is well-implemented.

The create_httpx_client_with_dynamo_hooks function properly:

Creates an httpx client with event hooks for header injection

Configures timeout appropriately

Has proper type hints and documentation

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py (4)

1-12: LGTM! Proper licensing and documentation.

The SPDX Apache-2.0 header and copyright years are correct, and the module docstring clearly describes the test coverage.

195-326: LGTM! Comprehensive decision logic and feedback generation tests.

The test coverage is thorough, including boundary conditions (exact threshold matches, exhausted retries, edge cases). The helper methods appropriately mimic production logic for isolated testing.

328-442: LGTM! Well-structured integration tests with proper markers.

The integration tests correctly:

Use @pytest.mark.slow and @pytest.mark.integration markers for optional execution

Apply @pytest.mark.asyncio decorator on all async test methods (consistent with project requirements)

Follow fixture naming conventions with the fixture_ prefix and name argument

Use NIM backend for realistic integration testing

The nvidia_api_key fixture parameter (line 339) is appropriately marked with noqa: ARG002 as it's a pytest fixture dependency ensuring the API key is available.

Based on learnings, the pytest-asyncio decorator usage is consistent with the NVIDIA NeMo Agent toolkit project's strict mode configuration.

23-84: LGTM! Thorough configuration validation tests.

The config validation tests appropriately verify:

File existence for both eval and profile configs

Correct workflow type specification

Required parameters presence

Valid ranges for max_retries (0-10) and confidence thresholds (0.0-1.0)

examples/dynamo_integration/react_benchmark_agent/tests/test_self_evaluation.py

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py

src/nat/llm/dynamo_llm.py

…/1242 Merging release/1.4 commits since creating this feature branch for dynamo

bbednarski9 · 2025-12-16T18:36:04Z

/ok to test 0773d81

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T19:22:22Z

/ok to test 3fb6d64

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/source/build-workflows/llms/index.md (1)
23-29: Add NVIDIA Dynamo to the supported LLM providers table.

The provider table is missing an entry for Dynamo, although the section below documents it fully. This inconsistency may confuse users about which providers are available.

Add Dynamo to the provider table:
 | Provider | Type | Description |
 |----------|------|-------------|
 | [NVIDIA NIM](https://build.nvidia.com) | `nim` | NVIDIA Inference Microservice (NIM) |
 | [OpenAI](https://openai.com) | `openai` | OpenAI API |
 | [AWS Bedrock](https://aws.amazon.com/bedrock/) | `aws_bedrock` | AWS Bedrock API |
 | [Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/quickstart) | `azure_openai` | Azure OpenAI API |
 | [LiteLLM](https://github.com/BerriAI/litellm) | `litellm` | LiteLLM API |
+| [NVIDIA Dynamo](https://github.com/ai-dynamo/dynamo) | `dynamo` | NVIDIA Dynamo inference optimization (experimental) |

♻️ Duplicate comments (1)

examples/dynamo_integration/react_benchmark_agent/README.md (1)
692-699: Add language identifier to fenced code block (MD040).

The file structure example at line 692 is missing a language identifier, causing markdownlint MD040 violation. Add text to the opening fence:
-```
+```text
 react_benchmark_agent/outputs/benchmarks/<name>_<timestamp>/
 ├── benchmark_results.csv          # Machine-readable CSV
 ├── benchmark_report.md            # Human-readable markdown
 ├── analysis_16.txt                 # Detailed analysis for concurrency=16
 ├── analysis_32.txt                 # Detailed analysis for concurrency=32
 └── ...

🧹 Nitpick comments (5)

docs/source/build-workflows/llms/index.md (2)
36-54: Consider adding a YAML configuration example for Dynamo.

All documented providers include a YAML configuration example for reference (as shown in lines 36–54), but the new Dynamo section (lines 150–168) does not. Adding a minimal example would improve consistency and help users get started quickly.

Proposed example to add after line 168:
  dynamo_llm:
    _type: dynamo
    model_name: meta/llama-3.1-70b-instruct
    prefix_template: "<prefix_id>"
    prefix_osl: 256
156-168: Consider adding a gated-fields note for temperature and top_p.

Providers with temperature and top_p fields (NIM, OpenAI, Azure) include a note that these are model-gated and may not be supported by all models (see lines 73–75, 95–97, 131–133). For consistency and user clarity, consider adding a similar note to the Dynamo section.

Proposed addition after line 168:
:::{note}
`temperature` and `top_p` are model-gated fields and may not be supported by all models. If unsupported and explicitly set, validation will fail. See [Gated Fields](../../extend/custom-components/gated-fields.md) for details.
:::
examples/dynamo_integration/README.md (1)
26-26: Optional: Consider rephrasing to reduce wordiness.

Line 26 uses "with respect to" which static analysis suggests could be wordy. Optional improvements:
-Agentic LLM requests have predictable patterns with respect to conversation length, system prompts, and tool-calling.
+Agentic LLM requests have predictable patterns in conversation length, system prompts, and tool-calling.
Or restructure as: "Agentic LLM requests show predictable patterns: conversation length, system prompts, and tool-calling."
examples/dynamo_integration/ARCHITECTURE.md (1)
17-19: Minor note: Consider using full toolkit name on first mention.

While this is an architecture document and the experimental note contextualizes usage, adding the NVIDIA prefix in line 18 would align with documentation standards:
-> ⚠️ **EXPERIMENTAL**: This integration between NeMo Agent toolkit and Dynamo is experimental and under active development. APIs, configurations, and features may change without notice.
+> ⚠️ **EXPERIMENTAL**: This integration between NVIDIA NeMo Agent toolkit and Dynamo is experimental and under active development. APIs, configurations, and features may change without notice.
examples/dynamo_integration/react_benchmark_agent/README.md (1)
18-19: Minor: Use full toolkit name on first mention.

Line 18 references "NeMo Agent toolkit" in the experimental note. Per documentation guidelines, use the full vendor-prefixed name on first mention:
-> ⚠️ **EXPERIMENTAL**: This integration between NVIDIA NeMo Agent toolkit and Dynamo is experimental and under active development. APIs, configurations, and features may change without notice.
+> ⚠️ **EXPERIMENTAL**: This integration between NVIDIA NeMo Agent toolkit and Dynamo is experimental and under active development. APIs, configurations, and features may change without notice.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0773d81 and 3fb6d64.

📒 Files selected for processing (6)

docs/source/build-workflows/llms/index.md (1 hunks)
examples/dynamo_integration/ARCHITECTURE.md (1 hunks)
examples/dynamo_integration/README.md (1 hunks)
examples/dynamo_integration/react_benchmark_agent/README.md (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md (1 hunks)
external/dynamo/README.md (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

🧰 Additional context used

📓 Path-based instructions (9)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

examples/dynamo_integration/ARCHITECTURE.md
docs/source/build-workflows/llms/index.md
examples/dynamo_integration/README.md
examples/dynamo_integration/react_benchmark_agent/README.md
external/dynamo/README.md

**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Files:

examples/dynamo_integration/ARCHITECTURE.md
docs/source/build-workflows/llms/index.md
examples/dynamo_integration/README.md
examples/dynamo_integration/react_benchmark_agent/README.md
external/dynamo/README.md

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

examples/dynamo_integration/ARCHITECTURE.md
docs/source/build-workflows/llms/index.md
examples/dynamo_integration/README.md
examples/dynamo_integration/react_benchmark_agent/README.md
external/dynamo/README.md

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

examples/dynamo_integration/ARCHITECTURE.md
docs/source/build-workflows/llms/index.md
examples/dynamo_integration/README.md
examples/dynamo_integration/react_benchmark_agent/README.md
external/dynamo/README.md

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

examples/dynamo_integration/ARCHITECTURE.md
docs/source/build-workflows/llms/index.md
examples/dynamo_integration/README.md
examples/dynamo_integration/react_benchmark_agent/README.md
external/dynamo/README.md

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/dynamo_integration/ARCHITECTURE.md
examples/dynamo_integration/README.md
examples/dynamo_integration/react_benchmark_agent/README.md

docs/**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never use deprecated names: Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq in documentation (unless intentionally referring to deprecated versions)

Files:

docs/source/build-workflows/llms/index.md

docs/source/**/*.{md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.{md,rst}: Documentation must be clear and comprehensive, without TODOs, FIXMEs, or placeholder text
Ensure documentation is free of offensive or outdated terms
Ensure documentation is free of spelling mistakes and do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt

Files:

docs/source/build-workflows/llms/index.md

docs/source/**/*

⚙️ CodeRabbit configuration file

docs/source/**/*: This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Documentation Categories

Ensure documentation is placed in the correct category:

get-started/: Introductory documentation for new users

get-started/tutorials/: Step-by-step learning guides

build-workflows/: Workflow creation, configuration, adding remote MCP tools or A2A agents - run-workflows/: Execution, observability, serving workflows via MCP and A2A protocols - improve-workflows/: Evaluation and optimization guides - components/: Specific component implementations (agents, tools, connectors) - extend/: Custom component development and testing (not core library contributions) - reference/: Python and REST API documentation only - resources/: Project information (licensing, FAQs)

resources/contributing/: Development environment and contribution guides

Placement rules:

Component implementations always belong in components/, not build-workflows/ 2. API documentation belongs only in reference/ 3. Using remote MCP tools or A2A agents should be placed in build-workflows/ 4. Serving workflows via MCP/A2A should be placed in run-workflows/

Files:

docs/source/build-workflows/llms/index.md

🧠 Learnings (6)

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{md,rst,py} : Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Applied to files:

examples/dynamo_integration/README.md
external/dynamo/README.md

📚 Learning: 2025-08-25T15:24:13.143Z

Learnt from: willkill07
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 710
File: .coderabbit.yaml:33-33
Timestamp: 2025-08-25T15:24:13.143Z
Learning: User feedback indicates that CodeRabbit's chat commands, particularly `coderabbitai review` for manual incremental reviews, need better documentation and discoverability within repositories, especially when auto_incremental_review is disabled.

Applied to files:

examples/dynamo_integration/README.md
external/dynamo/README.md

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{md,rst} : Use 'NeMo Agent Toolkit' (capitalize 'T') when the name appears in headings

Applied to files:

external/dynamo/README.md

📚 Learning: 2025-12-12T20:49:51.370Z

Learnt from: zterek
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1243
File: examples/risk_and_security/retail_agent/src/nat_retail_agent/configs/red-teaming.yml:1-98
Timestamp: 2025-12-12T20:49:51.370Z
Learning: In the NVIDIA/NeMo-Agent-Toolkit repository, most YAML configuration files use 2-space indentation rather than 4-space indentation. When reviewing YAML files, prefer consistency with the existing codebase style (2-space indentation) over the stated coding guideline (4-space indentation) until a repository-wide standardization is performed.

Applied to files:

external/dynamo/README.md

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to docs/**/*.{md,rst} : Never use deprecated names: Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq in documentation (unless intentionally referring to deprecated versions)

Applied to files:

external/dynamo/README.md

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,toml,yaml,yml} : Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Applied to files:

external/dynamo/README.md

🪛 LanguageTool

docs/source/build-workflows/llms/index.md

[grammar] ~152-~152: Use a hyphen to join words.
Context: ...rimental) Dynamo is an inference engine agnostic LLM provider designed to optimi...

(QB_NEW_EN_HYPHEN)

[grammar] ~164-~164: Ensure spelling is correct
Context: ...sation prefix IDs. Setting to null will distable use of prefix_template, prefix_total_re...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

examples/dynamo_integration/README.md

[style] ~26-~26: ‘with respect to’ might be wordy. Consider a shorter alternative.
Context: ... LLM requests have predictable patterns with respect to conversation length, system prompts, an...

(EN_WORDINESS_PREMIUM_WITH_RESPECT_TO)

external/dynamo/README.md

[style] ~465-~465: Consider using a different verb for a more formal wording.
Context: ... the script provides guidance on how to fix the issue. --- ## Stopping Dynamo A ...

(FIX_RESOLVE)

🪛 markdownlint-cli2 (0.18.1)

examples/dynamo_integration/react_benchmark_agent/README.md

692-692: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (1)

external/dynamo/README.md (1)
21-21: Use "NVIDIA NeMo Agent toolkit" on first mention.

Line 21 references the toolkit by abbreviated name. Per coding guidelines, the full name with vendor prefix should be used on first mention in the file.
-> ⚠️ **EXPERIMENTAL**: This integration between NVIDIA NeMo Agent toolkit and Dynamo is experimental and under active development. APIs, configurations, and features may change without notice.
+> ⚠️ **EXPERIMENTAL**: This integration between NVIDIA NeMo Agent toolkit and Dynamo is experimental and under active development. APIs, configurations, and features may change without notice.
Based on coding guidelines for **/*.{md,rst,py}.

docs/source/build-workflows/llms/index.md

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T19:27:43Z

/ok to test 6d4ff39

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T20:07:03Z

/ok to test ce72671

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

…/1242

bbednarski9 · 2025-12-16T20:40:10Z

/ok to test 072a7f7

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

src/nat/llm/dynamo_llm.py (1)
251-312: Correct header injection logic with minor type hint opportunity.

The hook creation pattern is well-designed:

Generates a default prefix ID once at hook creation for consistency across requests

Properly checks DynamoPrefixContext first for per-request overrides (useful in batch evaluation scenarios)

Injects all required Dynamo headers with correct names and value formatting

Includes helpful debug logging at each decision point

The outer function has comprehensive type hints. The inner on_request function (line 287) could optionally benefit from explicit type hints for better IDE support and clarity, though this is not strictly required for private functions.

If you'd like to add type hints to the inner function for consistency:
-    async def on_request(request):
+    async def on_request(request: "httpx.Request") -> None:
         """Inject Dynamo prefix headers before each request."""

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce72671 and 3eb804f.

📒 Files selected for processing (2)

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py (2 hunks)
src/nat/llm/dynamo_llm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (11)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py
src/nat/llm/dynamo_llm.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

When adding a new package, that new package name (as defined in the pyproject.toml file) should
be added as a dependency to the nvidia-nat-all package in packages/nvidia_nat_all/pyproject.toml

Files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py

src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

src/nat/llm/dynamo_llm.py

🧠 Learnings (2)

📚 Learning: 2025-09-15T21:26:29.430Z

Learnt from: saglave
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 726
File: examples/frameworks/adk_demo/src/nat_adk_demo/weather_update_tool.py:0-0
Timestamp: 2025-09-15T21:26:29.430Z
Learning: In NAT's ADK integration, the docstring of registered functions serves as the tool description sent to the LLM, not standard Python function documentation. The docstring should describe the logical tool interface (parameters the LLM will provide) rather than the NAT framework wrapper parameters like tool_config and builder.

Applied to files:

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Confirm that copyright years are up-to-date whenever a file is changed

Applied to files:

src/nat/llm/dynamo_llm.py

🧬 Code graph analysis (2)

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py (3)

src/nat/llm/dynamo_llm.py (3)

dynamo_llm (356-361)

DynamoModelConfig (164-243)

get_dynamo_field_names (221-243)

src/nat/builder/framework_enum.py (1)

LLMFrameworkEnum (19-26)

src/nat/utils/responses_api.py (1)

validate_no_responses_api (21-26)

src/nat/llm/dynamo_llm.py (1)

src/nat/data_models/optimizable.py (2)

OptimizableField (157-196)

SearchSpace (35-154)

🔇 Additional comments (6)

packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py (1)

130-205: LGTM! Well-structured Dynamo ADK client implementation.

The dynamo_adk function correctly follows the established pattern for ADK client registrations in this file. The implementation properly:

Validates API compatibility with validate_no_responses_api

Excludes Dynamo-specific fields using DynamoModelConfig.get_dynamo_field_names()

Constructs prefix headers matching the Dynamo spec (x-prefix-id, x-prefix-total-requests, x-prefix-osl, x-prefix-iat)

Documents the static header limitation and recommends LangChain for dynamic prefix IDs

The docstring clearly explains the ADK-specific constraint that LiteLLM sets headers at initialization time rather than per-request, which is helpful context for users choosing between frameworks.

src/nat/llm/dynamo_llm.py (5)

1-75: Excellent module documentation and setup.

The copyright headers, license information, and module-level documentation are comprehensive and correct. The docstring clearly explains:

The module's purpose (OpenAI-compatible LLM with Dynamo prefix headers)

How httpx event hooks enable framework-agnostic header injection

Detailed explanations of each Dynamo prefix parameter (OSL, IAT, total_requests)

The imports are well-organized and appropriate for the functionality provided.

82-156: Well-implemented context management pattern.

The DynamoPrefixContext class provides a clean, thread-safe way to manage prefix IDs across LLM calls using ContextVar. Key strengths:

Thread-safe context variable storage appropriate for async environments

Context manager (scope) properly ensures cleanup via finally block, even on exceptions

Clear docstrings with usage examples for both manual and context manager approaches

Proper type hints on all public methods

This design enables per-evaluation-question prefix IDs while maintaining safety across concurrent operations.

164-243: Solid configuration model with optimizer integration.

The DynamoModelConfig class is well-designed with proper field constraints and optimizer integration:

Inherits from OpenAIModelConfig for OpenAI API compatibility

Uses OptimizableField with appropriate SearchSpace definitions for the three tunable parameters

Field-level validation constraints (e.g., ge=1, le=50) are appropriately broader than optimizer search spaces, allowing manual overrides outside the search range

get_dynamo_field_names() utility method provides clean field exclusion for framework adapters

Comprehensive docstrings explain the purpose and impact of each Dynamo parameter

The default values (prefix_template enabled, MEDIUM for OSL/IAT, 10 total requests) provide sensible starting points for optimization.

315-345: Clean httpx client factory with proper configuration.

The function provides a straightforward way to create a configured httpx.AsyncClient:

Proper type hints on all parameters and return value

Correctly configures event hooks with the Dynamo request hook

Sets appropriate timeout using httpx.Timeout wrapper

Local import of httpx is appropriate for optional dependency handling

Clear docstring explains the framework-agnostic approach

This factory function enables easy integration with any framework that accepts custom httpx clients (OpenAI SDK, LangChain, etc.).

355-361: Provider registration correctly implemented.

The dynamo_llm function properly registers the Dynamo LLM provider:

Correctly decorated with @register_llm_provider using the DynamoModelConfig type

Yields LLMProviderInfo with appropriate config and description

Description clearly communicates the provider's purpose (Dynamo-optimized with automatic prefix headers)

The registration pattern is consistent with other LLM providers in the codebase and enables automatic discovery by the NAT framework.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T21:05:15Z

/ok to test 54aeb06

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/dynamo_integration/react_benchmark_agent/pyproject.toml (1)
18-18: Consider using lowercase "toolkit" for consistency.

The root pyproject.toml uses "NVIDIA NeMo Agent toolkit" with lowercase "toolkit". For consistency, consider updating the description to match this convention.
-description = "Custom NeMo Agent Toolkit Workflow"
+description = "Custom NeMo Agent toolkit Workflow"
Alternatively, you could make it more specific to this example:
-description = "Custom NeMo Agent Toolkit Workflow"
+description = "React Benchmark Agent for Dynamo integration"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3eb804f and 54aeb06.

📒 Files selected for processing (3)

examples/dynamo_integration/react_benchmark_agent/pyproject.toml (1 hunks)
examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md (1 hunks)
pyproject.toml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

examples/dynamo_integration/react_benchmark_agent/src/react_benchmark_agent/README.md

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

**/pyproject.toml

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/pyproject.toml: Dependencies should use ~= format with two-digit versions (e.g., ~=1.0)
New dependencies must be added to both pyproject.toml (alphabetically) and uv.lock via uv pip install --sync

Files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml

🧠 Learnings (7)

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to packages/*/pyproject.toml : The pyproject.toml should declare a dependency on nvidia-nat or another package with a name starting with nvidia-nat-

Applied to files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Each package should contain a pyproject.toml file

Applied to files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml

📚 Learning: 2025-11-10T21:26:35.059Z

Learnt from: jiaxiangr
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 974
File: packages/nvidia_nat_all/pyproject.toml:39-39
Timestamp: 2025-11-10T21:26:35.059Z
Learning: In packages/nvidia_nat_all/pyproject.toml, workspace dependencies (nvidia-nat-* plugin packages) should NOT have version constraints because they are managed as workspace dependencies. Version constraints are only applied to the base nvidia-nat package and external dependencies, not to internal workspace packages.

Applied to files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

📚 Learning: 2025-12-03T18:42:23.494Z

Learnt from: AnuradhaKaruppiah
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 1147
File: packages/nvidia_nat_a2a/pyproject.toml:1-10
Timestamp: 2025-12-03T18:42:23.494Z
Learning: In the packages/ directory, pyproject.toml files typically do not include SPDX license headers. Out of 34 packages, only nvidia_nat_strands is an exception. This pattern differs from the requirement for SPDX headers in source code files (.py, .js, .ts, etc.).

Applied to files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{py,toml,yaml,yml} : Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Applied to files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml
pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Applies to **/*.{md,rst,py} : Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Applied to files:

examples/dynamo_integration/react_benchmark_agent/pyproject.toml

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Python code in examples should be placed in a src/ subdirectory with a pyproject.toml file

Applied to files:

pyproject.toml

🔇 Additional comments (2)

pyproject.toml (1)

122-122: LGTM! Example registration looks correct.

The new nat_react_benchmark_agent example is properly registered in both the examples list and workspace sources with correct alphabetical ordering and path configuration.

Also applies to: 228-228

examples/dynamo_integration/react_benchmark_agent/pyproject.toml (1)

1-1: SPDX headers are not required for pyproject.toml files in the examples/ directory.

The coding guidelines specify SPDX headers only for source code files: **/*.{py,js,ts,java,cpp,c,go,rb,php,sh}. Configuration files like pyproject.toml are explicitly excluded from this requirement. Additionally, license information for Python packages is declared in the [project] table of pyproject.toml as packaging metadata, not as file-level SPDX headers. This pattern is consistent throughout both the examples/ and packages/ directories in the repository.

Likely an incorrect or invalid review comment.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T22:04:34Z

/ok to test 0205fa3

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T22:15:59Z

/ok to test d33c2b7

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

packages/nvidia_nat_langchain/tests/test_llm_langchain.py (2)
171-208: Well-structured test fixtures with comprehensive coverage.

The fixtures cover the key test scenarios:

No prefix template (default behavior)

With prefix template (header injection with all optimization params)

RESPONSES API type (special branch logic)

Optional: Consider file-wide fixture naming refactor.

The coding guidelines specify that fixtures should use the fixture_ prefix and define a name argument in the decorator. For example:
@pytest.fixture(name="dynamo_cfg_no_prefix")
def fixture_dynamo_cfg_no_prefix(self):
    """Dynamo config without prefix template (no header injection)."""
    return DynamoModelConfig(...)
However, this pattern is not used anywhere in this file (all existing fixtures use descriptive names without the fixture_ prefix). Updating this would be a file-wide refactoring effort that could be deferred to maintain consistency with the existing code.

276-313: Strong test with excellent documentation, but could be more maintainable.

The test has a detailed docstring explaining the rationale for field exclusion, and the assertions are clear. However, the field names are hardcoded rather than derived from get_dynamo_field_names().

Consider making this test more maintainable by programmatically checking against the fields returned by get_dynamo_field_names():
+        # Get the Dynamo-specific fields that should be excluded
+        excluded_fields = dynamo_cfg_with_prefix.get_dynamo_field_names()
+        
         # These Dynamo-specific fields should NOT be passed to ChatOpenAI
-        assert "prefix_template" not in kwargs
-        assert "prefix_total_requests" not in kwargs
-        assert "prefix_osl" not in kwargs
-        assert "prefix_iat" not in kwargs
-        assert "request_timeout" not in kwargs
+        for field_name in excluded_fields:
+            assert field_name not in kwargs, f"Dynamo-specific field '{field_name}' leaked to ChatOpenAI kwargs"
This ensures the test stays in sync if get_dynamo_field_names() is updated in the future.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0205fa3 and d33c2b7.

📒 Files selected for processing (1)

packages/nvidia_nat_langchain/tests/test_llm_langchain.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (11)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/test_*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/test_*.py: Use pytest with pytest-asyncio for asynchronous code testing
Test functions should be named using the test_ prefix, using snake_case
Extract frequently repeated code into pytest fixtures, which should be named using the fixture_ prefix and define the name argument in the decorator
Mock external services with pytest_httpserver or unittest.mock instead of hitting live endpoints
Mark slow tests with @pytest.mark.slow so they can be skipped in the default test suite
Mark integration tests requiring external services with @pytest.mark.integration so they can be skipped in the default test suite

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/ directory at the same level as the pyproject.toml file.

When adding a new package, that new package name (as defined in the pyproject.toml file) should
be added as a dependency to the nvidia-nat-all package in packages/nvidia_nat_all/pyproject.toml

Files:

packages/nvidia_nat_langchain/tests/test_llm_langchain.py

🧬 Code graph analysis (1)

packages/nvidia_nat_langchain/tests/test_llm_langchain.py (1)

src/nat/llm/dynamo_llm.py (2)

dynamo_llm (356-361)

DynamoModelConfig (164-243)

🔇 Additional comments (3)

packages/nvidia_nat_langchain/tests/test_llm_langchain.py (3)

18-18: LGTM! Imports are appropriate.

The new imports (AsyncMock, DynamoModelConfig, dynamo_langchain) are necessary for the Dynamo integration tests and follow the existing import organization pattern.

Also applies to: 28-28, 32-32

210-275: Excellent test coverage with proper async handling.

The test methods effectively verify:

No custom httpx client when prefix template is absent

Custom httpx client creation with correct parameters when prefix template is present

Proper async cleanup (aclose awaited)

RESPONSES API special flags

The AsyncMock usage for the aclose method is correct and the tests properly verify cleanup occurs.

327-337: LGTM! Registry mapping correctly updated.

The DynamoModelConfig mapping is properly added to the mock registry, following the same pattern as the existing NIM, OpenAI, and Bedrock mappings. The assertion correctly verifies the registration.

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 · 2025-12-16T22:47:27Z

/ok to test da05743

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/nat/llm/dynamo_llm.py (1)
305-306: Consider removing redundant .upper() calls.

The osl and iat parameters are already typed as PrefixLevel = Literal["LOW", "MEDIUM", "HIGH"], so calling .upper() is redundant. While harmless, removing it would make the code cleaner.

Apply this diff if you'd like to simplify:
-        request.headers["x-prefix-osl"] = osl.upper()
-        request.headers["x-prefix-iat"] = iat.upper()
+        request.headers["x-prefix-osl"] = osl
+        request.headers["x-prefix-iat"] = iat

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d33c2b7 and da05743.

📒 Files selected for processing (1)

src/nat/llm/dynamo_llm.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (10)

**/*.{md,rst,py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use 'NVIDIA NeMo Agent toolkit' on first use, then 'NeMo Agent toolkit' for subsequent references

Files:

src/nat/llm/dynamo_llm.py

**/*.{py,toml,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Use abbreviations: 'nat' for API namespace and CLI tool, 'nvidia-nat' for package name, 'NAT' for environment variable prefixes and informal comments

Files:

src/nat/llm/dynamo_llm.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf second (PEP 8 base, column_limit = 120) for Python formatting
Use ruff check --fix (via pre-commit) for linting using configuration embedded in pyproject.toml, fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer collections.abc / typing abstractions (Sequence over list) for type hints
Use typing.Annotated for units or extra metadata when useful
Treat pyright warnings (configured in pyproject.toml) as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare raise statements and logger.error() when re-raising
When catching and logging exceptions without re-raising, always use logger.exception() to capture the full stack trace information
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period (Vale checks this)
Surround code entities with backticks in docstrings to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with functools.lru_cache or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible

Files:

src/nat/llm/dynamo_llm.py

**/*.{py,js,ts,yaml,yml,json,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Indent with 4 spaces, never tabs, and ensure every file ends with a single newline

Files:

src/nat/llm/dynamo_llm.py

**/*.{py,env,toml,yaml,yml,json}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never commit API keys, credentials or personal data; use environment variables or .env files excluded from Git

Files:

src/nat/llm/dynamo_llm.py

**/*.{py,js,ts,java,cpp,c,go,rb,php}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Every file must start with the standard SPDX Apache-2.0 header

Files:

src/nat/llm/dynamo_llm.py

**/*.{py,md,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Version numbers are derived automatically by setuptools-scm; never hard-code them in code or docs

Files:

src/nat/llm/dynamo_llm.py

**/*.{py,js,ts,java,cpp,c,go,rb,php,sh}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All source files must include the SPDX Apache-2.0 header template

Files:

src/nat/llm/dynamo_llm.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of None,
in that situation no return type hint is needed).
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

src/nat/llm/dynamo_llm.py

src/nat/**/*

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

src/nat/llm/dynamo_llm.py

🧠 Learnings (1)

📚 Learning: 2025-11-24T18:56:53.109Z

Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-11-24T18:56:53.109Z
Learning: Confirm that copyright years are up-to-date whenever a file is changed

Applied to files:

src/nat/llm/dynamo_llm.py

🧬 Code graph analysis (1)

src/nat/llm/dynamo_llm.py (2)

src/nat/data_models/optimizable.py (2)

OptimizableField (157-196)

SearchSpace (35-154)

src/nat/llm/openai_llm.py (2)

openai_llm (57-59)

OpenAIModelConfig (32-53)

🔇 Additional comments (7)

src/nat/llm/dynamo_llm.py (7)

1-14: LGTM!

License header is correct with up-to-date copyright year (2025) and complete Apache 2.0 license text.

15-47: LGTM!

Excellent module-level documentation. The docstring clearly explains the purpose, provides context for Dynamo prefix parameters (OSL, IAT, total_requests), and follows Google-style conventions.

49-75: LGTM!

Imports are well-organized and follow best practices. Good use of TYPE_CHECKING to defer httpx import and collections.abc for abstract type hints per coding guidelines.

82-157: LGTM!

Well-implemented context management using ContextVar for thread-safe prefix ID tracking. The scope() context manager provides clean resource management, and all methods include proper type hints and comprehensive docstrings.

165-245: LGTM!

Excellent Pydantic configuration model. Proper use of OptimizableField with SearchSpace for hyperparameter tuning, comprehensive field descriptions, and appropriate validation constraints. The get_dynamo_field_names() utility method is well-documented and useful for framework integrations.

317-347: LGTM with one caveat.

Well-structured function that creates a properly configured httpx client. The local import, timeout configuration, and event hook setup are all correct.

Note: This function inherits the prefix_template=None behavior issue flagged in the previous comment. Ensure callers check for None before invoking this function, or add logic here to handle None appropriately.

357-363: LGTM!

Provider registration follows the framework's established pattern correctly. The async generator properly yields LLMProviderInfo with appropriate configuration and description.

src/nat/llm/dynamo_llm.py

bbednarski9 · 2025-12-16T23:41:28Z

/merge

… routers (NVIDIA#1242) This is an initial integration between the NVIDIA NAT and [Dynamo](https://github.com/ai-dynamo/dynamo) SDKs. The scope of this PR is around establishing key integrations and a robust evaluation harness, that will allow the NAT team to explore agentic inference acceleration using Dynamo servers. There are a few key components for review: - [dynamo components] (./external/dynamo): startup and shutdown scripts for three variates of dyanmo servers - unified server with KV aware routing, unified server with predictive KV aware routing (thompson sampling), and disaggregated serving with KV aware routing - [dynamo LLM provider] (./src/nat/llm/dynamo_llm.py): a new llm provider that enabled custom request prefix headers which can be optimized for the predictive kv aware cache router (thompson). Framework integration in place for langchain and google ADK - [dynamo integration example agents] (./examples/dynamo_integration) - example project for running nat workflows (run, eval, optimize, profile) and compare performance between difference inference endpoints. Analysis and visualization scripts included. ### Musings - tool calling for the Galileo-V2 dataset used here is not parameterized by the agent in order to help normalize the OSL. - post-processing of experiment data is not integrated with profiler yet. There are currently two sets of plotting tools with _histograms_ being the newer. - After internal discussion, we opted to build a new dynamo llm provider (dynamo_llm.py), rather than implementing this as a mixin. It keeps development isolated to the dynamo components so that we can be more agile moving forward with development. - Features in this PR are largely experimental and non-breaking to the core library - There is an additional evaluator defined in `action_completion_evaluator.py`, its not used in current evaluations but im going to leave it here for completeness and possible extension. Closes ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Dynamo LLM provider and end-to-end Dynamo backend integration with prefix-hint support and streaming/non-streaming API. * **New Agents & Tools** * React benchmark agent, decision-only tool stubs, banking tools group, self-evaluating agent with feedback/retry loop, and intent-buffer utilities. * **New Evaluators** * Tool Selection Quality (TSQ) and Action Completion (AC) evaluators. * **Documentation** * Comprehensive Dynamo integration guides, architecture docs, and READMEs. * **Tools & Scripts** * Deployment, start/stop, monitoring, benchmarking, plotting, dataset utilities, and env templates. * **Tests** * Extensive unit and integration tests across LLMs, agents, evaluators, and analysis tools. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> Authors: - Bryan Bednarski (https://github.com/bbednarski9) Approvers: - Dhruv Nandakumar (https://github.com/dnandakumar-nv) URL: NVIDIA#1242 Signed-off-by: Yuchen Zhang <yuchenz@nvidia.com>

zcs9ybtjyg-bit · 2026-01-17T08:35:45Z

This integration raises an interesting question around authorization
once inference becomes agentic and routed asynchronously.

In setups like this, where requests are dynamically routed and
execution may occur without a synchronous checkpoint, how are
human approvals preserved in a way that’s reviewable after the fact?

I’m less concerned with runtime control than with whether there’s a
clear artifact trail showing:
• who approved an action
• when it occurred
• and what context was visible at the time

Curious if this has been discussed elsewhere in the project, or if
there are plans to address auditability alongside routing efficiency?

bbednarski9 added non-breaking Non-breaking change ! - Release Related to a Major Release labels Dec 12, 2025

bbednarski9 added the feature request New feature or request label Dec 12, 2025

willkill07 removed the ! - Release Related to a Major Release label Dec 12, 2025

bbednarski9 added 3 commits December 12, 2025 17:04

migrate from gitlab

4783b44

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

dynamo llm support for langchain and adk

48b927a

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

readme and test flow cleanup

750926f

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 force-pushed the bb/dynamo-R14-integration branch from 20147d7 to 750926f Compare December 12, 2025 17:06

update router, frontend, processor to latest

d5d7081

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

improve hardware prereqs and setup flow

0090cd8

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 marked this pull request as ready for review December 15, 2025 16:39

bbednarski9 requested a review from a team as a code owner December 15, 2025 16:39

bbednarski9 requested review from dnandakumar-nv, willkill07 and zhongxuanwang-nv December 15, 2025 16:40

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

willkill07 reviewed Dec 15, 2025

View reviewed changes

src/nat/llm/dynamo_llm.py Outdated Show resolved Hide resolved

packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py Outdated Show resolved Hide resolved

src/nat/llm/dynamo_llm.py Outdated Show resolved Hide resolved

src/nat/llm/dynamo_llm.py Outdated Show resolved Hide resolved

abs to rel symlink

d3a86a2

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

examples/dynamo_integration/scripts/plot_throughput_histograms_per_request.py Show resolved Hide resolved

function group reuse, renaming configs

565663b

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

add copyright headers

590f727

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

singleton for DynamoLLMConfig, tests

cf08736

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 15, 2025

View reviewed changes

Merge remote-tracking branch 'origin/release/1.4' into pr/bbednarski9…

0773d81

…/1242 Merging release/1.4 commits since creating this feature branch for dynamo

update docs

3fb6d64

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

docs/source/build-workflows/llms/index.md Show resolved Hide resolved

docs/source/build-workflows/llms/index.md Outdated Show resolved Hide resolved

spelling

6d4ff39

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

vale :)

ce72671

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

bbednarski9 added 2 commits December 16, 2025 12:36

CI checks

3eb804f

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

Merge remote-tracking branch 'origin/release/1.4' into pr/bbednarski9…

072a7f7

…/1242

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

import pyproject at root

54aeb06

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

unit test cleanup

0205fa3

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

yapf

d33c2b7

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

dnandakumar-nv approved these changes Dec 16, 2025

View reviewed changes

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

Sphinx formatting

da05743

Signed-off-by: Bryan Bednarski <bbednarski@nvidia.com>

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

src/nat/llm/dynamo_llm.py Show resolved Hide resolved

src/nat/llm/dynamo_llm.py Show resolved Hide resolved

rapids-bot bot merged commit 3126f75 into NVIDIA:release/1.4 Dec 16, 2025
17 checks passed

Conversation

bbednarski9 commented Dec 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Musings

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Dec 12, 2025

Uh oh!

coderabbitai bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

bbednarski9 commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbednarski9 commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbednarski9 commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

SPDX-License-Identifier: Apache-2.0

For function groups, list the group name; all tools in the group are added automatically

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willkill07 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

bbednarski9 commented Dec 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 12, 2025 •

edited

Loading

bbednarski9 commented Dec 14, 2025 •

edited

Loading

bbednarski9 commented Dec 14, 2025 •

edited

Loading

bbednarski9 commented Dec 14, 2025 •

edited

Loading