Add NAT Agent Hyperparameter Optimizer by dnandakumar-nv · Pull Request #650 · NVIDIA/NeMo-Agent-Toolkit

dnandakumar-nv · 2025-08-14T19:13:07Z

Description

End-to-end Parameter Optimization for NAT: Optimizable Fields, Numeric (Optuna) Search, GA-based Prompt Optimization, and Evaluation Integration

Summary

This PR introduces a complete optimization subsystem for the NeMo Agent toolkit (NAT) that can tune both numeric/enumerated hyperparameters and prompts. It provides:

A first-class way to declare optimizable config fields with searchable distributions.
A numeric/enum optimizer backed by Optuna with multi-objective support, trial artifacts, and Pareto plots.
A genetic algorithm (GA) prompt optimizer that evolves prompts using registered functions, with per-generation checkpoints and history.
Tight integration with the evaluation runner so objective values are computed by real evaluators.
Utilities to discover optimizable fields and safely apply updates to Pydantic models.
Documentation and tests across the feature surface.

Scope

New data-modeling primitives for optimizability.
Numeric optimization using Optuna and multi-objective trial selection.
GA prompt optimization with configurable reproduction and concurrency.
Infrastructure for finding optimizable fields and applying updates.
Outputs suitable for dashboards and offline analysis.
Extensive tests and a guide for usage.

Key Components

Optimizable fields

src/nat/data_models/optimizable.py
- SearchSpace[T]: distribution/range metadata for int/float/categorical or prompt fields.
- OptimizableField(...): drop-in replacement for pydantic.Field(...) that attaches:
  - json_schema_extra = {"optimizable": True, "search_space": SearchSpace}
  - Merge behavior for user-provided json_schema_extra (overwrite/keep/error).
- OptimizableMixin: base class adding optimizable_params: list[str] to gate discovery.
src/nat/profiler/parameter_optimization/optimizable_utils.py
- walk_optimizables(model) -> dict[flattened.path, SearchSpace]
- Traverses nested models and dict-of-models; respects optimizable_params and emits a warning if optimizable fields exist but no allowlist is set.

Parameter optimization (numeric/enumerated)

src/nat/profiler/parameter_optimization/parameter_optimizer.py
- optimize_parameters(...) -> Config
- Filters out SearchSpace.is_prompt; uses Optuna to search numeric/enum params.
- Multi-objective directions from OptimizerMetric.direction and final trial picked via weighted harmonic/sum/chebyshev.
- Supports repetitions per trial and stores per-repetition rep_scores in trial user attributes.
- Persists:
  - Per-trial configs: config_numeric_trial_{trial_id}.yml
  - Final optimized config: optimized_config.yml
  - Trials DataFrame CSV with rep_scores: trials_dataframe_params.csv
  - Pareto plots: plots/

Prompt optimization (genetic algorithm)

src/nat/profiler/parameter_optimization/prompt_optimizer.py
- optimize_prompts(...) -> None
- Discovers SearchSpace.is_prompt fields with purposes, then evolves a population of prompt sets.
- Uses WorkflowBuilder and registered functions to:
  - Initialize/mutate prompts (required optimizer.prompt.prompt_population_init_function)
  - Optionally recombine prompts (optimizer.prompt.prompt_recombination_function)
- Configurable GA: population size, generations, crossover/mutation rates, tournament/roulette selection, elitism, parallel evaluation limit, diversity penalty, and reps per candidate.
- Persists:
  - Generation checkpoints: optimized_prompts_gen{n}.json
  - Final best prompts: optimized_prompts.json
  - GA history: ga_history_prompts.csv

Optimizer configuration models

src/nat/data_models/optimizer.py
- OptimizerMetric: evaluator_name, direction ("maximize"|"minimize"), weight.
- OptimizerConfig: output_path, eval_metrics, reps_per_param_set, multi_objective_combination_mode, and nested numeric/prompt configs.
- OptimizerRunConfig: config_file (or model), dataset, endpoint, timeout, overrides.

Update helpers

src/nat/profiler/parameter_optimization/update_helpers.py
- nest_updates(...): now deep-merges nested dicts so dotted paths with shared prefixes preserve siblings.
- apply_suggestions(cfg, updates) -> BaseModel: returns a new model with dotted-path updates applied (non-mutating).

User-facing Usage

1) Declare optimizable fields in your models (one-time)

....
from nat.data_models.optimizable import OptimizableField, OptimizableMixin, SearchSpace

class MyConfig(OptimizableMixin, BaseModel, name="some_config"):
    llm_name: str
    param_a: int = OptimizableField(3, space=SearchSpace(low=1, high=10))
    param_b: float = OptimizableField(0.5, space=SearchSpace(low=0.1, high=1.0, log=True))
    prompt: str = OptimizableField(
            "You are a helpful assistant.",
            space=SearchSpace(is_prompt=True, prompt_purpose="General QA")
        )

Notes:

SearchSpace.low can be a sequence for categorical choices; high=None implies categorical.
Prompt fields are declared with is_prompt=True. Numeric optimization ignores prompts; GA prompt optimization uses them.

2) Enable fields and configure the optimizer in your config file

Minimal example showing where to mark fields as optimizable and how to set optimizer behavior:

# Mark fields to optimize (example for an LLM + workflow)
llms:
  nim_llm:
    _type: nim
    model_name: meta/llama-3.1-70b-instruct
    temperature: 0.0
    optimizable_params:
      - temperature
      - top_p
      - max_tokens

workflow:
  _type: some_config
  llm_name: nim_llm
  optimizable_params:
    - param_a
    - param_b
    - prompt

# Evaluators and dataset
eval:
  general:
    output_dir: ./.tmp/eval/run_01
    dataset:
      _type: csv
      file_path: path/to/dataset.csv
      id_key: id
      structure:
        question_key: input
        answer_key: expected

  evaluators:
    accuracy:
      _type: ragas
      metric: AnswerAccuracy
      llm_name: nim_llm
    latency:
      _type: avg_llm_latency

# Optimizer configuration
optimizer:
  output_path: ./.tmp/optimizer/run_01
  reps_per_param_set: 2
  eval_metrics:
    accuracy:
      evaluator_name: accuracy
      direction: maximize
      weight: 1.0
    latency:
      evaluator_name: latency
      direction: minimize
      weight: 0.5
  multi_objective_combination_mode: harmonic

  # Numeric (Optuna)
  numeric:
    enabled: true
    n_trials: 20

  # Prompt (Genetic Algorithm)
  prompt:
    enabled: true
    prompt_population_init_function: prompt_init
    prompt_recombination_function: prompt_recombination   # optional
    ga_population_size: 16
    ga_generations: 8
    ga_parallel_evaluations: 8

If you use GA prompt optimization, declare the supporting functions in the functions section so the CLI can find them:

functions:
  prompt_init:
    _type: prompt_init
    optimizer_llm: nim_llm
    system_objective: Agent that answers general QA accurately.
  prompt_recombination:
    _type: prompt_recombiner
    optimizer_llm: nim_llm
    system_objective: Agent that answers general QA accurately.

3) Run the optimizer from the CLI

Basic run:

nat optimize --config_file path/to/config.yaml

With overrides:

nat optimize --config_file path/to/config.yaml \
  --dataset path/to/dataset.jsonl \
  --result_json_path '$' \
  --endpoint http://localhost:8000/your-endpoint \
  --endpoint_timeout 300

The optimizer will execute numeric and/or prompt optimization based on your optimizer.numeric.enabled and optimizer.prompt.enabled flags.

Artifacts and Observability

Numeric search:
- optimized_config.yml
- trials_dataframe_params.csv (flattens user_attrs → rep_scores)
- config_numeric_trial_{trial_id}.yml
- plots/ (Pareto fronts)
GA prompt optimization:
- optimized_prompts_gen{n}.json (per-generation checkpoints)
- optimized_prompts.json (final best prompts with purposes)
- ga_history_prompts.csv (per-individual metrics/scalar fitness over generations)

Performance Considerations

Numeric search scales with n_trials and reps_per_param_set.
GA prompt optimization parallelizes evaluations up to ga_parallel_evaluations. Population and generation sizes control runtime.
Both optimizers defer evaluation to EvaluationRun, so overall throughput depends on evaluator cost and any configured endpoint.

Documentation

Reference guide updated: docs/source/reference/optimizer_guide.md (how to declare fields, configure optimizers, run optimization, interpret artifacts).
Inline docstrings in the new modules clarify behavior and parameters.

Test Coverage

Data models and utilities: tests/nat/data_models/test_optimizable.py, tests/nat/profiler/test_optimizable_utils.py, tests/nat/profiler/test_update_helpers.py
Numeric optimizer: tests/nat/profiler/test_parameter_optimizer.py
Prompt optimizer (with real WorkflowBuilder): tests/nat/profiler/test_prompt_optimizer.py

Checklist

Optimizable field primitives, discovery, and safe updates
Optuna-backed numeric/enum optimizer with artifacts and plots
GA prompt optimizer with builder integration and checkpoints
Integration with EvaluationRun for objective computation
Deep‑merge fix for nest_updates
Tests and docs updated

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

New Features
- End-to-end Optimizer: numeric (Optuna) and GA-based prompt optimization with CLI command, config schemas, model metadata for tunable params, and runtime orchestration.
Documentation
- Optimizer reference added plus README and config examples showing usage, CLI invocation, and expected outputs.
Examples
- Email phishing analyzer: optimizer-ready config and prompt integration.
Runtime Evaluators
- New evaluators measuring LLM latency, token usage, call counts, and workflow runtime.
Visualization
- Pareto visualization toolkit for optimization results.
Tests
- Extensive tests covering optimizables, optimizer flows, GA prompt optimizer, utilities, and CLI.
Chores
- Added Optuna dependency and vocabulary updates.

Introduced new modules and configurations for parameter optimization in the AIQ system. This includes support for hyperparameter optimization using Optuna, updated data models, search space management, and utility functions. Additional updates include CLI commands and configuration enhancements to enable numeric and prompt-based optimization functionality. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Introduce `reps_per_param_set` to allow multiple repetitions of optimization runs for improved metric stability. Updated evaluation logic to calculate averaged metric scores across repetitions, ensuring more reliable optimization outcomes. Additionally, refactored evaluation code for clarity and concurrency. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Introduce `OptimizableMixin` to standardize handling of optimizable fields across models. Replaced manual introspection with Pydantic-based attributes and added field allow-lists. Updated LLM configurations to integrate `OptimizableMixin` and define searchable hyperparameters using `OptimizableField`. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Reorganized the parameter optimization codebase for better modularity and clarity. Extracted helper functions (`walk_optimizables`, `apply_suggestions`, `nest_updates`), and separated prompt optimization (`prompt_optimizer.py`) and numeric optimization (`parameter_optimizer.py`) into distinct modules. Updated `optimize_config` to invoke modularized logic, improving maintainability and separation of concerns. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Replaced relative imports with explicit absolute paths in optimizer runtime, parameter optimizer, and prompt optimizer modules. This improves code clarity, ensures compatibility across different execution contexts, and aligns with best practices. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

The import path for PromptOptimizerInputSchema was corrected to reflect its new location in the module. This change ensures proper functionality and resolves potential import errors. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Added `trial_idx` argument to `_single_eval` for better tracking of individual evaluations. Also implemented saving of trial configuration files with unique names to facilitate debugging and reproducibility. Adjusted function calls accordingly to pass trial indices. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Introduced a comprehensive guide for using the AIQ Optimizer, detailing configuration, usage, and output analysis. Added a new `ParetoVisualizer` module to support advanced Pareto front visualizations for multi-objective optimization, with 2D scatter plots, parallel coordinates, and pairwise metrics comparison. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Log a warning and skip optimization when no optimizable parameters are found in the configuration. This prevents unnecessary processing and ensures clearer debugging information. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Refactor parameter optimization utilities to the experimental module to better categorize development-stage features. Updated all relevant imports and documentation references accordingly. Minor adjustments made to file output logic for trial configurations. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

This refactor reorganizes and renames the `parameter_optimization` module to `optimizer` for consistency and simplicity. All related imports and references have been updated to reflect the new module name. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Applied the `@aiq_experimental` decorator to `optimize_config` to signal its experimental status. This ensures better visibility to users when using potentially unstable features. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Ensure `output_path` and `eval_metrics` are validated in `parameter_optimizer.py` to prevent misconfigurations. Refactor evaluation tasks into a dedicated async function for improved clarity and reuse across both optimizers. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

Simplifies optimizer handling by removing unused annotations, checks, and validators. Integrates prompt optimization into the ReAct agent workflow and evaluation configs, enabling tuning of parameters like `temperature` and `additional_instructions`. Enhances compatibility and modularity for prompt and numeric optimization workflows. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>

This reverts commit 8e26d72.

…pace"" This reverts commit 6fdcfa6.

This reverts commit 96f207a.

This reverts commit 741869f.

This reverts commit 8703867.

This reverts commit 70adfa7.

…ule" This reverts commit ef2413d.

Eliminated legacy validation and annotations related to OptimizerConfig along with unnecessary Optuna availability checks. Simplified imports and removed redundant code to enhance maintainability and reduce clutter.

Added validation to check that `output_path` and `eval_metrics` in `optimizer_config` are not None to prevent runtime errors. Refactored async evaluation logic to improve clarity and maintainability by encapsulating task creation in `_run_all_evals` functions for both parameter and prompt optimizers.

Moved trial config saving outside evaluation loop for efficiency. Extended evaluation config with prompt optimization and numeric optimization settings, while removing outdated profiler configurations for cleaner workflows.

Replaced "prompt_evaluation_function" with "prompt_optimization_function" to align with new functionality. Introduced a `CustomTrajectoryOutputParser` to parse ReAct agent trajectories and normalize scores. Updated the Pareto visualizer for improved formatting and added optimizable fields to configuration files for parameter tuning.

Updated feedback storage to associate trajectory feedback with all prompt parameters used in each trial. This change ensures more accurate tracking and distribution of feedback per parameter, improving optimization reliability.

Persist raw per-repetition evaluation scores in the `rep_scores` user attribute for trials. Update the export logic to include these scores in the `trials_dataframe` CSVs, ensuring better traceability and convenience by flattening `rep_scores` into its own column. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>

# Conflicts: # src/nat/agent/react_agent/register.py # src/nat/cli/commands/optimize.py # src/nat/cli/type_registry.py # src/nat/data_models/config.py # src/nat/data_models/optimizable.py # src/nat/data_models/optimizer.py # src/nat/eval/trajectory_evaluator/evaluate.py # src/nat/eval/trajectory_evaluator/output_parser.py # src/nat/llm/aws_bedrock_llm.py # src/nat/llm/nim_llm.py # src/nat/llm/openai_llm.py # uv.lock

Replaced Optuna-based prompt optimization with a genetic algorithm for greater flexibility in multi-objective scenarios. Added support for advanced genetic operations, population diversity management, and scalable parallel evaluations. Updated related configurations and dependencies accordingly.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (16)

tests/nat/profiler/test_parameter_selection_extra.py (3)

22-27: Add type hints and remove unused noqa (RUF100).

Type hints are required; drop the noqa and type the helper. Also import Study and Sequence.

Apply this diff:

+from typing import Sequence
 import optuna
-from optuna.study import StudyDirection
+from optuna.study import Study, StudyDirection

 from nat.profiler.parameter_optimization.parameter_selection import pick_trial

-def _make_study_with_trials(values_list):  # noqa: ANN001
+def _make_study_with_trials(values_list: Sequence[Sequence[float]]) -> Study:
     study = optuna.create_study(directions=[StudyDirection.MINIMIZE, StudyDirection.MINIMIZE])
     for vals in values_list:
         t = optuna.trial.create_trial(values=list(vals), params={}, distributions={})
         study.add_trial(t)
     return study

Also applies to: 16-20

48-52: Replace assert False with pytest.raises (B011) and import pytest.

assert False can be stripped by -O; use pytest.raises for negative tests.

Apply this diff:

+import pytest
@@
-    try:
-        pick_trial(study, mode="sum", weights=[1.0])
-        assert False, "Expected ValueError for weights length"
-    except ValueError:
-        pass
+    with pytest.raises(ValueError):
+        pick_trial(study, mode="sum", weights=[1.0])
@@
-    try:
-        pick_trial(study, mode="unknown_mode")
-        assert False, "Expected ValueError for unknown mode"
-    except ValueError:
-        pass
+    with pytest.raises(ValueError):
+        pick_trial(study, mode="unknown_mode")
@@
-    try:
-        pick_trial(study, mode="sum")
-        assert False, "Expected ValueError for empty Pareto front"
-    except ValueError:
-        pass
+    with pytest.raises(ValueError):
+        pick_trial(study, mode="sum")

Also applies to: 59-63, 68-72, 16-20

73-73: Add quick coverage for harmonic and hypervolume modes.

Low-effort tests to guard behavior and future refactors.

Apply this diff:

+
+def test_pick_trial_harmonic_selects_center_point():
+    vals = [(0.1, 0.9), (0.2, 0.2), (0.9, 0.1)]
+    study = _make_study_with_trials(vals)
+    trial = pick_trial(study, mode="harmonic")
+    assert tuple(trial.values) == (0.2, 0.2)
+
+
+def test_pick_trial_hypervolume_single_point_returns_that_point():
+    vals = [(0.4, 0.6)]
+    study = _make_study_with_trials(vals)
+    trial = pick_trial(study, mode="hypervolume")
+    assert tuple(trial.values) == (0.4, 0.6)

tests/nat/eval/utils/test_tqdm_position_registry_extra.py (2)

25-27: Comment doesn't match assertion; either assert reuse or soften the claim.

You state “claim the same position again,” but don’t assert it. Prefer softening the comment to avoid coupling to allocation policy.
-    # after release, we should be able to claim the same position again quickly
+    # after release, we should be able to claim a position again
30-36: Avoid relying on private internals in tests.

Directly touching _max_positions and _positions makes the test brittle. Consider adding a small public test helper (e.g., TqdmPositionRegistry.reset(max_positions=...)) and using that here.

Would you like me to draft a minimal reset API and update the tests accordingly?

tests/nat/profiler/test_pareto_visualizer_extra.py (4)

24-29: Add return type hints for helper; keep tests pyright‑clean.

Annotate the factory with a precise return type.
-def _make_two_obj_study():
+def _make_two_obj_study() -> optuna.Study:
16-22: Close figures to avoid resource leaks in CI.

Import pyplot so tests can close figures after assertions.
 from pathlib import Path
 
 import optuna
 import pandas as pd
+import matplotlib.pyplot as plt
 
 from nat.profiler.parameter_optimization.pareto_visualizer import create_pareto_visualization
32-45: Strengthen assertions and verify all artifacts for 2‑metric case.

Also close figures to keep the test runner lean.
     )
     # Should include 2D scatter and other plots when 2 metrics
     assert "2d_scatter" in figs
     assert (tmp_path / "pareto_front_2d.png").exists()
+    assert "parallel_coordinates" in figs
+    assert "pairwise_matrix" in figs
+    assert (tmp_path / "pareto_parallel_coordinates.png").exists()
+    assert (tmp_path / "pareto_pairwise_matrix.png").exists()
+    for fig in figs.values():
+        plt.close(fig)
47-60: Broaden checks for CSV path source and close figures.

Ensure all expected plots are returned even when not saving to disk; close figures afterward.
     )
-    assert isinstance(figs, dict)
+    assert isinstance(figs, dict)
+    assert "2d_scatter" in figs
+    assert "parallel_coordinates" in figs
+    assert "pairwise_matrix" in figs
+    for fig in figs.values():
+        plt.close(fig)

tests/nat/utils/test_url_utils.py (1)

16-21: Add edge-case coverage (slashes, empty parts).
Increase resilience by testing normalization and empty segments.

Apply this patch to extend tests:

@@
-from nat.utils.url_utils import url_join
+from nat.utils.url_utils import url_join
+import pytest
@@
 def test_url_join_basic():
     result = url_join("http://example.com", "api", "v1")
     assert result == "http://example.com/api/v1"
+
+
+@pytest.mark.parametrize(
+    ("parts", "expected"),
+    [
+        (("http://example.com/", "api", "/v1/"), "http://example.com/api/v1/"),
+        (("http://example.com", "", "api", "", "v1"), "http://example.com/api/v1"),
+        (("http://example.com/api", "v1"), "http://example.com/api/v1"),
+        (("http://example.com/",), "http://example.com/"),
+    ],
+)
+def test_url_join_normalization(parts, expected):
+    assert url_join(*parts) == expected

tests/nat/utils/test_string_utils.py (2)

18-26: Remove unused import and dead code (_M).
These trigger lint noise without adding value.

Apply:
-from pydantic import BaseModel
-
 from nat.utils.string_utils import convert_to_str
-
-
-class _M(BaseModel):
-    a: int
-    b: str | None = None
+from nat.utils.string_utils import convert_to_str
31-32: Make dict assertion order-independent and stricter.
Ensure both entries are present, not just a prefix.
-    s = convert_to_str({"k": 1, "z": 2})
-    assert (s.startswith("k: 1") or s.startswith("z: 2"))
+    s = convert_to_str({"k": 1, "z": 2})
+    assert "k: 1" in s and "z: 2" in s

tests/nat/utils/test_optional_imports.py (1)

45-56: Avoid leaking global tracer state across tests.
Wrap provider mutation in try/finally to restore original.

-def test_dummy_tracer_stack():
-    tracer = DummyTracerProvider.get_tracer()
-    span = tracer.start_span("op")
-    assert isinstance(span, DummySpan)
-    span.set_attribute("k", "v")
-    span.end()
-    DummyBatchSpanProcessor().shutdown()
-    DummySpanExporter.export()
-    DummySpanExporter.shutdown()
-    assert DummyTrace.get_tracer_provider() is not None
-    DummyTrace.set_tracer_provider(None)
-    assert DummyTrace.get_tracer("name") is not None
+def test_dummy_tracer_stack():
+    old_provider = DummyTrace.get_tracer_provider()
+    try:
+        tracer = DummyTracerProvider.get_tracer()
+        span = tracer.start_span("op")
+        assert isinstance(span, DummySpan)
+        span.set_attribute("k", "v")
+        span.end()
+        DummyBatchSpanProcessor().shutdown()
+        DummySpanExporter.export()
+        DummySpanExporter.shutdown()
+        assert DummyTrace.get_tracer_provider() is not None
+        DummyTrace.set_tracer_provider(None)
+        assert DummyTrace.get_tracer("name") is not None
+    finally:
+        DummyTrace.set_tracer_provider(old_provider)

tests/nat/profiler/test_optimizer_runtime_extra.py (3)

69-72: Remove unused noqa directives (RUF100).
Not needed; ruff flags them as unused.

-    def _fake_optimize_parameters(**kwargs):  # noqa: ANN001, ARG001
+    def _fake_optimize_parameters(**kwargs):
         del kwargs
         calls["numeric"] += 1
         return cfg

74-77: Remove unused noqa directives (RUF100).
Same here.

-    async def _fake_optimize_prompts(**kwargs):  # noqa: ANN001, ARG001
+    async def _fake_optimize_prompts(**kwargs):
         del kwargs
         calls["prompt"] += 1

31-38: Optional: use default_factory for nested models to avoid shared defaults.
Prevents accidental instance sharing if fields are mutated later.

-from pydantic import BaseModel
+from pydantic import BaseModel, Field
@@
 class _DummyOptimizer(BaseModel):
-    numeric: _DummyInner = _DummyInner()
-    prompt: _DummyPrompt = _DummyPrompt()
+    numeric: _DummyInner = Field(default_factory=_DummyInner)
+    prompt: _DummyPrompt = Field(default_factory=_DummyPrompt)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9b8937 and 7f387ac.

📒 Files selected for processing (10)

ci/vale/styles/config/vocabularies/nat/accept.txt (1 hunks)
docs/source/workflows/observe/observe-workflow-with-galileo.md (1 hunks)
tests/nat/eval/utils/test_tqdm_position_registry_extra.py (1 hunks)
tests/nat/profiler/test_optimizer_runtime_extra.py (1 hunks)
tests/nat/profiler/test_parameter_selection_extra.py (1 hunks)
tests/nat/profiler/test_pareto_visualizer_extra.py (1 hunks)
tests/nat/runtime/test_user_metadata.py (1 hunks)
tests/nat/utils/test_optional_imports.py (1 hunks)
tests/nat/utils/test_string_utils.py (1 hunks)
tests/nat/utils/test_url_utils.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

docs/source/workflows/observe/observe-workflow-with-galileo.md

🚧 Files skipped from review as they are similar to previous changes (1)

ci/vale/styles/config/vocabularies/nat/accept.txt

🧰 Additional context used

📓 Path-based instructions (7)

tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Unit tests must live under tests/ and use configured markers (e2e, integration, etc.)

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

⚙️ CodeRabbit configuration file

tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using the test_ prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using the fixture_ prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bare raise and log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

**/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

tests/nat/profiler/test_pareto_visualizer_extra.py
tests/nat/runtime/test_user_metadata.py
tests/nat/profiler/test_parameter_selection_extra.py
tests/nat/utils/test_optional_imports.py
tests/nat/profiler/test_optimizer_runtime_extra.py
tests/nat/eval/utils/test_tqdm_position_registry_extra.py
tests/nat/utils/test_url_utils.py
tests/nat/utils/test_string_utils.py

🧬 Code graph analysis (3)

tests/nat/profiler/test_pareto_visualizer_extra.py (1)

src/nat/profiler/parameter_optimization/pareto_visualizer.py (1)

create_pareto_visualization (317-380)

tests/nat/profiler/test_parameter_selection_extra.py (1)

src/nat/profiler/parameter_optimization/parameter_selection.py (1)

pick_trial (40-108)

tests/nat/profiler/test_optimizer_runtime_extra.py (2)

src/nat/data_models/optimizer.py (1)

OptimizerRunConfig (138-149)

src/nat/profiler/parameter_optimization/optimizer_runtime.py (1)

optimize_config (31-67)

🪛 Ruff (0.12.2)

tests/nat/profiler/test_parameter_selection_extra.py

22-22: Unused noqa directive (non-enabled: ANN001)

Remove unused noqa directive

(RUF100)

50-50: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

61-61: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

70-70: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

tests/nat/profiler/test_optimizer_runtime_extra.py

69-69: Unused noqa directive (unused: ARG001; non-enabled: ANN001)

Remove unused noqa directive

(RUF100)

74-74: Unused noqa directive (unused: ARG001; non-enabled: ANN001)

Remove unused noqa directive

(RUF100)

🔇 Additional comments (10)

tests/nat/profiler/test_parameter_selection_extra.py (2)

1-15: License header: good to go.

SPDX and year look correct.

30-41: Balanced-point assertions for sum/chebyshev look correct.

Matches the scalarizations after normalization.

tests/nat/eval/utils/test_tqdm_position_registry_extra.py (1)

1-15: License header: good to go.

SPDX header and year look correct.

tests/nat/profiler/test_pareto_visualizer_extra.py (1)

1-15: License header: good to go.

SPDX header and year look correct.

tests/nat/utils/test_url_utils.py (1)

19-21: LGTM on basic happy-path test.
Covers a simple join correctly.

tests/nat/runtime/test_user_metadata.py (1)

19-30: LGTM: defaults contract validated.
Solid sanity check for None defaults.

tests/nat/utils/test_string_utils.py (1)

35-44: LGTM on object str handling.
Good coverage of custom stringification.

tests/nat/utils/test_optional_imports.py (1)

29-35: LGTM on optional import success/failure checks.
Clear signal on expected behaviors.

tests/nat/profiler/test_optimizer_runtime_extra.py (2)

40-53: LGTM: no-space branch behavior verified.
Good use of monkeypatch and direct BaseModel pass-through.

55-85: LGTM: orchestration path exercised.
Asserts both numeric and prompt hooks invoked exactly once.

willkill07

Documentations change needed for visual + reference to config file.

One change required for model_config.

Other change would be nice from the end-user perspective but is entirely subjective.

docs/source/reference/optimizer.md

src/nat/data_models/optimizable.py

...evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

…nat_email_phishing_analyzer/register.py Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Introduced a `model_validator` to ensure either `values` or both `low` and `high` are provided for search space definitions. Updated error messages for clarity and adjusted related documentation to reflect the changes in parameter requirements. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (7)

docs/source/reference/optimizer.md (4)

273-289: Formatting issues: incorrect key name and backticks.

The configuration example uses a dot prefix and inconsistent backtick formatting.
-This can be enabled using the .optimizable_params` field of your configuration file.
+This can be enabled using the `optimizable_params` field of your configuration file.

 llms:
   nim_llm:
     _type: nim
     model_name: meta/llama-3.1-70b-instruct
     temperature: 0.0
-   optimizable_params:
+    optimizable_params:
       - temperature
       - top_p
       - max_tokens

-**NOTE:** Ensure your configuration object inherits from `OptimizableMixin` to enable the .optimizable_params` field.
+**NOTE:** Ensure your configuration object inherits from `OptimizableMixin` to enable the `optimizable_params` field.
1-409: Add this page to the documentation TOC.

This reference page needs to be linked from docs/source/index.md.
#!/bin/bash
# Check if optimizer.md is referenced in the documentation index
if [ -f "docs/source/index.md" ]; then
    echo "Checking for optimizer reference in index.md..."
    grep -n "optimizer" docs/source/index.md || echo "No reference to optimizer found in index.md"
fi
328-328: SearchSpace mismatch: nim.max_tokens high boundary.

Documentation shows high=2176 but table shows high=2048.

The max_tokens search space has inconsistent values:

Code (src/nat/llm/nim_llm.py): SearchSpace(high=2176, low=128, step=512)

Docs table: shows high=2048

Align both to the same value based on the intended maximum.

219-219: Remove invalid class declaration syntax.

Python doesn't support name= in class inheritance.
-class SomeImageAgentConfig(FunctionBaseConfig, OptimizableMixin, name="some_image_agent_config"):
+class SomeImageAgentConfig(FunctionBaseConfig, OptimizableMixin):

src/nat/data_models/optimizable.py (2)

31-53: Add docstrings to public classes and methods.

Project guidelines require Google-style docstrings for all public APIs.

 class SearchSpace(BaseModel, Generic[T]):
+    """Declarative search space for an optimizable field.
+    
+    Attributes:
+        values: Discrete choices for categorical parameters.
+        low: Lower bound for numeric parameters.
+        high: Upper bound for numeric parameters.
+        log: Whether to use logarithmic scale for numeric parameters.
+        step: Step size for discrete numeric parameters.
+        is_prompt: Whether this is a prompt to be optimized.
+        prompt: Base prompt text to optimize.
+        prompt_purpose: Description of the prompt's purpose for the optimizer.
+    """
     values: Sequence[T] | None = None
     ...
 
     def suggest(self, trial: Any, name: str):
+        """Generate a parameter suggestion using an Optuna-like trial.
+        
+        Args:
+            trial: Optuna trial object for parameter suggestion.
+            name: Name of the parameter to suggest.
+            
+        Returns:
+            Suggested parameter value.
+            
+        Raises:
+            ValueError: If prompt optimization is attempted or invalid categorical space.
+        """
     ...
 
 def OptimizableField(
     ...
 ):
+    """Create a Pydantic field with optimization metadata.
+    
+    Args:
+        default: Default value for the field.
+        space: Optional search space configuration.
+        merge_conflict: How to handle conflicts in json_schema_extra ("overwrite", "keep", "error").
+        **fld_kw: Additional keyword arguments for pydantic.Field.
+        
+    Returns:
+        Configured pydantic Field with optimization metadata.
+        
+    Raises:
+        TypeError: If json_schema_extra is not a dict.
+        ValueError: If merge conflicts or prompt requirements are violated.
+    """
     ...
 
 class OptimizableMixin(BaseModel):
+    """Mixin to enable optimization support for configuration models.
+    
+    Attributes:
+        optimizable_params: List of field names that can be optimized.
+        search_space: Optional search space overrides for optimizable parameters.
+    """

Also applies to: 55-94, 97-102

21-21: Remove Optuna type dependency and harden suggest method.

The hard dependency on optuna.Trial should be avoided in type hints. Also need to prevent bool/int confusion and validate categorical choices.

-from optuna import Trial
 from pydantic import BaseModel
+from pydantic import ConfigDict
 from pydantic import Field

 ...

-    def suggest(self, trial: Trial, name: str):
+    def suggest(self, trial: Any, name: str):
         if self.is_prompt:
-            raise ValueError("Prompt optimization not currently supported using Optuna."
-                             " Use the genetic algorithm implementation instead.")
+            raise ValueError("Prompt optimization not supported via Optuna; use GA.")
         if self.values is not None:
+            if isinstance(self.values, (str, bytes)):
+                raise ValueError("Categorical space requires a non-string sequence.")
             return trial.suggest_categorical(name, self.values)
-        if isinstance(self.low, int):
-            return trial.suggest_int(name, self.low, self.high, log=self.log, step=self.step)
+        if type(self.low) is int:  # avoid bool being treated as int
+            step_int = int(self.step) if self.step is not None else None
+            return trial.suggest_int(name, int(self.low), int(self.high), log=self.log, step=step_int)
         return trial.suggest_float(name, float(self.low), float(self.high), log=self.log, step=self.step)

Also applies to: 44-52

examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (1)

36-47: Enable optimizable fields with an allow-list.

The OptimizableMixin default empty optimizable_params causes llm and prompt fields to be ignored by the optimizer. Add an explicit allow-list.

 class EmailPhishingAnalyzerConfig(FunctionBaseConfig, OptimizableMixin, name="email_phishing_analyzer"):
     _type: str = "email_phishing_analyzer"
+    # Allow-list for optimizer discovery
+    optimizable_params: list[str] = ["llm", "prompt"]
     llm: LLMRef = OptimizableField(description="The LLM to use for email phishing analysis.",
                                    default="llama_3_405",
                                    space=SearchSpace(values=["llama_3_405", "llama_3_70"]))

🧹 Nitpick comments (5)

docs/source/reference/optimizer.md (2)
38-38: Add language specifier to code block.

The diagram should have a language identifier for proper rendering.
-```
+```text
 ┌─────────────────┐
 │ Start           │
264-264: Fix incorrect list indentation.

Unordered list item has 4 spaces instead of 2.
-    - `temperature` shows how to mark a field as.optimizable without specifying a search space in code; the search space must then be provided in the workflow configuration.
+  - `temperature` shows how to mark a field as.optimizable without specifying a search space in code; the search space must then be provided in the workflow configuration.
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (2)
66-71: Use logger.exception for proper stack trace logging.

When catching an exception without re-raising, use logger.exception() to capture the full stack trace.
 try:
     # Get response from LLM
     response = await llm.apredict(config.prompt.replace("{body}", text))
 except Exception as e:
-    logger.error(f"Error during LLM prediction: {e}")
+    logger.exception("Error during LLM prediction")
     return f"Error: LLM prediction failed {e}"
68-68: String replacement works but consider format() for consistency.

The change from .format() to .replace() is functional but less idiomatic for template substitution.
-response = await llm.apredict(config.prompt.replace("{body}", text))
+response = await llm.apredict(config.prompt.format(body=text))
This maintains consistency with Python's string formatting conventions and would handle edge cases better (e.g., if the text contains {body} literally).
src/nat/data_models/optimizable.py (1)
46-47: Consider shorter error messages to satisfy linter.

Long error messages trigger TRY003. While these are informative, consider extracting to constants or custom exceptions.
+_ERR_PROMPT_OPTUNA = "Prompt optimization not supported via Optuna; use GA."
+_ERR_PROMPT_BASE = "Prompt-optimized fields require a base prompt"
+_ERR_RESERVED_KEYS = "json_schema_extra contains reserved key(s)"
+
 ...
 
 if self.is_prompt:
-    raise ValueError("Prompt optimization not currently supported using Optuna."
-                     " Use the genetic algorithm implementation instead.")
+    raise ValueError(_ERR_PROMPT_OPTUNA)
 ...
 if default is None:
-    raise ValueError("Prompt-optimized fields require a base prompt: provide a non-None field default "
-                     "or set space.prompt.")
+    raise ValueError(f"{_ERR_PROMPT_BASE}: provide a non-None field default or set space.prompt.")
 ...
-raise ValueError("`json_schema_extra` already contains reserved key(s): "
-                 f"{', '.join(intersect)}")
+raise ValueError(f"{_ERR_RESERVED_KEYS}: {', '.join(intersect)}")
Also applies to: 71-72, 85-86

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f387ac and 7173245.

📒 Files selected for processing (3)

docs/source/reference/optimizer.md (1 hunks)
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (2 hunks)
src/nat/data_models/optimizable.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (10)

src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

src/**/*.py: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks

Files:

src/nat/data_models/optimizable.py

src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Core functionality under src/nat should prioritize backward compatibility when changed

Files:

src/nat/data_models/optimizable.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

src/nat/data_models/optimizable.py

**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bare raise and log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

Files:

src/nat/data_models/optimizable.py
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks

Files:

src/nat/data_models/optimizable.py
docs/source/reference/optimizer.md
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

**/*.{py,md}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm

Files:

src/nat/data_models/optimizable.py
docs/source/reference/optimizer.md
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

src/nat/data_models/optimizable.py
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions
Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.
Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:
def my_function(param1: int, param2: str) -> bool:
    pass
For Python exception handling, ensure proper stack trace preservation:

When re-raising exceptions: use bare raise statements to maintain the original stack trace,
and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.

When catching and logging exceptions without re-raising: always use logger.exception()
to capture the full stack trace information.
Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

Confirm that copyright years are up-to date whenever a file is changed.

Files:

src/nat/data_models/optimizable.py
docs/source/reference/optimizer.md
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation
Documentation sources must be Markdown under docs/source; keep docs in sync and fix Sphinx errors/broken links
Documentation must be clear, comprehensive, free of TODO/FIXME/placeholders/offensive/outdated terms; fix spelling; adhere to Vale vocab allow/reject lists

Files:

docs/source/reference/optimizer.md

docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

docs/source/reference/optimizer.md

examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

If an example contains Python code, it should be placed in a subdirectory named src/ and should
contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.

If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
be checked into git-lfs.

Files:

examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

🧠 Learnings (2)

📚 Learning: 2025-08-28T23:22:41.742Z

Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to docs/source/**/*.md : Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation

Applied to files:

docs/source/reference/optimizer.md

📚 Learning: 2025-08-28T23:22:41.742Z

Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to **/README.{md,ipynb} : Each documentation README must follow the NeMo Agent toolkit naming rules and must not use deprecated names

Applied to files:

docs/source/reference/optimizer.md

🧬 Code graph analysis (2)

src/nat/data_models/optimizable.py (1)

tests/nat/profiler/test_parameter_optimizer.py (3)

suggest_categorical (36-37)

suggest_int (39-47)

suggest_float (49-57)

examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (1)

src/nat/data_models/optimizable.py (3)

OptimizableField (55-94)

OptimizableMixin (97-102)

SearchSpace (31-52)

🪛 Ruff (0.12.2)

src/nat/data_models/optimizable.py

29-29: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

41-41: Undefined name ConfigDict

(F821)

46-47: Avoid specifying long messages outside the exception class

(TRY003)

62-62: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?

(RUF003)

65-65: Avoid specifying long messages outside the exception class

(TRY003)

71-72: Avoid specifying long messages outside the exception class

(TRY003)

85-86: Avoid specifying long messages outside the exception class

(TRY003)

examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py

69-69: Do not catch blind exception: Exception

(BLE001)

70-70: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

🪛 markdownlint-cli2 (0.17.2)

docs/source/reference/optimizer.md

38-38: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

264-264: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

🔇 Additional comments (4)

docs/source/reference/optimizer.md (4)

17-17: LGTM! Official product naming correctly applied.

The first mention properly uses "NVIDIA NeMo Agent toolkit" with correct lowercase "toolkit".

163-167: Config flags correctly documented.

The documentation accurately describes the numeric.enabled and prompt.enabled fields with proper defaults.

402-406: Output files correctly documented.

The output files now accurately match the implementation: optimized_config.yml, trials_dataframe_params.csv, and plots under plots/ directory.

330-330: Well-written guidance on using defaults.

Clear explanation of how to leverage default optimizable parameters and override them when needed.

src/nat/data_models/optimizable.py

willkill07

One last round of nits, then LGTM

docs/source/reference/optimizer.md

src/nat/data_models/optimizable.py

tests/nat/data_models/test_optimizable.py

Replaced the ASCII-style flowchart in the optimizer docs with a referenced image for improved clarity and readability. Added the corresponding diagram file to the `_static` directory. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

willkill07 · 2025-09-15T19:07:34Z

/ok to test dd0c78f

Replaced `low` with `values` in the `arch` SearchSpace initialization for consistency with parameter usage. This ensures the code aligns with the expected `SearchSpace` argument naming conventions. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>

dnandakumar-nv · 2025-09-15T19:08:48Z

/ok to test aacbb27

Salonijain27

lgtm

dnandakumar-nv · 2025-09-15T23:57:00Z

/merge

dnandakumar-nv and others added 30 commits July 28, 2025 14:20

Revert "Add warning and early return for empty optimization space"

6fdcfa6

This reverts commit 8e26d72.

Revert "Revert "Add warning and early return for empty optimization s…

299e45e

…pace"" This reverts commit 6fdcfa6.

Revert "Refactor optimizer integration and add prompt optimization."

4270891

This reverts commit 96f207a.

Revert "Refactor evaluation handling and add input validations."

1364c17

This reverts commit 741869f.

Revert "Add experimental warning decorator to optimize_config"

a85d5d4

This reverts commit 8703867.

Revert "Rename parameter_optimization module to optimizer."

ef6d057

This reverts commit 70adfa7.

Revert "Move parameter optimization from profiler to experimental mod…

50da8a2

…ule" This reverts commit ef2413d.

Remove unused OptimizerConfig and Optuna dependency checks

9853b14

Eliminated legacy validation and annotations related to OptimizerConfig along with unnecessary Optuna availability checks. Simplified imports and removed redundant code to enhance maintainability and reduce clutter.

Refactor parameter optimization and enhance evaluation config

72844e5

Moved trial config saving outside evaluation loop for efficiency. Extended evaluation config with prompt optimization and numeric optimization settings, while removing outdated profiler configurations for cleaner workflows.

Refine feedback association for prompt parameters.

ebb4fc8

Updated feedback storage to associate trajectory feedback with all prompt parameters used in each trial. This change ensures more accurate tracking and distribution of feedback per parameter, improving optimization reliability.

dnandakumar-nv removed feature request New feature or request external This issue was filed by someone outside of the NeMo Agent toolkit team labels Sep 15, 2025

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

willkill07 approved these changes Sep 15, 2025

View reviewed changes

dnandakumar-nv and others added 5 commits September 15, 2025 14:29

Update docs/source/reference/optimizer.md

46c2500

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update src/nat/data_models/optimizable.py

aad99f5

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update src/nat/data_models/optimizable.py

b4cb1a1

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update src/nat/data_models/optimizable.py

b8b7eaa

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update examples/evaluation_and_profiling/email_phishing_analyzer/src/…

7173245

…nat_email_phishing_analyzer/register.py Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

coderabbitai bot added the feature request New feature or request label Sep 15, 2025

dnandakumar-nv removed the feature request New feature or request label Sep 15, 2025

coderabbitai bot reviewed Sep 15, 2025

View reviewed changes

src/nat/data_models/optimizable.py Show resolved Hide resolved

Merge branch 'develop' into optimizer

1c0e9d2

willkill07 approved these changes Sep 15, 2025

View reviewed changes

dnandakumar-nv and others added 6 commits September 15, 2025 15:03

Update docs/source/reference/optimizer.md

6a67d1e

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update docs/source/reference/optimizer.md

b0a81fe

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update src/nat/data_models/optimizable.py

19cb152

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update tests/nat/data_models/test_optimizable.py

2a998ea

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

Update tests/nat/data_models/test_optimizable.py

dd0c78f

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>

willkill07 changed the title ~~Added NAT Optimizer for Numeric Hyperparameters and Prompts~~ Add NAT Agent Hyperparameter Optimizer Sep 15, 2025

willkill07 removed request for AnuradhaKaruppiah and mdemoret-nv September 15, 2025 21:04

Salonijain27 approved these changes Sep 15, 2025

View reviewed changes

rapids-bot bot merged commit 43ced60 into NVIDIA:develop Sep 15, 2025
37 of 41 checks passed

Conversation

dnandakumar-nv commented Aug 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

Scope

Key Components

Optimizable fields

Parameter optimization (numeric/enumerated)

Prompt optimization (genetic algorithm)

Optimizer configuration models

Update helpers

User-facing Usage

1) Declare optimizable fields in your models (one-time)

2) Enable fields and configure the optimizer in your config file

3) Run the optimizer from the CLI

Artifacts and Observability

Performance Considerations

Documentation

Test Coverage

Checklist

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

willkill07 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

Uh oh!

Uh oh!

willkill07 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willkill07 commented Sep 15, 2025

Uh oh!

dnandakumar-nv commented Sep 15, 2025

Uh oh!

Salonijain27 left a comment

Choose a reason for hiding this comment

Uh oh!

dnandakumar-nv commented Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dnandakumar-nv commented Aug 14, 2025 •

edited by coderabbitai bot

Loading

willkill07 left a comment •

edited

Loading