Add NAT Agent Hyperparameter Optimizer#650
Conversation
Introduced new modules and configurations for parameter optimization in the AIQ system. This includes support for hyperparameter optimization using Optuna, updated data models, search space management, and utility functions. Additional updates include CLI commands and configuration enhancements to enable numeric and prompt-based optimization functionality. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Introduce `reps_per_param_set` to allow multiple repetitions of optimization runs for improved metric stability. Updated evaluation logic to calculate averaged metric scores across repetitions, ensuring more reliable optimization outcomes. Additionally, refactored evaluation code for clarity and concurrency. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Introduce `OptimizableMixin` to standardize handling of optimizable fields across models. Replaced manual introspection with Pydantic-based attributes and added field allow-lists. Updated LLM configurations to integrate `OptimizableMixin` and define searchable hyperparameters using `OptimizableField`. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Reorganized the parameter optimization codebase for better modularity and clarity. Extracted helper functions (`walk_optimizables`, `apply_suggestions`, `nest_updates`), and separated prompt optimization (`prompt_optimizer.py`) and numeric optimization (`parameter_optimizer.py`) into distinct modules. Updated `optimize_config` to invoke modularized logic, improving maintainability and separation of concerns. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Replaced relative imports with explicit absolute paths in optimizer runtime, parameter optimizer, and prompt optimizer modules. This improves code clarity, ensures compatibility across different execution contexts, and aligns with best practices. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
The import path for PromptOptimizerInputSchema was corrected to reflect its new location in the module. This change ensures proper functionality and resolves potential import errors. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Added `trial_idx` argument to `_single_eval` for better tracking of individual evaluations. Also implemented saving of trial configuration files with unique names to facilitate debugging and reproducibility. Adjusted function calls accordingly to pass trial indices. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Introduced a comprehensive guide for using the AIQ Optimizer, detailing configuration, usage, and output analysis. Added a new `ParetoVisualizer` module to support advanced Pareto front visualizations for multi-objective optimization, with 2D scatter plots, parallel coordinates, and pairwise metrics comparison. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Introduced a comprehensive guide for using the AIQ Optimizer, detailing configuration, usage, and output analysis. Added a new `ParetoVisualizer` module to support advanced Pareto front visualizations for multi-objective optimization, with 2D scatter plots, parallel coordinates, and pairwise metrics comparison. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Log a warning and skip optimization when no optimizable parameters are found in the configuration. This prevents unnecessary processing and ensures clearer debugging information. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Refactor parameter optimization utilities to the experimental module to better categorize development-stage features. Updated all relevant imports and documentation references accordingly. Minor adjustments made to file output logic for trial configurations. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
This refactor reorganizes and renames the `parameter_optimization` module to `optimizer` for consistency and simplicity. All related imports and references have been updated to reflect the new module name. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Applied the `@aiq_experimental` decorator to `optimize_config` to signal its experimental status. This ensures better visibility to users when using potentially unstable features. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Ensure `output_path` and `eval_metrics` are validated in `parameter_optimizer.py` to prevent misconfigurations. Refactor evaluation tasks into a dedicated async function for improved clarity and reuse across both optimizers. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
Simplifies optimizer handling by removing unused annotations, checks, and validators. Integrates prompt optimization into the ReAct agent workflow and evaluation configs, enabling tuning of parameters like `temperature` and `additional_instructions`. Enhances compatibility and modularity for prompt and numeric optimization workflows. Signed-off-by: dnandakumar-nv <168006707+dnandakumar-nv@users.noreply.github.com>
This reverts commit 8e26d72.
…pace"" This reverts commit 6fdcfa6.
This reverts commit 96f207a.
This reverts commit 741869f.
This reverts commit 8703867.
This reverts commit 70adfa7.
…ule" This reverts commit ef2413d.
Eliminated legacy validation and annotations related to OptimizerConfig along with unnecessary Optuna availability checks. Simplified imports and removed redundant code to enhance maintainability and reduce clutter.
Added validation to check that `output_path` and `eval_metrics` in `optimizer_config` are not None to prevent runtime errors. Refactored async evaluation logic to improve clarity and maintainability by encapsulating task creation in `_run_all_evals` functions for both parameter and prompt optimizers.
Moved trial config saving outside evaluation loop for efficiency. Extended evaluation config with prompt optimization and numeric optimization settings, while removing outdated profiler configurations for cleaner workflows.
Replaced "prompt_evaluation_function" with "prompt_optimization_function" to align with new functionality. Introduced a `CustomTrajectoryOutputParser` to parse ReAct agent trajectories and normalize scores. Updated the Pareto visualizer for improved formatting and added optimizable fields to configuration files for parameter tuning.
Updated feedback storage to associate trajectory feedback with all prompt parameters used in each trial. This change ensures more accurate tracking and distribution of feedback per parameter, improving optimization reliability.
Persist raw per-repetition evaluation scores in the `rep_scores` user attribute for trials. Update the export logic to include these scores in the `trials_dataframe` CSVs, ensuring better traceability and convenience by flattening `rep_scores` into its own column. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
# Conflicts: # src/nat/agent/react_agent/register.py # src/nat/cli/commands/optimize.py # src/nat/cli/type_registry.py # src/nat/data_models/config.py # src/nat/data_models/optimizable.py # src/nat/data_models/optimizer.py # src/nat/eval/trajectory_evaluator/evaluate.py # src/nat/eval/trajectory_evaluator/output_parser.py # src/nat/llm/aws_bedrock_llm.py # src/nat/llm/nim_llm.py # src/nat/llm/openai_llm.py # uv.lock
Replaced Optuna-based prompt optimization with a genetic algorithm for greater flexibility in multi-objective scenarios. Added support for advanced genetic operations, population diversity management, and scalable parallel evaluations. Updated related configurations and dependencies accordingly.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (16)
tests/nat/profiler/test_parameter_selection_extra.py (3)
22-27: Add type hints and remove unused noqa (RUF100).Type hints are required; drop the noqa and type the helper. Also import Study and Sequence.
Apply this diff:
+from typing import Sequence import optuna -from optuna.study import StudyDirection +from optuna.study import Study, StudyDirection from nat.profiler.parameter_optimization.parameter_selection import pick_trial -def _make_study_with_trials(values_list): # noqa: ANN001 +def _make_study_with_trials(values_list: Sequence[Sequence[float]]) -> Study: study = optuna.create_study(directions=[StudyDirection.MINIMIZE, StudyDirection.MINIMIZE]) for vals in values_list: t = optuna.trial.create_trial(values=list(vals), params={}, distributions={}) study.add_trial(t) return studyAlso applies to: 16-20
48-52: Replaceassert Falsewithpytest.raises(B011) and import pytest.
assert Falsecan be stripped by -O; usepytest.raisesfor negative tests.Apply this diff:
+import pytest @@ - try: - pick_trial(study, mode="sum", weights=[1.0]) - assert False, "Expected ValueError for weights length" - except ValueError: - pass + with pytest.raises(ValueError): + pick_trial(study, mode="sum", weights=[1.0]) @@ - try: - pick_trial(study, mode="unknown_mode") - assert False, "Expected ValueError for unknown mode" - except ValueError: - pass + with pytest.raises(ValueError): + pick_trial(study, mode="unknown_mode") @@ - try: - pick_trial(study, mode="sum") - assert False, "Expected ValueError for empty Pareto front" - except ValueError: - pass + with pytest.raises(ValueError): + pick_trial(study, mode="sum")Also applies to: 59-63, 68-72, 16-20
73-73: Add quick coverage forharmonicandhypervolumemodes.Low-effort tests to guard behavior and future refactors.
Apply this diff:
+ +def test_pick_trial_harmonic_selects_center_point(): + vals = [(0.1, 0.9), (0.2, 0.2), (0.9, 0.1)] + study = _make_study_with_trials(vals) + trial = pick_trial(study, mode="harmonic") + assert tuple(trial.values) == (0.2, 0.2) + + +def test_pick_trial_hypervolume_single_point_returns_that_point(): + vals = [(0.4, 0.6)] + study = _make_study_with_trials(vals) + trial = pick_trial(study, mode="hypervolume") + assert tuple(trial.values) == (0.4, 0.6)tests/nat/eval/utils/test_tqdm_position_registry_extra.py (2)
25-27: Comment doesn't match assertion; either assert reuse or soften the claim.You state “claim the same position again,” but don’t assert it. Prefer softening the comment to avoid coupling to allocation policy.
- # after release, we should be able to claim the same position again quickly + # after release, we should be able to claim a position again
30-36: Avoid relying on private internals in tests.Directly touching _max_positions and _positions makes the test brittle. Consider adding a small public test helper (e.g., TqdmPositionRegistry.reset(max_positions=...)) and using that here.
Would you like me to draft a minimal reset API and update the tests accordingly?
tests/nat/profiler/test_pareto_visualizer_extra.py (4)
24-29: Add return type hints for helper; keep tests pyright‑clean.Annotate the factory with a precise return type.
-def _make_two_obj_study(): +def _make_two_obj_study() -> optuna.Study:
16-22: Close figures to avoid resource leaks in CI.Import pyplot so tests can close figures after assertions.
from pathlib import Path import optuna import pandas as pd +import matplotlib.pyplot as plt from nat.profiler.parameter_optimization.pareto_visualizer import create_pareto_visualization
32-45: Strengthen assertions and verify all artifacts for 2‑metric case.Also close figures to keep the test runner lean.
) # Should include 2D scatter and other plots when 2 metrics assert "2d_scatter" in figs assert (tmp_path / "pareto_front_2d.png").exists() + assert "parallel_coordinates" in figs + assert "pairwise_matrix" in figs + assert (tmp_path / "pareto_parallel_coordinates.png").exists() + assert (tmp_path / "pareto_pairwise_matrix.png").exists() + for fig in figs.values(): + plt.close(fig)
47-60: Broaden checks for CSV path source and close figures.Ensure all expected plots are returned even when not saving to disk; close figures afterward.
) - assert isinstance(figs, dict) + assert isinstance(figs, dict) + assert "2d_scatter" in figs + assert "parallel_coordinates" in figs + assert "pairwise_matrix" in figs + for fig in figs.values(): + plt.close(fig)tests/nat/utils/test_url_utils.py (1)
16-21: Add edge-case coverage (slashes, empty parts).
Increase resilience by testing normalization and empty segments.Apply this patch to extend tests:
@@ -from nat.utils.url_utils import url_join +from nat.utils.url_utils import url_join +import pytest @@ def test_url_join_basic(): result = url_join("http://example.com", "api", "v1") assert result == "http://example.com/api/v1" + + +@pytest.mark.parametrize( + ("parts", "expected"), + [ + (("http://example.com/", "api", "/v1/"), "http://example.com/api/v1/"), + (("http://example.com", "", "api", "", "v1"), "http://example.com/api/v1"), + (("http://example.com/api", "v1"), "http://example.com/api/v1"), + (("http://example.com/",), "http://example.com/"), + ], +) +def test_url_join_normalization(parts, expected): + assert url_join(*parts) == expectedtests/nat/utils/test_string_utils.py (2)
18-26: Remove unused import and dead code (_M).
These trigger lint noise without adding value.Apply:
-from pydantic import BaseModel - from nat.utils.string_utils import convert_to_str - - -class _M(BaseModel): - a: int - b: str | None = None +from nat.utils.string_utils import convert_to_str
31-32: Make dict assertion order-independent and stricter.
Ensure both entries are present, not just a prefix.- s = convert_to_str({"k": 1, "z": 2}) - assert (s.startswith("k: 1") or s.startswith("z: 2")) + s = convert_to_str({"k": 1, "z": 2}) + assert "k: 1" in s and "z: 2" in stests/nat/utils/test_optional_imports.py (1)
45-56: Avoid leaking global tracer state across tests.
Wrap provider mutation in try/finally to restore original.-def test_dummy_tracer_stack(): - tracer = DummyTracerProvider.get_tracer() - span = tracer.start_span("op") - assert isinstance(span, DummySpan) - span.set_attribute("k", "v") - span.end() - DummyBatchSpanProcessor().shutdown() - DummySpanExporter.export() - DummySpanExporter.shutdown() - assert DummyTrace.get_tracer_provider() is not None - DummyTrace.set_tracer_provider(None) - assert DummyTrace.get_tracer("name") is not None +def test_dummy_tracer_stack(): + old_provider = DummyTrace.get_tracer_provider() + try: + tracer = DummyTracerProvider.get_tracer() + span = tracer.start_span("op") + assert isinstance(span, DummySpan) + span.set_attribute("k", "v") + span.end() + DummyBatchSpanProcessor().shutdown() + DummySpanExporter.export() + DummySpanExporter.shutdown() + assert DummyTrace.get_tracer_provider() is not None + DummyTrace.set_tracer_provider(None) + assert DummyTrace.get_tracer("name") is not None + finally: + DummyTrace.set_tracer_provider(old_provider)tests/nat/profiler/test_optimizer_runtime_extra.py (3)
69-72: Remove unused noqa directives (RUF100).
Not needed; ruff flags them as unused.- def _fake_optimize_parameters(**kwargs): # noqa: ANN001, ARG001 + def _fake_optimize_parameters(**kwargs): del kwargs calls["numeric"] += 1 return cfg
74-77: Remove unused noqa directives (RUF100).
Same here.- async def _fake_optimize_prompts(**kwargs): # noqa: ANN001, ARG001 + async def _fake_optimize_prompts(**kwargs): del kwargs calls["prompt"] += 1
31-38: Optional: use default_factory for nested models to avoid shared defaults.
Prevents accidental instance sharing if fields are mutated later.-from pydantic import BaseModel +from pydantic import BaseModel, Field @@ class _DummyOptimizer(BaseModel): - numeric: _DummyInner = _DummyInner() - prompt: _DummyPrompt = _DummyPrompt() + numeric: _DummyInner = Field(default_factory=_DummyInner) + prompt: _DummyPrompt = Field(default_factory=_DummyPrompt)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
ci/vale/styles/config/vocabularies/nat/accept.txt(1 hunks)docs/source/workflows/observe/observe-workflow-with-galileo.md(1 hunks)tests/nat/eval/utils/test_tqdm_position_registry_extra.py(1 hunks)tests/nat/profiler/test_optimizer_runtime_extra.py(1 hunks)tests/nat/profiler/test_parameter_selection_extra.py(1 hunks)tests/nat/profiler/test_pareto_visualizer_extra.py(1 hunks)tests/nat/runtime/test_user_metadata.py(1 hunks)tests/nat/utils/test_optional_imports.py(1 hunks)tests/nat/utils/test_string_utils.py(1 hunks)tests/nat/utils/test_url_utils.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/source/workflows/observe/observe-workflow-with-galileo.md
🚧 Files skipped from review as they are similar to previous changes (1)
- ci/vale/styles/config/vocabularies/nat/accept.txt
🧰 Additional context used
📓 Path-based instructions (7)
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Unit tests must live under tests/ and use configured markers (e2e, integration, etc.)
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
**/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/tests/**/*.py: Test functions must use the test_ prefix and snake_case
Extract repeated test code into pytest fixtures; fixtures should set name=... in @pytest.fixture and functions named with fixture_ prefix
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Use pytest with pytest-asyncio for async code; mock external services with pytest_httpserver or unittest.mock
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
tests/nat/profiler/test_pareto_visualizer_extra.pytests/nat/runtime/test_user_metadata.pytests/nat/profiler/test_parameter_selection_extra.pytests/nat/utils/test_optional_imports.pytests/nat/profiler/test_optimizer_runtime_extra.pytests/nat/eval/utils/test_tqdm_position_registry_extra.pytests/nat/utils/test_url_utils.pytests/nat/utils/test_string_utils.py
🧬 Code graph analysis (3)
tests/nat/profiler/test_pareto_visualizer_extra.py (1)
src/nat/profiler/parameter_optimization/pareto_visualizer.py (1)
create_pareto_visualization(317-380)
tests/nat/profiler/test_parameter_selection_extra.py (1)
src/nat/profiler/parameter_optimization/parameter_selection.py (1)
pick_trial(40-108)
tests/nat/profiler/test_optimizer_runtime_extra.py (2)
src/nat/data_models/optimizer.py (1)
OptimizerRunConfig(138-149)src/nat/profiler/parameter_optimization/optimizer_runtime.py (1)
optimize_config(31-67)
🪛 Ruff (0.12.2)
tests/nat/profiler/test_parameter_selection_extra.py
22-22: Unused noqa directive (non-enabled: ANN001)
Remove unused noqa directive
(RUF100)
50-50: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
61-61: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
70-70: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
tests/nat/profiler/test_optimizer_runtime_extra.py
69-69: Unused noqa directive (unused: ARG001; non-enabled: ANN001)
Remove unused noqa directive
(RUF100)
74-74: Unused noqa directive (unused: ARG001; non-enabled: ANN001)
Remove unused noqa directive
(RUF100)
🔇 Additional comments (10)
tests/nat/profiler/test_parameter_selection_extra.py (2)
1-15: License header: good to go.SPDX and year look correct.
30-41: Balanced-point assertions for sum/chebyshev look correct.Matches the scalarizations after normalization.
tests/nat/eval/utils/test_tqdm_position_registry_extra.py (1)
1-15: License header: good to go.SPDX header and year look correct.
tests/nat/profiler/test_pareto_visualizer_extra.py (1)
1-15: License header: good to go.SPDX header and year look correct.
tests/nat/utils/test_url_utils.py (1)
19-21: LGTM on basic happy-path test.
Covers a simple join correctly.tests/nat/runtime/test_user_metadata.py (1)
19-30: LGTM: defaults contract validated.
Solid sanity check for None defaults.tests/nat/utils/test_string_utils.py (1)
35-44: LGTM on object str handling.
Good coverage of custom stringification.tests/nat/utils/test_optional_imports.py (1)
29-35: LGTM on optional import success/failure checks.
Clear signal on expected behaviors.tests/nat/profiler/test_optimizer_runtime_extra.py (2)
40-53: LGTM: no-space branch behavior verified.
Good use of monkeypatch and direct BaseModel pass-through.
55-85: LGTM: orchestration path exercised.
Asserts both numeric and prompt hooks invoked exactly once.
...evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
…nat_email_phishing_analyzer/register.py Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Introduced a `model_validator` to ensure either `values` or both `low` and `high` are provided for search space definitions. Updated error messages for clarity and adjusted related documentation to reflect the changes in parameter requirements. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (7)
docs/source/reference/optimizer.md (4)
273-289: Formatting issues: incorrect key name and backticks.The configuration example uses a dot prefix and inconsistent backtick formatting.
-This can be enabled using the .optimizable_params` field of your configuration file. +This can be enabled using the `optimizable_params` field of your configuration file. llms: nim_llm: _type: nim model_name: meta/llama-3.1-70b-instruct temperature: 0.0 - optimizable_params: + optimizable_params: - temperature - top_p - max_tokens -**NOTE:** Ensure your configuration object inherits from `OptimizableMixin` to enable the .optimizable_params` field. +**NOTE:** Ensure your configuration object inherits from `OptimizableMixin` to enable the `optimizable_params` field.
1-409: Add this page to the documentation TOC.This reference page needs to be linked from docs/source/index.md.
#!/bin/bash # Check if optimizer.md is referenced in the documentation index if [ -f "docs/source/index.md" ]; then echo "Checking for optimizer reference in index.md..." grep -n "optimizer" docs/source/index.md || echo "No reference to optimizer found in index.md" fi
328-328: SearchSpace mismatch: nim.max_tokens high boundary.Documentation shows high=2176 but table shows high=2048.
The
max_tokenssearch space has inconsistent values:
- Code (src/nat/llm/nim_llm.py):
SearchSpace(high=2176, low=128, step=512)- Docs table: shows
high=2048Align both to the same value based on the intended maximum.
219-219: Remove invalid class declaration syntax.Python doesn't support
name=in class inheritance.-class SomeImageAgentConfig(FunctionBaseConfig, OptimizableMixin, name="some_image_agent_config"): +class SomeImageAgentConfig(FunctionBaseConfig, OptimizableMixin):src/nat/data_models/optimizable.py (2)
31-53: Add docstrings to public classes and methods.Project guidelines require Google-style docstrings for all public APIs.
class SearchSpace(BaseModel, Generic[T]): + """Declarative search space for an optimizable field. + + Attributes: + values: Discrete choices for categorical parameters. + low: Lower bound for numeric parameters. + high: Upper bound for numeric parameters. + log: Whether to use logarithmic scale for numeric parameters. + step: Step size for discrete numeric parameters. + is_prompt: Whether this is a prompt to be optimized. + prompt: Base prompt text to optimize. + prompt_purpose: Description of the prompt's purpose for the optimizer. + """ values: Sequence[T] | None = None ... def suggest(self, trial: Any, name: str): + """Generate a parameter suggestion using an Optuna-like trial. + + Args: + trial: Optuna trial object for parameter suggestion. + name: Name of the parameter to suggest. + + Returns: + Suggested parameter value. + + Raises: + ValueError: If prompt optimization is attempted or invalid categorical space. + """ ... def OptimizableField( ... ): + """Create a Pydantic field with optimization metadata. + + Args: + default: Default value for the field. + space: Optional search space configuration. + merge_conflict: How to handle conflicts in json_schema_extra ("overwrite", "keep", "error"). + **fld_kw: Additional keyword arguments for pydantic.Field. + + Returns: + Configured pydantic Field with optimization metadata. + + Raises: + TypeError: If json_schema_extra is not a dict. + ValueError: If merge conflicts or prompt requirements are violated. + """ ... class OptimizableMixin(BaseModel): + """Mixin to enable optimization support for configuration models. + + Attributes: + optimizable_params: List of field names that can be optimized. + search_space: Optional search space overrides for optimizable parameters. + """Also applies to: 55-94, 97-102
21-21: Remove Optuna type dependency and harden suggest method.The hard dependency on
optuna.Trialshould be avoided in type hints. Also need to prevent bool/int confusion and validate categorical choices.-from optuna import Trial from pydantic import BaseModel +from pydantic import ConfigDict from pydantic import Field ... - def suggest(self, trial: Trial, name: str): + def suggest(self, trial: Any, name: str): if self.is_prompt: - raise ValueError("Prompt optimization not currently supported using Optuna." - " Use the genetic algorithm implementation instead.") + raise ValueError("Prompt optimization not supported via Optuna; use GA.") if self.values is not None: + if isinstance(self.values, (str, bytes)): + raise ValueError("Categorical space requires a non-string sequence.") return trial.suggest_categorical(name, self.values) - if isinstance(self.low, int): - return trial.suggest_int(name, self.low, self.high, log=self.log, step=self.step) + if type(self.low) is int: # avoid bool being treated as int + step_int = int(self.step) if self.step is not None else None + return trial.suggest_int(name, int(self.low), int(self.high), log=self.log, step=step_int) return trial.suggest_float(name, float(self.low), float(self.high), log=self.log, step=self.step)Also applies to: 44-52
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (1)
36-47: Enable optimizable fields with an allow-list.The
OptimizableMixindefault emptyoptimizable_paramscausesllmandpromptfields to be ignored by the optimizer. Add an explicit allow-list.class EmailPhishingAnalyzerConfig(FunctionBaseConfig, OptimizableMixin, name="email_phishing_analyzer"): _type: str = "email_phishing_analyzer" + # Allow-list for optimizer discovery + optimizable_params: list[str] = ["llm", "prompt"] llm: LLMRef = OptimizableField(description="The LLM to use for email phishing analysis.", default="llama_3_405", space=SearchSpace(values=["llama_3_405", "llama_3_70"]))
🧹 Nitpick comments (5)
docs/source/reference/optimizer.md (2)
38-38: Add language specifier to code block.The diagram should have a language identifier for proper rendering.
-``` +```text ┌─────────────────┐ │ Start │
264-264: Fix incorrect list indentation.Unordered list item has 4 spaces instead of 2.
- - `temperature` shows how to mark a field as.optimizable without specifying a search space in code; the search space must then be provided in the workflow configuration. + - `temperature` shows how to mark a field as.optimizable without specifying a search space in code; the search space must then be provided in the workflow configuration.examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (2)
66-71: Use logger.exception for proper stack trace logging.When catching an exception without re-raising, use
logger.exception()to capture the full stack trace.try: # Get response from LLM response = await llm.apredict(config.prompt.replace("{body}", text)) except Exception as e: - logger.error(f"Error during LLM prediction: {e}") + logger.exception("Error during LLM prediction") return f"Error: LLM prediction failed {e}"
68-68: String replacement works but consider format() for consistency.The change from
.format()to.replace()is functional but less idiomatic for template substitution.-response = await llm.apredict(config.prompt.replace("{body}", text)) +response = await llm.apredict(config.prompt.format(body=text))This maintains consistency with Python's string formatting conventions and would handle edge cases better (e.g., if the text contains
{body}literally).src/nat/data_models/optimizable.py (1)
46-47: Consider shorter error messages to satisfy linter.Long error messages trigger TRY003. While these are informative, consider extracting to constants or custom exceptions.
+_ERR_PROMPT_OPTUNA = "Prompt optimization not supported via Optuna; use GA." +_ERR_PROMPT_BASE = "Prompt-optimized fields require a base prompt" +_ERR_RESERVED_KEYS = "json_schema_extra contains reserved key(s)" + ... if self.is_prompt: - raise ValueError("Prompt optimization not currently supported using Optuna." - " Use the genetic algorithm implementation instead.") + raise ValueError(_ERR_PROMPT_OPTUNA) ... if default is None: - raise ValueError("Prompt-optimized fields require a base prompt: provide a non-None field default " - "or set space.prompt.") + raise ValueError(f"{_ERR_PROMPT_BASE}: provide a non-None field default or set space.prompt.") ... -raise ValueError("`json_schema_extra` already contains reserved key(s): " - f"{', '.join(intersect)}") +raise ValueError(f"{_ERR_RESERVED_KEYS}: {', '.join(intersect)}")Also applies to: 71-72, 85-86
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/source/reference/optimizer.md(1 hunks)examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py(2 hunks)src/nat/data_models/optimizable.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (10)
src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
src/**/*.py: All importable Python code must live under src/
All public APIs in src/ require Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise with a period; surround code entities with backticks
Files:
src/nat/data_models/optimizable.py
src/nat/**/*
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Core functionality under src/nat should prioritize backward compatibility when changed
Files:
src/nat/data_models/optimizable.py
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/data_models/optimizable.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 8/20 style; format with yapf (column_limit=120) and use 4-space indentation; end files with a single newline
Run ruff (ruff check --fix) per pyproject.toml; fix warnings unless explicitly ignored; ruff is linter-only
Use snake_case for functions/variables, PascalCase for classes, and UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: preserve stack traces and avoid duplicate logging
When re-raising exceptions, use bareraiseand log with logger.error(), not logger.exception()
When catching and not re-raising, log with logger.exception() to capture stack trace
Validate and sanitize all user input; prefer httpx with SSL verification and follow OWASP Top‑10
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile/mprof; cache with functools.lru_cache or external cache; leverage NumPy vectorization when beneficial
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
Files:
src/nat/data_models/optimizable.pyexamples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,sh,md,yml,yaml,toml,ini,json,ipynb,txt,rst}: Every file must start with the standard SPDX Apache-2.0 header; keep copyright years up‑to‑date
All source files must include the SPDX Apache‑2.0 header; do not bypass CI header checks
Files:
src/nat/data_models/optimizable.pydocs/source/reference/optimizer.mdexamples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
**/*.{py,md}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Never hard‑code version numbers in code or docs; versions are derived by setuptools‑scm
Files:
src/nat/data_models/optimizable.pydocs/source/reference/optimizer.mdexamples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
src/nat/data_models/optimizable.pyexamples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/data_models/optimizable.pydocs/source/reference/optimizer.mdexamples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
docs/source/**/*.md
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
docs/source/**/*.md: Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation
Documentation sources must be Markdown under docs/source; keep docs in sync and fix Sphinx errors/broken links
Documentation must be clear, comprehensive, free of TODO/FIXME/placeholders/offensive/outdated terms; fix spelling; adhere to Vale vocab allow/reject lists
Files:
docs/source/reference/optimizer.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_staticdirectory.
Files:
docs/source/reference/optimizer.md
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
🧠 Learnings (2)
📚 Learning: 2025-08-28T23:22:41.742Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to docs/source/**/*.md : Use the official naming: first use “NVIDIA NeMo Agent toolkit”; subsequent uses “NeMo Agent toolkit”; never use deprecated names in documentation
Applied to files:
docs/source/reference/optimizer.md
📚 Learning: 2025-08-28T23:22:41.742Z
Learnt from: CR
PR: NVIDIA/NeMo-Agent-Toolkit#0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2025-08-28T23:22:41.742Z
Learning: Applies to **/README.{md,ipynb} : Each documentation README must follow the NeMo Agent toolkit naming rules and must not use deprecated names
Applied to files:
docs/source/reference/optimizer.md
🧬 Code graph analysis (2)
src/nat/data_models/optimizable.py (1)
tests/nat/profiler/test_parameter_optimizer.py (3)
suggest_categorical(36-37)suggest_int(39-47)suggest_float(49-57)
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (1)
src/nat/data_models/optimizable.py (3)
OptimizableField(55-94)OptimizableMixin(97-102)SearchSpace(31-52)
🪛 Ruff (0.12.2)
src/nat/data_models/optimizable.py
29-29: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?
(RUF003)
41-41: Undefined name ConfigDict
(F821)
46-47: Avoid specifying long messages outside the exception class
(TRY003)
62-62: Comment contains ambiguous ‑ (NON-BREAKING HYPHEN). Did you mean - (HYPHEN-MINUS)?
(RUF003)
65-65: Avoid specifying long messages outside the exception class
(TRY003)
71-72: Avoid specifying long messages outside the exception class
(TRY003)
85-86: Avoid specifying long messages outside the exception class
(TRY003)
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py
69-69: Do not catch blind exception: Exception
(BLE001)
70-70: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🪛 markdownlint-cli2 (0.17.2)
docs/source/reference/optimizer.md
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
264-264: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
🔇 Additional comments (4)
docs/source/reference/optimizer.md (4)
17-17: LGTM! Official product naming correctly applied.The first mention properly uses "NVIDIA NeMo Agent toolkit" with correct lowercase "toolkit".
163-167: Config flags correctly documented.The documentation accurately describes the
numeric.enabledandprompt.enabledfields with proper defaults.
402-406: Output files correctly documented.The output files now accurately match the implementation:
optimized_config.yml,trials_dataframe_params.csv, and plots underplots/directory.
330-330: Well-written guidance on using defaults.Clear explanation of how to leverage default optimizable parameters and override them when needed.
willkill07
left a comment
There was a problem hiding this comment.
One last round of nits, then LGTM
Replaced the ASCII-style flowchart in the optimizer docs with a referenced image for improved clarity and readability. Added the corresponding diagram file to the `_static` directory. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Dhruv Nandakumar <168006707+dnandakumar-nv@users.noreply.github.com>
|
/ok to test dd0c78f |
Replaced `low` with `values` in the `arch` SearchSpace initialization for consistency with parameter usage. This ensures the code aligns with the expected `SearchSpace` argument naming conventions. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
|
/ok to test aacbb27 |
|
/merge |
Description
End-to-end Parameter Optimization for NAT: Optimizable Fields, Numeric (Optuna) Search, GA-based Prompt Optimization, and Evaluation Integration
Summary
This PR introduces a complete optimization subsystem for the NeMo Agent toolkit (NAT) that can tune both numeric/enumerated hyperparameters and prompts. It provides:
Scope
Key Components
Optimizable fields
src/nat/data_models/optimizable.pySearchSpace[T]: distribution/range metadata for int/float/categorical or prompt fields.OptimizableField(...): drop-in replacement forpydantic.Field(...)that attaches:json_schema_extra = {"optimizable": True, "search_space": SearchSpace}json_schema_extra(overwrite/keep/error).OptimizableMixin: base class addingoptimizable_params: list[str]to gate discovery.src/nat/profiler/parameter_optimization/optimizable_utils.pywalk_optimizables(model) -> dict[flattened.path, SearchSpace]optimizable_paramsand emits a warning if optimizable fields exist but no allowlist is set.Parameter optimization (numeric/enumerated)
src/nat/profiler/parameter_optimization/parameter_optimizer.pyoptimize_parameters(...) -> ConfigSearchSpace.is_prompt; uses Optuna to search numeric/enum params.OptimizerMetric.directionand final trial picked via weighted harmonic/sum/chebyshev.rep_scoresin trial user attributes.config_numeric_trial_{trial_id}.ymloptimized_config.ymlrep_scores:trials_dataframe_params.csvplots/Prompt optimization (genetic algorithm)
src/nat/profiler/parameter_optimization/prompt_optimizer.pyoptimize_prompts(...) -> NoneSearchSpace.is_promptfields with purposes, then evolves a population of prompt sets.WorkflowBuilderand registered functions to:optimizer.prompt.prompt_population_init_function)optimizer.prompt.prompt_recombination_function)optimized_prompts_gen{n}.jsonoptimized_prompts.jsonga_history_prompts.csvOptimizer configuration models
src/nat/data_models/optimizer.pyOptimizerMetric:evaluator_name,direction("maximize"|"minimize"),weight.OptimizerConfig:output_path,eval_metrics,reps_per_param_set,multi_objective_combination_mode, and nestednumeric/promptconfigs.OptimizerRunConfig:config_file(or model), dataset, endpoint, timeout, overrides.Update helpers
src/nat/profiler/parameter_optimization/update_helpers.pynest_updates(...): now deep-merges nested dicts so dotted paths with shared prefixes preserve siblings.apply_suggestions(cfg, updates) -> BaseModel: returns a new model with dotted-path updates applied (non-mutating).User-facing Usage
1) Declare optimizable fields in your models (one-time)
Notes:
SearchSpace.lowcan be a sequence for categorical choices;high=Noneimplies categorical.is_prompt=True. Numeric optimization ignores prompts; GA prompt optimization uses them.2) Enable fields and configure the optimizer in your config file
Minimal example showing where to mark fields as optimizable and how to set optimizer behavior:
If you use GA prompt optimization, declare the supporting functions in the
functionssection so the CLI can find them:3) Run the optimizer from the CLI
nat optimize --config_file path/to/config.yaml \ --dataset path/to/dataset.jsonl \ --result_json_path '$' \ --endpoint http://localhost:8000/your-endpoint \ --endpoint_timeout 300The optimizer will execute numeric and/or prompt optimization based on your
optimizer.numeric.enabledandoptimizer.prompt.enabledflags.Artifacts and Observability
Numeric search:
optimized_config.ymltrials_dataframe_params.csv(flattens user_attrs →rep_scores)config_numeric_trial_{trial_id}.ymlplots/(Pareto fronts)GA prompt optimization:
optimized_prompts_gen{n}.json(per-generation checkpoints)optimized_prompts.json(final best prompts with purposes)ga_history_prompts.csv(per-individual metrics/scalar fitness over generations)Performance Considerations
n_trialsandreps_per_param_set.ga_parallel_evaluations. Population and generation sizes control runtime.EvaluationRun, so overall throughput depends on evaluator cost and any configured endpoint.Documentation
docs/source/reference/optimizer_guide.md(how to declare fields, configure optimizers, run optimization, interpret artifacts).Test Coverage
tests/nat/data_models/test_optimizable.py,tests/nat/profiler/test_optimizable_utils.py,tests/nat/profiler/test_update_helpers.pytests/nat/profiler/test_parameter_optimizer.pyWorkflowBuilder):tests/nat/profiler/test_prompt_optimizer.pyChecklist
EvaluationRunfor objective computationnest_updatesBy Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Documentation
Examples
Runtime Evaluators
Visualization
Tests
Chores