Skip to content

Add additional E2E tests for examples #986

Merged
rapids-bot[bot] merged 20 commits intoNVIDIA:release/1.3from
dagardner-nv:david-more-example-e2e-tests-b2
Oct 13, 2025
Merged

Add additional E2E tests for examples #986
rapids-bot[bot] merged 20 commits intoNVIDIA:release/1.3from
dagardner-nv:david-more-example-e2e-tests-b2

Conversation

@dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Oct 13, 2025

Description

  • Add E2E test for text file ingest documentation example
  • Add E2E test for Email Phishing Analyzer
  • Add E2E test for Simple Calc Eval
  • Minor grammatical fix in docs/source/tutorials/create-a-new-workflow.md
  • Remove unused LLMs and eval section from examples/documentation_guides/workflows/text_file_ingest/configs/config.yml
  • Update the run_workflow test helper method to require keyword arguments
  • Update location of configs and data dirs for Email Phishing Analyzer example to match the layout of other examples
  • Move the validate_workflow_output test helper method to be shared by other tests

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Documentation

    • Corrected tutorial phrasing and added guidance for handling evaluation rate limits; updated example evaluation config paths and removed obsolete evaluation entries.
  • Tests

    • Added integration tests for text-file ingest, email-phishing analysis, and simple calculator evaluation.
    • Standardized test calls to use keyword arguments, added workflow output validation utility, and added a fixture to enable nested event-loop support.
  • Chores

    • Broadened CI allowlist for config path checks.

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…documentation example

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…2e for the optimization run, remove unused imports

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…s, add test for simple calculator eval

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv self-assigned this Oct 13, 2025
@dagardner-nv dagardner-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels Oct 13, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 13, 2025

Walkthrough

Broadened CI allowlist for email_phishing_analyzer paths, updated doc text, made test utility run_workflow keyword-only and added validate_workflow_output, added require_nest_asyncio fixture, multiple tests updated or added (integration tests for text_file_ingest, email_phishing_analyzer, simple_calculator_eval), and pruned/updated example configs and paths.

Changes

Cohort / File(s) Summary
CI allowlist pattern
ci/scripts/path_checks.py
Expanded allowlist regex for email_phishing_analyzer configs to match any subpath before /configs (e.g., .*/email_phishing_analyzer/.*/configs).
Docs phrasing fixes
docs/source/tutorials/create-a-new-workflow.md
Corrected two example input strings to "What does DOCA GPUNetIO do ...".
Test utilities & fixture additions
packages/nvidia_nat_test/src/nat/test/utils.py, packages/nvidia_nat_test/src/nat/test/plugin.py
Made run_workflow keyword-only (*, config=None, config_file=None, question, expected_answer, ...), added validate_workflow_output(workflow_output_file: Path) (JSON structure checks), and added session fixture require_nest_asyncio that calls nest_asyncio.apply().
Refactor tests to keyword-only API
examples/agents/tests/test_agents.py, examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py, examples/documentation_guides/tests/test_custom_workflow.py, examples/frameworks/haystack_deep_research_agent/tests/test_haystack_deep_research_agent.py
Updated calls to run_workflow to use keyword args (config_file=, question=, expected_answer=).
New integration test: text_file_ingest
examples/documentation_guides/tests/test_text_file_ingest.py
Added fixtures (text_file_ingest_dir, src_dir, add_src_dir_to_path) and integration test test_text_file_ingest_full_workflow that imports local src, runs the workflow, and asserts expected output.
Config pruning: text_file_ingest
examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml
Removed multiple LLM eval entries and the entire eval block; retained primary nim_llm and embedders.
Email phishing analyzer additions
examples/evaluation_and_profiling/email_phishing_analyzer/configs, .../data, examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py
Added src/nat_email_phishing_analyzer/configs and .../data paths and a new integration test module with two tests (one async run, one skipped optimize test).
Eval docs note
examples/evaluation_and_profiling/simple_calculator_eval/README.md
Added note suggesting eval.general.max_concurrency: 1 to handle 429 rate limits.
Eval config path updates
examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
Updated eval.general.output_dir and eval.general.dataset.file_path to point to simple_calculator example paths.
New integration test: simple_calculator_eval
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
Added async test that runs EvaluationRun with override eval.general.max_concurrency='1', validates workflow output and presence of tuneable eval outputs.
Test cleanup: simple_web_query_eval
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
Replaced local validate_workflow_output with import from nat.test.utils; added inline import of module before locating config.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Pytest as Test Runner
  participant Plugin as nat.test.plugin
  participant Utils as nat.test.utils
  participant Loader as runtime.loader
  participant Workflow as Workflow Engine
  participant FS as Filesystem

  Tester->>Pytest: run integration test
  Pytest->>Plugin: (optional) require_nest_asyncio fixture -> nest_asyncio.apply()
  Pytest->>Utils: run_workflow(config_file=..., question=..., expected_answer=...)
  Utils->>Loader: load config (from `config` or `config_file`)
  Loader-->>Utils: return Config object
  Utils->>Workflow: execute workflow with question
  Workflow-->>Utils: produce answer + outputs
  Utils->>FS: write workflow_output.json
  Utils-->>Pytest: return answer
  note right of Utils: Tests call validate_workflow_output(path) for structure checks
  Pytest->>Utils: validate_workflow_output(path)
  Utils->>FS: read + parse JSON, validate keys/types
  Utils-->>Pytest: validation OK / raises
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

breaking

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title is concise, descriptive, uses the imperative mood, and clearly summarizes the primary change of adding end-to-end tests to example workflows while remaining under the 72-character limit.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6e18b34 and ef46373.

📒 Files selected for processing (1)
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
🧬 Code graph analysis (1)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (3)
src/nat/eval/evaluate.py (1)
  • EvaluationRun (45-565)
src/nat/eval/config.py (1)
  • EvaluationRunConfig (26-48)
packages/nvidia_nat_test/src/nat/test/utils.py (2)
  • locate_example_config (57-68)
  • validate_workflow_output (132-154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (2)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (2)

32-51: LGTM!

The config setup and evaluation run follow the established pattern correctly. The use of locate_example_config for dynamic config resolution and explicit parameter specification in EvaluationRunConfig align with best practices.


53-70: LGTM! Spelling issue resolved.

The output validation logic is sound and follows best practices. The spelling error "atleast" from previous reviews has been correctly fixed to "at least" on line 69. The validation steps appropriately check that the workflow ran successfully, all output files exist, and the expected tuneable_eval_output file was produced.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dagardner-nv dagardner-nv marked this pull request as ready for review October 13, 2025 19:02
@dagardner-nv dagardner-nv requested a review from a team as a code owner October 13, 2025 19:02
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 13, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
examples/documentation_guides/tests/test_text_file_ingest.py (1)

44-57: Refine the type hint for the generator.

The fixture's type hint should be Generator[str, None, None] to properly specify all three generic parameters (yield type, send type, return type).

Apply this diff:

-def add_src_dir_to_path_fixture(src_dir: Path) -> Generator[str]:
+def add_src_dir_to_path_fixture(src_dir: Path) -> Generator[str, None, None]:
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1)

68-72: Consider simplifying the redundant existence check.

Line 69 checks output_file.exists() for every file in output.evaluator_output_files. Since these files should exist if they're in the list, this check is redundant. Consider removing it for cleaner code.

Apply this diff:

     for output_file in output.evaluator_output_files:
-        assert output_file.exists()
         output_file_str = str(output_file)
         if "tuneable_eval_output" in output_file_str:
             tuneable_eval_output = output_file
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff471c8 and 77d2944.

⛔ Files ignored due to path filters (1)
  • examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/data/smaller_test.csv is excluded by !**/*.csv
📒 Files selected for processing (16)
  • ci/scripts/path_checks.py (1 hunks)
  • docs/source/tutorials/create-a-new-workflow.md (1 hunks)
  • examples/agents/tests/test_agents.py (2 hunks)
  • examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py (1 hunks)
  • examples/documentation_guides/tests/test_custom_workflow.py (2 hunks)
  • examples/documentation_guides/tests/test_text_file_ingest.py (1 hunks)
  • examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml (0 hunks)
  • examples/evaluation_and_profiling/email_phishing_analyzer/configs (1 hunks)
  • examples/evaluation_and_profiling/email_phishing_analyzer/data (1 hunks)
  • examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py (1 hunks)
  • examples/evaluation_and_profiling/simple_calculator_eval/README.md (1 hunks)
  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml (1 hunks)
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1 hunks)
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (2 hunks)
  • packages/nvidia_nat_test/src/nat/test/plugin.py (1 hunks)
  • packages/nvidia_nat_test/src/nat/test/utils.py (3 hunks)
💤 Files with no reviewable changes (1)
  • examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml
🧰 Additional context used
📓 Path-based instructions (15)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
  • examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py
  • examples/agents/tests/test_agents.py
  • ci/scripts/path_checks.py
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
  • examples/documentation_guides/tests/test_custom_workflow.py
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
  • packages/nvidia_nat_test/src/nat/test/utils.py
  • examples/documentation_guides/tests/test_text_file_ingest.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py
  • examples/agents/tests/test_agents.py
  • ci/scripts/path_checks.py
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
  • examples/documentation_guides/tests/test_custom_workflow.py
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
  • packages/nvidia_nat_test/src/nat/test/utils.py
  • examples/documentation_guides/tests/test_text_file_ingest.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
  • examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py
  • examples/agents/tests/test_agents.py
  • examples/evaluation_and_profiling/email_phishing_analyzer/configs
  • ci/scripts/path_checks.py
  • examples/evaluation_and_profiling/email_phishing_analyzer/data
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
  • examples/documentation_guides/tests/test_custom_workflow.py
  • docs/source/tutorials/create-a-new-workflow.md
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
  • examples/evaluation_and_profiling/simple_calculator_eval/README.md
  • packages/nvidia_nat_test/src/nat/test/utils.py
  • examples/documentation_guides/tests/test_text_file_ingest.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py
  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
  • examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py
  • examples/agents/tests/test_agents.py
  • examples/evaluation_and_profiling/email_phishing_analyzer/configs
  • examples/evaluation_and_profiling/email_phishing_analyzer/data
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
  • examples/documentation_guides/tests/test_custom_workflow.py
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
  • examples/evaluation_and_profiling/simple_calculator_eval/README.md
  • examples/documentation_guides/tests/test_text_file_ingest.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • packages/nvidia_nat_test/src/nat/test/utils.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • packages/nvidia_nat_test/src/nat/test/utils.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • packages/nvidia_nat_test/src/nat/test/utils.py
**/*.{yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

In workflow/config YAML, set llms.._type: nat_test_llm to stub responses.

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
**/configs/**

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Configuration files consumed by code must be stored next to that code in a configs/ folder

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
examples/*/tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Example tests live under examples//tests/

Files:

  • examples/agents/tests/test_agents.py
  • examples/documentation_guides/tests/test_custom_workflow.py
  • examples/documentation_guides/tests/test_text_file_ingest.py
{tests/**/*.py,examples/*/tests/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration

Files:

  • examples/agents/tests/test_agents.py
  • examples/documentation_guides/tests/test_custom_workflow.py
  • examples/documentation_guides/tests/test_text_file_ingest.py
{scripts/**,ci/scripts/**}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Shell or utility scripts belong in scripts/ or ci/scripts/ and must not be mixed with library code

Files:

  • ci/scripts/path_checks.py
docs/source/**/*.md

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

docs/source/**/*.md: Use the official naming throughout documentation: first use “NVIDIA NeMo Agent toolkit”, subsequent “NeMo Agent toolkit”; never use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq)
Documentation sources are Markdown files under docs/source; images belong in docs/source/_static
Keep docs in sync with code; documentation pipeline must pass Sphinx and link checks; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling correctness
Do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt; accepted terms in accept.txt are allowed

Files:

  • docs/source/tutorials/create-a-new-workflow.md
docs/source/**/*

⚙️ CodeRabbit configuration file

This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the docs/source/_static directory.

Files:

  • docs/source/tutorials/create-a-new-workflow.md
**/README.@(md|ipynb)

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Ensure READMEs follow the naming convention; avoid deprecated names; use “NeMo Agent Toolkit” (capital T) in headings

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/README.md
🧬 Code graph analysis (8)
examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
  • run_workflow (71-95)
examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py (4)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
  • locate_example_config (57-68)
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (1)
  • EmailPhishingAnalyzerConfig (36-46)
src/nat/data_models/optimizer.py (1)
  • OptimizerRunConfig (138-149)
src/nat/profiler/parameter_optimization/optimizer_runtime.py (1)
  • optimize_config (31-67)
examples/agents/tests/test_agents.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
  • run_workflow (71-95)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
  • validate_workflow_output (132-154)
examples/documentation_guides/tests/test_custom_workflow.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
  • run_workflow (71-95)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (4)
src/nat/eval/evaluate.py (1)
  • EvaluationRun (45-565)
src/nat/eval/config.py (1)
  • EvaluationRunConfig (26-48)
packages/nvidia_nat_test/src/nat/test/utils.py (2)
  • locate_example_config (57-68)
  • validate_workflow_output (132-154)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
  • test_eval (80-138)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (1)
  • validate_workflow_output (30-52)
examples/documentation_guides/tests/test_text_file_ingest.py (2)
packages/nvidia_nat_test/src/nat/test/utils.py (2)
  • locate_example_config (57-68)
  • run_workflow (71-95)
examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/text_file_ingest_function.py (1)
  • TextFileIngestFunctionConfig (29-33)
🪛 Ruff (0.14.0)
packages/nvidia_nat_test/src/nat/test/utils.py

145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


145-145: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (9)
packages/nvidia_nat_test/src/nat/test/utils.py (1)

71-78: LGTM on the keyword-only API change.

The transition to keyword-only arguments improves API clarity and prevents mistakes with positional arguments. All parameters have proper type hints.

examples/documentation_guides/tests/test_custom_workflow.py (1)

47-47: LGTM! Keyword-only API correctly adopted.

The changes correctly align with the updated run_workflow signature that enforces keyword-only arguments and uses expected_answer instead of answer.

Also applies to: 56-56

packages/nvidia_nat_test/src/nat/test/plugin.py (1)

348-355: LGTM! Well-documented fixture for nested event loops.

The fixture correctly applies nest_asyncio to enable nested event loops. The docstring appropriately notes that multiple calls are safe.

examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (2)

25-25: LGTM! Correctly imports shared validation utility.

The import aligns with the refactoring to use a shared validate_workflow_output function from nat.test.utils.


88-89: LGTM! Inline import ensures module availability.

The inline import pattern ensures the module is loaded before dynamic config resolution, consistent with other tests in this PR.

examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py (2)

27-43: LGTM! Well-structured integration test.

The test correctly:

  • Uses the integration marker and nvidia_api_key fixture
  • Loads config dynamically via locate_example_config
  • Calls run_workflow with keyword arguments
  • Tests with a representative phishing email example

46-63: LGTM! Appropriately skipped optimizer test.

The test structure is sound and the skip reason with issue reference (#842) is documented. The test validates the optimizer workflow when rate limits permit.

examples/documentation_guides/tests/test_text_file_ingest.py (1)

60-67: LGTM! Integration test follows established patterns.

The test correctly:

  • Uses appropriate markers and fixtures
  • Imports the config class dynamically after sys.path setup
  • Locates config via locate_example_config
  • Invokes run_workflow with keyword arguments
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1)

29-79: LGTM! Well-structured evaluation test.

The test correctly:

  • Uses appropriate markers and fixtures
  • Dynamically locates the config
  • Creates and runs the evaluation with proper overrides
  • Validates workflow output using the shared utility
  • Asserts expected output files exist

@dagardner-nv dagardner-nv removed the breaking Breaking change label Oct 13, 2025
dagardner-nv and others added 2 commits October 13, 2025 12:10
Signed-off-by: David Gardner <dagardner@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 13, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)

132-154: Improve docstring and consider refining the key presence check.

The function validates a specific output format but the docstring doesn't clearly explain the expected format or mention that different workflow types may need different validators.

Additionally, line 154 checks for truthy values (item.get(key)), which will fail if a key exists but has a falsy value (empty string, 0, False, empty list). If the intent is to check for key presence rather than non-empty values, use key in item instead.

Apply this diff to improve the docstring and key check:

 def validate_workflow_output(workflow_output_file: Path) -> None:
     """
-    Validate the contents of the workflow output file.
-    WIP: output format should be published as a schema and this validation should be done against that schema.
+    Validate the contents of the workflow output file for standard evaluation workflows.
+
+    Expects a JSON list of dictionaries with required keys: id, question, answer,
+    generated_answer, intermediate_steps.
+
+    Note: This validator is specific to standard evaluation workflow output format.
+    Different workflow types (e.g., SWE-bench) may require different validation.
+
+    Args:
+        workflow_output_file: Path to the workflow_output.json file.
+
+    Raises:
+        AssertionError: If validation fails.
+        RuntimeError: If the file cannot be parsed as JSON.
+
+    Todo:
+        Output format should be published as a schema and validation done against that schema.
     """
     # Ensure the workflow_output.json file was created
     assert workflow_output_file.exists(), "The workflow_output.json file was not created"
 
     # Read and validate the workflow_output.json file
     try:
         with open(workflow_output_file, encoding="utf-8") as f:
             result_json = json.load(f)
     except json.JSONDecodeError as err:
         raise RuntimeError("Failed to parse workflow_output.json as valid JSON") from err
 
     assert isinstance(result_json, list), "The workflow_output.json file is not a list"
     assert len(result_json) > 0, "The workflow_output.json file is empty"
     assert isinstance(result_json[0], dict), "The workflow_output.json file is not a list of dictionaries"
 
     # Ensure required keys exist
     required_keys = ["id", "question", "answer", "generated_answer", "intermediate_steps"]
     for key in required_keys:
-        assert all(item.get(key) for item in result_json), f"The '{key}' key is missing in workflow_output.json"
+        assert all(key in item for item in result_json), f"The '{key}' key is missing in workflow_output.json"

Optional refinement: Static analysis suggests moving the exception message to the exception class (TRY003), but the current approach is clear and acceptable.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 77d2944 and 7f0feb1.

📒 Files selected for processing (2)
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1 hunks)
  • packages/nvidia_nat_test/src/nat/test/utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • packages/nvidia_nat_test/src/nat/test/utils.py
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_test/src/nat/test/utils.py
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_test/src/nat/test/utils.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • packages/nvidia_nat_test/src/nat/test/utils.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • packages/nvidia_nat_test/src/nat/test/utils.py
  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_test/src/nat/test/utils.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
🧬 Code graph analysis (2)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (1)
  • validate_workflow_output (30-52)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (4)
src/nat/eval/evaluate.py (1)
  • EvaluationRun (45-565)
src/nat/eval/config.py (1)
  • EvaluationRunConfig (26-48)
packages/nvidia_nat_test/src/nat/test/utils.py (2)
  • locate_example_config (57-68)
  • validate_workflow_output (132-154)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
  • test_eval (80-138)
🪛 Ruff (0.14.0)
packages/nvidia_nat_test/src/nat/test/utils.py

145-145: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)

71-78: LGTM! Good API improvement.

Making parameters keyword-only prevents positional argument mistakes and makes the API more explicit. The addition of the optional config parameter provides more flexibility for test code.

Signed-off-by: David Gardner <dagardner@nvidia.com>
…dagardner-nv/AIQtoolkit into david-more-example-e2e-tests-b2

Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv removed the breaking Breaking change label Oct 13, 2025
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 13, 2025
@dagardner-nv dagardner-nv removed the breaking Breaking change label Oct 13, 2025
…/test_simple_calculator_eval.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
@coderabbitai coderabbitai bot added the breaking Breaking change label Oct 13, 2025
@dagardner-nv dagardner-nv removed the breaking Breaking change label Oct 13, 2025
@dagardner-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 40b4d00 into NVIDIA:release/1.3 Oct 13, 2025
17 checks passed
@dagardner-nv dagardner-nv deleted the david-more-example-e2e-tests-b2 branch October 13, 2025 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants