Add additional E2E tests for examples #986
Add additional E2E tests for examples #986rapids-bot[bot] merged 20 commits intoNVIDIA:release/1.3from
Conversation
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…documentation example Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…2e for the optimization run, remove unused imports Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…s, add test for simple calculator eval Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
WalkthroughBroadened CI allowlist for email_phishing_analyzer paths, updated doc text, made test utility Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Tester
participant Pytest as Test Runner
participant Plugin as nat.test.plugin
participant Utils as nat.test.utils
participant Loader as runtime.loader
participant Workflow as Workflow Engine
participant FS as Filesystem
Tester->>Pytest: run integration test
Pytest->>Plugin: (optional) require_nest_asyncio fixture -> nest_asyncio.apply()
Pytest->>Utils: run_workflow(config_file=..., question=..., expected_answer=...)
Utils->>Loader: load config (from `config` or `config_file`)
Loader-->>Utils: return Config object
Utils->>Workflow: execute workflow with question
Workflow-->>Utils: produce answer + outputs
Utils->>FS: write workflow_output.json
Utils-->>Pytest: return answer
note right of Utils: Tests call validate_workflow_output(path) for structure checks
Pytest->>Utils: validate_workflow_output(path)
Utils->>FS: read + parse JSON, validate keys/types
Utils-->>Pytest: validation OK / raises
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (4)**/*.{py,yaml,yml}📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
Files:
**/*.py📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
Files:
**/*⚙️ CodeRabbit configuration file
Files:
examples/**/*⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (3)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
examples/documentation_guides/tests/test_text_file_ingest.py (1)
44-57: Refine the type hint for the generator.The fixture's type hint should be
Generator[str, None, None]to properly specify all three generic parameters (yield type, send type, return type).Apply this diff:
-def add_src_dir_to_path_fixture(src_dir: Path) -> Generator[str]: +def add_src_dir_to_path_fixture(src_dir: Path) -> Generator[str, None, None]:examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1)
68-72: Consider simplifying the redundant existence check.Line 69 checks
output_file.exists()for every file inoutput.evaluator_output_files. Since these files should exist if they're in the list, this check is redundant. Consider removing it for cleaner code.Apply this diff:
for output_file in output.evaluator_output_files: - assert output_file.exists() output_file_str = str(output_file) if "tuneable_eval_output" in output_file_str: tuneable_eval_output = output_file
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/data/smaller_test.csvis excluded by!**/*.csv
📒 Files selected for processing (16)
ci/scripts/path_checks.py(1 hunks)docs/source/tutorials/create-a-new-workflow.md(1 hunks)examples/agents/tests/test_agents.py(2 hunks)examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py(1 hunks)examples/documentation_guides/tests/test_custom_workflow.py(2 hunks)examples/documentation_guides/tests/test_text_file_ingest.py(1 hunks)examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml(0 hunks)examples/evaluation_and_profiling/email_phishing_analyzer/configs(1 hunks)examples/evaluation_and_profiling/email_phishing_analyzer/data(1 hunks)examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py(1 hunks)examples/evaluation_and_profiling/simple_calculator_eval/README.md(1 hunks)examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml(1 hunks)examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py(1 hunks)examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py(2 hunks)packages/nvidia_nat_test/src/nat/test/plugin.py(1 hunks)packages/nvidia_nat_test/src/nat/test/utils.py(3 hunks)
💤 Files with no reviewable changes (1)
- examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/configs/config.yml
🧰 Additional context used
📓 Path-based instructions (15)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.pypackages/nvidia_nat_test/src/nat/test/plugin.pyexamples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.ymlexamples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.pyexamples/agents/tests/test_agents.pyci/scripts/path_checks.pyexamples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.pyexamples/documentation_guides/tests/test_custom_workflow.pyexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.pypackages/nvidia_nat_test/src/nat/test/utils.pyexamples/documentation_guides/tests/test_text_file_ingest.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.pypackages/nvidia_nat_test/src/nat/test/plugin.pyexamples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.pyexamples/agents/tests/test_agents.pyci/scripts/path_checks.pyexamples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.pyexamples/documentation_guides/tests/test_custom_workflow.pyexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.pypackages/nvidia_nat_test/src/nat/test/utils.pyexamples/documentation_guides/tests/test_text_file_ingest.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.pypackages/nvidia_nat_test/src/nat/test/plugin.pyexamples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.ymlexamples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.pyexamples/agents/tests/test_agents.pyexamples/evaluation_and_profiling/email_phishing_analyzer/configsci/scripts/path_checks.pyexamples/evaluation_and_profiling/email_phishing_analyzer/dataexamples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.pyexamples/documentation_guides/tests/test_custom_workflow.pydocs/source/tutorials/create-a-new-workflow.mdexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.pyexamples/evaluation_and_profiling/simple_calculator_eval/README.mdpackages/nvidia_nat_test/src/nat/test/utils.pyexamples/documentation_guides/tests/test_text_file_ingest.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.pyexamples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.ymlexamples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.pyexamples/agents/tests/test_agents.pyexamples/evaluation_and_profiling/email_phishing_analyzer/configsexamples/evaluation_and_profiling/email_phishing_analyzer/dataexamples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.pyexamples/documentation_guides/tests/test_custom_workflow.pyexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.pyexamples/evaluation_and_profiling/simple_calculator_eval/README.mdexamples/documentation_guides/tests/test_text_file_ingest.py
packages/*/src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Importable Python code inside packages must live under packages//src/
Files:
packages/nvidia_nat_test/src/nat/test/plugin.pypackages/nvidia_nat_test/src/nat/test/utils.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
packages/nvidia_nat_test/src/nat/test/plugin.pypackages/nvidia_nat_test/src/nat/test/utils.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_test/src/nat/test/plugin.pypackages/nvidia_nat_test/src/nat/test/utils.py
**/*.{yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
In workflow/config YAML, set llms.._type: nat_test_llm to stub responses.
Files:
examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
**/configs/**
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Configuration files consumed by code must be stored next to that code in a configs/ folder
Files:
examples/evaluation_and_profiling/simple_calculator_eval/src/nat_simple_calculator_eval/configs/config-tunable-rag-eval.yml
examples/*/tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Example tests live under examples//tests/
Files:
examples/agents/tests/test_agents.pyexamples/documentation_guides/tests/test_custom_workflow.pyexamples/documentation_guides/tests/test_text_file_ingest.py
{tests/**/*.py,examples/*/tests/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
{tests/**/*.py,examples/*/tests/**/*.py}: Use pytest (with pytest-asyncio for async); name test files test_*.py; test functions start with test_; extract repeated code into fixtures; fixtures must set name in decorator and be named with fixture_ prefix
Mock external services with pytest_httpserver or unittest.mock; do not hit live endpoints
Mark expensive tests with @pytest.mark.slow or @pytest.mark.integration
Files:
examples/agents/tests/test_agents.pyexamples/documentation_guides/tests/test_custom_workflow.pyexamples/documentation_guides/tests/test_text_file_ingest.py
{scripts/**,ci/scripts/**}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Shell or utility scripts belong in scripts/ or ci/scripts/ and must not be mixed with library code
Files:
ci/scripts/path_checks.py
docs/source/**/*.md
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
docs/source/**/*.md: Use the official naming throughout documentation: first use “NVIDIA NeMo Agent toolkit”, subsequent “NeMo Agent toolkit”; never use deprecated names (Agent Intelligence toolkit, aiqtoolkit, AgentIQ, AIQ/aiq)
Documentation sources are Markdown files under docs/source; images belong in docs/source/_static
Keep docs in sync with code; documentation pipeline must pass Sphinx and link checks; avoid TODOs/FIXMEs/placeholders; avoid offensive/outdated terms; ensure spelling correctness
Do not use words listed in ci/vale/styles/config/vocabularies/nat/reject.txt; accepted terms in accept.txt are allowed
Files:
docs/source/tutorials/create-a-new-workflow.md
docs/source/**/*
⚙️ CodeRabbit configuration file
This directory contains the source code for the documentation. All documentation should be written in Markdown format. Any image files should be placed in the
docs/source/_staticdirectory.
Files:
docs/source/tutorials/create-a-new-workflow.md
**/README.@(md|ipynb)
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Ensure READMEs follow the naming convention; avoid deprecated names; use “NeMo Agent Toolkit” (capital T) in headings
Files:
examples/evaluation_and_profiling/simple_calculator_eval/README.md
🧬 Code graph analysis (8)
examples/custom_functions/automated_description_generation/tests/test_auto_desc_generation.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
run_workflow(71-95)
examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py (4)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
locate_example_config(57-68)examples/evaluation_and_profiling/email_phishing_analyzer/src/nat_email_phishing_analyzer/register.py (1)
EmailPhishingAnalyzerConfig(36-46)src/nat/data_models/optimizer.py (1)
OptimizerRunConfig(138-149)src/nat/profiler/parameter_optimization/optimizer_runtime.py (1)
optimize_config(31-67)
examples/agents/tests/test_agents.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
run_workflow(71-95)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
validate_workflow_output(132-154)
examples/documentation_guides/tests/test_custom_workflow.py (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
run_workflow(71-95)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (4)
src/nat/eval/evaluate.py (1)
EvaluationRun(45-565)src/nat/eval/config.py (1)
EvaluationRunConfig(26-48)packages/nvidia_nat_test/src/nat/test/utils.py (2)
locate_example_config(57-68)validate_workflow_output(132-154)examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
test_eval(80-138)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (1)
validate_workflow_output(30-52)
examples/documentation_guides/tests/test_text_file_ingest.py (2)
packages/nvidia_nat_test/src/nat/test/utils.py (2)
locate_example_config(57-68)run_workflow(71-95)examples/documentation_guides/workflows/text_file_ingest/src/text_file_ingest/text_file_ingest_function.py (1)
TextFileIngestFunctionConfig(29-33)
🪛 Ruff (0.14.0)
packages/nvidia_nat_test/src/nat/test/utils.py
145-145: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
145-145: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (9)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
71-78: LGTM on the keyword-only API change.The transition to keyword-only arguments improves API clarity and prevents mistakes with positional arguments. All parameters have proper type hints.
examples/documentation_guides/tests/test_custom_workflow.py (1)
47-47: LGTM! Keyword-only API correctly adopted.The changes correctly align with the updated
run_workflowsignature that enforces keyword-only arguments and usesexpected_answerinstead ofanswer.Also applies to: 56-56
packages/nvidia_nat_test/src/nat/test/plugin.py (1)
348-355: LGTM! Well-documented fixture for nested event loops.The fixture correctly applies
nest_asyncioto enable nested event loops. The docstring appropriately notes that multiple calls are safe.examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (2)
25-25: LGTM! Correctly imports shared validation utility.The import aligns with the refactoring to use a shared
validate_workflow_outputfunction fromnat.test.utils.
88-89: LGTM! Inline import ensures module availability.The inline import pattern ensures the module is loaded before dynamic config resolution, consistent with other tests in this PR.
examples/evaluation_and_profiling/email_phishing_analyzer/tests/test_email_phishing_analyzer.py (2)
27-43: LGTM! Well-structured integration test.The test correctly:
- Uses the integration marker and nvidia_api_key fixture
- Loads config dynamically via
locate_example_config- Calls
run_workflowwith keyword arguments- Tests with a representative phishing email example
46-63: LGTM! Appropriately skipped optimizer test.The test structure is sound and the skip reason with issue reference (#842) is documented. The test validates the optimizer workflow when rate limits permit.
examples/documentation_guides/tests/test_text_file_ingest.py (1)
60-67: LGTM! Integration test follows established patterns.The test correctly:
- Uses appropriate markers and fixtures
- Imports the config class dynamically after sys.path setup
- Locates config via
locate_example_config- Invokes
run_workflowwith keyword argumentsexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (1)
29-79: LGTM! Well-structured evaluation test.The test correctly:
- Uses appropriate markers and fixtures
- Dynamically locates the config
- Creates and runs the evaluation with proper overrides
- Validates workflow output using the shared utility
- Asserts expected output files exist
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
Outdated
Show resolved
Hide resolved
Signed-off-by: David Gardner <dagardner@nvidia.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
132-154: Improve docstring and consider refining the key presence check.The function validates a specific output format but the docstring doesn't clearly explain the expected format or mention that different workflow types may need different validators.
Additionally, line 154 checks for truthy values (
item.get(key)), which will fail if a key exists but has a falsy value (empty string, 0, False, empty list). If the intent is to check for key presence rather than non-empty values, usekey in iteminstead.Apply this diff to improve the docstring and key check:
def validate_workflow_output(workflow_output_file: Path) -> None: """ - Validate the contents of the workflow output file. - WIP: output format should be published as a schema and this validation should be done against that schema. + Validate the contents of the workflow output file for standard evaluation workflows. + + Expects a JSON list of dictionaries with required keys: id, question, answer, + generated_answer, intermediate_steps. + + Note: This validator is specific to standard evaluation workflow output format. + Different workflow types (e.g., SWE-bench) may require different validation. + + Args: + workflow_output_file: Path to the workflow_output.json file. + + Raises: + AssertionError: If validation fails. + RuntimeError: If the file cannot be parsed as JSON. + + Todo: + Output format should be published as a schema and validation done against that schema. """ # Ensure the workflow_output.json file was created assert workflow_output_file.exists(), "The workflow_output.json file was not created" # Read and validate the workflow_output.json file try: with open(workflow_output_file, encoding="utf-8") as f: result_json = json.load(f) except json.JSONDecodeError as err: raise RuntimeError("Failed to parse workflow_output.json as valid JSON") from err assert isinstance(result_json, list), "The workflow_output.json file is not a list" assert len(result_json) > 0, "The workflow_output.json file is empty" assert isinstance(result_json[0], dict), "The workflow_output.json file is not a list of dictionaries" # Ensure required keys exist required_keys = ["id", "question", "answer", "generated_answer", "intermediate_steps"] for key in required_keys: - assert all(item.get(key) for item in result_json), f"The '{key}' key is missing in workflow_output.json" + assert all(key in item for item in result_json), f"The '{key}' key is missing in workflow_output.json"Optional refinement: Static analysis suggests moving the exception message to the exception class (TRY003), but the current approach is clear and acceptable.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py(1 hunks)packages/nvidia_nat_test/src/nat/test/utils.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,yaml,yml}
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.
Files:
packages/nvidia_nat_test/src/nat/test/utils.pyexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)
**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).
**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial
Files:
packages/nvidia_nat_test/src/nat/test/utils.pyexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
packages/*/src/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Importable Python code inside packages must live under packages//src/
Files:
packages/nvidia_nat_test/src/nat/test/utils.py
{src/**/*.py,packages/*/src/**/*.py}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful
Files:
packages/nvidia_nat_test/src/nat/test/utils.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
packages/nvidia_nat_test/src/nat/test/utils.pyexamples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.
Files:
packages/nvidia_nat_test/src/nat/test/utils.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
🧬 Code graph analysis (2)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (1)
validate_workflow_output(30-52)
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py (4)
src/nat/eval/evaluate.py (1)
EvaluationRun(45-565)src/nat/eval/config.py (1)
EvaluationRunConfig(26-48)packages/nvidia_nat_test/src/nat/test/utils.py (2)
locate_example_config(57-68)validate_workflow_output(132-154)examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
test_eval(80-138)
🪛 Ruff (0.14.0)
packages/nvidia_nat_test/src/nat/test/utils.py
145-145: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (1)
packages/nvidia_nat_test/src/nat/test/utils.py (1)
71-78: LGTM! Good API improvement.Making parameters keyword-only prevents positional argument mistakes and makes the API more explicit. The addition of the optional
configparameter provides more flexibility for test code.
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
Show resolved
Hide resolved
examples/evaluation_and_profiling/simple_calculator_eval/tests/test_simple_calculator_eval.py
Outdated
Show resolved
Hide resolved
…dagardner-nv/AIQtoolkit into david-more-example-e2e-tests-b2 Signed-off-by: David Gardner <dagardner@nvidia.com>
…/test_simple_calculator_eval.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: David Gardner <96306125+dagardner-nv@users.noreply.github.com>
|
/merge |
Description
docs/source/tutorials/create-a-new-workflow.mdexamples/documentation_guides/workflows/text_file_ingest/configs/config.ymlrun_workflowtest helper method to require keyword argumentsvalidate_workflow_outputtest helper method to be shared by other testsBy Submitting this PR I confirm:
Summary by CodeRabbit
Documentation
Tests
Chores