Skip to content

Fix tests under examples/, remove all pytest skip markers#846

Merged
rapids-bot[bot] merged 26 commits intoNVIDIA:developfrom
dagardner-nv:david-fix-example-tests
Sep 24, 2025
Merged

Fix tests under examples/, remove all pytest skip markers#846
rapids-bot[bot] merged 26 commits intoNVIDIA:developfrom
dagardner-nv:david-fix-example-tests

Conversation

@dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Sep 24, 2025

Description

  • Tests now accurately depend on the API key fixturess they need
  • test_alert_triage_agent_workflow.py now only processes the first prompt in the dataset reducing the runtime from 4 minutes down to 40s.
  • Revert unintended formatting change to test_spans.csv causing parsing errors for the profiler_agent tests.
  • Document the potential need to run the simple web query eval example with max_concurrency=1 as a work-around for Implement retry/sleep/backoff logic when we receive 429 errors #842
  • Fix Python 3.13 specific issue (swe_bench evaluation example broken in Python 3.13 #845) causing the swe_bench example to fail with a type error (Thanks to @willkill07 on this one).
  • Fix the early-out check in src/nat/utils/type_converter.py when the source and destination types are the same, avoids useless warnings like: WARNING - Indirect type conversion used to convert <class 'str'> to <class 'str'>, which may lead to unintended conversions. Consider adding a direct converter from <class 'str'> to <class 'str'> to ensure correctness.
  • Add an Phoenix service instance for e2e testing.

Closes #845

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Grouped “user report” tooling exposing all report operations via a single function group.
  • Refactor

    • Migrated user report API to grouped access and switched object-store paths to relative semantics.
  • Documentation

    • Added guidance to handle rate limits by setting eval.general.max_concurrency=1; markdown link checks now ignore arize.com.
  • Tests

    • Converted several skipped tests to fixture-based integration tests, added a Docker-required fixture, introduced data fixtures, reduced timeouts, and enforced a concurrency override.
  • Chores

    • CI updated to add a Phoenix service and adjust service ordering.

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…is test from 4 minutes down to 40 seconds

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…avid-fix-example-tests

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
This reverts commit 77985f9.

Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv self-assigned this Sep 24, 2025
@dagardner-nv dagardner-nv added bug Something isn't working non-breaking Non-breaking change labels Sep 24, 2025
@coderabbitai
Copy link

coderabbitai bot commented Sep 24, 2025

Walkthrough

Adds a Phoenix CI service, converts multiple tests from skip markers to fixture-based runs (adding Docker and NVIDIA fixtures), tightens timeouts/concurrency for evaluations, migrates user-report functions to a grouped API, and refactors core type-instance and conversion checks.

Changes

Cohort / File(s) Summary
CI services
.gitlab-ci.yml
Reordered services in test:python_tests; added arizephoenix/phoenix:latest service with alias phoenix; preserved MySQL service.
Alert triage agent tests
examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py
Replaced skip with integration/test decorators and fixture; read files as UTF-8; run a single JSON input entry; simplified assertions and deterministic checks.
Profiler agent tests
examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py
Added df_path pytest fixture (df_path_fixture); updated test_flow_chart_tool and test_token_usage_tool signatures to accept df_path; removed skip placeholders.
Simple web query eval docs
examples/evaluation_and_profiling/simple_web_query_eval/README.md
Added note advising eval.general.max_concurrency: 1 (YAML override or CLI) for rate limiting.
Simple web query eval tests
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
Added usefixtures("nvidia_api_key"); reduced endpoint_timeout from 300→30; added override=(('eval.general.max_concurrency','1'),).
SWE-bench eval tests
examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
Changed import to nat_swe_bench.config; removed skip decorator; added @pytest.mark.usefixtures("require_docker").
Object store user report tests (grouped API)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
Migrated tests from per-function registration to group-based UserReportConfig and add_function_group/get_function_group; ObjectStoreRef(name=...)ObjectStoreRef(value=...); object paths made relative ("reports/..."); access functions via group getters; removed KeyAlreadyExistsError imports.
Test plugin (Docker fixture)
packages/nvidia_nat_test/src/nat/test/plugin.py
Added session-scoped require_docker fixture that attempts to construct a DockerClient, yields it on success, and skips or raises with a reason when Docker is unavailable.
Core type handling
src/nat/builder/component_utils.py, src/nat/utils/type_converter.py, src/nat/utils/type_utils.py
Changed union and instance checks to use DecomposedType(...).is_instance(...) and get_base_type() where appropriate; _convert uses decomposed.is_instance(data) and captures src_type for warnings, altering how type matches and conversion warnings are determined.
Markdown link checks
ci/markdown-link-check-config.json
Added an ignore pattern for https://arize.com to markdown link checks.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Test as Test
  participant Builder as WorkflowBuilder
  participant Group as FunctionGroup(user_report)
  participant Store as ObjectStore

  Note over Test,Builder: Register grouped user-report functions
  Test->>Builder: add_function_group("user_report", UserReportConfig(...))
  Builder-->>Test: OK

  Note over Test,Group: Retrieve and invoke operations via group
  Test->>Builder: get_function_group("user_report")
  Builder-->>Test: Group
  Test->>Group: get("user_report.put")
  Group-->>Test: put_fn
  Test->>put_fn: put("reports/abc.json", data)
  put_fn->>Store: write "reports/abc.json"
  Store-->>put_fn: success
  put_fn-->>Test: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely describes the main changes to example tests by removing pytest skip markers and fixing test execution, is written in imperative mood, and meets the length requirement.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…avid-fix-example-tests

Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv marked this pull request as ready for review September 24, 2025 18:31
@dagardner-nv dagardner-nv requested a review from a team as a code owner September 24, 2025 18:31
@coderabbitai coderabbitai bot added the breaking Breaking change label Sep 24, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/nat/utils/type_converter.py (1)

221-243: isinstance with typing/generic targets can raise TypeError — use DecomposedType(...).is_instance.

isinstance(data, to_type) and isinstance(next_data, to_type) will fail for Annotated, Union, or parametrized generics (list[str]). Use the base-type-aware check for consistency with other parts.

Apply:

-        # 1) If data is already correct type
-        if isinstance(data, to_type):
+        # 1) If data is already correct type (base-type aware)
+        if DecomposedType(to_type).is_instance(data):
             return data
@@
-                        if isinstance(next_data, to_type):
+                        if DecomposedType(to_type).is_instance(next_data):
                             return next_data
🧹 Nitpick comments (14)
.gitlab-ci.yml (1)

82-82: Make MySQL service alias explicit (the mysql:9.3 tag exists)

  • Replace service entry:
-    - mysql:9.3
+    - name: mysql:9.3
+      alias: mysql
  • (Optional) In your test job, wait for MySQL before running tests:
-    - echo "Running tests"
+    - echo "Waiting for MySQL..."
+    - until (</dev/tcp/mysql/3306) >/dev/null 2>&1; do sleep 1; done
+    - echo "Running tests"
src/nat/utils/type_converter.py (2)

92-96: Avoid constructing DecomposedType when to_type is None (early-return first).

Minor robustness/clarity: check to_type is None before creating DecomposedType(to_type).

Apply:

-        decomposed = DecomposedType(to_type)
-
-        # 1) If data is already correct type, return it
-        if to_type is None or decomposed.is_instance(data):
-            return data
+        # 1) If to_type is None or data is already correct type, return it
+        if to_type is None:
+            return data
+        decomposed = DecomposedType(to_type)
+        if decomposed.is_instance(data):
+            return data

159-165: Log full stack on conversion failure when not re-raising.

Per guidelines, prefer logger.exception() when swallowing the error to aid diagnosis.

-        except ValueError:
-            logger.warning("Type conversion failed, using original value. From %s to %s", type(data), to_type)
+        except ValueError:
+            logger.exception("Type conversion failed, using original value. From %s to %s", type(data), to_type)
             # Return original data, let downstream code handle it
             return data
packages/nvidia_nat_test/src/nat/test/plugin.py (1)

210-223: Ensure Docker is actually reachable; close the client.

Instantiate via from_env(), ping the daemon to verify availability, and close the client after session. Current DockerClient() neither validates connectivity nor closes.

-@pytest.fixture(name="require_docker", scope='session')
-def require_docker_fixture(fail_missing: bool) -> "DockerClient":
+@pytest.fixture(name="require_docker", scope='session')
+def require_docker_fixture(fail_missing: bool) -> "DockerClient":
     """
     Use for integration tests that require Docker to be running.
     """
-    try:
-        from docker.client import DockerClient
-        yield DockerClient()
-    except Exception as e:
+    try:
+        from docker import from_env
+        client = from_env()
+        # Validate connectivity
+        client.ping()
+        try:
+            yield client
+        finally:
+            # Close client at end of session
+            client.close()
+    except Exception as e:
         reason = f"Unable to connect to Docker daemon: {e}"
         if fail_missing:
             raise RuntimeError(reason) from e
         pytest.skip(reason=reason)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (2)

50-54: Fix fixture docstring (returns dict of functions, not a function).

Minor clarity improvement.

-async def group(builder):
-    """Pytest fixture to get a function from the builder."""
+async def group(builder):
+    """Pytest fixture to get accessible functions from the user_report group."""

1-1: Typo in filename: consider renaming to ‘test_object_store_example_user_report_tool.py’.

Improves discoverability and avoids confusion.

Would you like me to open a follow-up to rename this file and update any references?

examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py (5)

41-49: Avoid brittle ../../../ jumps; resolve dataset path robustly and assert existence.

Using importlib.resources with upward traversal is fragile. Prefer Path-based resolution and check existence.

Apply:

-    with open(config_file, "r", encoding="utf-8") as file:
+    with open(config_file, "r", encoding="utf-8") as file:
         config = yaml.safe_load(file)
         input_filepath = config["eval"]["general"]["dataset"]["file_path"]

-    input_filepath_abs = importlib.resources.files(package_name).joinpath("../../../../../", input_filepath).absolute()
+    input_filepath_abs = Path(input_filepath)
+    if not input_filepath_abs.is_absolute():
+        input_filepath_abs = (Path.cwd() / input_filepath_abs).resolve()
+    assert input_filepath_abs.exists(), f"Dataset not found: {input_filepath_abs}"

51-52: Guard against empty or malformed datasets before indexing.

Add a precondition to avoid IndexError if the JSON isn’t a non-empty list.

-    input_data = input_data[0]  # Limit to first row for testing
+    assert isinstance(input_data, list) and input_data, "Dataset is empty or wrong format"
+    input_data = input_data[0]  # Limit to first row for testing

32-36: Add return type annotation to the test function.

Keep type hints consistent per repo standards.

-async def test_full_workflow():
+async def test_full_workflow() -> None:

59-59: Strengthen emptiness check to ignore whitespace-only results.

-    assert len(result) > 0, "Result is empty"
+    assert result and result.strip(), "Result is empty or whitespace"

62-62: Make label match case-insensitive to reduce flakiness.

-    assert input_data['label'] in result
+    assert input_data['label'].lower() in result.lower()
examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py (3)

69-69: Annotate test return type.

Align tests with type-hint policy.

-async def test_flow_chart_tool(df_path: Path):
+async def test_flow_chart_tool(df_path: Path) -> None:

80-80: Annotate test return type.

-async def test_token_usage_tool(df_path: Path):
+async def test_token_usage_tool(df_path: Path) -> None:

51-61: Prefer httpx and tighter exception handling for the Phoenix probe.

Per guidelines, use httpx and catch HTTP-specific errors. Also annotate fixture return type.

-    import requests
+    import httpx
     try:
-        response = requests.get("http://localhost:6006/v1/traces", timeout=5)
-        if response.status_code != 200:
+        with httpx.Client(timeout=5.0) as client:
+            response = client.get("http://localhost:6006/v1/traces")
+        if response.status_code != 200:
             raise ConnectionError(f"Unexpected status code: {response.status_code}")
-    except Exception as e:
+    except httpx.HTTPError as e:
         reason = f"Unable to connect to Phoenix server at http://localhost:6006/v1/traces: {e}"
         if fail_missing:
             raise RuntimeError(reason)
         pytest.skip(reason=reason)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3a964c and 17bbe8e.

⛔ Files ignored due to path filters (1)
  • examples/advanced_agents/profiler_agent/tests/test_spans.csv is excluded by !**/*.csv
📒 Files selected for processing (11)
  • .gitlab-ci.yml (1 hunks)
  • examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py (1 hunks)
  • examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py (1 hunks)
  • examples/evaluation_and_profiling/simple_web_query_eval/README.md (1 hunks)
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (2 hunks)
  • examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (3 hunks)
  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (2 hunks)
  • packages/nvidia_nat_test/src/nat/test/plugin.py (2 hunks)
  • src/nat/builder/component_utils.py (1 hunks)
  • src/nat/utils/type_converter.py (2 hunks)
  • src/nat/utils/type_utils.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py
  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
  • src/nat/utils/type_converter.py
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py
  • examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py
  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
  • src/nat/utils/type_converter.py
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py
  • examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py
  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
  • src/nat/utils/type_converter.py
  • examples/evaluation_and_profiling/simple_web_query_eval/README.md
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py
  • examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py
  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
  • examples/evaluation_and_profiling/simple_web_query_eval/README.md
  • examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py
  • examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py
  • examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py
src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All importable Python code must live under src/ (or packages//src/)

Files:

  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • src/nat/utils/type_converter.py
src/nat/**/*

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Changes in src/nat should prioritize backward compatibility

Files:

  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • src/nat/utils/type_converter.py

⚙️ CodeRabbit configuration file

This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.

Files:

  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • src/nat/utils/type_converter.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • src/nat/builder/component_utils.py
  • src/nat/utils/type_utils.py
  • src/nat/utils/type_converter.py
  • packages/nvidia_nat_test/src/nat/test/plugin.py
**/README.@(md|ipynb)

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Ensure READMEs follow the naming convention; avoid deprecated names; use “NeMo Agent Toolkit” (capital T) in headings

Files:

  • examples/evaluation_and_profiling/simple_web_query_eval/README.md
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
🧬 Code graph analysis (5)
examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py (1)
examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/register.py (1)
  • AlertTriageAgentWorkflowConfig (46-65)
src/nat/builder/component_utils.py (1)
src/nat/utils/type_utils.py (2)
  • DecomposedType (60-488)
  • is_instance (341-356)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (3)
examples/object_store/user_report/src/nat_user_report/user_report_tools.py (1)
  • UserReportConfig (31-43)
src/nat/builder/workflow_builder.py (4)
  • add_function_group (440-463)
  • add_function_group (1134-1135)
  • get_function_group (475-481)
  • get_function_group (1147-1153)
src/nat/builder/function.py (4)
  • get_accessible_functions (485-521)
  • ainvoke (116-117)
  • ainvoke (120-121)
  • ainvoke (124-163)
src/nat/utils/type_converter.py (1)
src/nat/utils/type_utils.py (1)
  • is_instance (341-356)
examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (1)
src/nat/utils/type_utils.py (1)
  • override (56-57)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (16)
.gitlab-ci.yml (1)

83-85: Pin Phoenix image and add readiness wait

  • Replace arizephoenix/phoenix:latest with ${PHOENIX_IMAGE} and pin to the latest stable tag (v11.36.0):
    variables:
      # Pin to a reproducible Phoenix release
  • PHOENIX_IMAGE: "arizephoenix/phoenix:<PINNED_TAG_OR_DIGEST>"
  • PHOENIX_IMAGE: "arizephoenix/phoenix:version-11.36.0"

- In your test script, wait for Phoenix to be ready before running tests:  
```yaml
  - echo "Waiting for Phoenix..."
  - until curl -fsS "http://phoenix:${NAT_CI_PHOENIX_PORT:-6006}/" >/dev/null; do sleep 1; done
  - echo "Running tests"
examples/evaluation_and_profiling/simple_web_query_eval/README.md (1)

89-90: Concurrency note addition looks good.

Clear, actionable guidance; matches the CLI usage pattern shown in tests.

src/nat/utils/type_converter.py (1)

201-212: Good: preserve original source type for indirect-conversion warnings.

Capturing src_type = type(data) and reusing it in warnings avoids spurious type re-evaluation.

src/nat/utils/type_utils.py (1)

356-356: LGTM: is_instance now respects base type.

Using get_base_type().root fixes checks for annotated/async-generator-wrapped types.

src/nat/builder/component_utils.py (1)

161-162: LGTM: union-branch check now base-type aware.

Switching to DecomposedType(arg).is_instance(value) aligns with the new type utilities and avoids false negatives for annotated types.

examples/evaluation_and_profiling/swe_bench/tests/test_swe_bench_eval.py (2)

26-26: LGTM: import path update.

Using nat_swe_bench.config is appropriate for config-only access.


82-83: LGTM: require_docker fixture usage.

Makes the Docker dependency explicit and skip/fail behavior configurable via plugin options.

packages/nvidia_nat_test/src/nat/test/plugin.py (1)

21-23: TYPE_CHECKING import is appropriate.

Keeps runtime deps optional while preserving types.

examples/evaluation_and_profiling/simple_web_query_eval/tests/test_simple_web_query_eval.py (2)

105-106: LGTM: API key requirement expressed via fixture.

Aligns with plugin-provided nvidia_api_key and removes skip markers.


127-130: No change needed: EvaluationRunConfig includes an override: tuple[tuple[str, str], ...] field.

examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (1)

33-39: LGTM: function group config and includes.

Group-based registration with include is coherent and uses new ObjectStoreRef(value=...).

examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py (3)

55-56: LGTM: single-run workflow and typed result.

Using to_type=str is clear and avoids downstream type surprises.


65-65: LGTM: semantic assertion.

Checking for “root cause category” in the output is a meaningful invariant.


32-33: LGTM: integration mark and nvidia_api_key fixture usage validated.

examples/advanced_agents/profiler_agent/tests/test_profiler_agent.py (2)

68-89: Async test support confirmed pytest-asyncio is installed and configured (asyncio_mode = "auto", pytest-asyncio==0.24.*).


63-66: Approve: fixture provides a stable path to test data.

@willkill07 willkill07 removed the breaking Breaking change label Sep 24, 2025
…ple_user_report_tool.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Will Killian <2007799+willkill07@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (1)

1-1: Fix filename typo: “objext” → “object”.

Rename the file to test_object_store_example_user_report_tool.py for clarity and professionalism.

🧹 Nitpick comments (3)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (3)

50-53: Rename fixture for clarity.

group actually returns a dict of functions, not the group instance. Consider renaming to user_report_functions (or similar) to reduce confusion.


28-40: Add return type hints on fixtures (pyright treats warnings as errors).

Annotate fixture return types to satisfy the project’s typing standards.

-async def builder():
+async def builder() -> WorkflowBuilder:
@@
-async def group(builder):
+async def group(builder) -> dict[str, Function]:

Add the missing import at the top of the file:

from nat.builder.function import Function

Also applies to: 50-53


60-71: Compare JSON structures, not serialized strings.

Avoid brittleness from string serialization differences.

-        assert result == json.dumps(test_report)
+        assert json.loads(result) == test_report
@@
-        assert result == json.dumps(test_report)
+        assert json.loads(result) == test_report

Also applies to: 81-85

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17bbe8e and eff57c1.

📒 Files selected for processing (1)
  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py
🧬 Code graph analysis (1)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (4)
examples/object_store/user_report/src/nat_user_report/user_report_tools.py (1)
  • UserReportConfig (31-43)
src/nat/builder/workflow_builder.py (4)
  • add_function_group (440-463)
  • add_function_group (1134-1135)
  • get_function_group (475-481)
  • get_function_group (1147-1153)
src/nat/builder/function.py (4)
  • get_accessible_functions (485-521)
  • ainvoke (116-117)
  • ainvoke (120-121)
  • ainvoke (124-163)
src/nat/object_store/models.py (1)
  • ObjectStoreItem (21-38)
🔇 Additional comments (5)
examples/object_store/user_report/tests/test_objext_store_example_user_report_tool.py (5)

33-39: LGTM: Correct migration to grouped function config.

Using ObjectStoreRef(value="test_object_store"), include list, and per-op descriptions aligns with the new API.


116-127: LGTM: Idempotent put behavior matches new API (string status instead of exception).

The “already exists” response aligns with the grouped-tooling behavior change.


179-181: Consistent object keys (no leading slash) — resolved.

The earlier inconsistency was fixed; keys now use "reports/..." everywhere, including deletion checks.

Also applies to: 195-196


206-241: LGTM: Solid end-to-end CRUD workflow test.

Covers put/get/update/get/delete/get(not found) with clear assertions.


59-61: Async tests correctly configured via pytest-asyncio
pytest-asyncio==0.24.* is installed and asyncio_mode = "auto" in pyproject.toml enables async def tests to run without explicit markers.

Signed-off-by: David Gardner <dagardner@nvidia.com>
…toolkit into david-fix-example-tests

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
ci/markdown-link-check-config.json (1)

19-20: Broaden the arize.com ignore to cover subdomains and subpaths
Many links use e.g. https://www.arize.com, https://phoenix.arize.com or subpaths like /docs/phoenix. Update the pattern to match an optional subdomain:

-            "pattern": "^https://arize\\.com"
+            "pattern": "^https://([^.]+\\.)?arize\\.com"

Run the existing JSON validation and link-scan scripts to confirm coverage.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff57c1 and 2c29ac6.

📒 Files selected for processing (1)
  • ci/markdown-link-check-config.json (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • ci/markdown-link-check-config.json
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (1)
ci/markdown-link-check-config.json (1)

17-18: Fix JSON syntax: add missing comma between objects

Without a comma after the MySQL pattern object, this JSON is invalid and will break the link checker in CI.

Apply this diff:

-        }
+        },

Likely an incorrect or invalid review comment.

@dagardner-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 424bc38 into NVIDIA:develop Sep 24, 2025
17 checks passed
@dagardner-nv dagardner-nv deleted the david-fix-example-tests branch September 24, 2025 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

swe_bench evaluation example broken in Python 3.13

2 participants