Skip to content

Add E2E tests for Simple Calculator Observability example#1019

Merged
rapids-bot[bot] merged 7 commits intoNVIDIA:release/1.3from
dagardner-nv:david-observe-simple-calc-e2e
Oct 16, 2025
Merged

Add E2E tests for Simple Calculator Observability example#1019
rapids-bot[bot] merged 7 commits intoNVIDIA:release/1.3from
dagardner-nv:david-observe-simple-calc-e2e

Conversation

@dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Oct 16, 2025

Description

  • This example contains 8 distinct workflows, each requiring different keys/services, this PR adds E2E tests for the following workflows:

    • Weave
    • Phoenix
    • Otel file
  • Remove unused LLM from configs

  • Misc documentation improvement

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • Documentation

    • Broadened observability docs with clearer headings, added platform sections and explicit integration links.
  • Changes

    • Example configs updated to use NIM as the sole LLM provider (OpenAI entries removed).
  • Tests

    • Added integration tests covering Weave, Phoenix, and OpenTelemetry observability backends and supporting test fixtures for external service setup.

Signed-off-by: David Gardner <dagardner@nvidia.com>
…to david-observe-simple-calc-e2e

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…roduction Monitoring Platforms' heading, implying that phoneix is only for local dev

Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv dagardner-nv self-assigned this Oct 16, 2025
@dagardner-nv dagardner-nv requested a review from a team as a code owner October 16, 2025 16:32
@dagardner-nv dagardner-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels Oct 16, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 16, 2025

Walkthrough

Removed OpenAI LLM entries from observability example configs, expanded README platform headings/links, and added integration tests and fixtures for Weave, Phoenix, and OpenTelemetry observability backends plus test-support fixtures for WANDB/weave.

Changes

Cohort / File(s) Change Summary
Documentation
examples/observability/simple_calculator_observability/README.md
Broadened platform sections and headings, removed specific “(Local Development)” qualifiers, converted Langfuse mention to an explicit hyperlink, and adjusted tracing/platform setup phrasing.
Config Files – OpenAI LLM Removal
examples/observability/simple_calculator_observability/configs/config-{catalyst,galileo,langfuse,langsmith,patronus,phoenix,weave}.yml
Removed openai_llm block from llms (previously _type: openai, model_name: gpt-3.5-turbo, max_tokens: 2000); nim_llm remains as the sole LLM.
Integration Tests
examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
Added new test module with fixtures (config_dir, nvidia_api_key, question, expected_answer, weave_project_name) and three integration tests (test_weave_full_workflow, test_phoenix_full_workflow, test_otel_full_workflow) validating workflow execution and trace emission.
Test Infrastructure
packages/nvidia_nat_test/src/nat/test/plugin.py
Added session-scoped fixtures: wandb_api_key (ensures WANDB_API_KEY) and weave (attempts to import weave, skipping or failing based on configuration); added types import for fixture typing.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Test as Test Suite
    participant Config as YAML Config Loader
    participant Workflow as Calculator Workflow
    participant Tracer as Observability Backend

    Test->>Config: load config (weave/phoenix/otel)
    Config-->>Test: config ready
    Test->>Tracer: init tracer/project (if needed)
    Test->>Workflow: run_workflow(question)
    activate Workflow
    Workflow->>Workflow: execute steps / call LLM (nim_llm)
    Workflow-->>Test: return result
    deactivate Workflow
    Workflow->>Tracer: emit traces/events
    Test->>Tracer: fetch/validate traces
    Tracer-->>Test: traces validated
    Test->>Tracer: cleanup (project/trace artifacts)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Add E2E tests for Simple Calculator Observability example" is concise at 57 characters, well within the ~72 character limit, and uses proper imperative mood with the verb "Add." The title is fully related to the main objective of the changeset, which is introducing end-to-end tests for three observability workflows (Weave, Phoenix, and OpenTelemetry file exporter). While the PR also includes secondary changes such as removing unused OpenAI LLM configurations and documentation improvements, these are supporting modifications to the primary goal of adding comprehensive test coverage for the example.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17615ba and 3e40a50.

📒 Files selected for processing (1)
  • examples/observability/simple_calculator_observability/README.md (9 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/README.@(md|ipynb)

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Ensure READMEs follow the naming convention; avoid deprecated names; use “NeMo Agent Toolkit” (capital T) in headings

Files:

  • examples/observability/simple_calculator_observability/README.md
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/observability/simple_calculator_observability/README.md
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/observability/simple_calculator_observability/README.md
🪛 LanguageTool
examples/observability/simple_calculator_observability/README.md

[grammar] ~216-~216: There might be a mistake here.
Context: ....yml| Phoenix | Tracing with Phoenix | |config-otel-file.yml` | File Export |...

(QB_NEW_EN)


[grammar] ~217-~217: There might be a mistake here.
Context: ... tracing for development and debugging | | config-langfuse.yml | Langfuse | Lan...

(QB_NEW_EN)


[grammar] ~218-~218: There might be a mistake here.
Context: ...se | Langfuse monitoring and analytics | | config-langsmith.yml | LangSmith | L...

(QB_NEW_EN)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
examples/observability/simple_calculator_observability/README.md (1)

20-20: Use the correct product name capitalization

Prefer “NeMo Agent Toolkit” (capital T), not “NeMo Agent toolkit.”

Apply this diff:

-This example demonstrates how to implement **observability and tracing capabilities** using the NVIDIA NeMo Agent toolkit. You'll learn to monitor, trace, and analyze your AI agent's behavior in real-time using the Simple Calculator workflow.
+This example demonstrates how to implement **observability and tracing capabilities** using the NVIDIA NeMo Agent Toolkit. You'll learn to monitor, trace, and analyze your AI agent's behavior in real-time using the Simple Calculator workflow.

As per coding guidelines

packages/nvidia_nat_test/src/nat/test/plugin.py (1)

238-250: Move return statement to else block.

The return weave statement at line 245 should be in an else block to clarify that it only executes when the import succeeds, improving code structure and addressing the static analysis hint.

Apply this diff:

 @pytest.fixture(name="weave", scope='session')
 def require_weave_fixture(fail_missing: bool) -> types.ModuleType:
     """
     Use for integration tests that require Weave to be running.
     """
     try:
         import weave
-        return weave
     except Exception as e:
         reason = "Weave must be installed to run weave based tests"
         if fail_missing:
             raise RuntimeError(reason) from e
         pytest.skip(reason=reason)
+    else:
+        return weave
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9cafe4 and 17615ba.

📒 Files selected for processing (10)
  • examples/observability/simple_calculator_observability/README.md (3 hunks)
  • examples/observability/simple_calculator_observability/configs/config-catalyst.yml (0 hunks)
  • examples/observability/simple_calculator_observability/configs/config-galileo.yml (0 hunks)
  • examples/observability/simple_calculator_observability/configs/config-langfuse.yml (0 hunks)
  • examples/observability/simple_calculator_observability/configs/config-langsmith.yml (0 hunks)
  • examples/observability/simple_calculator_observability/configs/config-patronus.yml (0 hunks)
  • examples/observability/simple_calculator_observability/configs/config-phoenix.yml (1 hunks)
  • examples/observability/simple_calculator_observability/configs/config-weave.yml (1 hunks)
  • examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (1 hunks)
  • packages/nvidia_nat_test/src/nat/test/plugin.py (2 hunks)
💤 Files with no reviewable changes (5)
  • examples/observability/simple_calculator_observability/configs/config-catalyst.yml
  • examples/observability/simple_calculator_observability/configs/config-langsmith.yml
  • examples/observability/simple_calculator_observability/configs/config-langfuse.yml
  • examples/observability/simple_calculator_observability/configs/config-galileo.yml
  • examples/observability/simple_calculator_observability/configs/config-patronus.yml
🧰 Additional context used
📓 Path-based instructions (10)
**/*.{yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

In workflow/config YAML, set llms.._type: nat_test_llm to stub responses.

Files:

  • examples/observability/simple_calculator_observability/configs/config-phoenix.yml
  • examples/observability/simple_calculator_observability/configs/config-weave.yml
**/*.{py,yaml,yml}

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.{py,yaml,yml}: Configure response_seq as a list of strings; values cycle per call, and [] yields an empty string.
Configure delay_ms to inject per-call artificial latency in milliseconds for nat_test_llm.

Files:

  • examples/observability/simple_calculator_observability/configs/config-phoenix.yml
  • examples/observability/simple_calculator_observability/configs/config-weave.yml
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
**/configs/**

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Configuration files consumed by code must be stored next to that code in a configs/ folder

Files:

  • examples/observability/simple_calculator_observability/configs/config-phoenix.yml
  • examples/observability/simple_calculator_observability/configs/config-weave.yml
**/*

⚙️ CodeRabbit configuration file

**/*: # Code Review Instructions

  • Ensure the code follows best practices and coding standards. - For Python code, follow
    PEP 20 and
    PEP 8 for style guidelines.
  • Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
    Example:
    def my_function(param1: int, param2: str) -> bool:
        pass
  • For Python exception handling, ensure proper stack trace preservation:
    • When re-raising exceptions: use bare raise statements to maintain the original stack trace,
      and use logger.error() (not logger.exception()) to avoid duplicate stack trace output.
    • When catching and logging exceptions without re-raising: always use logger.exception()
      to capture the full stack trace information.

Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any

words listed in the ci/vale/styles/config/vocabularies/nat/reject.txt file, words that might appear to be
spelling mistakes but are listed in the ci/vale/styles/config/vocabularies/nat/accept.txt file are OK.

Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,

and should contain an Apache License 2.0 header comment at the top of each file.

  • Confirm that copyright years are up-to date whenever a file is changed.

Files:

  • examples/observability/simple_calculator_observability/configs/config-phoenix.yml
  • examples/observability/simple_calculator_observability/configs/config-weave.yml
  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/observability/simple_calculator_observability/README.md
  • examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
examples/**/*

⚙️ CodeRabbit configuration file

examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.

  • If an example contains Python code, it should be placed in a subdirectory named src/ and should
    contain a pyproject.toml file. Optionally, it might also contain scripts in a scripts/ directory.
  • If an example contains YAML files, they should be placed in a subdirectory named configs/. - If an example contains sample data files, they should be placed in a subdirectory named data/, and should
    be checked into git-lfs.

Files:

  • examples/observability/simple_calculator_observability/configs/config-phoenix.yml
  • examples/observability/simple_calculator_observability/configs/config-weave.yml
  • examples/observability/simple_calculator_observability/README.md
  • examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/nat-test-llm.mdc)

**/*.py: Programmatic use: create TestLLMConfig(response_seq=[...], delay_ms=...), add with builder.add_llm("", cfg).
When retrieving the test LLM wrapper, use builder.get_llm(name, wrapper_type=LLMFrameworkEnum.) and call the framework’s method (e.g., ainvoke, achat, call).

**/*.py: In code comments/identifiers use NAT abbreviations as specified: nat for API namespace/CLI, nvidia-nat for package name, NAT for env var prefixes; do not use these abbreviations in documentation
Follow PEP 20 and PEP 8; run yapf with column_limit=120; use 4-space indentation; end files with a single trailing newline
Run ruff check --fix as linter (not formatter) using pyproject.toml config; fix warnings unless explicitly ignored
Respect naming: snake_case for functions/variables, PascalCase for classes, UPPER_CASE for constants
Treat pyright warnings as errors during development
Exception handling: use bare raise to re-raise; log with logger.error() when re-raising to avoid duplicate stack traces; use logger.exception() when catching without re-raising
Provide Google-style docstrings for every public module, class, function, and CLI command; first line concise and ending with a period; surround code entities with backticks
Validate and sanitize all user input, especially in web or CLI interfaces
Prefer httpx with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use async/await for I/O-bound work; profile CPU-heavy paths with cProfile or mprof before optimizing; cache expensive computations with functools.lru_cache or external cache; leverage NumPy vectorized operations when beneficial

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
  • examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py
packages/*/src/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Importable Python code inside packages must live under packages//src/

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
{src/**/*.py,packages/*/src/**/*.py}

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

All public APIs must have Python 3.11+ type hints on parameters and return values; prefer typing/collections.abc abstractions; use typing.Annotated when useful

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
packages/**/*

⚙️ CodeRabbit configuration file

packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain a pyproject.toml file. - The pyproject.toml file should declare a dependency on nvidia-nat or another package with a name starting
with nvidia-nat-. This dependency should be declared using ~=<version>, and the version should be a two
digit version (ex: ~=1.0).

  • Not all packages contain Python code, if they do they should also contain their own set of tests, in a
    tests/ directory at the same level as the pyproject.toml file.

Files:

  • packages/nvidia_nat_test/src/nat/test/plugin.py
**/README.@(md|ipynb)

📄 CodeRabbit inference engine (.cursor/rules/general.mdc)

Ensure READMEs follow the naming convention; avoid deprecated names; use “NeMo Agent Toolkit” (capital T) in headings

Files:

  • examples/observability/simple_calculator_observability/README.md
🪛 Ruff (0.14.0)
packages/nvidia_nat_test/src/nat/test/plugin.py

245-245: Consider moving this statement to an else block

(TRY300)

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py

48-48: Unused function argument: wandb_api_key

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: CI Pipeline / Check
🔇 Additional comments (12)
examples/observability/simple_calculator_observability/README.md (4)

56-77: Phoenix section rename looks good

Clear, concise, and consistent with the rest of the doc.


79-100: File-based tracing header update is good

Instructions are straightforward; no issues spotted.


103-103: Langfuse hyperlink update LGTM

Improves readability and navigability.


212-224: All referenced config files exist in examples/observability/simple_calculator_observability/configs Verified that the directory contains exactly the eight YAML files listed in the table.

examples/observability/simple_calculator_observability/configs/config-weave.yml (1)

42-42: LGTM! Removal of unused OpenAI LLM configuration.

The removal of the openai_llm configuration aligns with the PR objectives. The workflow correctly references nim_llm (line 52), which remains configured.

packages/nvidia_nat_test/src/nat/test/plugin.py (1)

227-235: LGTM! Follows established fixture pattern.

The wandb_api_key fixture correctly follows the pattern established by other API key fixtures in this file, using require_env_variables to handle missing environment variables appropriately.

examples/observability/simple_calculator_observability/configs/config-phoenix.yml (1)

61-61: LGTM! Consistent removal of unused OpenAI LLM.

This change mirrors the cleanup in config-weave.yml. The workflow correctly uses nim_llm (line 71), which remains properly configured.

examples/observability/simple_calculator_observability/tests/test_simple_calc_observability.py (5)

16-24: LGTM! Standard test imports.

The imports are appropriate for the test scenarios and follow best practices.


27-44: LGTM! Well-structured test fixtures.

The fixtures for configuration directory, API key, question, and expected answer are appropriately scoped and follow pytest best practices.


64-71: LGTM! Well-structured Weave integration test.

The test correctly loads the config, overrides the project name with the test fixture, and validates the workflow. The use of the @pytest.mark.integration and @pytest.mark.usefixtures("wandb_api_key") decorators is appropriate.


74-80: LGTM! Phoenix integration test follows correct pattern.

The test appropriately loads the Phoenix configuration, overrides the trace endpoint URL with the fixture value, and validates the workflow execution.


84-107: LGTM! Comprehensive OTEL validation.

The test correctly:

  • Creates a temporary output file for OTEL traces
  • Loads and configures the OTEL file exporter
  • Validates that traces were generated
  • Verifies that the expected calculator_multiply function appears in the trace ancestry

The validation logic is thorough and appropriate for an E2E test.

Copy link
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, but changes need to be made to the README

Signed-off-by: David Gardner <dagardner@nvidia.com>
@dagardner-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 9536112 into NVIDIA:release/1.3 Oct 16, 2025
17 checks passed
@dagardner-nv dagardner-nv deleted the david-observe-simple-calc-e2e branch October 16, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants