Fix/agno flaky test release 1.4#1383
Conversation
The test was expecting get_running_loop() to be called exactly once, but on arm64 it gets called 3 times (vs 1 on amd64). This is due to pydantic-core's C extensions behaving differently across architectures. Changed the assertion to check for known values [1, 3] instead of assert_called_once(). Added comments explaining the difference so future maintainers know this is expected behavior. Fixes: CI Pipeline / Test (arm64, 3.12) failure in PR #1349 Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
WalkthroughUpdated a test to perform architecture- and Python-version-aware validation in Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (7)**/*.py📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*.{py,yaml,yml,json,toml}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/test_*.py📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*.{py,md,mdx,rst}📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Files:
**/*⚙️ CodeRabbit configuration file
Files:
packages/**/*⚙️ CodeRabbit configuration file
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ions Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/1.4 #1383 +/- ##
============================================
Coverage 74.98% 74.98%
============================================
Files 552 552
Lines 38769 38769
============================================
Hits 29072 29072
Misses 9697 9697
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The test was failing on arm64 (aarch64) Python 3.11 because it expected 3 calls to get_running_loop() but only 1 call was made. This is due to pydantic-core C extension differences across architectures and Python versions. Changes: - Use platform.machine() and platform.python_version_tuple() for detection - Define architecture + Python version specific expected call counts - Confirmed: aarch64 Py3.11 → 1 call (CI job 60115041673) - Confirmed: x86_64 Py3.11/3.13 → 1 call (CI jobs 60115041674, 60115041704) - Accept [1, 3] for unconfirmed combinations to prevent future flakiness - Add clear documentation with CI job references for traceability This addresses willkill07's review feedback to use platform detection for more precise test assertions. Fixes: CI Pipeline / Test (arm64, 3.11) failure in PR #1349 Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
189-230: Well-structured architecture-aware test logic.The implementation correctly addresses the flaky test issue with:
- Proper normalization of machine name to lowercase
- Consistent use of string types for version tuple elements
- Defensive fallback for unknown platform/version combinations
- Actionable error message with guidance for maintainers
The CI job references in comments provide good traceability.
Minor simplification: Consider tuple unpacking for the version extraction.
♻️ Optional simplification
- py_version_tuple = platform.python_version_tuple() - py_major, py_minor = py_version_tuple[0], py_version_tuple[1] + py_major, py_minor, _ = platform.python_version_tuple()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
**/test_*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/test_*.py: Use pytest with pytest-asyncio for asynchronous code testing
Test functions should be named with 'test_' prefix using snake_case
Extract frequently repeated code in tests into pytest fixtures with the 'fixture_' prefix on the function name and a 'name' argument in the decorator
Mock external services with 'pytest_httpserver' or 'unittest.mock' instead of hitting live endpoints in tests
Mark slow tests with '@pytest.mark.slow' so they can be skipped in the default test suite
Mark integration tests requiring external services with '@pytest.mark.integration' so they can be skipped in the default test suite
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
packages/**/*
⚙️ CodeRabbit configuration file
packages/**/*: - This directory contains optional plugin packages for the toolkit, each should contain apyproject.tomlfile. - Thepyproject.tomlfile should declare a dependency onnvidia-nator another package with a name starting
withnvidia-nat-. This dependency should be declared using~=<version>, and the version should be a two
digit version (ex:~=1.0).
- Not all packages contain Python code, if they do they should also contain their own set of tests, in a
tests/directory at the same level as thepyproject.tomlfile.- When adding a new package, that new package name (as defined in the
pyproject.tomlfile) should
be added as a dependency to the nvidia-nat-all package inpackages/nvidia_nat_all/pyproject.toml
Files:
packages/nvidia_nat_agno/tests/test_tool_wrapper.py
🔇 Additional comments (1)
packages/nvidia_nat_agno/tests/test_tool_wrapper.py (1)
17-17: LGTM!The
platformimport is correctly positioned alphabetically within the stdlib imports section.
Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
## Description fix: simplify event loop test to assert_called The test was checking exact call counts of get_running_loop() which varies by architecture, Python version, and pydantic-core version. This caused repeated CI failures (#1383, #1396). Simplified to assert_called() since: - Exact call count is an implementation detail - Other tests already cover the event loop code path - The previous approach was not sustainable Fixes: #1396 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Simplified event-loop verification by replacing architecture-dependent assertions with a basic check that the running event loop is accessed. * Renamed and clarified the related test to reflect that only access is validated, improving cross-environment reliability and maintainability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
## Description Fixes the flaky `test_get_event_loop_called` test that's failing on arm64 architectures. The test was expecting `get_running_loop()` to be called exactly once, but on arm64 it gets called 3 times (vs 1 on amd64). This is due to pydantic-core's C extensions behaving differently across architectures during function introspection. Changed the assertion to check for known values `[1, 3]` instead of `assert_called_once()`. Added inline comments explaining the architecture differences so future maintainers understand this is expected behavior. **Fixes:** CI Pipeline / Test (arm64, 3.12) failure in NVIDIA#1349 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Made test validations architecture- and Python-version-aware to reduce flaky failures across ARM64 and x86_64 hosts. * Replaced a rigid single-call assertion with configurable call-count checks per environment and a safer fallback for unknown combos. * Improved failure messages to clearly explain mismatches and guide updates when CI behavior changes. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
## Description fix: simplify event loop test to assert_called The test was checking exact call counts of get_running_loop() which varies by architecture, Python version, and pydantic-core version. This caused repeated CI failures (NVIDIA#1383, NVIDIA#1396). Simplified to assert_called() since: - Exact call count is an implementation detail - Other tests already cover the event loop code path - The previous approach was not sustainable Fixes: NVIDIA#1396 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Simplified event-loop verification by replacing architecture-dependent assertions with a basic check that the running event loop is accessed. * Renamed and clarified the related test to reflect that only access is validated, improving cross-environment reliability and maintainability. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: mnajafian-nv <mnajafian@nvidia.com>
Description
Fixes the flaky
test_get_event_loop_calledtest that's failing on arm64 architectures.The test was expecting
get_running_loop()to be called exactly once, but on arm64 it gets called 3 times (vs 1 on amd64). This is due to pydantic-core's C extensions behaving differently across architectures during function introspection.Changed the assertion to check for known values
[1, 3]instead ofassert_called_once(). Added inline comments explaining the architecture differences so future maintainers understand this is expected behavior.Fixes: CI Pipeline / Test (arm64, 3.12) failure in #1349
By Submitting this PR I confirm:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.