Async endpoint improvements#1374
Conversation
…ry point Signed-off-by: David Gardner <dagardner@nvidia.com>
…ssion.remove() is always called Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
This reverts commit 6a1031e. Signed-off-by: David Gardner <dagardner@nvidia.com>
…sion_manager Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…to david-rel-14-cve-mem-leak Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
This reverts commit 01a94fb. Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
WalkthroughCentralizes logging into a utils helper, adds async Dask task helpers for job execution and periodic cleanup, integrates those helpers into the FastAPI front-end and worker (with env-driven thread/log flags), and hardens JobStore cleanup to count and safely remove expired jobs. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastAPI_Plugin
participant Dask_Client
participant Dask_Worker
participant async_job
participant JobStore
participant Database
Client->>FastAPI_Plugin: POST /enqueue (payload)
FastAPI_Plugin->>Dask_Client: submit run_generation(configure_logging, log_level, scheduler_addr, db_url, config_file, job_id, payload)
Dask_Client->>Dask_Worker: schedule task
Dask_Worker->>async_job: invoke run_generation()
async_job->>async_job: optionally call setup_logging(log_level)
async_job->>async_job: load workflow & compute result
async_job->>JobStore: update job status (SUCCESS/FAILURE) with result/error
JobStore->>Database: persist status/result
async_job->>Dask_Worker: cleanup resources and finish
sequenceDiagram
participant FastAPI_Plugin
participant Dask_Client
participant Dask_Worker
participant async_job
participant JobStore
participant Database
loop every sleep_time_sec
FastAPI_Plugin->>Dask_Client: submit periodic_cleanup(scheduler_addr, db_url, sleep_time_sec, configure_logging, log_level)
Dask_Client->>Dask_Worker: schedule periodic task
Dask_Worker->>async_job: invoke periodic_cleanup()
async_job->>async_job: optionally call setup_logging(log_level)
async_job->>async_job: sleep interval
async_job->>JobStore: cleanup_expired_jobs() -> returns num_expired
JobStore->>Database: query & remove expired records, attempt cancellation via Dask Variable
async_job->>FastAPI_Plugin: log cleanup result (num_expired)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…to david-rel-14-cve-mem-leak Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…ging in sub-processes Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
This reverts commit 06bd4b2. Signed-off-by: David Gardner <dagardner@nvidia.com>
This reverts commit 71ab041. Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…to david-rel-14-cve-mem-leak Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/nat/cli/entrypoint.py (1)
16-24: Remove duplicate and conflicting license header.The file has two license headers: Apache-2.0 (lines 1-14) and a proprietary NVIDIA license (lines 16-24). Per the coding guidelines, all code should be licensed under Apache License 2.0 only. The second header block should be removed.
🔧 Proposed fix
# limitations under the License. -# SPDX-FileCopyrightText: Copyright (c) 2024-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. -# SPDX-License-Identifier: LicenseRef-NvidiaProprietary -# -# NVIDIA CORPORATION, its affiliates and licensors retain all intellectual -# property and proprietary rights in and to this material, related -# documentation and any modifications thereto. Any use, reproduction, -# disclosure or distribution of this material and related documentation -# without an express license agreement from NVIDIA CORPORATION or -# its affiliates is strictly prohibited. - import logging
🤖 Fix all issues with AI agents
In @src/nat/front_ends/fastapi/async_job.py:
- Line 1: Update the file header copyright year range from "2026" to "2025-2026"
so it matches other files in the PR; locate the SPDX/ copyright comment at the
top of src/nat/front_ends/fastapi/async_job.py and change the year token in that
comment to "2025-2026".
🧹 Nitpick comments (1)
src/nat/utils/log_utils.py (1)
43-49: Consider addingforce=Truefor repeatedbasicConfigcalls.
logging.basicConfig()is a no-op if the root logger already has handlers configured. When Dask workers call this function in subprocess mode, it should work fine. However, in thread mode, subsequent calls may not apply the configuration if handlers were already set.Consider using
force=True(Python 3.8+) to ensure configuration is always applied:♻️ Suggested improvement
def setup_logging(log_level: int): """Configure logging with the specified level""" logging.basicConfig( level=log_level, format=LOG_FORMAT, datefmt=LOG_DATE_FORMAT, + force=True, )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/utils/log_utils.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/utils/log_utils.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/utils/log_utils.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/utils/log_utils.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/utils/log_utils.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/utils/log_utils.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/cli/entrypoint.pysrc/nat/front_ends/fastapi/async_job.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/utils/log_utils.pysrc/nat/front_ends/fastapi/job_store.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin.py
🧬 Code graph analysis (5)
src/nat/cli/entrypoint.py (1)
src/nat/utils/log_utils.py (1)
setup_logging(43-49)
src/nat/front_ends/fastapi/async_job.py (6)
src/nat/utils/log_utils.py (1)
setup_logging(43-49)src/nat/cli/entrypoint.py (1)
setup_logging(46-50)src/nat/front_ends/fastapi/job_store.py (1)
JobStatus(58-82)src/nat/front_ends/fastapi/response_helpers.py (1)
generate_single_response(108-118)src/nat/runtime/loader.py (1)
load_workflow(97-124)src/nat/builder/context.py (1)
output(60-61)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (3)
src/nat/front_ends/fastapi/async_job.py (1)
run_generation(32-82)src/nat/utils/log_utils.py (1)
setup_logging(43-49)src/nat/cli/entrypoint.py (1)
setup_logging(46-50)
src/nat/utils/log_utils.py (1)
src/nat/cli/entrypoint.py (1)
setup_logging(46-50)
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py (2)
src/nat/front_ends/fastapi/async_job.py (1)
periodic_cleanup(85-122)src/nat/front_ends/fastapi/dask_client_mixin.py (2)
DaskClientMixin(27-65)client(30-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (19)
src/nat/utils/log_utils.py (1)
18-19: LGTM!The logging format constants are well-defined and follow Python conventions for module-level constants (UPPER_CASE). The format string includes useful debugging information (timestamp, level, module, line number, message).
src/nat/cli/entrypoint.py (1)
35-35: LGTM!The refactoring to delegate logging setup to the centralized
log_utils.setup_loggingis clean. The function maintains its existing signature (accepting a string log level) and correctly converts it to a numeric level before calling the utility function.Also applies to: 46-50
src/nat/front_ends/fastapi/job_store.py (4)
211-218: LGTM - Good fix for session memory leak.Wrapping the session context in
try/finallyensures_session.remove()is always called, preventing the async_scoped_session from accumulating stale task keys. This directly addresses the memory leak concern mentioned in the PR objectives.
506-513: LGTM!Good improvements: the return type annotation
-> intdocuments the new behavior, and the startup log provides visibility into cleanup operations.
545-566: LGTM - Robust variable cleanup with proper error handling.The
finallyblock ensuresvar.delete()is called regardless of whether the future cancellation succeeds. The exception handling correctly useslogger.exception()for logging without re-raising, which captures the full stack trace as per coding guidelines. The explicitdel varafter deletion is a reasonable defensive measure for memory cleanup.
571-571: LGTM!Returning
num_expiredenables callers to log/monitor cleanup activity, which aligns with the PR goal of increased verbosity for Dask-related logs.src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (3)
58-58: LGTM!Good refactoring - importing
run_generationfrom the dedicatedasync_jobmodule improves code organization by keeping Dask task functions self-contained, as mentioned in the PR objectives.Also applies to: 79-79
111-113: LGTM!Environment-driven configuration for Dask threading mode and log level is well-implemented. Calling
setup_loggingduring initialization ensures consistent logging configuration in the worker process.
929-937: LGTM - Correct configure_logging logic.Passing
not self._use_dask_threadsas theconfigure_loggingparameter is correct: when using subprocess workers, logging needs to be configured in each subprocess; when using threads, the parent process's logging configuration is shared, so reconfiguration is unnecessary.src/nat/front_ends/fastapi/async_job.py (4)
15-17: LGTM!Clear module docstring explaining the design rationale for self-contained Dask task functions.
24-29: LGTM!Good design - the deferred import of
setup_logginginside the function ensures proper isolation when this module is serialized and sent to Dask workers.
32-82: LGTM - Well-structured Dask task with proper cleanup.The function is self-contained with deferred imports, includes comprehensive docstrings, and implements proper resource cleanup:
- Explicit
delstatements for session objects to reduce memory footprint- Uses
logger.exception()correctly when catching without re-raising- Updates job status appropriately on success/failure
85-122: LGTM - Appropriate bare except for long-running cleanup task.The bare
exceptclause (with# noqa: E722) is acceptable here because:
- This is a long-running Dask task that should never terminate due to transient errors
- All exceptions are properly logged with
logger.exception()for debugging- The loop continues to provide cleanup service
The function is well-documented with a complete docstring.
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py (6)
16-16: LGTM!Good imports:
copyfor safe uvicorn config modification,periodic_cleanupfrom the new dedicated module, andLOG_DATE_FORMATfor consistent timestamp formatting.Also applies to: 24-24, 32-32
49-49: LGTM!Initializing
_use_dask_threadstoFalseis a safe default that will be updated based on configuration when the cluster is set up.
66-74: LGTM!Good changes:
- Using
logger.infoinstead oflogger.debugfor cleanup task submission increases visibility- Delegating to imported
periodic_cleanupimproves modularityconfigure_logging=not self._use_dask_threadscorrectly configures logging only in subprocess mode
108-121: LGTM!Correct implementation:
processes=not self._use_dask_threadscreates thread workers whendask_workers == 'threads'- Worker setup (
_setup_worker) is correctly skipped for thread mode and Windows, asos.setsid()only applies to Unix subprocess workers
148-153: LGTM!Setting environment variables
NAT_USE_DASK_THREADSandNAT_LOG_LEVELensures worker subprocesses can read the configuration. Usingstr(int(...))for the boolean conversion ensures consistent "0"/"1" string representation.
184-199: LGTM - Safe uvicorn logging customization.Using
copy.deepcopy(uvicorn.config.LOGGING_CONFIG)before modifying ensures the global uvicorn configuration remains untouched. The timestamp prefix and date format customization aligns logging output with the rest of the application.
willkill07
left a comment
There was a problem hiding this comment.
LGTM except for the exception-flow driven cleanup of job_store
Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
…into david-rel-14-cve-mem-leak Signed-off-by: David Gardner <dagardner@nvidia.com>
Signed-off-by: David Gardner <dagardner@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @src/nat/front_ends/fastapi/async_job.py:
- Around line 82-119: The loop in periodic_cleanup currently sleeps before
running cleanup and uses a bare except that swallows asyncio.CancelledError;
change the loop to run cleanup_expired_jobs() first and then await
asyncio.sleep(sleep_time_sec), and replace the bare except with two handlers:
except asyncio.CancelledError: raise to allow graceful cancellation, and except
Exception as e: logger.exception("Error during job cleanup: %s", e) to log other
errors; keep references to periodic_cleanup and JobStore.cleanup_expired_jobs
when making the edits.
🧹 Nitpick comments (1)
src/nat/front_ends/fastapi/async_job.py (1)
24-30: Consider making logging setup explicitly idempotent (or intentionally “force”).
logging.basicConfig(...)is a no-op if handlers already exist; if the intent is “configure logging in subprocess tasks”, this is probably fine, but please confirm you don’t needforce=True(Python 3.8+) or a “no handlers present” guard to avoid surprises when a worker already configured logging.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/nat/front_ends/fastapi/async_job.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
src/nat/front_ends/fastapi/async_job.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
src/nat/front_ends/fastapi/async_job.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
src/nat/front_ends/fastapi/async_job.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
src/nat/front_ends/fastapi/async_job.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
src/nat/front_ends/fastapi/async_job.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/async_job.py
🧠 Learnings (1)
📚 Learning: 2026-01-05T15:46:49.677Z
Learnt from: CR
Repo: NVIDIA/NeMo-Agent-Toolkit PR: 0
File: .cursor/rules/general.mdc:0-0
Timestamp: 2026-01-05T15:46:49.677Z
Learning: Applies to **/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst} : Confirm that copyright years are up-to-date whenever a file is changed
Applied to files:
src/nat/front_ends/fastapi/async_job.py
🧬 Code graph analysis (1)
src/nat/front_ends/fastapi/async_job.py (4)
src/nat/utils/log_utils.py (1)
setup_logging(43-49)src/nat/front_ends/fastapi/job_store.py (4)
JobStatus(58-82)JobStore(138-571)update_status(328-378)cleanup_expired_jobs(506-571)src/nat/front_ends/fastapi/response_helpers.py (1)
generate_single_response(108-118)src/nat/runtime/loader.py (1)
load_workflow(97-124)
🔇 Additional comments (1)
src/nat/front_ends/fastapi/async_job.py (1)
1-17: SPDX header + 2026 copyright year look correct for a new file.Based on learnings, thanks for keeping the copyright year current.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/nat/front_ends/fastapi/job_store.py (1)
506-512: Docstring missingReturnssection.The method signature now returns
int, but the docstring doesn't document what the return value represents. Per the coding guidelines requiring Google-style docstrings, please add aReturnssection.📝 Suggested docstring update
async def cleanup_expired_jobs(self) -> int: """ Cleanup expired jobs, keeping the most recent one. Updated_at is used instead of created_at to determine the most recent job. This is because jobs may not be processed in the order they are created. + + Returns + ------- + int + The number of expired jobs found during this cleanup cycle. """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/nat/front_ends/fastapi/job_store.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
src/nat/front_ends/fastapi/job_store.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
src/nat/front_ends/fastapi/job_store.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
src/nat/front_ends/fastapi/job_store.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
src/nat/front_ends/fastapi/job_store.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
src/nat/front_ends/fastapi/job_store.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/job_store.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (3)
src/nat/front_ends/fastapi/job_store.py (3)
211-218: LGTM! Proper session cleanup pattern.The try/finally ensures
self._session.remove()is called even when exceptions occur during session usage, preventing memory leaks in the scoped session registry.
545-545: Good defensive cleanup pattern.Initializing
var = Nonebefore the try block and checking it in the finally block ensures the Dask Variable is deleted even when cancellation fails. Usinglogger.exception()for caught exceptions is correct per the coding guidelines.Also applies to: 560-566
541-542: Verify return value semantics:num_expiredvssuccessfully_expiredcount.The method returns
num_expired(total expired jobs found), but only jobs insuccessfully_expiredare actually marked as expired in the database. If some jobs fail to cancel (caught at line 557-558), the return value will be higher than the number of jobs actually cleaned up.Consider whether the caller expects the count of jobs found expired or jobs successfully cleaned up. If the latter, return
len(successfully_expired)instead.Also applies to: 570-570
Signed-off-by: David Gardner <dagardner@nvidia.com>
…ame name Signed-off-by: David Gardner <dagardner@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py:
- Around line 111-113: Validate and handle parsing errors when reading
NAT_FASTAPI_LOG_LEVEL: wrap the int conversion for self._log_level in a
try/except (catch ValueError/TypeError), fall back to logging.INFO (or a
configured default) on failure, and emit a warning via the logger (or print)
that includes the invalid env value and the exception; update the code around
self._log_level and setup_logging(self._log_level) to use the safe fallback so a
bad env value cannot crash the process.
🧹 Nitpick comments (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
929-937: Consider adding a clarifying comment for the inverted logic.Line 930 passes
not self._use_dask_threadsas theconfigure_loggingparameter. This inverted logic is correct (threads inherit logging from the parent process, while subprocesses need their own configuration), but it may not be immediately obvious to future maintainers.📝 Suggested comment
(_, job) = await self._job_store.submit_job( job_id=job_id, expiry_seconds=request.expiry_seconds, job_fn=run_generation, sync_timeout=request.sync_timeout, job_args=[ + # Configure logging only for subprocess workers (not threads) not self._use_dask_threads, self._log_level, self._scheduler_address,
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
🧰 Additional context used
📓 Path-based instructions (6)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.py: Follow PEP 20 and PEP 8 for Python style guidelines
Run yapf with PEP 8 base and 'column_limit = 120' for code formatting
Use 'ruff check --fix' for linting with configuration from 'pyproject.toml', fix warnings unless explicitly ignored
Use snake_case for functions and variables, PascalCase for classes, UPPER_CASE for constants
All public APIs require Python 3.11+ type hints on parameters and return values
Prefer 'collections.abc' / 'typing' abstractions (e.g., 'Sequence' over 'list') for type hints
Use 'typing.Annotated' for units or extra metadata when useful
Treat 'pyright' warnings (configured in 'pyproject.toml') as errors during development
Preserve stack traces and prevent duplicate logging when handling exceptions; use bare 'raise' statements when re-raising, and use 'logger.error()' for logging (not 'logger.exception()') to avoid duplicate stack trace output
When catching and logging exceptions without re-raising, always use 'logger.exception()' (equivalent to 'logger.error(exc_info=True)') to capture full stack trace information
Pydantic models using 'SecretStr', 'SerializableSecretStr', or 'OptionalSecretStr' should use 'default=None' for optional fields and 'default_factory=lambda: SerializableSecretStr("")' for non-optional fields to avoid initialization bugs
Provide Google-style docstrings for every public module, class, function and CLI command
The first line of docstrings must be a concise description ending with a period
Surround code entities in docstrings with backticks to avoid Vale false-positives
Validate and sanitise all user input, especially in web or CLI interfaces
Prefer 'httpx' with SSL verification enabled by default and follow OWASP Top-10 recommendations
Use 'async'/'await' for I/O-bound work (HTTP, DB, file reads)
Cache expensive computations with 'functools.lru_cache' or an external cache when appropriate
Leverage NumPy vectorised operations whenever beneficial and feasible
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
**/*.{py,yaml,yml,json,toml}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Indent with 4 spaces (never tabs) and ensure every file ends with a single newline
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
**/*.{py,js,ts,tsx,jsx,sh,yaml,yml,json,toml,md,mdx,rst}: Every file must start with the standard SPDX Apache-2.0 header
Confirm that copyright years are up-to-date whenever a file is changed
All source files must include the SPDX Apache-2.0 header template
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
**/*.{py,md,mdx,rst}
📄 CodeRabbit inference engine (.cursor/rules/general.mdc)
Version numbers are derived automatically by 'setuptools-scm'; never hard-code them in code or docs
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values (except for return values of
None,
in that situation no return type hint is needed).
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.
- Documentation in Markdown files should not contain usage of a possessive 's with inanimate objects
(ex: "the system's performance" should be "the performance of the system").- Documentation in Markdown files should not use NAT as an acronym, always spell out NeMo Agent Toolkit.
The exception to this rule is when referring to package names or code identifiers that contain "nat", th...
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
🧬 Code graph analysis (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (3)
src/nat/front_ends/fastapi/async_job.py (1)
run_generation(32-84)src/nat/utils/log_utils.py (1)
setup_logging(43-49)src/nat/cli/entrypoint.py (1)
setup_logging(46-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (6)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
58-58: LGTM! Imports align with the refactoring objectives.The imports correctly reference the newly centralized logging utilities and async job helpers.
Also applies to: 79-79
src/nat/front_ends/fastapi/fastapi_front_end_plugin.py (5)
16-16: LGTM! Imports are appropriate.The new imports support the logging configuration enhancements and async cleanup functionality.
Also applies to: 24-24, 32-32
66-74: LGTM! Cleanup task properly delegates to the async job helper.The increased log verbosity (info vs debug) and the use of the centralized
periodic_cleanupfunction align with the PR objectives.
86-86: LGTM! Log level capture and environment variable setup are correct.Capturing the effective log level early and propagating it via environment variables ensures consistent logging across main and worker processes.
Also applies to: 151-152
184-199: LGTM! Uvicorn logging configuration properly aligns with NeMo Agent Toolkit standards.The deep copy approach safely customizes the uvicorn logging format to include timestamps and match the centralized log format, ensuring consistent log output across all components.
49-49: LGTM! Dask threading control logic is correctly implemented.The
_use_dask_threadsflag properly controls whether Dask uses threads or processes, with appropriate conditional logic for process-specific setup (likesetsid()).Also applies to: 108-108, 111-111, 118-118
|
/merge |
* Move Dask task functions to their own module and update them to be self-contained. * Be a bit aggressive about deleting variables in the `run_generation` method. * Ensure that SQLAlchemy's `Session.remove()` is called in the event of an error * Move `nat.cli.entrypoint.setup_logging` to `nat.utils.log_utils` to be used outside of the CLI. * Configure logging in Dask tasks when using subprocesses * Configure uvicorn logging to match NAT * Log Dask details a bit more verbosely ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing/index.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added asynchronous job generation and a periodic cleanup task runnable via the distributed execution layer. * **Improvements** * Centralized logging setup with consistent timestamp format and propagated log level to server and workers. * Exposed configurable worker threading mode (thread vs process) via runtime environment. * Server startup now applies effective log settings and enhanced log formatting. * **Bug Fixes** * More robust job lifecycle cleanup with guaranteed resource handling and explicit reporting of expired-job counts. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Will Killian (https://github.com/willkill07) URL: NVIDIA#1374
Description
run_generationmethod.Session.remove()is called in the event of an errornat.cli.entrypoint.setup_loggingtonat.utils.log_utilsto be used outside of the CLI.By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.