Add a simple evaluate_item endpoint#1138
Conversation
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
WalkthroughAdds single-item evaluation: two example scripts (full and simple), new FastAPI request/response models and a POST /evaluate/item endpoint, evaluator initialization/cleanup and route wiring in the plugin worker, and tests covering success, not-found, error, and validation cases. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant FastAPI
participant PluginWorker
participant Evaluator
Client->>FastAPI: POST /evaluate/item (EvaluateItemRequest)
FastAPI->>PluginWorker: dispatch to evaluate_item handler
PluginWorker->>PluginWorker: lookup evaluator in _evaluators
alt evaluator found
PluginWorker->>Evaluator: await evaluate_fn(item)
Evaluator-->>PluginWorker: evaluation result (EvalOutputItem)
PluginWorker-->>FastAPI: EvaluateItemResponse(success=true, result)
FastAPI-->>Client: 200 OK
else evaluator not found
PluginWorker-->>FastAPI: 404 Not Found (error)
FastAPI-->>Client: 404
else evaluator raised
PluginWorker-->>FastAPI: EvaluateItemResponse(success=false, error=...)
FastAPI-->>Client: 200 OK (error payload)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py (1)
121-127: Simplifylogger.exceptioncalls and drop redundant exception argumentsIn the JSON parsing and
aiohttp.ClientErrorhandlers you’re passing the exception object intologger.exceptionwhile also getting the stack trace fromexc_info, which Ruff flags (TRY401) and is unnecessary.For example:
except json.JSONDecodeError as e: logger.exception("Failed to parse response: %s", e)and similar patterns at Lines 129–130 and 199–200.
You can simplify to:
- except json.JSONDecodeError as e: - logger.exception("Failed to parse response: %s", e) + except json.JSONDecodeError: + logger.exception("Failed to parse response") - except aiohttp.ClientError as e: - logger.exception("Request failed: %s", e) + except aiohttp.ClientError: + logger.exception("Request failed") - except aiohttp.ClientError as e: - logger.exception("Evaluation request failed: %s", e) + except aiohttp.ClientError: + logger.exception("Evaluation request failed")This matches the exception-handling guideline and removes redundant arguments.
Also applies to: 129-135, 199-202
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py (1)
160-165: Tightenlogger.exceptionusage to avoid redundant exception argumentsSimilar to the simple script, several exception handlers pass the exception object into
logger.exception, which already logs the stack trace and doesn’t need the extra argument. Ruff flags these as TRY401.Suggested edits:
- except json.JSONDecodeError as e: - logger.exception("Failed to parse generate response chunk: %s", e) + except json.JSONDecodeError: + logger.exception("Failed to parse generate response chunk") - except (json.JSONDecodeError, ValidationError) as e: - logger.exception("Failed to parse intermediate step: %s", e) + except (json.JSONDecodeError, ValidationError): + logger.exception("Failed to parse intermediate step") - except aiohttp.ClientError as e: - logger.exception("Request failed: %s", e) + except aiohttp.ClientError: + logger.exception("Request failed") - except aiohttp.ClientError as e: - logger.exception("Evaluation request failed: %s", e) + except aiohttp.ClientError: + logger.exception("Evaluation request failed")This keeps the full stack trace while simplifying the logging calls.
Also applies to: 181-183, 185-191, 261-263
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
239-270: Uselogger.exceptionin eval-init/cleanup error paths and add return type hintsIn
initialize_evaluatorsandcleanup_evaluators, exceptions are caught and logged withlogger.error, and the methods are unannotated:async def initialize_evaluators(self, config: Config): ... except Exception as e: logger.error(f"Failed to initialize evaluators: {e}") self._evaluators = {} async def cleanup_evaluators(self): ... except Exception as e: logger.error(f"Error cleaning up evaluator builder: {e}")Given these blocks swallow the exception and don’t re-raise, the logging guideline suggests
logger.exceptionto capture the stack trace. Also, adding explicit return types would align with the project’s typing guidance.Suggested changes:
- async def initialize_evaluators(self, config: Config): + async def initialize_evaluators(self, config: Config) -> None: @@ - except Exception as e: - logger.error(f"Failed to initialize evaluators: {e}") + except Exception: + logger.exception("Failed to initialize evaluators") # Don't fail startup, just log the error self._evaluators = {} @@ - async def cleanup_evaluators(self): + async def cleanup_evaluators(self) -> None: @@ - except Exception as e: - logger.error(f"Error cleaning up evaluator builder: {e}") + except Exception: + logger.exception("Error cleaning up evaluator builder")You may also optionally add
-> Noneto the newconfigureandadd_evaluate_item_routemethods for consistency.Also applies to: 271-282
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py(1 hunks)examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py(1 hunks)src/nat/front_ends/fastapi/fastapi_front_end_config.py(3 hunks)src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py(5 hunks)tests/nat/front_ends/fastapi/test_evaluate_endpoints.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
tests/nat/front_ends/fastapi/test_evaluate_endpoints.pyexamples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.pysrc/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pyexamples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.pysrc/nat/front_ends/fastapi/fastapi_front_end_config.py
tests/**/*.py
⚙️ CodeRabbit configuration file
tests/**/*.py: - Ensure that tests are comprehensive, cover edge cases, and validate the functionality of the code. - Test functions should be named using thetest_prefix, using snake_case. - Any frequently repeated code should be extracted into pytest fixtures. - Pytest fixtures should define the name argument when applying the pytest.fixture decorator. The fixture
function being decorated should be named using thefixture_prefix, using snake_case. Example:
@pytest.fixture(name="my_fixture")
def fixture_my_fixture():
pass
Files:
tests/nat/front_ends/fastapi/test_evaluate_endpoints.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.pyexamples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.pysrc/nat/front_ends/fastapi/fastapi_front_end_config.py
🧬 Code graph analysis (5)
tests/nat/front_ends/fastapi/test_evaluate_endpoints.py (3)
src/nat/eval/evaluator/evaluator_model.py (3)
EvalInput(46-47)EvalOutput(56-58)EvalOutputItem(50-53)src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (2)
config(127-128)add_evaluate_item_route(500-561)src/nat/front_ends/fastapi/fastapi_front_end_config.py (1)
EndpointBase(156-179)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py (1)
src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(482-494)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (4)
src/nat/builder/eval_builder.py (3)
WorkflowEvalBuilder(43-166)populate_builder(135-158)get_evaluator(69-74)src/nat/eval/evaluator/evaluator_model.py (1)
EvalInput(46-47)src/nat/front_ends/fastapi/fastapi_front_end_config.py (2)
EvaluateItemRequest(138-141)EvaluateItemResponse(144-148)src/nat/runtime/session.py (3)
config(88-89)SessionManager(47-226)session(100-135)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py (1)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py (1)
main(267-294)
src/nat/front_ends/fastapi/fastapi_front_end_config.py (1)
src/nat/eval/evaluator/evaluator_model.py (2)
EvalInputItem(23-43)EvalOutputItem(50-53)
🪛 Ruff (0.14.4)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py
1-1: The file is executable but no shebang is present
(EXE002)
164-164: Redundant exception object included in logging.exception call
(TRY401)
182-182: Redundant exception object included in logging.exception call
(TRY401)
186-186: Redundant exception object included in logging.exception call
(TRY401)
187-187: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
188-188: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
189-191: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
261-261: Redundant exception object included in logging.exception call
(TRY401)
262-262: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
263-263: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
266-266: Do not catch blind exception: Exception
(BLE001)
267-267: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
277-277: Do not catch blind exception: Exception
(BLE001)
278-278: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
530-530: Use explicit conversion flag
Replace with conversion flag
(RUF010)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
126-126: Redundant exception object included in logging.exception call
(TRY401)
130-130: Redundant exception object included in logging.exception call
(TRY401)
131-131: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
132-132: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
133-135: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
200-200: Redundant exception object included in logging.exception call
(TRY401)
201-201: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (3)
tests/nat/front_ends/fastapi/test_evaluate_endpoints.py (1)
266-404: Evaluate-item test coverage looks solid and aligned with the API contractThe new fixtures and tests comprehensively exercise
/evaluate/item:
- Success path with score and reasoning.
- 404 for unknown evaluator.
- Evaluator exception mapped to
success == Falsewith an error message.- 422 for invalid payload.
Using a dedicated
evaluate_item_clientandevaluate_item_client_with_errorkeeps concerns nicely isolated, and asserting on both HTTP status codes and body fields matches the intended endpoint semantics. No issues from a correctness or style standpoint.src/nat/front_ends/fastapi/fastapi_front_end_config.py (1)
30-31: Evaluate-item models and endpoint wiring are consistent and type-safe
EvaluateItemRequest/EvaluateItemResponsecorrectly reuseEvalInputItemandEvalOutputItem, and theevaluate_itemendpoint definition matches the route path/method used by the worker and tests. The field descriptions are clear and will generate sensible OpenAPI docs. No changes needed here.Also applies to: 138-149, 257-262
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
300-303: Raise HTTPException(500) on evaluator errors; fix lazy logging formattingThe route declares 500 in its OpenAPI responses but never raises it. On exception, change lines 529–530 to:
- except Exception as e: - logger.exception(f"Error evaluating item with {request.evaluator_name}") - return EvaluateItemResponse(success=False, result=None, error=f"Evaluation failed: {str(e)}") + except Exception as e: + logger.exception("Error evaluating item with %s", request.evaluator_name) + raise HTTPException(status_code=500, detail=f"Evaluation failed: {str(e)}") from eThis aligns with FastAPI best practices and the documented response schema. The 200+success=False pattern is an anti-pattern for single-item endpoints; use proper HTTP semantics (5xx for server errors) so clients, intermediaries, and observability tools respond correctly.
Likely an incorrect or invalid review comment.
...g/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
Outdated
Show resolved
Hide resolved
...rofiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py
Outdated
Show resolved
Hide resolved
...rofiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py
Outdated
Show resolved
Hide resolved
...g/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
235-281: Tighten evaluator lifecycle logging and type hintsThe evaluator lifecycle wiring is sound, but there are a couple of small cleanups worth doing:
- In
initialize_evaluatorsandcleanup_evaluatorsyou catchExceptionand don’t re-raise; per the project’s exception-handling guidelines and Ruff hints, uselogger.exception(...)instead oflogger.error(...)so the stack trace is preserved.- Consider adding explicit return type hints (
-> None) toinitialize_evaluatorsandcleanup_evaluatorsto match the “all methods typed” guideline.Example diff:
- async def initialize_evaluators(self, config: Config): + async def initialize_evaluators(self, config: Config) -> None: @@ - except Exception as e: - logger.error(f"Failed to initialize evaluators: {e}") + except Exception: + logger.exception("Failed to initialize evaluators") # Don't fail startup, just log the error self._evaluators = {} @@ - async def cleanup_evaluators(self): + async def cleanup_evaluators(self) -> None: @@ - except Exception as e: - logger.error(f"Error cleaning up evaluator builder: {e}") + except Exception: + logger.exception("Error cleaning up evaluator builder")If you intend to keep the broad
except Exceptionhere to avoid failing startup/shutdown, a brief comment or# noqa: BLE001with rationale would also silence Ruff without changing behavior.
🧹 Nitpick comments (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (1)
305-305: Minor note: extrabuilder.build()for the new routeAdding
await self.add_evaluate_item_route(app, SessionManager(await builder.build()))follows the existing pattern used for the other routes; it does mean one morebuilder.build()call at startup. If workflow construction becomes expensive, consider sharing a singleSessionManagerinstance across related routes in a future cleanup, but this is fine for now.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
src/nat/**/*
⚙️ CodeRabbit configuration file
This directory contains the core functionality of the toolkit. Changes should prioritize backward compatibility.
Files:
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
🧬 Code graph analysis (1)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (5)
src/nat/builder/eval_builder.py (3)
WorkflowEvalBuilder(43-166)populate_builder(135-158)get_evaluator(69-74)src/nat/eval/evaluator/evaluator_model.py (1)
EvalInput(46-47)src/nat/front_ends/fastapi/fastapi_front_end_config.py (2)
EvaluateItemRequest(138-141)EvaluateItemResponse(144-148)src/nat/runtime/session.py (3)
config(88-89)SessionManager(47-226)session(100-135)src/nat/eval/rag_evaluator/register.py (1)
evaluate_fn(110-117)
🪛 Ruff (0.14.4)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py
266-266: Do not catch blind exception: Exception
(BLE001)
267-267: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
277-277: Do not catch blind exception: Exception
(BLE001)
278-278: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
533-533: Use explicit conversion flag
Replace with conversion flag
(RUF010)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
🔇 Additional comments (2)
src/nat/front_ends/fastapi/fastapi_front_end_plugin_worker.py (2)
42-43: New eval-related imports look correct and consistentThe added imports (
WorkflowEvalBuilder,EvaluatorInfo,EvalInput,EvaluateItemRequest/Response) are all used in the new evaluator lifecycle and/evaluate/itemendpoint and fit the existing import style. No changes needed here.Also applies to: 56-56, 62-63
292-297: Evaluator init + shutdown handler wiring looks goodInitializing evaluators in
configureand registeringcleanup_evaluatorson FastAPI"shutdown"closes the loop on the async context forWorkflowEvalBuilderand avoids long-lived resource leaks. This also addresses the previous review’s concern about missing cleanup of the eval builder.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py (1)
129-129: Remove redundant exception objects fromlogger.exception()calls.The
logger.exception()method automatically includes the full exception traceback in the log output, so passing the exception object as a format argument is redundant and can result in the exception appearing twice in the logs.Apply this diff:
- logger.exception("Failed to parse response: %s", e) + logger.exception("Failed to parse response")- logger.exception("Request failed: %s", e) + logger.exception("Request failed")- logger.exception("Evaluation request failed: %s", e) + logger.exception("Evaluation request failed")Also applies to: 133-133, 203-203
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py (1)
164-164: Remove redundant exception objects fromlogger.exception()calls.The
logger.exception()method automatically includes the full exception traceback, so passing the exception object as a format argument is redundant and can result in duplicate exception information in the logs.Apply this diff:
- logger.exception("Failed to parse generate response chunk: %s", e) + logger.exception("Failed to parse generate response chunk")- logger.exception("Failed to parse intermediate step: %s", e) + logger.exception("Failed to parse intermediate step")- logger.exception("Request failed: %s", e) + logger.exception("Request failed")- logger.exception("Evaluation request failed: %s", e) + logger.exception("Evaluation request failed")Also applies to: 182-182, 186-186, 261-261
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py(1 hunks)examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*
⚙️ CodeRabbit configuration file
**/*: # Code Review Instructions
- Ensure the code follows best practices and coding standards. - For Python code, follow
PEP 20 and
PEP 8 for style guidelines.- Check for security vulnerabilities and potential issues. - Python methods should use type hints for all parameters and return values.
Example:def my_function(param1: int, param2: str) -> bool: pass- For Python exception handling, ensure proper stack trace preservation:
- When re-raising exceptions: use bare
raisestatements to maintain the original stack trace,
and uselogger.error()(notlogger.exception()) to avoid duplicate stack trace output.- When catching and logging exceptions without re-raising: always use
logger.exception()
to capture the full stack trace information.Documentation Review Instructions - Verify that documentation and comments are clear and comprehensive. - Verify that the documentation doesn't contain any TODOs, FIXMEs or placeholder text like "lorem ipsum". - Verify that the documentation doesn't contain any offensive or outdated terms. - Verify that documentation and comments are free of spelling mistakes, ensure the documentation doesn't contain any
words listed in the
ci/vale/styles/config/vocabularies/nat/reject.txtfile, words that might appear to be
spelling mistakes but are listed in theci/vale/styles/config/vocabularies/nat/accept.txtfile are OK.Misc. - All code (except .mdc files that contain Cursor rules) should be licensed under the Apache License 2.0,
and should contain an Apache License 2.0 header comment at the top of each file.
- Confirm that copyright years are up-to date whenever a file is changed.
Files:
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.pyexamples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
examples/**/*
⚙️ CodeRabbit configuration file
examples/**/*: - This directory contains example code and usage scenarios for the toolkit, at a minimum an example should
contain a README.md or file README.ipynb.
- If an example contains Python code, it should be placed in a subdirectory named
src/and should
contain apyproject.tomlfile. Optionally, it might also contain scripts in ascripts/directory.- If an example contains YAML files, they should be placed in a subdirectory named
configs/. - If an example contains sample data files, they should be placed in a subdirectory nameddata/, and should
be checked into git-lfs.
Files:
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.pyexamples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
🧬 Code graph analysis (2)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py (3)
src/nat/data_models/api_server.py (1)
ResponseIntermediateStep(482-494)src/nat/data_models/intermediate_step.py (3)
IntermediateStep(235-310)IntermediateStepPayload(130-232)UUID(301-302)examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py (1)
main(208-233)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py (1)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py (1)
main(267-294)
🪛 Ruff (0.14.4)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item.py
1-1: The file is executable but no shebang is present
(EXE002)
164-164: Redundant exception object included in logging.exception call
(TRY401)
182-182: Redundant exception object included in logging.exception call
(TRY401)
186-186: Redundant exception object included in logging.exception call
(TRY401)
187-187: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
188-188: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
189-191: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
261-261: Redundant exception object included in logging.exception call
(TRY401)
262-262: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
263-263: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
examples/evaluation_and_profiling/simple_web_query_eval/src/nat_simple_web_query_eval/scripts/evaluate_single_item_simple.py
129-129: Redundant exception object included in logging.exception call
(TRY401)
133-133: Redundant exception object included in logging.exception call
(TRY401)
134-134: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
135-135: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
136-138: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
203-203: Redundant exception object included in logging.exception call
(TRY401)
204-204: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: CI Pipeline / Check
|
/merge |
This PR adds a new /evaluate/item endpoint for synchronous single-item evaluation, enabling quick testing and debugging of evaluators without running full dataset evaluations. **Changes** 1. API Endpoint (/evaluate/item) - Method: POST - Purpose: Evaluate a single item with a specified evaluator (synchronous response) - Use cases: Interactive testing, debugging, real-time evaluation - No Dask required: Works without async job infrastructure 2. Route Structure (Nested) - POST /evaluate/item → Single item evaluation (sync, immediate response) 3. Implementation - Added add_evaluate_item_route() in fastapi_front_end_plugin_worker.py - Evaluator initialization via WorkflowEvalBuilder on server startup - Request/response models: EvaluateItemRequest, EvaluateItemResponse 4. Example Scripts - evaluate_single_item.py - Full version with trajectory processing - evaluate_single_item_simple.py - Simplified version without trajectory 5. Tests (test_evaluate_endpoints.py) 6. Docs deferred (evaluate_api.md will be going through more changes before next rel) ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/NVIDIA/NeMo-Agent-Toolkit/blob/develop/docs/source/resources/contributing.md). - We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. - Any contribution which contains commits that are not Signed-Off will not be accepted. - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. ## Summary by CodeRabbit * **New Features** * Added a single-item evaluation endpoint (/evaluate/item) to evaluate an individual item with a named evaluator. * Added two example scripts showcasing end-to-end and minimal single-item evaluation workflows against the server, with streaming handling and evaluation reporting. * **Tests** * Added tests covering success, evaluator-not-found (404), evaluator runtime errors, and invalid-payload validation for the single-item evaluation endpoint. Authors: - Anuradha Karuppiah (https://github.com/AnuradhaKaruppiah) Approvers: - David Gardner (https://github.com/dagardner-nv) - Michael Demoret (https://github.com/mdemoret-nv) URL: NVIDIA#1138 Signed-off-by: Sangharsh Aglave <aglave@synopsys.com>
Description
This PR adds a new /evaluate/item endpoint for synchronous single-item evaluation, enabling quick testing and debugging of evaluators without running full dataset evaluations.
Changes
By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Tests