Enable per-user workflow support in nat eval by ericevans-nv · Pull Request #1503 · NVIDIA/NeMo-Agent-Toolkit

ericevans-nv · 2026-01-28T16:43:09Z

Description

Fixes issue where stateful tools retain state across eval items, causing state contamination during nat eval runs.

Changes

New per_input_user_id config option: eval.general.per_input_user_id (default: True) generates a unique user_id for each eval item ({user_id}_{item_id}_{uuid}). For per-user workflows, this creates a fresh workflow instance per eval item, resetting all stateful tools to their initial state. Set to False to disable this behavior.
Eval data model improvements: Updated eval config classes (EvalCustomScriptConfig, JobManagementConfig, EvalOutputConfig, EvalGeneralConfig) to use Pydantic Field with descriptions instead of inline comments.

Usage

To reset stateful tools during evaluation, use a per-user workflow (decorated with @register_per_user_function). When per_input_user_id is enabled (default), each eval item will get its own workflow instance with fresh state.

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

New Features
- Added workflow alias for display in the evaluation UI
- Added per-item user ID generation to isolate state for each evaluation input
Improvements
- Validates LLM endpoints before runs to catch deployment issues early
- Evaluation configuration fields now include descriptive metadata and safer defaults
- Renamed isolate_workflow_state to per_input_user_id and adjusted default behavior

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-28T16:43:34Z

Walkthrough

Updated Eval data models to use Pydantic Field metadata, added/renamed fields (notably per_input_user_id), and changed eval execution to optionally generate a per-item unique user_id for session isolation; removed passing runtime_type to session.run.

Changes

Cohort / File(s)	Change Summary
Data model updates `src/nat/data_models/evaluate.py`	Converted public attributes to `Field(...)` with `default`/`default_factory` and descriptions across `EvalCustomScriptConfig`, `JobManagementConfig`, `EvalOutputConfig`, `EvalGeneralConfig`, and `EvalConfig`. Replaced mutable defaults with `default_factory`. Added `workflow_alias`. Renamed `isolate_workflow_state` → `per_input_user_id` (default semantics adjusted).
Eval execution logic `src/nat/eval/evaluate.py`	When `eval_config.general.per_input_user_id` is enabled, derive a per-item `user_id` (base `user_id` + `item.id` + `uuid4`) and open the session with it in `run_one` and `run_workflow_local`. Removed `runtime_type` import/argument and no longer pass `runtime_type` to `session.run`.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Eval Runner
    participant SessionMgr as SessionManager
    participant Session as Session
    participant Workflow as Workflow/Tools
    Runner->>SessionMgr: open/create session (user_id or per-item user_id)
    SessionMgr-->>Session: returns session instance
    Runner->>Session: session.run(workflow, input)  -- (no runtime_type)
    Session->>Workflow: execute workflow (tools, LLMs)
    Workflow-->>Session: results
    Session-->>Runner: execution result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.37% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	The PR implements all coding requirements from `#1311`: adds per-input user_id mechanism to reset stateful tools per eval item and supports optional disabling of isolation.
Out of Scope Changes check	✅ Passed	Field metadata updates to eval config classes improve documentation and align with code quality standards without introducing out-of-scope functionality changes.
Title check	✅ Passed	The title 'Enable per-user workflow support in nat eval' directly and clearly describes the main purpose of the PR: enabling per-user workflow support for the nat eval functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

src/nat/builder/eval_per_user_builder.py

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>

…eature/per-user-eval-functions

willkill07

Thanks for adding Pydantic model information :)

Minor nit on generating a unique ID. Otherwise, let's rename this PR appropriately?

src/nat/eval/evaluate.py

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Eric Evans II <194135482+ericevans-nv@users.noreply.github.com>

ericevans-nv · 2026-01-29T00:52:37Z

/merge

AnuradhaKaruppiah

LGTM. thx for the changes @ericevans-nv

ericevans-nv self-assigned this Jan 28, 2026

ericevans-nv requested a review from a team as a code owner January 28, 2026 16:43

ericevans-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jan 28, 2026

ericevans-nv changed the title ~~Reset workflow state for functions and function groups during eval runs~~ Fix stateful tools not resetting during eval runs Jan 28, 2026

AnuradhaKaruppiah reviewed Jan 28, 2026

View reviewed changes

src/nat/builder/eval_per_user_builder.py Outdated Show resolved Hide resolved

ericevans-nv force-pushed the feature/per-user-eval-functions branch 2 times, most recently from f139c05 to 0e39532 Compare January 28, 2026 22:13

Enable per-user workflows during eval

62f3a29

Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>

ericevans-nv force-pushed the feature/per-user-eval-functions branch from 0e39532 to 62f3a29 Compare January 28, 2026 22:25

ericevans-nv mentioned this pull request Jan 28, 2026

re-initialize stateful tools during nat eval ... #1311

Closed

2 tasks

ericevans-nv added 2 commits January 28, 2026 16:42

Merge branch 'develop' of github.com:NVIDIA/NeMo-Agent-Toolkit into f…

7b62bfb

…eature/per-user-eval-functions

Merge branch 'develop' of github.com:NVIDIA/NeMo-Agent-Toolkit into f…

1200321

…eature/per-user-eval-functions

willkill07 approved these changes Jan 29, 2026

View reviewed changes

src/nat/eval/evaluate.py Outdated Show resolved Hide resolved

Update src/nat/eval/evaluate.py

6bed27a

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Eric Evans II <194135482+ericevans-nv@users.noreply.github.com>

ericevans-nv changed the title ~~Fix stateful tools not resetting during eval runs~~ Enable per-user workflow support in nat eval Jan 29, 2026

rapids-bot bot merged commit 270a517 into NVIDIA:develop Jan 29, 2026
16 of 17 checks passed

AnuradhaKaruppiah reviewed Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable per-user workflow support in nat eval#1503

Enable per-user workflow support in nat eval#1503
rapids-bot[bot] merged 4 commits intoNVIDIA:developfrom
ericevans-nv:feature/per-user-eval-functions

ericevans-nv commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

willkill07 left a comment

Uh oh!

Uh oh!

ericevans-nv commented Jan 29, 2026

Uh oh!

Uh oh!

AnuradhaKaruppiah left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericevans-nv commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Usage

By Submitting this PR I confirm:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

Uh oh!

willkill07 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericevans-nv commented Jan 29, 2026

Uh oh!

Uh oh!

AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericevans-nv commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading