Skip to content

Enable per-user workflow support in nat eval#1503

Merged
rapids-bot[bot] merged 4 commits intoNVIDIA:developfrom
ericevans-nv:feature/per-user-eval-functions
Jan 29, 2026
Merged

Enable per-user workflow support in nat eval#1503
rapids-bot[bot] merged 4 commits intoNVIDIA:developfrom
ericevans-nv:feature/per-user-eval-functions

Conversation

@ericevans-nv
Copy link
Contributor

@ericevans-nv ericevans-nv commented Jan 28, 2026

Description

Closes #1311

Fixes issue where stateful tools retain state across eval items, causing state contamination during nat eval runs.

Changes

  • New per_input_user_id config option: eval.general.per_input_user_id (default: True) generates a unique user_id for each eval item ({user_id}_{item_id}_{uuid}). For per-user workflows, this creates a fresh workflow instance per eval item, resetting all stateful tools to their initial state. Set to False to disable this behavior.
  • Eval data model improvements: Updated eval config classes (EvalCustomScriptConfig, JobManagementConfig, EvalOutputConfig, EvalGeneralConfig) to use Pydantic Field with descriptions instead of inline comments.

Usage

To reset stateful tools during evaluation, use a per-user workflow (decorated with @register_per_user_function). When per_input_user_id is enabled (default), each eval item will get its own workflow instance with fresh state.

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Added workflow alias for display in the evaluation UI
    • Added per-item user ID generation to isolate state for each evaluation input
  • Improvements

    • Validates LLM endpoints before runs to catch deployment issues early
    • Evaluation configuration fields now include descriptive metadata and safer defaults
    • Renamed isolate_workflow_state to per_input_user_id and adjusted default behavior

✏️ Tip: You can customize this high-level summary in your review settings.

@ericevans-nv ericevans-nv self-assigned this Jan 28, 2026
@ericevans-nv ericevans-nv requested a review from a team as a code owner January 28, 2026 16:43
@ericevans-nv ericevans-nv added improvement Improvement to existing functionality non-breaking Non-breaking change labels Jan 28, 2026
@ericevans-nv ericevans-nv changed the title Reset workflow state for functions and function groups during eval runs Fix stateful tools not resetting during eval runs Jan 28, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Walkthrough

Updated Eval data models to use Pydantic Field metadata, added/renamed fields (notably per_input_user_id), and changed eval execution to optionally generate a per-item unique user_id for session isolation; removed passing runtime_type to session.run.

Changes

Cohort / File(s) Change Summary
Data model updates
src/nat/data_models/evaluate.py
Converted public attributes to Field(...) with default/default_factory and descriptions across EvalCustomScriptConfig, JobManagementConfig, EvalOutputConfig, EvalGeneralConfig, and EvalConfig. Replaced mutable defaults with default_factory. Added workflow_alias. Renamed isolate_workflow_stateper_input_user_id (default semantics adjusted).
Eval execution logic
src/nat/eval/evaluate.py
When eval_config.general.per_input_user_id is enabled, derive a per-item user_id (base user_id + item.id + uuid4) and open the session with it in run_one and run_workflow_local. Removed runtime_type import/argument and no longer pass runtime_type to session.run.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Eval Runner
    participant SessionMgr as SessionManager
    participant Session as Session
    participant Workflow as Workflow/Tools
    Runner->>SessionMgr: open/create session (user_id or per-item user_id)
    SessionMgr-->>Session: returns session instance
    Runner->>Session: session.run(workflow, input)  -- (no runtime_type)
    Session->>Workflow: execute workflow (tools, LLMs)
    Workflow-->>Session: results
    Session-->>Runner: execution result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 70.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR implements all coding requirements from #1311: adds per-input user_id mechanism to reset stateful tools per eval item and supports optional disabling of isolation.
Out of Scope Changes check ✅ Passed Field metadata updates to eval config classes improve documentation and align with code quality standards without introducing out-of-scope functionality changes.
Title check ✅ Passed The title 'Enable per-user workflow support in nat eval' directly and clearly describes the main purpose of the PR: enabling per-user workflow support for the nat eval functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ericevans-nv ericevans-nv force-pushed the feature/per-user-eval-functions branch 2 times, most recently from f139c05 to 0e39532 Compare January 28, 2026 22:13
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
Copy link
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding Pydantic model information :)

Minor nit on generating a unique ID. Otherwise, let's rename this PR appropriately?

Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com>
Signed-off-by: Eric Evans II <194135482+ericevans-nv@users.noreply.github.com>
@ericevans-nv ericevans-nv changed the title Fix stateful tools not resetting during eval runs Enable per-user workflow support in nat eval Jan 29, 2026
@ericevans-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 270a517 into NVIDIA:develop Jan 29, 2026
16 of 17 checks passed
Copy link
Contributor

@AnuradhaKaruppiah AnuradhaKaruppiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thx for the changes @ericevans-nv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement to existing functionality non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

re-initialize stateful tools during nat eval ...

3 participants