Enable per-user workflow support in nat eval#1503
Enable per-user workflow support in nat eval#1503rapids-bot[bot] merged 4 commits intoNVIDIA:developfrom
Conversation
WalkthroughUpdated Eval data models to use Pydantic Field metadata, added/renamed fields (notably Changes
Sequence Diagram(s)sequenceDiagram
participant Runner as Eval Runner
participant SessionMgr as SessionManager
participant Session as Session
participant Workflow as Workflow/Tools
Runner->>SessionMgr: open/create session (user_id or per-item user_id)
SessionMgr-->>Session: returns session instance
Runner->>Session: session.run(workflow, input) -- (no runtime_type)
Session->>Workflow: execute workflow (tools, LLMs)
Workflow-->>Session: results
Session-->>Runner: execution result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
f139c05 to
0e39532
Compare
Signed-off-by: Eric Evans <194135482+ericevans-nv@users.noreply.github.com>
0e39532 to
62f3a29
Compare
…eature/per-user-eval-functions
…eature/per-user-eval-functions
willkill07
left a comment
There was a problem hiding this comment.
Thanks for adding Pydantic model information :)
Minor nit on generating a unique ID. Otherwise, let's rename this PR appropriately?
Co-authored-by: Will Killian <2007799+willkill07@users.noreply.github.com> Signed-off-by: Eric Evans II <194135482+ericevans-nv@users.noreply.github.com>
|
/merge |
AnuradhaKaruppiah
left a comment
There was a problem hiding this comment.
LGTM. thx for the changes @ericevans-nv
Description
Closes #1311
Fixes issue where stateful tools retain state across eval items, causing state contamination during
nat evalruns.Changes
per_input_user_idconfig option:eval.general.per_input_user_id(default:True) generates a unique user_id for each eval item ({user_id}_{item_id}_{uuid}). For per-user workflows, this creates a fresh workflow instance per eval item, resetting all stateful tools to their initial state. Set toFalseto disable this behavior.EvalCustomScriptConfig,JobManagementConfig,EvalOutputConfig,EvalGeneralConfig) to use PydanticFieldwith descriptions instead of inline comments.Usage
To reset stateful tools during evaluation, use a per-user workflow (decorated with
@register_per_user_function). Whenper_input_user_idis enabled (default), each eval item will get its own workflow instance with fresh state.By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.