Add latent-objective recognition eval to multi-objective skill#1442
Conversation
Signed-off-by: cafzal <cameron.afzal@gmail.com>
…002/004 house style) Signed-off-by: cafzal <cameron.afzal@gmail.com>
📝 WalkthroughWalkthroughA single new evaluation case, ChangesLatent Objective Eval Case
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@ramakrishnap-nv small skill eval – adds the latent-objective case (the under-trigger boundary the other four miss) to cuopt-multi-objective-exploration, mirroring the existing four. Ready when you/Miles have a cycle. |
|
/ok to test 8275bf7 |
|
/merge |
Remove one blank line after frontmatter to trigger NVSkills-Eval re-run with the latent-objective eval added in #1442. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
Removes one blank line after frontmatter to trigger NVSkills-Eval re-run with the latent-objective eval added in #1442. --------- Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com> Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
Description
Adds a fifth eval to the
cuopt-multi-objective-explorationskill —multiobj-explore-eval-005-latent-objective— covering the boundary the existing four don't: a problem stated with a single objective while a second objective sits latent in the data, unstated.The current evals all hand the agent both objectives (001 interpret, 002 explore, 004 dual-as-slope) or are explicitly single-objective (003 decoy). None test recognizing a latent objective. This one grades whether the skill makes the agent surface the latent cost objective and trace the supply-vs-cost frontier — rather than optimizing the stated objective alone or silently folding cost into a self-chosen weighted blend (
maximize supply − λ·cost). It brackets the skill's activation boundary opposite the 003 decoy.Behavioral eval (
expected_script: null, LLM-graded on the behavior list), same house style as 001/002/004;validate_skills.shpicks up the new array entry and the signature /BENCHMARK.md/ skill-card regenerate via NVSkills-Eval. The latent-objective shape is the max-supply supply-vs-cost case validated on cuOpt (Tesla T4) in NVIDIA/cuopt-examples#157.Checklist