Skip to content

Add latent-objective recognition eval to multi-objective skill#1442

Merged
rapids-bot[bot] merged 2 commits into
NVIDIA:mainfrom
cafzal:multiobj-latent-objective-eval
Jun 24, 2026
Merged

Add latent-objective recognition eval to multi-objective skill#1442
rapids-bot[bot] merged 2 commits into
NVIDIA:mainfrom
cafzal:multiobj-latent-objective-eval

Conversation

@cafzal

@cafzal cafzal commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Description

Adds a fifth eval to the cuopt-multi-objective-exploration skill — multiobj-explore-eval-005-latent-objective — covering the boundary the existing four don't: a problem stated with a single objective while a second objective sits latent in the data, unstated.

The current evals all hand the agent both objectives (001 interpret, 002 explore, 004 dual-as-slope) or are explicitly single-objective (003 decoy). None test recognizing a latent objective. This one grades whether the skill makes the agent surface the latent cost objective and trace the supply-vs-cost frontier — rather than optimizing the stated objective alone or silently folding cost into a self-chosen weighted blend (maximize supply − λ·cost). It brackets the skill's activation boundary opposite the 003 decoy.

Behavioral eval (expected_script: null, LLM-graded on the behavior list), same house style as 001/002/004; validate_skills.sh picks up the new array entry and the signature / BENCHMARK.md / skill-card regenerate via NVSkills-Eval. The latent-objective shape is the max-supply supply-vs-cost case validated on cuOpt (Tesla T4) in NVIDIA/cuopt-examples#157.

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Signed-off-by: cafzal <cameron.afzal@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

…002/004 house style)

Signed-off-by: cafzal <cameron.afzal@gmail.com>
@cafzal cafzal marked this pull request as ready for review June 18, 2026 17:26
@cafzal cafzal requested a review from a team as a code owner June 18, 2026 17:26
@cafzal cafzal requested a review from tmckayus June 18, 2026 17:26
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

A single new evaluation case, multiobj-explore-eval-005-latent-objective, is appended to the evals array in skills/cuopt-multi-objective-exploration/evals/evals.json. The case defines a multi-period production planning problem with unstated cost objectives and specifies expected skill behavior for latent-objective discovery via epsilon-constraint Pareto tracing.

Changes

Latent Objective Eval Case

Layer / File(s) Summary
New latent-objective evaluation case
skills/cuopt-multi-objective-exploration/evals/evals.json
Adds multiobj-explore-eval-005-latent-objective with a question describing a production planning scenario where cost data is present but not in the stated objective. Specifies expected outputs: epsilon-constraint Pareto frontier tracing, exchange-rate estimation via adjacent-point differencing (MILP/no duals), interpretable operating points with knee flagging, and exclusion of single-plan collapse or self-chosen weighted-sum behaviors.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • NVIDIA/cuopt#1406: Updates cuopt-multi-objective-exploration evaluation expectations and documentation on exchange-rate derivation (dual when available, otherwise differencing), directly matching the method specified in the new eval case.

Suggested labels

non-breaking, improvement

Suggested reviewers

  • mlubin
  • rgsl888prabhu
  • tmckayus
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: adding a new evaluation case for latent-objective recognition to the multi-objective exploration skill.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly explains the new evaluation case, its purpose, and how it complements existing evaluations by testing latent objective recognition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@cafzal

cafzal commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

@ramakrishnap-nv small skill eval – adds the latent-objective case (the under-trigger boundary the other four miss) to cuopt-multi-objective-exploration, mirroring the existing four. Ready when you/Miles have a cycle.

@ramakrishnap-nv ramakrishnap-nv added non-breaking Introduces a non-breaking change improvement Improves an existing functionality Agentic This label is used to track agentic and skill related issues labels Jun 23, 2026
@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator

/ok to test 8275bf7

@ramakrishnap-nv

Copy link
Copy Markdown
Collaborator

/merge

@rapids-bot rapids-bot Bot merged commit b874238 into NVIDIA:main Jun 24, 2026
46 checks passed
ramakrishnap-nv added a commit that referenced this pull request Jun 24, 2026
Remove one blank line after frontmatter to trigger NVSkills-Eval
re-run with the latent-objective eval added in #1442.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
ramakrishnap-nv added a commit that referenced this pull request Jun 25, 2026
Removes one blank line after frontmatter to trigger NVSkills-Eval re-run
with the latent-objective eval added in #1442.

---------

Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
Signed-off-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: nvskills-svc-account <svc-nvskills-signing@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agentic This label is used to track agentic and skill related issues improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants