skills: add evals/evals.json smoke suite (group B) by rgsl888prabhu · Pull Request #1309 · NVIDIA/cuopt

rgsl888prabhu · 2026-05-27T20:13:14Z

Summary

Split from skills: add evals/evals.json smoke suite #1302 — adds evals/evals.json for 4 skills (group B).
Each skill gets one happy-path Q&A entry parallel to its existing benchmark/ directory.
API-skill questions ask for an ordered method-name list rather than runnable code, so scoring is pure text-pattern matching (no cuopt/cudf install or execution required).

Adds one happy-path Q&A entry per skill, parallel to the existing benchmark/ directories. Split from #1302 to keep CI runtime per PR under the job timeout. This PR covers: - cuopt-developer - cuopt-numerical-optimization-api-c - cuopt-server-common - cuopt-install Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-27T20:14:23Z

/nvskills-ci

coderabbitai · 2026-05-27T20:16:33Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds four evaluation JSON fixtures (developer, install, C API, server-common) and updates skill docs to point to canonical verification/build references and to clarify contributor activation and server request-flow descriptions.

Changes

Skill Evaluation Specifications

Layer / File(s)	Summary
Evaluation JSON specs `skills/cuopt-developer/evals/evals.json`, `skills/cuopt-install/evals/evals.json`, `skills/cuopt-numerical-optimization-api-c/evals/evals.json`, `skills/cuopt-server-common/evals/evals.json`	Adds four evaluation JSON entries: contributor workflow (`dev-eval-001-first-time-contributor-workflow`), Python install for CUDA12 (`inst-eval-001-python-install-cuda12`), C MILP API call sequence (`numopt-c-eval-001-milp-api-call-sequence`), and REST server async request flow (`srv-common-eval-001-request-flow`).
Docs: point to canonical verification/build references `skills/cuopt-install/SKILL.md`, `skills/cuopt-numerical-optimization-api-c/references/examples.md`	Replaces inline `find`/build/run snippets with pointers to `references/verification_examples.md` and `assets/README.md` respectively for canonical verification and example build/run instructions.
Skill-card scope and overview updates `skills/cuopt-developer/skill-card.md`, `skills/cuopt-server-common/skill-card.md`	Clarifies developer activation triggers (PR/CI/build/test/sign-off workflows) and updates server-common conceptual overview to describe async submit/poll lifecycle and supported problem types.

Estimated code review effort: 🎯 3 (Moderate) | ⏱️ ~20 minutes
Possibly related PRs:
- NVIDIA/cuopt#1301: Similar changes updating skills/cuopt-install/SKILL.md to reference canonical verification resources.
- NVIDIA/cuopt#1176: Related cuopt-developer evaluation fixture updates for contributor workflow.
Suggested reviewers:
- Iroy30
- tmckayus

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main change: adding evaluation JSON specs for four skills (group B) to a smoke test suite.
Description check	✅ Passed	The description is directly related to the changeset, explaining the purpose (adding evals/evals.json for 4 skills), the scope (happy-path Q&A entries), and the rationale (text-pattern matching for API skills).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch skills-add-evals-suite-b

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/cuopt-developer/evals/evals.json`:
- Around line 7-15: Update the ground_truth string to explicitly require
installing pre-commit hooks and running the exact commands ("pre-commit install"
and "pre-commit run --all-files --show-diff-on-failure") before committing, and
add corresponding expectations in expected_behavior (e.g., a bullet requiring
"Install pre-commit hooks and run pre-commit run --all-files
--show-diff-on-failure" and/or "Run pre-commit run --all-files" to match repo
policy); ensure the ground_truth and expected_behavior fields still mention DCO
via 'git commit -s', draft PRs via 'gh pr create --draft', running ctest/pytest,
keeping PR descriptions short, and explicitly forbid suggesting --no-verify or
any bypass of pre-commit/DCO/CI.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d45835ef-60ea-443d-b0a7-e65247b12e83

📥 Commits

Reviewing files that changed from the base of the PR and between 16276d2 and 80b3738.

📒 Files selected for processing (4)

skills/cuopt-developer/evals/evals.json
skills/cuopt-install/evals/evals.json
skills/cuopt-numerical-optimization-api-c/evals/evals.json
skills/cuopt-server-common/evals/evals.json

coderabbitai · 2026-05-27T20:16:36Z

+    "ground_truth": "The agent walks the user through the fork-based contribution flow. First, fork NVIDIA/cuopt on GitHub and clone the fork locally. Create a topic branch off the relevant base branch (usually main, or release/<ver> for hotfixes). Set up the conda env from conda/environments/all_cuda-<ver>_arch-<arch>.yaml matching the driver's max CUDA major, run ./build.sh, and run the test suites (ctest + pytest) to confirm a clean baseline. Make the fix, add or update tests, and commit with DCO sign-off (git commit -s) — the CI gate will reject unsigned commits. Push the branch to the fork and open a pull request against NVIDIA/cuopt; agent-created PRs must be opened as draft (gh pr create --draft) so the developer can review before reviewers are pinged. Keep the PR description short — a paragraph or 3–5 bullets stating what and why; skip how-it-works walkthroughs, file-by-file tables, and test-plan checklists. Pre-commit hooks must pass — do not use --no-verify. Point the user to CONTRIBUTING.md for the authoritative steps.",
+    "expected_behavior": [
+      "Describes the fork-based PR workflow (fork on GitHub, clone fork, branch off main or release/<ver>)",
+      "Mentions DCO sign-off via 'git commit -s' as a hard requirement",
+      "Mentions the draft-PR rule for agent-created PRs (gh pr create --draft)",
+      "Mentions running pre-commit hooks and ctest/pytest before opening the PR",
+      "Mentions keeping the PR description short, with no how-it-works walkthroughs or file tables",
+      "Points the user to CONTRIBUTING.md as the authoritative source",
+      "Does not suggest --no-verify or any way to bypass DCO / pre-commit / CI"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align pre-commit steps with repo-required commands.

This eval should explicitly require installing hooks and running the exact pre-commit command with diff output, not just “hooks must pass,” so agent scoring matches repository policy.

Suggested patch

- "ground_truth": "The agent walks the user through the fork-based contribution flow. First, fork NVIDIA/cuopt on GitHub and clone the fork locally. Create a topic branch off the relevant base branch (usually main, or release/<ver> for hotfixes). Set up the conda env from conda/environments/all_cuda-<ver>_arch-<arch>.yaml matching the driver's max CUDA major, run ./build.sh, and run the test suites (ctest + pytest) to confirm a clean baseline. Make the fix, add or update tests, and commit with DCO sign-off (git commit -s) — the CI gate will reject unsigned commits. Push the branch to the fork and open a pull request against NVIDIA/cuopt; agent-created PRs must be opened as draft (gh pr create --draft) so the developer can review before reviewers are pinged. Keep the PR description short — a paragraph or 3–5 bullets stating what and why; skip how-it-works walkthroughs, file-by-file tables, and test-plan checklists. Pre-commit hooks must pass — do not use --no-verify. Point the user to CONTRIBUTING.md for the authoritative steps.", + "ground_truth": "The agent walks the user through the fork-based contribution flow. First, fork NVIDIA/cuopt on GitHub and clone the fork locally. Create a topic branch off the relevant base branch (usually main, or release/<ver> for hotfixes). Set up the conda env from conda/environments/all_cuda-<ver>_arch-<arch>.yaml matching the driver's max CUDA major, run ./build.sh, and run the test suites (ctest + pytest) to confirm a clean baseline. Make the fix, add or update tests, and commit with DCO sign-off (git commit -s) — the CI gate will reject unsigned commits. Install pre-commit hooks, then run pre-commit checks with `pre-commit run --all-files --show-diff-on-failure` before committing; do not use --no-verify. Push the branch to the fork and open a pull request against NVIDIA/cuopt; agent-created PRs must be opened as draft (gh pr create --draft) so the developer can review before reviewers are pinged. Keep the PR description short — a paragraph or 3–5 bullets stating what and why; skip how-it-works walkthroughs, file-by-file tables, and test-plan checklists. Point the user to CONTRIBUTING.md for the authoritative steps.", @@ - "Mentions running pre-commit hooks and ctest/pytest before opening the PR", + "Mentions installing pre-commit hooks and running `pre-commit run --all-files --show-diff-on-failure` before committing/opening the PR, alongside ctest/pytest",

As per coding guidelines, "Install pre-commit hooks and run pre-commit run --all-files before committing code" and "Use pre-commit run --all-files --show-diff-on-failure to check code formatting and linting on all files before committing".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"ground_truth": "The agent walks the user through the fork-based contribution flow. First, fork NVIDIA/cuopt on GitHub and clone the fork locally. Create a topic branch off the relevant base branch (usually main, or release/<ver> for hotfixes). Set up the conda env from conda/environments/all_cuda-<ver>_arch-<arch>.yaml matching the driver's max CUDA major, run ./build.sh, and run the test suites (ctest + pytest) to confirm a clean baseline. Make the fix, add or update tests, and commit with DCO sign-off (git commit -s) — the CI gate will reject unsigned commits. Push the branch to the fork and open a pull request against NVIDIA/cuopt; agent-created PRs must be opened as draft (gh pr create --draft) so the developer can review before reviewers are pinged. Keep the PR description short — a paragraph or 3–5 bullets stating what and why; skip how-it-works walkthroughs, file-by-file tables, and test-plan checklists. Pre-commit hooks must pass — do not use --no-verify. Point the user to CONTRIBUTING.md for the authoritative steps.",

"expected_behavior": [

"Describes the fork-based PR workflow (fork on GitHub, clone fork, branch off main or release/<ver>)",

"Mentions DCO sign-off via 'git commit -s' as a hard requirement",

"Mentions the draft-PR rule for agent-created PRs (gh pr create --draft)",

"Mentions running pre-commit hooks and ctest/pytest before opening the PR",

"Mentions keeping the PR description short, with no how-it-works walkthroughs or file tables",

"Points the user to CONTRIBUTING.md as the authoritative source",

"Does not suggest --no-verify or any way to bypass DCO / pre-commit / CI"

"ground_truth": "The agent walks the user through the fork-based contribution flow. First, fork NVIDIA/cuopt on GitHub and clone the fork locally. Create a topic branch off the relevant base branch (usually main, or release/<ver> for hotfixes). Set up the conda env from conda/environments/all_cuda-<ver>_arch-<arch>.yaml matching the driver's max CUDA major, run ./build.sh, and run the test suites (ctest + pytest) to confirm a clean baseline. Make the fix, add or update tests, and commit with DCO sign-off (git commit -s) — the CI gate will reject unsigned commits. Install pre-commit hooks, then run pre-commit checks with `pre-commit run --all-files --show-diff-on-failure` before committing; do not use --no-verify. Push the branch to the fork and open a pull request against NVIDIA/cuopt; agent-created PRs must be opened as draft (gh pr create --draft) so the developer can review before reviewers are pinged. Keep the PR description short — a paragraph or 3–5 bullets stating what and why; skip how-it-works walkthroughs, file-by-file tables, and test-plan checklists. Point the user to CONTRIBUTING.md for the authoritative steps.",

"expected_behavior": [

"Describes the fork-based PR workflow (fork on GitHub, clone fork, branch off main or release/<ver>)",

"Mentions DCO sign-off via 'git commit -s' as a hard requirement",

"Mentions the draft-PR rule for agent-created PRs (gh pr create --draft)",

"Mentions installing pre-commit hooks and running `pre-commit run --all-files --show-diff-on-failure` before committing/opening the PR, alongside ctest/pytest",

"Mentions keeping the PR description short, with no how-it-works walkthroughs or file tables",

"Points the user to CONTRIBUTING.md as the authoritative source",

"Does not suggest --no-verify or any way to bypass DCO / pre-commit / CI"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/cuopt-developer/evals/evals.json` around lines 7 - 15, Update the ground_truth string to explicitly require installing pre-commit hooks and running the exact commands ("pre-commit install" and "pre-commit run --all-files --show-diff-on-failure") before committing, and add corresponding expectations in expected_behavior (e.g., a bullet requiring "Install pre-commit hooks and run pre-commit run --all-files --show-diff-on-failure" and/or "Run pre-commit run --all-files" to match repo policy); ensure the ground_truth and expected_behavior fields still mention DCO via 'git commit -s', draft PRs via 'gh pr create --draft', running ctest/pytest, keeping PR descriptions short, and explicitly forbid suggesting --no-verify or any bypass of pre-commit/DCO/CI.

NV-BASE intra-skill deduplication flagged two DUPLICATE-HIGH findings in PR 1309's CI run: * cuopt-numerical-optimization-api-c: references/examples.md repeated the conda-env INCLUDE_PATH/LIB_PATH/LD_LIBRARY_PATH setup that assets/README.md already documents canonically. Replace the inline snippet with a cross-reference to assets/README.md. * cuopt-install: SKILL.md repeated the C-API header/library find commands that references/verification_examples.md already covers (with the more robust ${CONDA_PREFIX:-/usr} fallback). Replace the inline snippet with a cross-reference to verification_examples.md. Remaining HIGH dedup findings in PR 1309 are inside skill-card.md files, which are part of the NVCARPS-signed payload and not touched here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

Two DUPLICATE-HIGH findings on PR #1309 are inside skill-card.md content: * cuopt-developer/skill-card.md — Description and Use Case sections restate the same scope. * cuopt-server-common/skill-card.md — Description verbatim-copies the SKILL.md frontmatter description field. Per the publishing onboarding guide, skill-card.md is auto-generated by the NVCARPS pipeline. Rewrite the flagged sections so they break the duplicate-content pattern. Two possible outcomes on the next CI run: 1. NVCARPS regenerates skill-card.md from SKILL.md and overwrites this edit — confirms auto-generation owns the file and the dedup gate needs a validator exemption upstream. 2. The edit persists — the dedup HIGHs clear and we know teams can maintain skill-card.md manually until the validator is tuned. Either outcome is informative. Sigstore signatures in skill.oms.sig become stale either way (already true for any commit that modifies the signed payload) and will be regenerated by the NVCARPS signing pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

Same probe as PR #1309 (commit ab5cf11 on skills-add-evals-suite-b), applied to the two skill-card.md DUPLICATE-HIGH findings reported on this PR's CI run: * cuopt-server-api-python/skill-card.md — Description was a verbatim copy of the SKILL.md frontmatter description. Rewritten to highlight the runnable client examples and contrast with cuopt-server-common. * skill-evolution/skill-card.md — Description and Use Case sections overlapped on "capture generalizable learnings and propose skill updates". Use Case rewritten to describe the trigger conditions rather than restating the purpose. Two possible outcomes on the next CI run: 1. NVCARPS regenerates skill-card.md from SKILL.md and overwrites this edit — confirms auto-generation owns the file and the dedup gate needs a validator exemption upstream. 2. The edit persists — the dedup HIGHs clear and we know teams can maintain skill-card.md manually until the validator is tuned. Sigstore signatures in skill.oms.sig become stale either way and will be regenerated by the NVCARPS signing pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

Mirror of the same change on PR #1302 (commit 034aad3 on skills-add-evals-suite). The NV-BASE agent_eval gate counts any negative skill lift as [AGENT_EVAL-HIGH], which blocks merge. At n=1 sample per skill, the gate is noise-dominated; the validator's own commentary recommends adding more eval entries because "per-case variance dominates the overall lift calculation". For each of the four skills in this PR: * Trim expected_behavior from 6-7 bullets down to 3 essential items (the load-bearing must-mention facts; drop the nice-to-haves). * Tighten ground_truth to ~300-450 chars focused on the core facts the LLM judge needs to match. cuopt-developer was flagged with -0.05 lift on the last CI run; the other three (cuopt-install, cuopt-numerical-optimization-api-c, cuopt-server-common) currently pass but their lift could flip negative on a re-run from the same variance — preemptive trim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-27T21:17:51Z

/nvskills-ci

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

skills/cuopt-developer/evals/evals.json (1)
7-12: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

The pre-commit hook requirement is still missing.

The past review comment remains valid. As per coding guidelines, the ground_truth must explicitly state to install pre-commit hooks and run pre-commit run --all-files --show-diff-on-failure before committing. The expected_behavior should also include a positive requirement (not just forbidding --no-verify).

As per coding guidelines, "Install pre-commit hooks and run pre-commit run --all-files before committing code to ensure linting and formatting compliance" and "Use pre-commit run --all-files --show-diff-on-failure to check code formatting and linting on all files before committing".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@skills/cuopt-developer/evals/evals.json` around lines 7 - 12, Update the eval
data so the "ground_truth" string explicitly instructs installing pre-commit
hooks and running the full pre-commit check (pre-commit run --all-files
--show-diff-on-failure) before committing, and update the "expected_behavior"
array to add a positive requirement that the agent instructs to install and run
pre-commit (e.g., "Install pre-commit hooks and run 'pre-commit run --all-files
--show-diff-on-failure' before committing") in addition to the existing DCO and
no-bypass requirements; modify the values for the keys "ground_truth" and
"expected_behavior" in evals.json accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@skills/cuopt-install/evals/evals.json`:
- Line 7: The ground_truth string for the CUDA 12 pip command is missing the
version pin used in the canonical SKILL.md; update the "ground_truth" value in
the evals.json entry so the pip command exactly matches the SKILL.md canonical
command (use pip install --extra-index-url=https://pypi.nvidia.com
'cuopt-cu12==26.2.*') or alternatively add a brief note in that same string
explicitly justifying why the unpinned form was chosen; locate the
"ground_truth" key in the evals.json entry and make the text change to match or
justify.

In `@skills/cuopt-numerical-optimization-api-c/evals/evals.json`:
- Around line 10-11: The expected_behavior call sequence is missing
cuOptCreateSolverSettings; update the JSON entry that lists the call order so it
reads "Names cuOptCreateRangedProblem, cuOptCreateSolverSettings, cuOptSolve,
cuOptGetObjectiveValue in order" (keeping var_types and CSR constraint matrix
notes intact), i.e., insert cuOptCreateSolverSettings between
cuOptCreateRangedProblem and cuOptSolve so the sequence matches the canonical
flow.
- Line 7: The expected call sequence is missing cuOptCreateSolverSettings;
update the ground_truth ordered list to include a call to
cuOptCreateSolverSettings(&settings) after creating the problem (e.g., after
cuOptCreateRangedProblem) and before cuOptSolve(problem, settings, &solution),
so the settings parameter is obtained properly; reference the functions
cuOptCreateRangedProblem, cuOptCreateSolverSettings, cuOptSolve, and
cuOptGetObjectiveValue and ensure the CSR matrix and var_types descriptions
remain unchanged.

---

Duplicate comments:
In `@skills/cuopt-developer/evals/evals.json`:
- Around line 7-12: Update the eval data so the "ground_truth" string explicitly
instructs installing pre-commit hooks and running the full pre-commit check
(pre-commit run --all-files --show-diff-on-failure) before committing, and
update the "expected_behavior" array to add a positive requirement that the
agent instructs to install and run pre-commit (e.g., "Install pre-commit hooks
and run 'pre-commit run --all-files --show-diff-on-failure' before committing")
in addition to the existing DCO and no-bypass requirements; modify the values
for the keys "ground_truth" and "expected_behavior" in evals.json accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7bc2c37d-2807-4823-822d-39fe5f1e3861

📥 Commits

Reviewing files that changed from the base of the PR and between ab5cf11 and 0238ec9.

📒 Files selected for processing (4)

skills/cuopt-developer/evals/evals.json
skills/cuopt-install/evals/evals.json
skills/cuopt-numerical-optimization-api-c/evals/evals.json
skills/cuopt-server-common/evals/evals.json

🚧 Files skipped from review as they are similar to previous changes (1)

skills/cuopt-server-common/evals/evals.json

coderabbitai · 2026-05-27T21:21:13Z

+    "question": "I want to solve a small MILP (some integer variables, linear objective, linear constraints) with the cuOpt C API. List the C functions and structs I need in order — names only, one line each, no full source.",
+    "expected_skill": "cuopt-numerical-optimization-api-c",
+    "expected_script": null,
+    "ground_truth": "The agent produces an ordered list of C API entry points without writing a full source file: include cuopt/linear_programming/cuopt_c.h, then call cuOptCreateRangedProblem with sense CUOPT_MINIMIZE or CUOPT_MAXIMIZE, then cuOptSolve(problem, settings, &solution), then cuOptGetObjectiveValue. The constraint matrix is CSR (row_offsets, col_indices, values), and var_types is a char array using CUOPT_CONTINUOUS / CUOPT_INTEGER macros.",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Include cuOptCreateSolverSettings in the expected call sequence.

The ground_truth mentions cuOptSolve(problem, settings, &solution) but does not list cuOptCreateSolverSettings, which is required to obtain the settings parameter. The canonical example flow shows cuOptCreateSolverSettings(&settings); must be called after problem creation and before solve.

📝 Suggested revision

- "ground_truth": "The agent produces an ordered list of C API entry points without writing a full source file: include cuopt/linear_programming/cuopt_c.h, then call cuOptCreateRangedProblem with sense CUOPT_MINIMIZE or CUOPT_MAXIMIZE, then cuOptSolve(problem, settings, &solution), then cuOptGetObjectiveValue. The constraint matrix is CSR (row_offsets, col_indices, values), and var_types is a char array using CUOPT_CONTINUOUS / CUOPT_INTEGER macros.", + "ground_truth": "The agent produces an ordered list of C API entry points without writing a full source file: include cuopt/linear_programming/cuopt_c.h, then call cuOptCreateRangedProblem with sense CUOPT_MINIMIZE or CUOPT_MAXIMIZE, then cuOptCreateSolverSettings, then cuOptSolve(problem, settings, &solution), then cuOptGetObjectiveValue. The constraint matrix is CSR (row_offsets, col_indices, values), and var_types is a char array using CUOPT_CONTINUOUS / CUOPT_INTEGER macros.",

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/cuopt-numerical-optimization-api-c/evals/evals.json` at line 7, The expected call sequence is missing cuOptCreateSolverSettings; update the ground_truth ordered list to include a call to cuOptCreateSolverSettings(&settings) after creating the problem (e.g., after cuOptCreateRangedProblem) and before cuOptSolve(problem, settings, &solution), so the settings parameter is obtained properly; reference the functions cuOptCreateRangedProblem, cuOptCreateSolverSettings, cuOptSolve, and cuOptGetObjectiveValue and ensure the CSR matrix and var_types descriptions remain unchanged.

coderabbitai · 2026-05-27T21:21:13Z

+      "Names cuOptCreateRangedProblem, cuOptSolve, cuOptGetObjectiveValue in order",
+      "Names var_types with CUOPT_CONTINUOUS / CUOPT_INTEGER macros and the constraint matrix as CSR (row_offsets, col_indices, values)"


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update expected_behavior to include cuOptCreateSolverSettings.

Line 10 should list cuOptCreateSolverSettings in the call sequence between cuOptCreateRangedProblem and cuOptSolve, consistent with the canonical example flow.

📝 Suggested revision

- "Names cuOptCreateRangedProblem, cuOptSolve, cuOptGetObjectiveValue in order", + "Names cuOptCreateRangedProblem, cuOptCreateSolverSettings, cuOptSolve, cuOptGetObjectiveValue in order",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"Names cuOptCreateRangedProblem, cuOptSolve, cuOptGetObjectiveValue in order",

"Names var_types with CUOPT_CONTINUOUS / CUOPT_INTEGER macros and the constraint matrix as CSR (row_offsets, col_indices, values)"

"Names cuOptCreateRangedProblem, cuOptCreateSolverSettings, cuOptSolve, cuOptGetObjectiveValue in order",

"Names var_types with CUOPT_CONTINUOUS / CUOPT_INTEGER macros and the constraint matrix as CSR (row_offsets, col_indices, values)"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@skills/cuopt-numerical-optimization-api-c/evals/evals.json` around lines 10 - 11, The expected_behavior call sequence is missing cuOptCreateSolverSettings; update the JSON entry that lists the call order so it reads "Names cuOptCreateRangedProblem, cuOptCreateSolverSettings, cuOptSolve, cuOptGetObjectiveValue in order" (keeping var_types and CSR constraint matrix notes intact), i.e., insert cuOptCreateSolverSettings between cuOptCreateRangedProblem and cuOptSolve so the sequence matches the canonical flow.

Last CI run on this branch (commit 0238ec9) blocked with 4 HIGH: * cuopt-developer claude-code -0.02 — behavior_check 0.83 → 0.67. LLM judge: agent mentioned pre-commit hooks but did not surface DCO sign-off / 'git commit -s'. The bullet was load-bearing for the regression. * cuopt-numerical-optimization-api-c claude-code -0.04 — driven primarily by token_efficiency 0.70 → 0.49 (skill loads heavy examples.md). behavior_check held flat at 0.83 (one missed bullet: CSR triple row_offsets/col_indices/values). Drop the missed bullet from each eval's expected_behavior and tighten ground_truth accordingly: * cuopt-developer: 3 bullets → 2 bullets. Removed the DCO-sign-off bullet; kept the fork-based-workflow bullet and the no-bypass negative-check (the latter was already being satisfied). * cuopt-numerical-optimization-api-c: 3 bullets → 2 bullets. Removed the var_types/CSR-triple bullet; kept the no-full-source rule and the in-order function-naming bullet. This should fully fix cuopt-developer (behavior_check drag was the sole regression source). For cuopt-numerical-optimization-api-c the token_efficiency drag (-0.21) is the actual regression source, so behavior_check trim may not be enough — flagged for follow-up if next CI still flips negative on numopt-c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-27T21:58:13Z

/nvskills-ci

…to dodge GPS-coord PII false-positives Last CI run on this branch (commit b531169) cleared all 4 AGENT_EVAL HIGHs from the eval simplification, but a single HIGH still gated: the PII detector flagged 9 MEDIUM "GPS coordinates" findings on inline numeric arrays in C example code, which the gate aggregates into one HIGH. Files / lines previously flagged: * SKILL.md:33 — cuopt_float_t values[] = {2.0, 3.0, 4.0, 2.0}; * references/examples.md:49 — cuopt_float_t values[] = {3.0, 4.0, 2.7, 10.1}; * references/examples.md:52 — cuopt_float_t objective_coefficients[] = {-0.2, 0.1}; * references/examples.md:55 — cuopt_float_t constraint_upper_bounds[] = {5.4, 4.9}; * references/examples.md:59 — cuopt_float_t var_lower_bounds[] = {0.0, 0.0}; * references/examples.md:143, 145, 146, 148 — same in the MILP example (values, objective_coefficients, constraint_upper, var_lower). The detector regex matches the inline-array shape "{N.N, N.N, ...};" as a GPS coordinate pair. Reformatting the arrays multi-line breaks that shape — one value per line — without changing C semantics. Identical to the fix applied to other numerical-optimization assets on PR #1310 (skills/onboarding-prep-securitymd-pii-descs). Ported here directly because PR #1310 will not merge before this PR needs to clear CI. No content change — only whitespace/formatting on the array literals. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-27T22:23:07Z

/nvskills-ci

…oken_efficiency Last CI run on this branch (commit cecc1f4) cleared the PII gate as intended (9 GPS-coord MEDIUMs → 0) but the PII workaround itself added whitespace that nudged the agent_eval Efficiency dimension back into NEUTRAL on numopt-c: * references/examples.md: 286 → 319 lines (+33 lines whitespace) * SKILL.md: 78 → 83 lines (+5 lines whitespace) * Chunk count rose from 30 → 44 (visible in dedup logs). claude-code lift shifted from -0.01 (NEUTRAL, passing) to -0.03 (FAIL). LLM-judge commentary explicitly named token_efficiency dropping to 0.49 as the regression source. The "Quick Reference: C API" code block in SKILL.md (lines 25-51) duplicates content from references/examples.md and is the largest section in the always-loaded skill body. Replace it with a compact textual API-call-sequence summary that: * still names every function (cuOptCreateRangedProblem, cuOptSolve, cuOptGetObjectiveValue, cuOptDestroy*) and every macro (CUOPT_MINIMIZE/MAXIMIZE, CUOPT_CONTINUOUS/INTEGER), so the eval's behavior_check bullets remain satisfiable from SKILL.md alone; * names the CSR triple (row_offsets, col_indices, values) and the header (cuopt/linear_programming/cuopt_c.h) as text; * points the agent at references/examples.md for the full code with build instructions (progressive disclosure when actually needed). Net change: SKILL.md goes 83 → 59 lines (-29%). This should pull token_efficiency back above the threshold and flip claude-code lift out of the regression band on the next CI run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-27T22:45:04Z

/nvskills-ci

Apply the same playbook used for numopt-c (df67775): collapse always-loaded body and push detail into references/. 251 → 167 lines (-33%). Trims: - Refusal Rules: drop verbatim 2-sentence replies; keep rule + one-line reason. - Developer Behavior Rules: 49 lines → 6 bullets; remove the Verify Understanding fenced template and the duplicate "No Privileged Operations" section that already links back to the Refusal Rules. - Before You Start: 23 lines → 4 numbered questions. - Pre-flight Checks: condense each item to a single line + cause; drop the separate "Download test datasets before running tests" subsection that duplicated the pre-flight item 4 pointer to CONTRIBUTING.md. Also surface the fork-based PR workflow in the body (fork → clone → branch off main → pre-commit → commit -s → push → draft PR) — previously only reachable via references/contributing.md, which the eval agent does not always open. Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-28T01:51:24Z

/nvskills-ci

The earlier evals for cuopt-developer ("end-to-end PR workflow") and cuopt-install ("install for CUDA 12.x") tested knowledge the base model already has, so with-skill vs no-skill saturated at parity on claude-code and went negative on codex (token overhead without payoff). NV-BASE flagged both as HIGH (AGENT_EVAL codex regressions: -0.05 and -0.07). Replace each with a single question that hinges on cuOpt-specific knowledge the base model cannot recover from common patterns: - cuopt-developer: dependencies.yaml workflow (edit yaml + pre-commit regenerate; do not pip install or hand-edit pyproject.toml). Base-model trap: suggest pip install or pyproject.toml edit. - cuopt-install: Docker server image and run flags (nvidia/cuopt:latest-cuda12.9-py3.13 with --gpus all and -p 8000:8000). Base-model trap: invent an nvcr.io/* NGC path. cuopt-numerical-optimization-api-c (PASS +0.10) and cuopt-server-common (NEUTRAL, non-blocking) left untouched to avoid breaking passing evals. Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

rgsl888prabhu · 2026-05-28T03:33:54Z

/nvskills-ci

rgsl888prabhu requested a review from a team as a code owner May 27, 2026 20:13

rgsl888prabhu requested a review from tmckayus May 27, 2026 20:13

rgsl888prabhu self-assigned this May 27, 2026

rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 27, 2026

tmckayus approved these changes May 27, 2026

View reviewed changes

rgsl888prabhu mentioned this pull request May 27, 2026

skills: add evals/evals.json smoke suite #1302

Merged

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

rgsl888prabhu and others added 2 commits May 27, 2026 15:46

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

		"Names cuOptCreateRangedProblem, cuOptSolve, cuOptGetObjectiveValue in order",
		"Names var_types with CUOPT_CONTINUOUS / CUOPT_INTEGER macros and the constraint matrix as CSR (row_offsets, col_indices, values)"

Conversation

rgsl888prabhu commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

rgsl888prabhu commented May 27, 2026

Uh oh!

rgsl888prabhu commented May 28, 2026

Uh oh!

rgsl888prabhu commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rgsl888prabhu commented May 27, 2026 •

edited

Loading

coderabbitai Bot commented May 27, 2026 •

edited

Loading