Add skill evaluation dataset for cuopt-lp-milp-api-python by rgsl888prabhu · Pull Request #1172 · NVIDIA/cuopt

rgsl888prabhu · 2026-05-01T18:17:27Z

Summary

Initial skill evaluation dataset for cuopt-lp-milp-api-python at skills/cuopt-lp-milp-api-python/evals/evals.json. 99 entries adapted from the microsoft/OptiGuide IndustryOR corpus (MIT, attribution in evals/SOURCES.md).

ground_truth is the numeric optimal value; rubric requires exact match to the precision shown (no tolerance)
expected_behavior is generic across all entries — does not pre-categorize as LP vs MILP
Each entry has a source field referencing the dataset row for traceability

QP eval set is out of scope (the corpus has no genuine QP problems) and will follow in a separate PR.

Initial skill evaluation dataset for cuopt-lp-milp-api-python at skills/cuopt-lp-milp-api-python/evals/evals.json. 10 entries adapted from the microsoft/OptiGuide IndustryOR corpus (MIT license, attribution in evals/SOURCES.md): - 5 LP-style problems (production planning, profit max, transportation, diet, blending with tiered pricing) - 5 MILP-style problems (assignment, knapsack, lot-sizing, set multi-cover / shift scheduling, bin packing / car parking) Each entry uses the standard schema with one extra `source` field for provenance. Per the user's review: - ground_truth is the numeric optimal value only (exact match, no tolerance) so the LLM judge has a deterministic check - expected_behavior is generic and problem-agnostic — does not pre-categorize a problem as LP vs MILP since that is the agent's job to figure out from the problem text and the cuopt-lp-milp-api-python skill covers both Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

The previous expected_behavior bullet ('Reports the optimal objective value as part of the response') did not state that the value must match the ground_truth exactly to the shown precision, leaving room for the LLM judge to accept rounded answers. Replacing the bullet with 'Reports an optimal objective value that exactly matches the ground_truth to the precision shown (no rounding tolerance is allowed)' so the requirement is unambiguous. Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

copy-pr-bot · 2026-05-01T18:17:31Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

The earlier seed of 10 entries was a leftover 'start small' habit that doesn't really apply for dataset-derived evals — once the rubric and schema are validated, including the rest of the corpus is a near-zero- cost transcription. Regenerated evals.json from scratch using the same generic rubric and the exact-precision ground_truth requirement so all 99 entries are internally consistent. IDs are stable: lpmilp-NNN-<class-slug> where NNN is the source row index + 1 and the class slug is derived from the first problem_class tag, which makes problem-level traceability easy without breaking the source-row mapping. The corpus is overwhelmingly LP/MILP. The one row tagged PortfolioOptimization is included because its own text says 'Formulate this as a linear programming problem' — it is an LP, not an actual QP, so it is in scope for cuopt-lp-milp-api-python. Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

coderabbitai · 2026-05-01T19:17:32Z

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (1)

skills/cuopt-lp-milp-api-python/evals/evals.json

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e49e61ab-54bb-45d6-b78d-54928df4b0da

📥 Commits

Reviewing files that changed from the base of the PR and between d28ae61 and 9933e43.

📒 Files selected for processing (1)

skills/cuopt-lp-milp-api-python/evals/evals.json

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

A new SOURCES.md documentation file is added to the cuopt-lp-milp-api-python skill's evaluation directory. The file records the provenance of evaluation prompts, specifying the source dataset repository, CSV file references, mapping of eval entries to original rows, and includes the MIT license governing the source dataset.

Changes

Cohort / File(s)	Summary
Evaluation Provenance Documentation `skills/cuopt-lp-milp-api-python/evals/SOURCES.md`	New documentation file recording the source and provenance of evaluation prompts, including dataset repository references, row index mappings, and complete MIT license text.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main change: adding a skill evaluation dataset for the cuopt-lp-milp-api-python skill, which aligns with the changeset.
Description check	✅ Passed	The PR description is directly related to the changeset, providing detailed context about the evaluation dataset including source attribution, entry structure, and scope clarifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Review rate limit: 8/10 reviews remaining, refill in 11 minutes and 42 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/cuopt-lp-milp-api-python/evals/SOURCES.md`:
- Line 18: The fenced license block in SOURCES.md is missing a language tag
(MD040); update the opening fence used for the MIT license block from ``` to
```text so the block becomes a labeled text code fence (e.g., change the license
block's opening fence in the MIT License section to ```text).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 62cfb1c7-800f-4655-93ba-9e0ebb4348f9

📥 Commits

Reviewing files that changed from the base of the PR and between 746f46b and d28ae61.

📒 Files selected for processing (2)

skills/cuopt-lp-milp-api-python/evals/SOURCES.md
skills/cuopt-lp-milp-api-python/evals/evals.json

coderabbitai · 2026-05-01T19:17:36Z

+
+The MIT license under which the source dataset is distributed:
+
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced license block to satisfy markdownlint.

Line 18 opens a fenced block without a language, which triggers MD040.

Proposed fix

-``` +```text MIT License @@ SOFTWARE

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@skills/cuopt-lp-milp-api-python/evals/SOURCES.md` at line 18, The fenced license block in SOURCES.md is missing a language tag (MD040); update the opening fence used for the MIT license block from ``` to ```text so the block becomes a labeled text code fence (e.g., change the license block's opening fence in the MIT License section to ```text).

The other six bullets (decision variables, constraints, objective sense, cuOpt API usage, clarification, no solver substitution) were identical across all 99 entries and largely implied by the exact-precision objective-match check — an agent that gets the right answer to the shown precision had to formulate the problem correctly. Reducing duplication and file size without losing the load-bearing signal. Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>

Iroy30

LGTM. Just curious- Is it easier to wget each time or have it in the repo?

rgsl888prabhu · 2026-05-05T15:45:57Z

LGTM. Just curious- Is it easier to wget each time or have it in the repo?

If there are failures in pull github, it would be a headache for CI failures or even local. And it wants in json format which is not same as the github repo.

rgsl888prabhu · 2026-05-05T15:57:54Z

/merge

rgsl888prabhu added 2 commits May 1, 2026 12:38

rgsl888prabhu self-assigned this May 1, 2026

rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 1, 2026

rgsl888prabhu marked this pull request as ready for review May 1, 2026 19:16

rgsl888prabhu requested a review from a team as a code owner May 1, 2026 19:16

rgsl888prabhu requested a review from Iroy30 May 1, 2026 19:16

coderabbitai Bot reviewed May 1, 2026

View reviewed changes

rgsl888prabhu and others added 2 commits May 1, 2026 15:01

Merge branch 'main' into add-lp-milp-api-evals

9933e43

Iroy30 approved these changes May 5, 2026

View reviewed changes

rapids-bot Bot merged commit 733a459 into NVIDIA:main May 5, 2026
43 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add skill evaluation dataset for cuopt-lp-milp-api-python#1172

Add skill evaluation dataset for cuopt-lp-milp-api-python#1172
rapids-bot[bot] merged 5 commits intoNVIDIA:mainfrom
rgsl888prabhu:add-lp-milp-api-evals

rgsl888prabhu commented May 1, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 1, 2026

Uh oh!

Iroy30 left a comment

Uh oh!

rgsl888prabhu commented May 5, 2026

Uh oh!

rgsl888prabhu commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		The MIT license under which the source dataset is distributed:

		```

Conversation

rgsl888prabhu commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

copy-pr-bot Bot commented May 1, 2026

Uh oh!

coderabbitai Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Iroy30 left a comment

Choose a reason for hiding this comment

Uh oh!

rgsl888prabhu commented May 5, 2026

Uh oh!

rgsl888prabhu commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rgsl888prabhu commented May 1, 2026 •

edited

Loading

coderabbitai Bot commented May 1, 2026 •

edited

Loading