Skip to content

Add skill evaluation dataset for cuopt-lp-milp-api-python#1172

Merged
rapids-bot[bot] merged 5 commits intoNVIDIA:mainfrom
rgsl888prabhu:add-lp-milp-api-evals
May 5, 2026
Merged

Add skill evaluation dataset for cuopt-lp-milp-api-python#1172
rapids-bot[bot] merged 5 commits intoNVIDIA:mainfrom
rgsl888prabhu:add-lp-milp-api-evals

Conversation

@rgsl888prabhu
Copy link
Copy Markdown
Collaborator

@rgsl888prabhu rgsl888prabhu commented May 1, 2026

Summary

Initial skill evaluation dataset for cuopt-lp-milp-api-python at skills/cuopt-lp-milp-api-python/evals/evals.json. 99 entries adapted from the microsoft/OptiGuide IndustryOR corpus (MIT, attribution in evals/SOURCES.md).

  • ground_truth is the numeric optimal value; rubric requires exact match to the precision shown (no tolerance)
  • expected_behavior is generic across all entries — does not pre-categorize as LP vs MILP
  • Each entry has a source field referencing the dataset row for traceability

QP eval set is out of scope (the corpus has no genuine QP problems) and will follow in a separate PR.

Initial skill evaluation dataset for cuopt-lp-milp-api-python at
skills/cuopt-lp-milp-api-python/evals/evals.json. 10 entries adapted
from the microsoft/OptiGuide IndustryOR corpus (MIT license, attribution
in evals/SOURCES.md):

- 5 LP-style problems (production planning, profit max, transportation,
  diet, blending with tiered pricing)
- 5 MILP-style problems (assignment, knapsack, lot-sizing, set
  multi-cover / shift scheduling, bin packing / car parking)

Each entry uses the standard schema with one extra `source` field for
provenance. Per the user's review:

- ground_truth is the numeric optimal value only (exact match, no
  tolerance) so the LLM judge has a deterministic check
- expected_behavior is generic and problem-agnostic — does not
  pre-categorize a problem as LP vs MILP since that is the agent's
  job to figure out from the problem text and the
  cuopt-lp-milp-api-python skill covers both

Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
The previous expected_behavior bullet ('Reports the optimal objective
value as part of the response') did not state that the value must
match the ground_truth exactly to the shown precision, leaving room
for the LLM judge to accept rounded answers. Replacing the bullet with
'Reports an optimal objective value that exactly matches the
ground_truth to the precision shown (no rounding tolerance is
allowed)' so the requirement is unambiguous.

Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 1, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

The earlier seed of 10 entries was a leftover 'start small' habit that
doesn't really apply for dataset-derived evals — once the rubric and
schema are validated, including the rest of the corpus is a near-zero-
cost transcription.

Regenerated evals.json from scratch using the same generic rubric and
the exact-precision ground_truth requirement so all 99 entries are
internally consistent. IDs are stable: lpmilp-NNN-<class-slug> where
NNN is the source row index + 1 and the class slug is derived from the
first problem_class tag, which makes problem-level traceability easy
without breaking the source-row mapping.

The corpus is overwhelmingly LP/MILP. The one row tagged
PortfolioOptimization is included because its own text says
'Formulate this as a linear programming problem' — it is an LP, not
an actual QP, so it is in scope for cuopt-lp-milp-api-python.

Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
@rgsl888prabhu rgsl888prabhu self-assigned this May 1, 2026
@rgsl888prabhu rgsl888prabhu added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels May 1, 2026
@rgsl888prabhu rgsl888prabhu marked this pull request as ready for review May 1, 2026 19:16
@rgsl888prabhu rgsl888prabhu requested a review from a team as a code owner May 1, 2026 19:16
@rgsl888prabhu rgsl888prabhu requested a review from Iroy30 May 1, 2026 19:16
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 1, 2026

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (1)
  • skills/cuopt-lp-milp-api-python/evals/evals.json
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e49e61ab-54bb-45d6-b78d-54928df4b0da

📥 Commits

Reviewing files that changed from the base of the PR and between d28ae61 and 9933e43.

📒 Files selected for processing (1)
  • skills/cuopt-lp-milp-api-python/evals/evals.json

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

A new SOURCES.md documentation file is added to the cuopt-lp-milp-api-python skill's evaluation directory. The file records the provenance of evaluation prompts, specifying the source dataset repository, CSV file references, mapping of eval entries to original rows, and includes the MIT license governing the source dataset.

Changes

Cohort / File(s) Summary
Evaluation Provenance Documentation
skills/cuopt-lp-milp-api-python/evals/SOURCES.md
New documentation file recording the source and provenance of evaluation prompts, including dataset repository references, row index mappings, and complete MIT license text.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding a skill evaluation dataset for the cuopt-lp-milp-api-python skill, which aligns with the changeset.
Description check ✅ Passed The PR description is directly related to the changeset, providing detailed context about the evaluation dataset including source attribution, entry structure, and scope clarifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Review rate limit: 8/10 reviews remaining, refill in 11 minutes and 42 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/cuopt-lp-milp-api-python/evals/SOURCES.md`:
- Line 18: The fenced license block in SOURCES.md is missing a language tag
(MD040); update the opening fence used for the MIT license block from ``` to
```text so the block becomes a labeled text code fence (e.g., change the license
block's opening fence in the MIT License section to ```text).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 62cfb1c7-800f-4655-93ba-9e0ebb4348f9

📥 Commits

Reviewing files that changed from the base of the PR and between 746f46b and d28ae61.

📒 Files selected for processing (2)
  • skills/cuopt-lp-milp-api-python/evals/SOURCES.md
  • skills/cuopt-lp-milp-api-python/evals/evals.json


The MIT license under which the source dataset is distributed:

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced license block to satisfy markdownlint.

Line 18 opens a fenced block without a language, which triggers MD040.

Proposed fix
-```
+```text
 MIT License
@@
 SOFTWARE
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/cuopt-lp-milp-api-python/evals/SOURCES.md` at line 18, The fenced
license block in SOURCES.md is missing a language tag (MD040); update the
opening fence used for the MIT license block from ``` to ```text so the block
becomes a labeled text code fence (e.g., change the license block's opening
fence in the MIT License section to ```text).

rgsl888prabhu and others added 2 commits May 1, 2026 15:01
The other six bullets (decision variables, constraints, objective
sense, cuOpt API usage, clarification, no solver substitution) were
identical across all 99 entries and largely implied by the
exact-precision objective-match check — an agent that gets the right
answer to the shown precision had to formulate the problem correctly.
Reducing duplication and file size without losing the load-bearing
signal.

Signed-off-by: Ramakrishna Prabhu <ramakrishnap@nvidia.com>
Copy link
Copy Markdown
Member

@Iroy30 Iroy30 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just curious- Is it easier to wget each time or have it in the repo?

@rgsl888prabhu
Copy link
Copy Markdown
Collaborator Author

LGTM. Just curious- Is it easier to wget each time or have it in the repo?

If there are failures in pull github, it would be a headache for CI failures or even local. And it wants in json format which is not same as the github repo.

@rgsl888prabhu
Copy link
Copy Markdown
Collaborator Author

/merge

@rapids-bot rapids-bot Bot merged commit 733a459 into NVIDIA:main May 5, 2026
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants