Skip to content

Replace flaky determinism test#1141

Open
aliceb-nv wants to merge 2 commits intoNVIDIA:mainfrom
aliceb-nv:flaky-determinism-fix
Open

Replace flaky determinism test#1141
aliceb-nv wants to merge 2 commits intoNVIDIA:mainfrom
aliceb-nv:flaky-determinism-fix

Conversation

@aliceb-nv
Copy link
Copy Markdown
Contributor

@aliceb-nv aliceb-nv commented Apr 24, 2026

This PR fixes the B&B determinism test flakiness by replacing seymour1.mps with another instance which completes faster.
run_mip.cpp was also fixed to build following the recent CCCL MR changes.

Closes #1134

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@aliceb-nv aliceb-nv added this to the 26.06 milestone Apr 24, 2026
@aliceb-nv aliceb-nv requested review from a team as code owners April 24, 2026 10:07
@aliceb-nv aliceb-nv added bug Something isn't working non-breaking Introduces a non-breaking change labels Apr 24, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test e59c9cd

Copy link
Copy Markdown
Contributor

@nguidotti nguidotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the fix, Alice.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Walkthrough

Two files are modified: one refactors RMM async memory resource handling by removing shared pointer wrapper and updating call sites to use direct object passing instead of .get() method calls; another updates test parameters by replacing one MPS test file with another.

Changes

Cohort / File(s) Summary
RMM Memory Resource Refactoring
benchmarks/linear_programming/cuopt/run_mip.cpp
Removes owning_wrapper header and refactors async memory resource plumbing to eliminate std::shared_ptr usage. make_async() now returns object directly; all call sites updated to pass object instead of .get(). Adaptor construction and set_current_device_resource calls adjusted accordingly. Control flow unchanged.
Test Parameter Update
cpp/tests/mip/determinism_test.cu
Replaces test dataset seymour1.mps (16 threads, 120s timeout) with pk1.mps (16 threads, 60s timeout) in parameterized determinism test suite. All other test logic and configuration remain unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Replace flaky determinism test' directly addresses one of the two main changes (replacing a flaky test instance), but does not mention the build fix to run_mip.cpp, which is equally significant.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description accurately describes the two main changes: replacing a flaky test instance and fixing run_mip.cpp for recent CCCL changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
benchmarks/linear_programming/cuopt/run_mip.cpp (1)

535-560: ⚠️ Potential issue | 🟠 Major

Move run_single_file() into the adaptor scope.

When --memory-limit or --track-allocations is enabled, limiting_adaptor / tracking_adaptor is destroyed at the end of its branch, but run_single_file() executes afterward. Since set_current_device_resource() stores a non-owning pointer, the solver will run against a dangling memory resource in those modes, causing undefined behavior.

Move the solve call into each branch to keep the adaptor alive throughout execution.

Suggested fix
   auto memory_resource = make_async();
+  auto run = [&] {
+    run_single_file(path,
+                    0,
+                    0,
+                    n_gpus,
+                    out_dir,
+                    initial_solution_file,
+                    heuristics_only,
+                    num_cpu_threads,
+                    write_log_file,
+                    log_to_console,
+                    reliability_branching,
+                    time_limit,
+                    work_limit,
+                    deterministic);
+  };
   if (memory_limit > 0) {
     auto limiting_adaptor =
       rmm::mr::limiting_resource_adaptor(memory_resource, memory_limit * 1024ULL * 1024ULL);
     rmm::mr::set_current_device_resource(limiting_adaptor);
+    run();
   } else if (track_allocations) {
     rmm::mr::tracking_resource_adaptor tracking_adaptor(memory_resource,
                                                         /*capture_stacks=*/true);
     rmm::mr::set_current_device_resource(tracking_adaptor);
+    run();
   } else {
     rmm::mr::set_current_device_resource(memory_resource);
+    run();
   }
-  run_single_file(path,
-                  0,
-                  0,
-                  n_gpus,
-                  out_dir,
-                  initial_solution_file,
-                  heuristics_only,
-                  num_cpu_threads,
-                  write_log_file,
-                  log_to_console,
-                  reliability_branching,
-                  time_limit,
-                  work_limit,
-                  deterministic);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@benchmarks/linear_programming/cuopt/run_mip.cpp` around lines 535 - 560, The
call to run_single_file() uses the current device resource after
limiting_adaptor or tracking_adaptor have been destroyed; keep the adaptor alive
by moving the run_single_file(...) invocation into each branch that creates an
adaptor (the branch that builds limiting_adaptor and the branch that builds
tracking_adaptor) and also call it in the else branch where you
set_current_device_resource(memory_resource), so that the lifetime of
limiting_adaptor/tracking_adaptor encloses the call; reference memory_resource,
limiting_adaptor, tracking_adaptor, rmm::mr::set_current_device_resource, and
run_single_file when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@benchmarks/linear_programming/cuopt/run_mip.cpp`:
- Around line 535-560: The call to run_single_file() uses the current device
resource after limiting_adaptor or tracking_adaptor have been destroyed; keep
the adaptor alive by moving the run_single_file(...) invocation into each branch
that creates an adaptor (the branch that builds limiting_adaptor and the branch
that builds tracking_adaptor) and also call it in the else branch where you
set_current_device_resource(memory_resource), so that the lifetime of
limiting_adaptor/tracking_adaptor encloses the call; reference memory_resource,
limiting_adaptor, tracking_adaptor, rmm::mr::set_current_device_resource, and
run_single_file when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b330eefd-5406-4df4-8df0-0b7c58596057

📥 Commits

Reviewing files that changed from the base of the PR and between d221982 and e59c9cd.

📒 Files selected for processing (2)
  • benchmarks/linear_programming/cuopt/run_mip.cpp
  • cpp/tests/mip/determinism_test.cu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Flaky Determinstic B&B test

3 participants