Skip to content

[WIP] Unified threading model in MIP solver#1099

Draft
nguidotti wants to merge 19 commits intoNVIDIA:mainfrom
nguidotti:omp-migration
Draft

[WIP] Unified threading model in MIP solver#1099
nguidotti wants to merge 19 commits intoNVIDIA:mainfrom
nguidotti:omp-migration

Conversation

@nguidotti
Copy link
Copy Markdown
Contributor

@nguidotti nguidotti commented Apr 14, 2026

In this PR, we migrate almost all parts of the MIP solver from std::thread to OpenMP (in particular, the tasking model of OpenMP). The only exception is the Papilo presolver that uses Intel TBB and the LP solver.

More specifically, this PR

  • Solves the CPU oversubscription problem. The solver now respect the number of threads set to the user, with the exception of Papilo or threads created by the CUDA runtime.
  • Removes overheads from creating and destroying std::thread.
  • Migrates RINS from std::thread to omp task. Similar to previous logic, one instance of RINS can run at a time.
  • Migrates CPU FJ from std::thread to omp task. There are a few limitations
    • scratch_cpu_fj_on_lp_opt and scratch_cpu_fj are running for the entire program. This essentially allocate two dedicated threads to these functions, while other routines needs to share the remaining CPU resources. This may hurt the performance for low core count CPUs.
    • Since there is a small delay between the task creation and its start (since the threads may be busy), the GPU FJ may finish before the CPU FJ even start when racing.
  • Migrate early FJ to omp task.
  • Now, only a single parallel region created at the beginning of the solver, so it can be shared across the MIP solver.
  • Eliminate cpu_worker_thread and other redundant code

MIPLIB2017:
GH200, 10min time limit, cbs-cta excluded (see #978)

================================================================================
 main-190326-2 (1) vs unified-parallel-model (2)
================================================================================

------------------------------------------------------------------------------------------------------------------------------
|                                        |       Run 1        |       Run 2        |     Abs. Diff.     |   Rel. Diff. (%)   |
------------------------------------------------------------------------------------------------------------------------------
| Feasible                                                 226                  227                   +1                 --- |
| Optimal                                                   70                   74                   +4                 --- |
| Solutions with <0.1% primal gap                          121                  124                   +3                 --- |
| Nodes explored (mean)                           4283972.9121         4684918.2469         +400945.3347              +8.558 |
| Nodes explored (shifted geomean)                   6202.3471            7545.9821           +1343.6350             +17.806 |
| Relative MIP gap (mean)                               0.3382               0.3279              -0.0103              -3.037 |
| Relative MIP gap (shifted geomean)                    0.1193               0.1146              -0.0047              -3.919 |
| Solve time (mean)                                   450.2347             449.6831              -0.5517              -0.123 |
| Solve time (shifted geomean)                        221.4772             239.4322             +17.9549              +7.499 |
| Primal gap (mean)                                    11.4459              11.3227              -0.1232              -1.076 |
| Primal gap (shifted geomean)                          0.6591               0.6122              -0.0469              -7.109 |
| Primal integral (mean)                               49.9109              54.1269              +4.2160              +7.789 |
| Primal integral (shifted geomean)                    11.5672              13.7114              +2.1442             +15.638 |
------------------------------------------------------------------------------------------------------------------------------

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
… shares the same thread pool.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti nguidotti changed the title [WIP] Unified threading model in MIP solver#820 [WIP] Unified threading model in MIP solver Apr 14, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 14, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

Refactors thread-based concurrency to OpenMP task/taskgroup-based execution across branch-and-bound, heuristics (RINS, FJ, local search), presolve, and top-level solver entry points; removes CRTP CPU worker thread base and replaces thread lifecycles with task dependencies and omp_atomic_t state.

Changes

Cohort / File(s) Summary
Branch-and-bound core
cpp/src/branch_and_bound/branch_and_bound.cpp, cpp/src/branch_and_bound/pseudo_costs.cpp
Replaced async/future and nested parallel/single patterns with OpenMP tasks/taskloop and taskwait; tightened OpenMP clauses (default(none), firstprivate/shared); added NVTX scopes; synchronized dual-simplex/root-relaxation via task dependencies.
RINS thread replacement
cpp/src/mip_heuristics/diversity/lns/rins.cu, cpp/src/mip_heuristics/diversity/lns/rins.cuh
Removed rins_thread_t and stop_rins(); switched to task-based launches guarded by an atomic launch_new_task and scope_guard re-arming; replaced mutex/atomic members with omp_atomic_t and launched CPUFJ via OpenMP tasks.
Feasibility-jump (FJ) CPU refactor
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu, cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh, cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu, cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cuh, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
Removed cpu_fj_thread_t wrapper and related thread lifecycle; introduced cpufj_solve(fj_cpu_climber_t*, time_limit) and call sites using OpenMP tasks with depend clauses; adjusted APIs and member types from thread wrapper to fj_cpu_climber_t; removed cpu_solve declaration.
Local search CPUFJ integration
cpp/src/mip_heuristics/local_search/local_search.cu, cpp/src/mip_heuristics/local_search/local_search.cuh
Replaced per-thread worker threads with vectors of fj_cpu_climber_t and OpenMP task launches (cpufj_solve); removed file-scope static state; added omp_atomic_t shared best-objective and local population pointer; changed start/stop to task-based depend/wait.
Presolve taskification
cpp/src/mip_heuristics/presolve/bounds_presolve.cuh, cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu, cpp/src/mip_heuristics/presolve/probing_cache.cu
Switched concurrency config from num_threadsnum_tasks; replaced parallel for with #pragma omp taskloop and per-task buffers indexed by task id; moved step-consolidation outside single regions and adjusted loop index types.
Diversity & top-level solver orchestration
cpp/src/mip_heuristics/diversity/diversity_manager.cu, cpp/src/mip_heuristics/solve.cu, cpp/src/mip_heuristics/solver.cu
Removed explicit stop_rins() calls; added OpenMP wrapper and master region for top-level solve_mip; renamed internal helper/driver functions; replaced std::async run_bb with taskgroup and in-task B&B/heuristic launches; simplified B&B thread-count derivation.
CPU worker base removal
cpp/src/mip_heuristics/utilities/cpu_worker_thread.cuh
Deleted the entire cpu_worker_thread_base_t<Derived> CRTP thread infrastructure, including lifecycle methods, synchronization primitives, and related public APIs.
GPU/early FJ task conversion
cpp/src/mip_heuristics/feasibility_jump/early_gpufj.cu, cpp/src/mip_heuristics/feasibility_jump/early_gpufj.cuh
Removed dedicated GPU worker thread and run_worker(); start/stop now use OpenMP tasks and taskwait depend on fj_ptr_.
Diversity manager minor control changes
cpp/src/mip_heuristics/diversity/diversity_manager.cu
Removed several unconditional rins.stop_rins() calls and gated rins.enable() on thread count (omp_get_num_threads() > 4).
Probing & caching worker indexing
cpp/src/mip_heuristics/presolve/probing_cache.cu
Resized per-worker pools to num_tasks, replaced thread-num indexing with task-id indexing and taskloop partitioning per step.
Misc. header/member updates
cpp/src/mip_heuristics/local_search/local_search.cuh, cpp/src/mip_heuristics/presolve/bounds_presolve.cuh, cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
Adjusted private member types (thread wrappers → climber pointers), added omp_atomic_t members, and replaced num_threads field with num_tasks in settings struct; removed thread-related includes.
Bench & Python metadata
benchmarks/linear_programming/cuopt/run_mip.cpp, python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
Increased benchmark default min CPU threads from 1→2; updated SolverConfig.num_cpu_threads description string (no API change).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '[WIP] Unified threading model in MIP solver' accurately describes the main objective: migrating the MIP solver from std::thread to OpenMP tasking across most components.
Description check ✅ Passed The PR description is comprehensive and directly related to the changeset, explaining the threading migration from std::thread to OpenMP, addressing CPU oversubscription, and documenting trade-offs and performance implications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/mip_heuristics/diversity/lns/rins.cu (1)

102-105: ⚠️ Potential issue | 🟠 Major

Reset launch_new_task with a guard, not manual scattered returns.

The early return at Line 104 leaves launch_new_task == false, so one transient !dm.population.is_feasible() result can permanently disable all future RINS launches.

💡 Suggested fix
 void rins_t<i_t, f_t>::run_rins()
 {
   raft::common::nvtx::range fun_scope("Running RINS");
+
+  struct launch_flag_guard_t {
+    rins_t* self;
+    ~launch_flag_guard_t() { self->launch_new_task = true; }
+  } launch_flag_guard{this};
 
   RAFT_CUDA_TRY(cudaSetDevice(context.handle_ptr->get_device()));
@@
   {
     std::lock_guard<std::recursive_mutex> lock(dm.population.write_mutex);
     if (!dm.population.is_feasible()) return;
@@
-  if (!best_sol.get_feasible()) {
-    launch_new_task = true;
-    return;
-  }
+  if (!best_sol.get_feasible()) { return; }
@@
   if (fractional_ratio < settings.min_fractional_ratio) {
     CUOPT_LOG_TRACE("RINS fractional ratio too low, aborting");
-    launch_new_task = true;
     return;
   }
@@
   if (n_to_fix == 0) {
     CUOPT_LOG_DEBUG("RINS no variables to fix");
-    launch_new_task = true;
     return;
   }
@@
-  launch_new_task = true;
 }

Also applies to: 118-121, 143-146, 168-171, 343-343

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/lns/rins.cu` around lines 102 - 105, The
code currently does an early return when dm.population.is_feasible() is false
which leaves the global/outer flag launch_new_task permanently false; wrap the
critical section with a RAII-style guard that saves the current launch_new_task
value and restores it on scope exit (or use std::scope_exit) so any early return
will reset launch_new_task back to its prior value; apply this pattern around
the blocks that check dm.population.is_feasible() (the locations using
std::lock_guard<std::recursive_mutex> lock(dm.population.write_mutex) and the
subsequent checks) so launch_new_task is restored even when returning early.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 1054-1139: The OpenMP task/taskloop regions in strong_branching()
(invoking batch_pdlp_strong_branching_task and strong_branch_helper) assume an
active parallel team; wrap the task-emitting block so it always executes inside
a parallel single region when not already in a parallel region: check
omp_in_parallel() and, if false, create a local `#pragma` omp parallel { `#pragma`
omp single nowait { ... } } around the task/taskloop code (preserving
firstprivate/shared clauses and concurrent_halt, pc, sb_view, etc.); otherwise
leave the existing task emission unchanged. Ensure the same guard or local
parallel/single scaffold is applied to reliable_variable_selection()'s analogous
task regions so tasks don’t silently run serially.

In `@cpp/src/mip_heuristics/diversity/lns/rins.cu`:
- Around line 84-87: The cudaSetDevice call is incorrectly guarded by if
(total_calls == 0) so threads other than the first may not set the correct
device; remove that guard and call
RAFT_CUDA_TRY(cudaSetDevice(context.handle_ptr->get_device())) unconditionally
at the start of the RINS execution (near fun_scope) so every invocation/thread
sets the proper CUDA device (refer to total_calls and the
RAFT_CUDA_TRY(cudaSetDevice(...)) call).

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1423-1436: The code builds time_limit by casting in_time_limit to
int then to milliseconds before checking for infinity, which can overflow when
in_time_limit == infinity(); modify cpufj_solve() to test
std::isfinite(in_time_limit) (or in_time_limit < infinity()) first and only
construct time_limit when finite (e.g., assign
std::optional<std::chrono::milliseconds> time_limit or a boolean
has_time_limit), then in the loop replace the current comparison with a guarded
check using has_time_limit and time_limit against now - loop_time_start; update
references to time_limit, loop_time_start and the in_time_limit conversion
accordingly to avoid the int cast on infinity.

In `@cpp/src/mip_heuristics/local_search/local_search.cu`:
- Around line 247-252: The OpenMP taskloop uses num_tasks(ls_cpu_fj.size())
which is invalid when ls_cpu_fj.size()==0 (e.g., num_cpufj_threads == 0); guard
the pragma by checking that ls_cpu_fj is non-empty (or num_cpufj_threads > 0)
before emitting the `#pragma` omp taskgroup / `#pragma` omp taskloop block so the
num_tasks clause is only evaluated for a positive task count; locate the block
around ls_cpu_fj and cpufj_solve and wrap the pragmas and for-loop in an if
(ls_cpu_fj.empty() == false) (or equivalent) conditional to skip the pragmas
when there are zero CPUFJ threads.

In `@cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu`:
- Around line 249-251: The OpenMP taskloop with default(none) uses
problem.n_constraints in the loop bound but never declares it in the
data-sharing clause; hoist the bound into a local int (e.g., int n_constraints =
problem.n_constraints;) and change the for-loop to use n_constraints, then add
that local to the pragma as firstprivate(n_constraints) so the loop bound is
explicitly available to each task (update the pragma on the taskloop that
currently names cnstr_pair and shared(offsets, variables, reverse_offsets,
reverse_constraints, constraint_pairs_h)).

In `@cpp/src/mip_heuristics/presolve/probing_cache.cu`:
- Around line 913-925: The loop computes chunk offsets with begin/end relative
to the window [step_start, step_end) but then indexes priority_indices using i
directly and logs using an undefined id; fix by adding the step_start offset
when computing the index (use priority_indices[step_start + i] or adjust
begin/end to be absolute) and replace the undefined log identifier in the
CUOPT_LOG_TRACE call with the correct thread/task identifier (use task_id or
another existing variable such as multi_probe_presolve_pool index) so the code
compiles and processes the correct slice without reprocessing the prefix.

In `@cpp/src/mip_heuristics/solve.cu`:
- Around line 577-586: The code currently forces any explicit
settings_const.num_cpu_threads value below 4 up to 4 and logs an error; instead
preserve the user's explicit cap: when settings_const.num_cpu_threads >= 0 set
num_threads = settings_const.num_cpu_threads (or at minimum 1 if you want to
guard against zero/negative inputs), but do not silently raise 1..3 to 4; if you
still want to inform users about potentially low thread counts emit a warning
via CUOPT_LOG_ERROR or CUOPT_LOG_WARN referencing settings_const.num_cpu_threads
and num_threads, and leave the omp_get_max_threads() branch unchanged so
omp_get_max_threads() continues to control threads when the setting is negative.
- Around line 588-590: The call to omp_set_max_active_levels(2) in solve_mip()
changes global OpenMP state and must be restored before returning; save the
previous value (e.g., int prev = omp_get_max_active_levels()) immediately before
calling omp_set_max_active_levels(2) and restore it after the nested parallel
region or at function exit (ensure restoration on all return paths and
exceptions), or encapsulate this logic in a small RAII-style helper so
solve_mip() leaves OpenMP nesting levels unchanged for callers.

In `@cpp/src/mip_heuristics/solver.cu`:
- Around line 452-459: The omp task calling branch_and_bound->solve(...) can
throw and will cause std::terminate if the exception escapes the task; capture
exceptions inside the task into a std::exception_ptr (e.g., named bb_exception)
instead of letting them propagate, set branch_and_bound_status and
branch_and_bound_solution as before, then after the `#pragma` omp taskgroup
completes check if bb_exception is non-null and rethrow it
(std::rethrow_exception) so outer solve_mip() handlers can catch it; reference
the omp task that assigns branch_and_bound_status and uses
branch_and_bound_solution and branch_and_bound->solve to add this
capture-and-rethrow behavior.

---

Outside diff comments:
In `@cpp/src/mip_heuristics/diversity/lns/rins.cu`:
- Around line 102-105: The code currently does an early return when
dm.population.is_feasible() is false which leaves the global/outer flag
launch_new_task permanently false; wrap the critical section with a RAII-style
guard that saves the current launch_new_task value and restores it on scope exit
(or use std::scope_exit) so any early return will reset launch_new_task back to
its prior value; apply this pattern around the blocks that check
dm.population.is_feasible() (the locations using
std::lock_guard<std::recursive_mutex> lock(dm.population.write_mutex) and the
subsequent checks) so launch_new_task is restored even when returning early.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: fca14163-5249-40cb-87eb-077397db18d0

📥 Commits

Reviewing files that changed from the base of the PR and between b497a5b and 1bb5d10.

📒 Files selected for processing (18)
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/diversity/lns/rins.cu
  • cpp/src/mip_heuristics/diversity/lns/rins.cuh
  • cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu
  • cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cuh
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cuh
  • cpp/src/mip_heuristics/local_search/local_search.cu
  • cpp/src/mip_heuristics/local_search/local_search.cuh
  • cpp/src/mip_heuristics/presolve/bounds_presolve.cuh
  • cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu
  • cpp/src/mip_heuristics/presolve/probing_cache.cu
  • cpp/src/mip_heuristics/solve.cu
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/mip_heuristics/utilities/cpu_worker_thread.cuh
💤 Files with no reviewable changes (3)
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh
  • cpp/src/mip_heuristics/utilities/cpu_worker_thread.cuh

Comment thread cpp/src/branch_and_bound/pseudo_costs.cpp
Comment thread cpp/src/mip_heuristics/diversity/lns/rins.cu Outdated
Comment thread cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu Outdated
Comment thread cpp/src/mip_heuristics/local_search/local_search.cu
Comment thread cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu Outdated
Comment thread cpp/src/mip_heuristics/presolve/probing_cache.cu Outdated
Comment thread cpp/src/mip_heuristics/solve.cu
Comment thread cpp/src/mip_heuristics/solve.cu
Comment thread cpp/src/mip_heuristics/solver.cu
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Copy link
Copy Markdown
Contributor

@akifcorduk akifcorduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should run a baseline with probing cache changes as it fixes performance bugs. That way we can measure the performance impact of the threading changes. Otherwise, we are testing multiple things together.

p.s. I didn't review cp_fj changes thoroughly.

Comment thread cpp/src/branch_and_bound/branch_and_bound.cpp Outdated
Comment thread cpp/src/branch_and_bound/pseudo_costs.cpp Outdated
Comment thread cpp/src/mip_heuristics/presolve/probing_cache.cu Outdated
Comment thread cpp/src/mip_heuristics/presolve/probing_cache.cu Outdated
Comment thread cpp/src/mip_heuristics/solve.cu Outdated
num_threads = omp_get_max_threads();
} else {
if (settings_const.num_cpu_threads < 4) {
CUOPT_LOG_ERROR(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of ERROR, this should be a WARNING.

} catch (...) {
// We cannot throw inside an OpenMP parallel region. So we need to catch and then
// re-throw later.
exception = std::current_exception();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple exceptions in parallel?

Copy link
Copy Markdown
Contributor Author

@nguidotti nguidotti Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that the exceptions cannot cross thread boundaries, this will only catch the exceptions throw by the master thread.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if we want to catch the exception from all threads, then we need a mechanism for collecting and re-throwing in the master thread.

template <typename i_t, typename f_t>
solution_t<i_t, f_t> mip_solver_t<i_t, f_t>::run_solver()
{
solution_t<i_t, f_t> sol(*context.problem_ptr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think problem might be copied or modified in the following calls. I would double check that removed solution constructors below are indeed using the unmodified problem and same as the original one.

…e for root relaxation.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test b4efd67

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test 80b14ab

@nguidotti nguidotti self-assigned this Apr 15, 2026
@nguidotti nguidotti added non-breaking Introduces a non-breaking change do not merge Do not merge if this flag is set improvement Improves an existing functionality mip labels Apr 15, 2026
@nguidotti nguidotti added this to the 26.06 milestone Apr 15, 2026
@nguidotti nguidotti linked an issue Apr 15, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Contributor

@aliceb-nv aliceb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for that work Nicolas :)
Let me know if you need help with porting the B&B determinism code.

Comment thread cpp/src/mip_heuristics/diversity/lns/rins.cu Outdated
Comment thread cpp/src/mip_heuristics/local_search/local_search.cu Outdated
for (auto& cpu_fj_ptr : ls_cpu_fj) {
cpu_fj_ptr->start_cpu_solver();
}
#pragma omp taskgroup
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand - there's an implicit taskwait at the end of this taskgroup right? Is that how the CPUFJ tasks are joined after the halt preemption is requested?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, at the end of the taskgroup there is an implicit barrier that waits for all tasks created inside the group to finish.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments to each implicit barrier

Comment thread cpp/src/mip_heuristics/solve.cu Outdated
Comment on lines +584 to +588
CUOPT_LOG_ERROR("The MIP solver requires at least 4 CPU threads!");
return mip_solution_t<i_t, f_t>{
cuopt::logic_error("The number of CPU threads is below than expected.",
cuopt::error_type_t::RuntimeError),
op_problem.get_handle_ptr()->get_stream()};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true? Don't we support running on a single thread?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With low thread count, some vital tasks may not execute. If you have just one thread, then how we suppose to execute the heuristic and the B&B thread at the same time?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could support low thread count, if either one or another tasks can be interrupted or yielded. But I think this is too much effort for a rare case (I mean, it is pretty common to have CPUs with more than 8 threads nowadays)

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
…nc in CPU FJ. use scope_guard in RINS.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test d7d3dad

…A 12.9

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test 6ac7882

@nguidotti
Copy link
Copy Markdown
Contributor Author

/ok to test e152241

…ed the thread requirement to 2.

Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
Signed-off-by: Nicolas L. Guidotti <nguidotti@nvidia.com>
@nguidotti nguidotti marked this pull request as ready for review April 17, 2026 10:03
@nguidotti nguidotti requested review from a team as code owners April 17, 2026 10:03
@nguidotti nguidotti requested a review from tmckayus April 17, 2026 10:03
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
cpp/src/mip_heuristics/presolve/probing_cache.cu (1)

914-914: ⚠️ Potential issue | 🟡 Minor

Format specifier mismatch and stale wording in trace log.

task_id is size_t but is printed with %d, which is undefined behavior under varargs (it will typically "work" on LP64 but breaks on 32-bit/Windows and under fortify/sanitizer builds). The label also still reads "on thread" after the migration away from threads.

Proposed fix
-        CUOPT_LOG_TRACE("Computing probing cache for var %d on thread %d", var_idx, task_id);
+        CUOPT_LOG_TRACE("Computing probing cache for var %d on task %zu", var_idx, task_id);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/probing_cache.cu` at line 914, The
CUOPT_LOG_TRACE call logging "Computing probing cache for var %d on thread %d"
uses the wrong format for task_id (task_id is size_t) and has stale wording;
update the CUOPT_LOG_TRACE invocation in probing_cache.cu (the call that
references var_idx and task_id) to use the correct size_t specifier (e.g., %zu)
or cast task_id to a fixed-width/unsigned type and use %lu/%" PRIuPTR as your
project prefers, and change the message text from "on thread" to something like
"for task" so it reads "Computing probing cache for var %d for task %zu" (or the
equivalent with a cast and %lu).
cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu (1)

1422-1436: ⚠️ Potential issue | 🔴 Critical

Guard the infinite time-limit case before building std::chrono::milliseconds.

cpufj_solve() still converts in_time_limit to an integer millisecond count before checking whether the input is finite. When the caller uses the default unlimited run, that path can overflow before the later infinity() guard is reached.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu` around lines 1422 - 1436,
The code converts in_time_limit to std::chrono::milliseconds before checking for
infinity, which can overflow for an "unlimited" run; modify cpufj_solve (the
loop where in_time_limit, time_limit, loop_start, loop_time_start are used) to
first test if in_time_limit is finite and only then build time_limit =
milliseconds(floor(in_time_limit*1000.0)); otherwise set a sentinel (e.g. no
time_limit / optional / boolean has_time_limit) and use that sentinel in the
later check (now - loop_time_start > time_limit) so the infinite case never
triggers the millisecond conversion.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu`:
- Around line 33-34: The thread-count guard in early_cpufj_t::start() is
inverted: it currently returns when omp_get_num_threads() > 3, skipping the CPU
heuristic on higher-core runs; change the condition to return when
omp_get_num_threads() < 4 (i.e., use the same low-thread cutoff as the GPU path)
so the check becomes something like if (fj_cpu_ || omp_get_num_threads() < 4) {
return; }, keeping the fj_cpu_ short-circuit as-is.

In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 359-362: You removed the nnz_per_move variable but it is still
used in log_regression_features(), causing a compile error; restore nnz_per_move
(computed from fj_cpu.nnz_processed_window divided by total_moves) and keep its
declaration (e.g., double nnz_per_move = 0.0;) before computing total_moves, or
guard its computation with a conditional so nnz_per_move is always defined when
calling log_regression_features(), referencing the nnz_per_move variable,
total_moves, and fj_cpu members (nnz_processed_window, n_lift_moves_window,
n_mtm_viol_moves_window, n_mtm_sat_moves_window) to locate where to fix.

In `@cpp/src/mip_heuristics/presolve/probing_cache.cu`:
- Around line 864-866: The log call in CUOPT_LOG_TRACE is using the wrong printf
specifier and a misleading label: change the format from "%d" to "%zu" for
task_id (or cast task_id to size_t) and update the message text from "thread" to
"task" (e.g. CUOPT_LOG_TRACE("Computing probing cache for var %d on task %zu",
var_idx, task_id)); locate and fix this in the CUOPT_LOG_TRACE invocation that
references var_idx and task_id; leave the sentinel check using
bound_presolve.settings.num_tasks < 0 as-is since the original setting is a
signed type.

In `@cpp/src/mip_heuristics/solver.cu`:
- Around line 452-476: The code captures sol from dm.run_solver() before the
taskgroup barrier so it may miss later incumbents published by
branch_and_bound_solution_helper_t::solution_callback(); after the implicit
barrier (i.e., immediately before using branch_and_bound_solution and
branch_and_bound_status) re-read the final incumbent from the diversity manager
/ solution helper and overwrite sol (e.g., call dm.get_best_solution() or the
helper's getter to fetch the latest incumbent), then proceed to update
context.stats and handle branch_and_bound_status (e.g., call
sol.set_problem_fully_reduced() if INFEASIBLE) so the returned solver result
reflects any incumbents found while the taskgroup ran.
- Around line 452-464: When OpenMP has only one thread the current logic can
deadlock because dm.run_solver() blocks waiting for
context.preempt_heuristic_solver_ while the B&B task cannot run; detect this by
calling omp_get_num_threads() and if it returns 1 and heuristics are enabled,
run the B&B solve inline instead of queuing it (i.e., call
branch_and_bound->solve(branch_and_bound_solution) before calling
dm.run_solver()), or alternatively yield by running dm.run_solver() in a
separate std::thread; update references to branch_and_bound_status,
branch_and_bound->solve(), dm.run_solver(), omp_get_num_threads(), and
context.preempt_heuristic_solver_ accordingly so the serial fallback executes
B&B immediately when single-threaded.

---

Duplicate comments:
In `@cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu`:
- Around line 1422-1436: The code converts in_time_limit to
std::chrono::milliseconds before checking for infinity, which can overflow for
an "unlimited" run; modify cpufj_solve (the loop where in_time_limit,
time_limit, loop_start, loop_time_start are used) to first test if in_time_limit
is finite and only then build time_limit =
milliseconds(floor(in_time_limit*1000.0)); otherwise set a sentinel (e.g. no
time_limit / optional / boolean has_time_limit) and use that sentinel in the
later check (now - loop_time_start > time_limit) so the infinite case never
triggers the millisecond conversion.

In `@cpp/src/mip_heuristics/presolve/probing_cache.cu`:
- Line 914: The CUOPT_LOG_TRACE call logging "Computing probing cache for var %d
on thread %d" uses the wrong format for task_id (task_id is size_t) and has
stale wording; update the CUOPT_LOG_TRACE invocation in probing_cache.cu (the
call that references var_idx and task_id) to use the correct size_t specifier
(e.g., %zu) or cast task_id to a fixed-width/unsigned type and use %lu/%"
PRIuPTR as your project prefers, and change the message text from "on thread" to
something like "for task" so it reads "Computing probing cache for var %d for
task %zu" (or the equivalent with a cast and %lu).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c4bb788c-8d9f-469c-ac30-669c292ed88f

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb5d10 and 8a77a6c.

📒 Files selected for processing (15)
  • benchmarks/linear_programming/cuopt/run_mip.cpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/diversity/lns/rins.cu
  • cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu
  • cpp/src/mip_heuristics/feasibility_jump/early_gpufj.cu
  • cpp/src/mip_heuristics/feasibility_jump/early_gpufj.cuh
  • cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
  • cpp/src/mip_heuristics/local_search/local_search.cu
  • cpp/src/mip_heuristics/presolve/conditional_bound_strengthening.cu
  • cpp/src/mip_heuristics/presolve/probing_cache.cu
  • cpp/src/mip_heuristics/solve.cu
  • cpp/src/mip_heuristics/solver.cu
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
  • cpp/src/mip_heuristics/feasibility_jump/early_gpufj.cuh
✅ Files skipped from review due to trivial changes (2)
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
  • benchmarks/linear_programming/cuopt/run_mip.cpp
🚧 Files skipped from review as they are similar to previous changes (3)
  • cpp/src/mip_heuristics/solve.cu
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp

Comment thread cpp/src/mip_heuristics/feasibility_jump/early_cpufj.cu
Comment thread cpp/src/mip_heuristics/feasibility_jump/fj_cpu.cu
Comment thread cpp/src/mip_heuristics/presolve/probing_cache.cu
Comment thread cpp/src/mip_heuristics/solver.cu
Comment thread cpp/src/mip_heuristics/solver.cu
@nguidotti nguidotti marked this pull request as draft April 17, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Do not merge if this flag is set improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Respect CPU thread limits

4 participants