Summary
c_api/TimeLimitTestFixture.time_limit/2 (the MIP parametric case using /mip/supportcase22.mps, time_limit=15s, CUOPT_METHOD_DUAL_SIMPLEX) fails intermittently on CI when the reported solve_time exceeds target_solve_time + excess_allowed_time (15s + 3s = 18s).
Failing assertion
cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp:64
EXPECT_NEAR(solve_time, target_solve_time, excess_allowed_time); // excess = 3.0
Observed CI failure
solve_time = 18.398s, target 15s, tolerance 3.0s → over by ~0.4s.
- Solver itself respected the limit: log shows
Explored 0 nodes in 15.01s. and termination status Time limit (5).
- CI environment per log: AMD EPYC 9554, only 12/12 threads visible (cgroup-limited on a 64-core host), 43 GiB RAM, NVIDIA H100 NVL, CUDA 13.1.
Root cause
Test reads cuOptGetSolveTime, which for MIP returns mip_solution_t::get_total_solve_time() → stats_.total_solve_time. That field is recorded at cpp/src/mip_heuristics/solver.cu:489 (and similar check_time_limit guards) using a timer_t that was started in cpp/src/mip_heuristics/solve.cu:306 — before Papilo presolve.
Therefore solve_time includes:
- Papilo presolve (OpenMP, ~2.59s in this run)
- cuOpt presolve (~1.59s)
run_mip (LP root relaxation, primal heuristics, B&B)
- Post-B&B serial wind-down inside
mip_solver_t::solve(): branch_and_bound_status_future.get(), sol.compute_feasibility(), sol.test_variable_bounds(...) + device sync
It does NOT include Papilo postsolve, the full_sol rebuild, or C-API wrapping (those run after total_solve_time is finalized).
The 3.4s gap between B&B's internal stop (15.01s) and total_solve_time (18.40s) is the post-join serial wind-down. On CPU-thread-constrained runners:
- Papilo presolve takes a larger share of the 15s budget (it's OpenMP-parallel — see
cpp/src/mip_heuristics/solve.cu:412 passing settings.num_cpu_threads).
- B&B uses
omp_get_max_threads() - 1 worker threads (cpp/src/mip_heuristics/solver.cu:360).
- The post-B&B
compute_feasibility over the ~260K-row presolved problem becomes proportionally more expensive vs the budget.
Memory is not a factor at this problem size (~260K rows / 2.2M nz, peak well under 1 GiB).
Reproduction
- CI: failure observed on H100 NVL CI runner with cgroup-limited 12-thread visibility.
- Local (RTX 8000, compute 7.5): not reproducible — full
c_api/TimeLimitTestFixture.time_limit/* set passes.
Suggested fixes (ranked)
- Measure
solve_time from inside the time-limited region only so the field reflects what time_limit actually controls. Excludes presolve/post-solve serial work that the user's time_limit setting cannot bound.
- Per-case
excess_allowed_time — keep 3s for LP cases, raise (e.g. 7s) for the MIP case to absorb host-side wind-down variance.
- Increase
target_solve_time for the MIP case so post-solve overhead becomes a smaller % of the budget.
(1) is the principled fix; (2)/(3) are pragmatic unblocks.
Mitigation in progress
Disabling the /2 case in a follow-up PR while this is investigated; the LP /0 and /1 cases remain enabled.
Related
There is a stale TODO at cpp/src/dual_simplex/presolve.cpp:1517 mentioning this test name, but it is about a different (numerical assertion) failure mode, not this timing one.
Summary
c_api/TimeLimitTestFixture.time_limit/2(the MIP parametric case using/mip/supportcase22.mps,time_limit=15s,CUOPT_METHOD_DUAL_SIMPLEX) fails intermittently on CI when the reportedsolve_timeexceedstarget_solve_time + excess_allowed_time(15s + 3s = 18s).Failing assertion
cpp/tests/linear_programming/c_api_tests/c_api_tests.cpp:64Observed CI failure
solve_time = 18.398s, target15s, tolerance3.0s→ over by ~0.4s.Explored 0 nodes in 15.01s.and termination statusTime limit (5).Root cause
Test reads
cuOptGetSolveTime, which for MIP returnsmip_solution_t::get_total_solve_time()→stats_.total_solve_time. That field is recorded atcpp/src/mip_heuristics/solver.cu:489(and similarcheck_time_limitguards) using atimer_tthat was started incpp/src/mip_heuristics/solve.cu:306— before Papilo presolve.Therefore
solve_timeincludes:run_mip(LP root relaxation, primal heuristics, B&B)mip_solver_t::solve():branch_and_bound_status_future.get(),sol.compute_feasibility(),sol.test_variable_bounds(...)+ device syncIt does NOT include Papilo postsolve, the
full_solrebuild, or C-API wrapping (those run aftertotal_solve_timeis finalized).The 3.4s gap between B&B's internal stop (15.01s) and
total_solve_time(18.40s) is the post-join serial wind-down. On CPU-thread-constrained runners:cpp/src/mip_heuristics/solve.cu:412passingsettings.num_cpu_threads).omp_get_max_threads() - 1worker threads (cpp/src/mip_heuristics/solver.cu:360).compute_feasibilityover the ~260K-row presolved problem becomes proportionally more expensive vs the budget.Memory is not a factor at this problem size (~260K rows / 2.2M nz, peak well under 1 GiB).
Reproduction
c_api/TimeLimitTestFixture.time_limit/*set passes.Suggested fixes (ranked)
solve_timefrom inside the time-limited region only so the field reflects whattime_limitactually controls. Excludes presolve/post-solve serial work that the user'stime_limitsetting cannot bound.excess_allowed_time— keep 3s for LP cases, raise (e.g. 7s) for the MIP case to absorb host-side wind-down variance.target_solve_timefor the MIP case so post-solve overhead becomes a smaller % of the budget.(1) is the principled fix; (2)/(3) are pragmatic unblocks.
Mitigation in progress
Disabling the
/2case in a follow-up PR while this is investigated; the LP/0and/1cases remain enabled.Related
There is a stale TODO at
cpp/src/dual_simplex/presolve.cpp:1517mentioning this test name, but it is about a different (numerical assertion) failure mode, not this timing one.