Skip to content

Extend batch pdlp support#1152

Merged
rapids-bot[bot] merged 32 commits into
mainfrom
extend_batch_pdlp_support
May 11, 2026
Merged

Extend batch pdlp support#1152
rapids-bot[bot] merged 32 commits into
mainfrom
extend_batch_pdlp_support

Conversation

@Kh4ster
Copy link
Copy Markdown
Contributor

@Kh4ster Kh4ster commented Apr 28, 2026

This PR greatly extends the capabilities of batch PDLP. Former batch PDLP only supported having a single variable bounds being different per climber. It now supports:

  • Different constraints lower and upper bounds per climber
  • Different objective coefficients per climber
  • Different objective offset per climber
  • More than one variable bound difference per climber

This PR also adds the support of per climber residual and first primal feasible to the Stable3 PDLP solver mode and its batch version. It allows to solve a batch of problems and stop once one or all the climbers have reached primal feasibility.

All those combinations can be put together, resulting in a potential:

Solve a batch of LPs all having different: constraints lower and upper bounds, objective coefficients, objective offset, variable bounds, using per constraint residual instead of the L2 norm and stopping once one, or all, climbers have reached primal feasibility.

@Kh4ster Kh4ster requested review from a team as code owners April 28, 2026 08:48
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Kh4ster Kh4ster self-assigned this Apr 28, 2026
@Kh4ster Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change pdlp labels Apr 28, 2026
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 28, 2026

/ok to test

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

/ok to test

@Kh4ster, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds fixed and auto batch PDLP execution paths with per-climber objective offsets, per-climber rescaling and convergence statistics, new solver settings (first/all_primal_feasible, fixed_batch_size), batch-safe swap/resize/transpose, many GPU stream synchronizations, and expanded tests for batch behaviors and validation.

Changes

Cohort / File(s) Summary
Batch Objective Offsets & Problem API
cpp/include/cuopt/linear_programming/optimization_problem.hpp, cpp/src/pdlp/optimization_problem.cu
Add set_batch_objective_offsets() and const/non-const getters; store batch_objective_offsets_; include in copy/equality/precision-conversion logic.
Solver Settings & Proto
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/src/grpc/cuopt_remote.proto
Add first_primal_feasible and all_primal_feasible, rename sub_batch_sizefixed_batch_size; expose all_primal_feasible in proto.
Batch Entry Points & Batch-size API
cpp/src/pdlp/solve.cuh, cpp/src/pdlp/solve.cu
Add run_batch_pdlp and compute_optimal_batch_size; implement fixed-path (pre-expanded per-climber fields) vs splitting/strong-branching path; centralize memory estimation and batch-size selection; validate fixed-batch sizing and batch_objective_offsets.
PDLP Core: lifecycle, transpose, swap/resize
cpp/src/pdlp/pdlp.cu, cpp/src/pdlp/pdlp.cuh, cpp/src/pdlp/pdhg.cu
Introduce original_batch_size_, batch snapshot/finalize helpers, transpose_problem_fields(to_row), extend swap/resize to handle per-climber objective/constraint arrays, and adapt PDHG kernels/indexing for per-climber layouts.
Convergence, Restart & Termination
cpp/src/pdlp/termination_strategy/*.cu, cpp/src/pdlp/termination_strategy/*.hpp, cpp/src/pdlp/restart_strategy/*
Convert convergence stats from scalars → per-climber device_uvector/span (L2/L∞, offsets); add getters; update restart views; add accept_primal_feasible gating, any_primal_feasible_or_optimal(), and per-climber GPU stats snapshot/finalize paths.
Initial Scaling & Rescaling Contexts
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu, cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuh
Switch bound/objective rescaling from scalars → per-batch vectors; add original_batch_size ctor param; add get_*_rescaling_vector(), swap_context(), resize_context(), and device/host swap kernels.
Problem Validation & Helpers
cpp/src/mip_heuristics/problem/problem.cu, cpp/src/mip_heuristics/problem/problem_helpers.cuh, cpp/src/pdlp/utils.cuh
Relax sizing checks to allow batch-expanded vectors (divisibility), fix variable_types allocation to use n_variables, refactor combine_constraint_bounds/compute_sum_bounds to avoid scalar readbacks and support per-climber reductions.
PDLP Numerical Kernels & Utils
cpp/src/pdlp/*.cu, cpp/src/pdlp/*.cuh (multiple files)
Update many kernels/functors to accept flags for per-climber objectives/constraints and index accordingly; replace CUB segmented reduces with segmented_sum_handler; add stream synchronizations after device casts/sets.
GPU Syncs & Sort Utilities
cpp/src/mip_heuristics/utilities/sort_csr.cuh, cpp/src/mip_heuristics/problem/problem.cu, various PDLP files
Add explicit stream synchronizations after key device copies/casts/sorts to ensure ordering when interleaving device operations.
Warm-start / Tolerances
cpp/src/branch_and_bound/pseudo_costs.cpp
Relax PDLP warm-start tolerance from 1e-5 → 1e-4 used for warm-start tolerances.
Tests & Test Utilities
cpp/tests/linear_programming/pdlp_test.cu, cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh
Add extensive batch/fixed-batch and early-termination tests, new helpers to run fixed/pre-expanded batch solves, adjust tolerances and residual accessors, and add negative validation tests for inconsistent per-climber expansion.
Misc. API/Instantiations
cpp/src/pdlp/solve.cu (instantiations), cpp/src/pdlp/solve.cuh
Template-instantiation and forward-declaration updates for new batch helpers; minor signature changes to support per-climber data views and batch helpers.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.42% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Extend batch pdlp support' clearly and concisely summarizes the main change: extending the capabilities of batch PDLP solver functionality.
Description check ✅ Passed The description is detailed and directly related to the changeset, explaining multiple new per-climber capabilities including different constraints, objectives, offsets, variable bounds, residual support, and stopping behaviors.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch extend_batch_pdlp_support

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu (1)

116-157: ⚠️ Potential issue | 🟠 Major

Don't derive batch-wide rescaling from only the first climber.

bound_rescaling_ and objective_rescaling_ are still single scalars, but this change now computes them from only n_constraints / n_variables starting at offset 0. In batch mode with per-climber bounds or objective coefficients, later climbers are ignored here and then rescaled anyway in Lines 515-559, so the scaling becomes batch-order-dependent and can badly mis-scale heterogeneous batches.

Please either compute these scalars from all climbers (for example, a max over per-climber norms/sums) or make the rescaling itself per-climber. As per coding guidelines, flag algorithm correctness errors including incorrect constraint/objective computation, and numerical stability issues.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu` around lines 116 -
157, The batch-aware scaling is wrong because bound_rescaling_ and
objective_rescaling_ are computed only over the first climber's
n_constraints/n_variables (using TransformReduce and detail::my_l2_weighted_norm
on op_problem_scaled_ starting at offset 0), making scaling batch-order
dependent; fix by aggregating across all climbers instead of just the first:
compute per-climber rescalings (e.g., run the TransformReduce /
my_l2_weighted_norm for each climber or compute per-climber norms and take a
batch-wise aggregator such as max across climbers) and then either store
per-climber rescaling factors or reduce them into a single safe scalar (e.g.,
max of per-climber values) before calling bound_rescaling_.set_value_async and
setting objective_rescaling_; update uses of bound_rescaling_ and
objective_rescaling_ to apply per-climber values if you choose per-climber
rescaling.
🧹 Nitpick comments (2)
run_obbt.sh (1)

2-2: Update line 2 to match repository convention: set -euo pipefail instead of set -e.

The repository standardizes on set -euo pipefail across 35+ shell scripts. This script should be updated to include both -u (to catch unset variables) and -o pipefail (to propagate pipe failures).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@run_obbt.sh` at line 2, Update the shell options to match repo convention by
replacing the current "set -e" invocation with the stricter form that enables
unset-variable checks and pipe-failure propagation; specifically change the
shell options used in run_obbt.sh (the existing set -e line) to use set -euo
pipefail so the script fails on errors, missing variables, and failed pipeline
components.
cpp/src/pdlp/pdlp.cu (1)

2113-2148: Avoid repeated device allocations in transpose_problem_fields hot path.

At Line 2115, each field transpose allocates a new rmm::device_uvector. This function is called repeatedly in the main solve loop, so this adds avoidable allocation churn.

Refactor sketch (single scratch buffer per call)
 void pdlp_solver_t<i_t, f_t>::transpose_problem_fields(bool to_row)
 {
+  const size_t max_field_size = std::max(
+    {op_problem_scaled_.objective_coefficients.size(),
+     op_problem_scaled_.constraint_lower_bounds.size(),
+     op_problem_scaled_.constraint_upper_bounds.size()});
+  rmm::device_uvector<f_t> transposed(max_field_size, stream_view_);
+
   auto transpose_field = [&](rmm::device_uvector<f_t>& field, i_t rows) {
     if (field.size() <= static_cast<size_t>(rows)) return;
-    rmm::device_uvector<f_t> transposed(field.size(), stream_view_);
     if (to_row) {
       ...
-      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
+      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
     } else {
       ...
-      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
+      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
     }
     raft::copy(field.data(), transposed.data(), field.size(), stream_view_);
   };

As per coding guidelines, "Flag excessive allocations in hot paths; prefer pooled RMM resources."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 2113 - 2148, The lambda transpose_field
currently allocates a fresh rmm::device_uvector<f_t> named transposed on every
call, causing hot-path allocation churn; move the allocation out of the lambda
in transpose_problem_fields and provide a single scratch
rmm::device_uvector<f_t> that is reused for each field (resize or reserve it if
needed), then have transpose_field capture or accept a reference to that
preallocated buffer and write into it before calling raft::copy; keep existing
cublasGeam calls and scalar buffers (reusable_device_scalar_value_1_,
reusable_device_scalar_value_0_) but remove per-call new allocations to use the
pooled scratch buffer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/grpc/cuopt_remote.proto`:
- Line 125: The new protobuf field all_primal_feasible must be wired through the
gRPC settings mapper: update cpp/src/grpc/grpc_settings_mapper.cpp to map the
proto field to the corresponding solver setting when translating incoming
requests (alongside the existing first_primal_feasible mapping) and include it
in the reverse mapping for responses; ensure you reference the proto name
all_primal_feasible and the internal setting used in solver_settings.hpp so the
mutual-exclusion validation path that currently checks first_primal_feasible is
extended/kept in sync to treat all_primal_feasible as mutually exclusive with
first_primal_feasible during validation.

In `@cpp/src/mip_heuristics/problem/problem.cu`:
- Around line 571-572: The modulo assertions currently divide by zero for empty
problems; update the checks so the modulus is only computed when n_variables > 0
(e.g., wrap the objective_coefficients.size() % n_variables assertion in a guard
that first asserts or branches on n_variables > 0), remove the redundant modulo
check against variable_bounds.size() once you already require
variable_bounds.size() == n_variables, and ensure that an empty
objective_coefficients vector cannot silently pass validation for a non-empty
problem by explicitly asserting consistency between
objective_coefficients.empty() and n_variables == 0 (or, when n_variables > 0,
that objective_coefficients.size() is a multiple of n_variables). Reference the
symbols objective_coefficients, variable_bounds, n_variables, and the
cuopt_assert calls to locate and fix the checks.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 878-885: The check for first_primal_feasible is using
current_termination_strategy_.any_done(/*accept_primal_feasible=*/true) which
treats non-primal-feasible terminal states (PrimalInfeasible, DualInfeasible,
ConcurrentLimit) as done; change the condition to explicitly detect a
PrimalFeasible termination only (e.g., iterate
current_termination_strategy_.get_terminations_status() and test for status ==
TerminationStatus::PrimalFeasible or add/use a helper like
any_primal_feasible()) so snapshot_climber_into_return(i) and
finalize_batch_return() only run when a climber reached PrimalFeasible, not any
terminal state.
- Around line 813-816: convert_gpu_terms_stats_to_host currently copies GPU
termination arrays using the loop index i as the source, but
fill_gpu_terms_stats_kernel wrote into GPU buffers at positions given by
original_indices (original_index_); update convert_gpu_terms_stats_to_host to
remap reads by using original_index_[i] as the source index when copying each
GPU stat array into the host AdditionalTerminationInformations (the same
destination used in
batch_solution_to_return_.get_additional_termination_informations()), so host
fields receive values from the GPU positions written by
fill_gpu_terms_stats_kernel.

In `@cpp/src/pdlp/solve.cu`:
- Around line 1149-1153: The call to solve_lp for the sub-batch is incorrectly
passing is_batch_mode=false which bypasses batch-only guards in run_pdlp; update
the invocation in solve.cu (the solve_lp(...) call) to pass
/*is_batch_mode=*/true when batch_settings.new_bounds contains multiple climbers
(i.e., when dispatching a batched PDLP split), so the batch-only validation and
execution paths in run_pdlp() are exercised for float/single/mixed-precision
checks.
- Around line 997-1008: run_batch_pdlp_fixed currently bypasses problem checking
and can underflow device reads if per-climber arrays were partially expanded;
before calling solve_lp(...) add explicit validation that any caller-expanded
per-climber buffers (e.g., objective_coefficients, constraint bound arrays in
the problem) have lengths >= fixed_batch_size * n (compute n from
problem.dimensions) and match expected per-climber sizing derived from
pdlp_solver_settings_t and apply_batch_settings_overrides; on violation call
cuopt_expects(..., error_type_t::ValidationError, "...") to fail-fast. Ensure
these checks live in run_batch_pdlp_fixed just after
apply_batch_settings_overrides and before the solve_lp call so the fixed path
never skips validation when buffers are not fully expanded.

In `@cpp/src/pdlp/termination_strategy/convergence_information.cu`:
- Around line 346-391: The setters set_relative_dual_tolerance_factor and
set_relative_primal_tolerance_factor incorrectly overwrite the cached L2 norm
vectors (l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_)
which are later used as denominators in the relative residual computations;
instead add and store separate tolerance-factor members (e.g.
relative_dual_tolerance_factor_ and relative_primal_tolerance_factor_, as host
scalars or device scalars consistent with existing usage) and update the setters
to write those members, leaving l2_norm_primal_linear_objective_ and
l2_norm_primal_right_hand_side_ unchanged; adjust any getters or
relative-residual computation code (referencing get_relative_l2_*_residual_value
or other callers) to read the new tolerance-factor members when applying
scaling.
- Around line 470-489: The relative_residual_t functor used in the
segmented_reduce_helper is constructed without the absolute_primal_tolerance and
the nb_violated_constraints_ counter, so when save_best_primal_so_far is enabled
the counter remains zero for every iteration; update the construction of
relative_residual_t in this block (the transform_iter passed to
segmented_reduce_helper) to pass settings.tolerances.absolute_primal_tolerance
and nb_violated_constraints_.data() (or a device pointer/wrapper) so the functor
can increment the violated-row counter, ensuring to_primal_quality_adapter sees
correct violated-constraint counts; keep the existing zeroing of
nb_violated_constraints_ when save_best_primal_so_far is true.

In `@run_obbt.sh`:
- Around line 31-36: The failure counter increment uses "((failed++))" which can
return a nonzero exit status under set -e and abort the script; replace that
with a safe increment such as "failed=$((failed+1))" (or "failed=$((failed +
1))") inside the for loop that iterates "for pid in \"${pids[@]}\"; do" so the
script continues waiting on remaining PIDs and correctly reports the aggregate
failure count.
- Around line 4-8: The cleanup() function currently kills processes via kill --
-$$ which assumes the shell PID equals the process-group ID; replace this with
signaling the tracked child PIDs (the pids array used when launching batches) or
ensure each batch is started in its own process group/session and record those
PGIDs, then iterate over "${pids[@]}" (or recorded PGIDs) to send termination
signals and wait; update cleanup() to use the pids variable (or recorded PGIDs)
instead of kill -- -$$ and keep the wait 2>/dev/null behavior for cleanup
confirmation.

---

Outside diff comments:
In `@cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu`:
- Around line 116-157: The batch-aware scaling is wrong because bound_rescaling_
and objective_rescaling_ are computed only over the first climber's
n_constraints/n_variables (using TransformReduce and detail::my_l2_weighted_norm
on op_problem_scaled_ starting at offset 0), making scaling batch-order
dependent; fix by aggregating across all climbers instead of just the first:
compute per-climber rescalings (e.g., run the TransformReduce /
my_l2_weighted_norm for each climber or compute per-climber norms and take a
batch-wise aggregator such as max across climbers) and then either store
per-climber rescaling factors or reduce them into a single safe scalar (e.g.,
max of per-climber values) before calling bound_rescaling_.set_value_async and
setting objective_rescaling_; update uses of bound_rescaling_ and
objective_rescaling_ to apply per-climber values if you choose per-climber
rescaling.

---

Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2113-2148: The lambda transpose_field currently allocates a fresh
rmm::device_uvector<f_t> named transposed on every call, causing hot-path
allocation churn; move the allocation out of the lambda in
transpose_problem_fields and provide a single scratch rmm::device_uvector<f_t>
that is reused for each field (resize or reserve it if needed), then have
transpose_field capture or accept a reference to that preallocated buffer and
write into it before calling raft::copy; keep existing cublasGeam calls and
scalar buffers (reusable_device_scalar_value_1_,
reusable_device_scalar_value_0_) but remove per-call new allocations to use the
pooled scratch buffer.

In `@run_obbt.sh`:
- Line 2: Update the shell options to match repo convention by replacing the
current "set -e" invocation with the stricter form that enables unset-variable
checks and pipe-failure propagation; specifically change the shell options used
in run_obbt.sh (the existing set -e line) to use set -euo pipefail so the script
fails on errors, missing variables, and failed pipeline components.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6dd9d46b-dba3-4234-a599-4f7dbf9b77cb

📥 Commits

Reviewing files that changed from the base of the PR and between fd40c9a and 72d7664.

📒 Files selected for processing (23)
  • cpp/include/cuopt/linear_programming/optimization_problem.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/grpc/cuopt_remote.proto
  • cpp/src/mip_heuristics/problem/problem.cu
  • cpp/src/mip_heuristics/problem/problem_helpers.cuh
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/pdlp/optimization_problem.cu
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp.cuh
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cuh
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solve.cuh
  • cpp/src/pdlp/termination_strategy/convergence_information.cu
  • cpp/src/pdlp/termination_strategy/convergence_information.hpp
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.hpp
  • cpp/src/pdlp/utils.cuh
  • cpp/tests/linear_programming/pdlp_test.cu
  • cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh
  • run_obbt.sh

Comment thread cpp/src/grpc/cuopt_remote.proto Outdated
bool save_best_primal_so_far = 28;
bool first_primal_feasible = 29;
int32 pdlp_precision = 30;
bool all_primal_feasible = 31;
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Wire the new proto field through the gRPC settings mapper.

Adding the field here is not enough yet: the provided snippets for cpp/src/grpc/grpc_settings_mapper.cpp only map first_primal_feasible, so all_primal_feasible will be ignored on requests and omitted on responses.

Suggested follow-up in cpp/src/grpc/grpc_settings_mapper.cpp
 pb_settings->set_save_best_primal_so_far(settings.save_best_primal_so_far);
 pb_settings->set_first_primal_feasible(settings.first_primal_feasible);
+pb_settings->set_all_primal_feasible(settings.all_primal_feasible);
 settings.save_best_primal_so_far = pb_settings.save_best_primal_so_far();
 settings.first_primal_feasible   = pb_settings.first_primal_feasible();
+settings.all_primal_feasible     = pb_settings.all_primal_feasible();

Given solver_settings.hpp also marks this as mutually exclusive with first_primal_feasible, keep the existing validation path in sync too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/grpc/cuopt_remote.proto` at line 125, The new protobuf field
all_primal_feasible must be wired through the gRPC settings mapper: update
cpp/src/grpc/grpc_settings_mapper.cpp to map the proto field to the
corresponding solver setting when translating incoming requests (alongside the
existing first_primal_feasible mapping) and include it in the reverse mapping
for responses; ensure you reference the proto name all_primal_feasible and the
internal setting used in solver_settings.hpp so the mutual-exclusion validation
path that currently checks first_primal_feasible is extended/kept in sync to
treat all_primal_feasible as mutually exclusive with first_primal_feasible
during validation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmckayus I would like your opinion here please

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Kh4ster. I took a closer look, and currently we are not supporting batch PDLP for gRPC, only single problems. If/when we decided to support batch PDLP on grpc, we'll add it all together (there are other settings we aren't passing, and we need a message scheme to handle the multiple problems and results, etc).

So, I removed the new proto mapping and added a comment.

Comment thread cpp/src/mip_heuristics/problem/problem.cu Outdated
Comment thread cpp/src/pdlp/pdlp.cu
Comment thread cpp/src/pdlp/pdlp.cu
Comment thread cpp/src/pdlp/solve.cu
Comment thread cpp/src/pdlp/solve.cu Outdated
Comment on lines 346 to 391
void convergence_information_t<i_t, f_t>::set_relative_dual_tolerance_factor(
f_t dual_tolerance_factor)
{
l2_norm_primal_linear_objective_.set_value_async(dual_tolerance_factor, stream_view_);
cub::DeviceTransform::Transform(thrust::make_constant_iterator(dual_tolerance_factor),
l2_norm_primal_linear_objective_.data(),
l2_norm_primal_linear_objective_.size(),
cuda::std::identity{},
stream_view_);
}

template <typename i_t, typename f_t>
void convergence_information_t<i_t, f_t>::set_relative_primal_tolerance_factor(
f_t primal_tolerance_factor)
{
l2_norm_primal_right_hand_side_.set_value_async(primal_tolerance_factor, stream_view_);
cub::DeviceTransform::Transform(thrust::make_constant_iterator(primal_tolerance_factor),
l2_norm_primal_right_hand_side_.data(),
l2_norm_primal_right_hand_side_.size(),
cuda::std::identity{},
stream_view_);
}

template <typename i_t, typename f_t>
f_t convergence_information_t<i_t, f_t>::get_relative_dual_tolerance_factor() const
{
return l2_norm_primal_linear_objective_.value(stream_view_);
return l2_norm_primal_linear_objective_.element(0, stream_view_);
}

template <typename i_t, typename f_t>
f_t convergence_information_t<i_t, f_t>::get_relative_primal_tolerance_factor() const
{
return l2_norm_primal_right_hand_side_.value(stream_view_);
return l2_norm_primal_right_hand_side_.element(0, stream_view_);
}

template <typename i_t, typename f_t>
const rmm::device_uvector<f_t>&
convergence_information_t<i_t, f_t>::get_l2_norm_primal_linear_objective() const
{
return l2_norm_primal_linear_objective_;
}

template <typename i_t, typename f_t>
const rmm::device_uvector<f_t>&
convergence_information_t<i_t, f_t>::get_l2_norm_primal_right_hand_side() const
{
return l2_norm_primal_right_hand_side_;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep tolerance factors separate from the cached L2 norms.

set_relative_dual_tolerance_factor() and set_relative_primal_tolerance_factor() now overwrite l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_. Those same vectors are later used as the denominators in get_relative_l2_*_residual_value(), so any caller of these setters silently changes convergence scaling for every climber instead of just storing a factor.

As per coding guidelines, flag algorithm correctness errors including incorrect constraint/objective computation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/termination_strategy/convergence_information.cu` around lines
346 - 391, The setters set_relative_dual_tolerance_factor and
set_relative_primal_tolerance_factor incorrectly overwrite the cached L2 norm
vectors (l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_)
which are later used as denominators in the relative residual computations;
instead add and store separate tolerance-factor members (e.g.
relative_dual_tolerance_factor_ and relative_primal_tolerance_factor_, as host
scalars or device scalars consistent with existing usage) and update the setters
to write those members, leaving l2_norm_primal_linear_objective_ and
l2_norm_primal_right_hand_side_ unchanged; adjust any getters or
relative-residual computation code (referencing get_relative_l2_*_residual_value
or other callers) to read the new tolerance-factor members when applying
scaling.

Comment thread cpp/src/pdlp/termination_strategy/convergence_information.cu
Comment thread run_obbt.sh Outdated
Comment thread run_obbt.sh Outdated
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 28, 2026

/ok to test

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 28, 2026

/ok to test

@Kh4ster, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 28, 2026

/ok to test a7944c8

Copy link
Copy Markdown
Contributor

@akifcorduk akifcorduk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work Nicolas!

In your tests, can you add more tests with randomized bounds. You do a lot of index indirection and we might be missing a bug that we are always checking for example the first batch's bounds. It would be nice to have a test with randomized bounds, objectives etc. and at the end the result of each batch is the same as single run of pdlp.

Comment thread cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu Outdated
termination_status[idx] = (i_t)pdlp_termination_status_t::PrimalFeasible;
return;
} else {
termination_status[idx] = (i_t)pdlp_termination_status_t::NoTermination;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this now necessary?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using per_constraint_residual and neither first_primal_feasible nor all_primal_feasible, an instance may become PrimalFeasible but then lose it, so we have to set it back to NoTermination. This is not the case for Optimal because we will always remove a climber (and end in the non-batch case) if it found optimal.
I also have the same in the non per_constraint_residual path

Comment thread cpp/src/pdlp/pdhg.cu
// Swap per-climber scaled problem fields (objectives, constraint bounds) — all in COL-major
// during the convergence block when swap_context is invoked.
if (problem_ptr->objective_coefficients.size() > static_cast<size_t>(primal_size_h_)) {
matrix_swap(problem_ptr->objective_coefficients, primal_size_h_, swap_pairs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if i understand the matrix_swap logic

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a climber is removed from the batch, all its associated data needs to be removed so that the rest of the solve can continue. For many of those fields, this represented a list of vectors stacked next to each other, aka a dense matrix. In that case I need to identify all the slices I need to move around in the matrix and swap them accordingly.
We can now have a matrix of objective offset, so if the associated climber is removed, we need to remove this slice. That being said, we can be in batch mode without having a matrix of objective_coefficients, hence the if.

Comment thread cpp/src/pdlp/pdhg.cu
cuda::std::min<f_t>(tmp, -constraint_lower_bounds[bound_idx]));
const f_t next_dual = (tmp - tmp_proj) * step_size;

potential_next_dual[idx] = next_dual;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is dual indexing not the same way as constraint_bound indexing? Also, we can just use idx in constraint_upper_bounds. For single climber, it will be equal to constraint_idx anyway which is equal to idx. For batched case, it is also equal to idx all the time. So no need for extra bound_idx

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have a batch case where all climbers have the same constraint_lower_bounds. Then we have to use constraint_idx and not global idx.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, thanks!

Comment thread cpp/src/pdlp/pdhg.cu
cuopt_assert(step_size > f_t(0.0), "dual_step_size must be > 0");

const f_t tmp = current_dual / step_size - dual_gradient[idx];
const int bound_idx = per_climber_constraints ? idx : constraint_idx;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark as above

Comment thread cpp/src/pdlp/pdlp.cu Outdated
auto transpose_field = [&](rmm::device_uvector<f_t>& field, i_t rows) {
if (field.size() <= static_cast<size_t>(rows)) return;
rmm::device_uvector<f_t> transposed(field.size(), stream_view_);
if (to_row) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing two branches, you can write a single transpose with just a pointer swap between climber_strategies_ and rows

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, changed

Comment thread cpp/src/pdlp/solve.cu
{
// Hyper parameter than can be changed, I have put what I believe to be the best
constexpr int batch_iteration_limit = 100000;
constexpr f_t pdlp_tolerance = 1e-4;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a fixed precision(1e-4) for batched pdlp ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and it's intended to have a good warm start and fast heuristics for strong branching but you are right, I should have a clean API to handle all of that properly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need 1e-6 abs and 1e-12 relative tolerance in MIP.

Comment thread cpp/src/pdlp/solve.cu
}
// Step size doesn't change anyways, just to save the compute
if (original_settings.get_initial_step_size().has_value()) {
batch_settings.set_initial_step_size(original_settings.get_initial_step_size().value());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we set different initial steap size and weights for different climbers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a research question. For now we use the same warm information across climbers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MIP use cases, it might make sense to pass different warm starts, but let's keep it for another PR.

Comment thread cpp/src/pdlp/solve.cu
// Since we decerement iteratively, we don't want to use std::numeric_limits<size_t>::max()
// Even if 20K fits in memory it will never be an optimal batch size, it's just to have a
// reasonable upper bound
constexpr size_t max_batch_size = 20000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put that into a global config or some user facing API ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have to think about that

settings.dual_postsolve = false;
for (auto [presolver, epsilon] :
{std::pair{presolver_t::Papilo, 1e-1}, std::pair{presolver_t::None, 1e-6}}) {
{std::pair{presolver_t::Papilo, 1e-1}, std::pair{presolver_t::None, 1e-4}}) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why tolerance change?

… and nothing else. Correctly rejects save_best_primal_so_far and batch mode, check sizes in the expanded problem case.
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 29, 2026

/ok to test b0a39d7

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 29, 2026

/ok to test a077f43

@Kh4ster Kh4ster requested a review from a team as a code owner April 29, 2026 16:24
@Kh4ster Kh4ster requested a review from AyodeAwe April 29, 2026 16:24
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 29, 2026

/ok to test af4f5fb

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 29, 2026

/ok to test b747ebe

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 29, 2026

/ok to test 31deda0

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Apr 29, 2026

/ok to test efa4b18

If/when we add batch PDLP to grpc, we'll include all of the
relevant parameters including this one. For now, grpc handles
only single LP/MIP problems, there is no batch capability.
Added a note to the .proto file explaining why batch parameters
are missing.
@tmckayus
Copy link
Copy Markdown
Contributor

/ok to test 86d1c18

@anandhkb anandhkb added this to the 26.06 milestone May 5, 2026
@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented May 11, 2026

/ok to test 35c3c61

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented May 11, 2026

/ok to test 6b1bb48

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented May 11, 2026

/ok to test 9bfa8c9

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented May 11, 2026

/merge

@rapids-bot rapids-bot Bot merged commit 322740a into main May 11, 2026
307 of 312 checks passed
@Kh4ster Kh4ster deleted the extend_batch_pdlp_support branch May 11, 2026 14:25
chris-maes pushed a commit to chris-maes/cuopt that referenced this pull request May 18, 2026
This PR greatly extends the capabilities of batch PDLP. Former batch PDLP only supported having a single variable bounds being different per climber. It now supports:
- Different constraints lower and upper bounds per climber
- Different objective coefficients per climber
- Different objective offset per climber
- More than one variable bound difference per climber

This PR also adds the support of per climber residual and first primal feasible to the Stable3 PDLP solver mode and its batch version. It allows to solve a batch of problems and stop once one or all the climbers have reached primal feasibility.

All those combinations can be put together, resulting in a potential:

Solve a batch of LPs all having different: constraints lower and upper bounds, objective coefficients, objective offset, variable bounds, using per constraint residual instead of the L2 norm and stopping once one, or all, climbers have reached primal feasibility.

Authors:
  - Nicolas Blin (https://github.com/Kh4ster)
  - Trevor McKay (https://github.com/tmckayus)

Approvers:
  - Akif ÇÖRDÜK (https://github.com/akifcorduk)
  - Trevor McKay (https://github.com/tmckayus)

URL: NVIDIA#1152
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants