Extend batch pdlp support by Kh4ster · Pull Request #1152 · NVIDIA/cuopt

Kh4ster · 2026-04-28T08:48:05Z

This PR greatly extends the capabilities of batch PDLP. Former batch PDLP only supported having a single variable bounds being different per climber. It now supports:

Different constraints lower and upper bounds per climber
Different objective coefficients per climber
Different objective offset per climber
More than one variable bound difference per climber

This PR also adds the support of per climber residual and first primal feasible to the Stable3 PDLP solver mode and its batch version. It allows to solve a batch of problems and stop once one or all the climbers have reached primal feasibility.

All those combinations can be put together, resulting in a potential:

Solve a batch of LPs all having different: constraints lower and upper bounds, objective coefficients, objective offset, variable bounds, using per constraint residual instead of the L2 norm and stopping once one, or all, climbers have reached primal feasibility.

…ve offset, new api to support first querying the size before expanding the problem

…primal feasible, in both stable3 mode and batch mode. it can work together along with potential problem modifications

copy-pr-bot · 2026-04-28T08:48:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Kh4ster · 2026-04-28T08:53:20Z

/ok to test

copy-pr-bot · 2026-04-28T08:53:23Z

/ok to test

@Kh4ster, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

coderabbitai · 2026-04-28T09:00:26Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds fixed and auto batch PDLP execution paths with per-climber objective offsets, per-climber rescaling and convergence statistics, new solver settings (first/all_primal_feasible, fixed_batch_size), batch-safe swap/resize/transpose, many GPU stream synchronizations, and expanded tests for batch behaviors and validation.

Changes

Cohort / File(s)	Summary
Batch Objective Offsets & Problem API `cpp/include/cuopt/linear_programming/optimization_problem.hpp`, `cpp/src/pdlp/optimization_problem.cu`	Add `set_batch_objective_offsets()` and const/non-const getters; store `batch_objective_offsets_`; include in copy/equality/precision-conversion logic.
Solver Settings & Proto `cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`, `cpp/src/grpc/cuopt_remote.proto`	Add `first_primal_feasible` and `all_primal_feasible`, rename `sub_batch_size` → `fixed_batch_size`; expose `all_primal_feasible` in proto.
Batch Entry Points & Batch-size API `cpp/src/pdlp/solve.cuh`, `cpp/src/pdlp/solve.cu`	Add `run_batch_pdlp` and `compute_optimal_batch_size`; implement fixed-path (pre-expanded per-climber fields) vs splitting/strong-branching path; centralize memory estimation and batch-size selection; validate fixed-batch sizing and `batch_objective_offsets`.
PDLP Core: lifecycle, transpose, swap/resize `cpp/src/pdlp/pdlp.cu`, `cpp/src/pdlp/pdlp.cuh`, `cpp/src/pdlp/pdhg.cu`	Introduce `original_batch_size_`, batch snapshot/finalize helpers, `transpose_problem_fields(to_row)`, extend swap/resize to handle per-climber objective/constraint arrays, and adapt PDHG kernels/indexing for per-climber layouts.
Convergence, Restart & Termination `cpp/src/pdlp/termination_strategy/.cu`, `cpp/src/pdlp/termination_strategy/.hpp`, `cpp/src/pdlp/restart_strategy/*`	Convert convergence stats from scalars → per-climber device_uvector/span (L2/L∞, offsets); add getters; update restart views; add `accept_primal_feasible` gating, `any_primal_feasible_or_optimal()`, and per-climber GPU stats snapshot/finalize paths.
Initial Scaling & Rescaling Contexts `cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu`, `cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuh`	Switch bound/objective rescaling from scalars → per-batch vectors; add `original_batch_size` ctor param; add `get_*_rescaling_vector()`, `swap_context()`, `resize_context()`, and device/host swap kernels.
Problem Validation & Helpers `cpp/src/mip_heuristics/problem/problem.cu`, `cpp/src/mip_heuristics/problem/problem_helpers.cuh`, `cpp/src/pdlp/utils.cuh`	Relax sizing checks to allow batch-expanded vectors (divisibility), fix `variable_types` allocation to use `n_variables`, refactor `combine_constraint_bounds`/`compute_sum_bounds` to avoid scalar readbacks and support per-climber reductions.
PDLP Numerical Kernels & Utils `cpp/src/pdlp/.cu`, `cpp/src/pdlp/.cuh` (multiple files)	Update many kernels/functors to accept flags for per-climber objectives/constraints and index accordingly; replace CUB segmented reduces with segmented_sum_handler; add stream synchronizations after device casts/sets.
GPU Syncs & Sort Utilities `cpp/src/mip_heuristics/utilities/sort_csr.cuh`, `cpp/src/mip_heuristics/problem/problem.cu`, various PDLP files	Add explicit stream synchronizations after key device copies/casts/sorts to ensure ordering when interleaving device operations.
Warm-start / Tolerances `cpp/src/branch_and_bound/pseudo_costs.cpp`	Relax PDLP warm-start tolerance from 1e-5 → 1e-4 used for warm-start tolerances.
Tests & Test Utilities `cpp/tests/linear_programming/pdlp_test.cu`, `cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh`	Add extensive batch/fixed-batch and early-termination tests, new helpers to run fixed/pre-expanded batch solves, adjust tolerances and residual accessors, and add negative validation tests for inconsistent per-climber expansion.
Misc. API/Instantiations `cpp/src/pdlp/solve.cu` (instantiations), `cpp/src/pdlp/solve.cuh`	Template-instantiation and forward-declaration updates for new batch helpers; minor signature changes to support per-climber data views and batch helpers.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 3.42% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Extend batch pdlp support' clearly and concisely summarizes the main change: extending the capabilities of batch PDLP solver functionality.
Description check	✅ Passed	The description is detailed and directly related to the changeset, explaining multiple new per-climber capabilities including different constraints, objectives, offsets, variable bounds, residual support, and stopping behaviors.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch extend_batch_pdlp_support

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu (1)
116-157: ⚠️ Potential issue | 🟠 Major

Don't derive batch-wide rescaling from only the first climber.

bound_rescaling_ and objective_rescaling_ are still single scalars, but this change now computes them from only n_constraints / n_variables starting at offset 0. In batch mode with per-climber bounds or objective coefficients, later climbers are ignored here and then rescaled anyway in Lines 515-559, so the scaling becomes batch-order-dependent and can badly mis-scale heterogeneous batches.

Please either compute these scalars from all climbers (for example, a max over per-climber norms/sums) or make the rescaling itself per-climber. As per coding guidelines, flag algorithm correctness errors including incorrect constraint/objective computation, and numerical stability issues.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu` around lines 116 -
157, The batch-aware scaling is wrong because bound_rescaling_ and
objective_rescaling_ are computed only over the first climber's
n_constraints/n_variables (using TransformReduce and detail::my_l2_weighted_norm
on op_problem_scaled_ starting at offset 0), making scaling batch-order
dependent; fix by aggregating across all climbers instead of just the first:
compute per-climber rescalings (e.g., run the TransformReduce /
my_l2_weighted_norm for each climber or compute per-climber norms and take a
batch-wise aggregator such as max across climbers) and then either store
per-climber rescaling factors or reduce them into a single safe scalar (e.g.,
max of per-climber values) before calling bound_rescaling_.set_value_async and
setting objective_rescaling_; update uses of bound_rescaling_ and
objective_rescaling_ to apply per-climber values if you choose per-climber
rescaling.

🧹 Nitpick comments (2)

run_obbt.sh (1)

2-2: Update line 2 to match repository convention: set -euo pipefail instead of set -e.

The repository standardizes on set -euo pipefail across 35+ shell scripts. This script should be updated to include both -u (to catch unset variables) and -o pipefail (to propagate pipe failures).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@run_obbt.sh` at line 2, Update the shell options to match repo convention by
replacing the current "set -e" invocation with the stricter form that enables
unset-variable checks and pipe-failure propagation; specifically change the
shell options used in run_obbt.sh (the existing set -e line) to use set -euo
pipefail so the script fails on errors, missing variables, and failed pipeline
components.

cpp/src/pdlp/pdlp.cu (1)

2113-2148: Avoid repeated device allocations in transpose_problem_fields hot path.

At Line 2115, each field transpose allocates a new rmm::device_uvector. This function is called repeatedly in the main solve loop, so this adds avoidable allocation churn.

Refactor sketch (single scratch buffer per call)

 void pdlp_solver_t<i_t, f_t>::transpose_problem_fields(bool to_row)
 {
+  const size_t max_field_size = std::max(
+    {op_problem_scaled_.objective_coefficients.size(),
+     op_problem_scaled_.constraint_lower_bounds.size(),
+     op_problem_scaled_.constraint_upper_bounds.size()});
+  rmm::device_uvector<f_t> transposed(max_field_size, stream_view_);
+
   auto transpose_field = [&](rmm::device_uvector<f_t>& field, i_t rows) {
     if (field.size() <= static_cast<size_t>(rows)) return;
-    rmm::device_uvector<f_t> transposed(field.size(), stream_view_);
     if (to_row) {
       ...
-      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
+      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
     } else {
       ...
-      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
+      CUBLAS_CHECK(cublasGeam<f_t>(..., transposed.data(), ...));
     }
     raft::copy(field.data(), transposed.data(), field.size(), stream_view_);
   };

As per coding guidelines, "Flag excessive allocations in hot paths; prefer pooled RMM resources."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 2113 - 2148, The lambda transpose_field
currently allocates a fresh rmm::device_uvector<f_t> named transposed on every
call, causing hot-path allocation churn; move the allocation out of the lambda
in transpose_problem_fields and provide a single scratch
rmm::device_uvector<f_t> that is reused for each field (resize or reserve it if
needed), then have transpose_field capture or accept a reference to that
preallocated buffer and write into it before calling raft::copy; keep existing
cublasGeam calls and scalar buffers (reusable_device_scalar_value_1_,
reusable_device_scalar_value_0_) but remove per-call new allocations to use the
pooled scratch buffer.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/grpc/cuopt_remote.proto`:
- Line 125: The new protobuf field all_primal_feasible must be wired through the
gRPC settings mapper: update cpp/src/grpc/grpc_settings_mapper.cpp to map the
proto field to the corresponding solver setting when translating incoming
requests (alongside the existing first_primal_feasible mapping) and include it
in the reverse mapping for responses; ensure you reference the proto name
all_primal_feasible and the internal setting used in solver_settings.hpp so the
mutual-exclusion validation path that currently checks first_primal_feasible is
extended/kept in sync to treat all_primal_feasible as mutually exclusive with
first_primal_feasible during validation.

In `@cpp/src/mip_heuristics/problem/problem.cu`:
- Around line 571-572: The modulo assertions currently divide by zero for empty
problems; update the checks so the modulus is only computed when n_variables > 0
(e.g., wrap the objective_coefficients.size() % n_variables assertion in a guard
that first asserts or branches on n_variables > 0), remove the redundant modulo
check against variable_bounds.size() once you already require
variable_bounds.size() == n_variables, and ensure that an empty
objective_coefficients vector cannot silently pass validation for a non-empty
problem by explicitly asserting consistency between
objective_coefficients.empty() and n_variables == 0 (or, when n_variables > 0,
that objective_coefficients.size() is a multiple of n_variables). Reference the
symbols objective_coefficients, variable_bounds, n_variables, and the
cuopt_assert calls to locate and fix the checks.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 878-885: The check for first_primal_feasible is using
current_termination_strategy_.any_done(/*accept_primal_feasible=*/true) which
treats non-primal-feasible terminal states (PrimalInfeasible, DualInfeasible,
ConcurrentLimit) as done; change the condition to explicitly detect a
PrimalFeasible termination only (e.g., iterate
current_termination_strategy_.get_terminations_status() and test for status ==
TerminationStatus::PrimalFeasible or add/use a helper like
any_primal_feasible()) so snapshot_climber_into_return(i) and
finalize_batch_return() only run when a climber reached PrimalFeasible, not any
terminal state.
- Around line 813-816: convert_gpu_terms_stats_to_host currently copies GPU
termination arrays using the loop index i as the source, but
fill_gpu_terms_stats_kernel wrote into GPU buffers at positions given by
original_indices (original_index_); update convert_gpu_terms_stats_to_host to
remap reads by using original_index_[i] as the source index when copying each
GPU stat array into the host AdditionalTerminationInformations (the same
destination used in
batch_solution_to_return_.get_additional_termination_informations()), so host
fields receive values from the GPU positions written by
fill_gpu_terms_stats_kernel.

In `@cpp/src/pdlp/solve.cu`:
- Around line 1149-1153: The call to solve_lp for the sub-batch is incorrectly
passing is_batch_mode=false which bypasses batch-only guards in run_pdlp; update
the invocation in solve.cu (the solve_lp(...) call) to pass
/*is_batch_mode=*/true when batch_settings.new_bounds contains multiple climbers
(i.e., when dispatching a batched PDLP split), so the batch-only validation and
execution paths in run_pdlp() are exercised for float/single/mixed-precision
checks.
- Around line 997-1008: run_batch_pdlp_fixed currently bypasses problem checking
and can underflow device reads if per-climber arrays were partially expanded;
before calling solve_lp(...) add explicit validation that any caller-expanded
per-climber buffers (e.g., objective_coefficients, constraint bound arrays in
the problem) have lengths >= fixed_batch_size * n (compute n from
problem.dimensions) and match expected per-climber sizing derived from
pdlp_solver_settings_t and apply_batch_settings_overrides; on violation call
cuopt_expects(..., error_type_t::ValidationError, "...") to fail-fast. Ensure
these checks live in run_batch_pdlp_fixed just after
apply_batch_settings_overrides and before the solve_lp call so the fixed path
never skips validation when buffers are not fully expanded.

In `@cpp/src/pdlp/termination_strategy/convergence_information.cu`:
- Around line 346-391: The setters set_relative_dual_tolerance_factor and
set_relative_primal_tolerance_factor incorrectly overwrite the cached L2 norm
vectors (l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_)
which are later used as denominators in the relative residual computations;
instead add and store separate tolerance-factor members (e.g.
relative_dual_tolerance_factor_ and relative_primal_tolerance_factor_, as host
scalars or device scalars consistent with existing usage) and update the setters
to write those members, leaving l2_norm_primal_linear_objective_ and
l2_norm_primal_right_hand_side_ unchanged; adjust any getters or
relative-residual computation code (referencing get_relative_l2_*_residual_value
or other callers) to read the new tolerance-factor members when applying
scaling.
- Around line 470-489: The relative_residual_t functor used in the
segmented_reduce_helper is constructed without the absolute_primal_tolerance and
the nb_violated_constraints_ counter, so when save_best_primal_so_far is enabled
the counter remains zero for every iteration; update the construction of
relative_residual_t in this block (the transform_iter passed to
segmented_reduce_helper) to pass settings.tolerances.absolute_primal_tolerance
and nb_violated_constraints_.data() (or a device pointer/wrapper) so the functor
can increment the violated-row counter, ensuring to_primal_quality_adapter sees
correct violated-constraint counts; keep the existing zeroing of
nb_violated_constraints_ when save_best_primal_so_far is true.

In `@run_obbt.sh`:
- Around line 31-36: The failure counter increment uses "((failed++))" which can
return a nonzero exit status under set -e and abort the script; replace that
with a safe increment such as "failed=$((failed+1))" (or "failed=$((failed +
1))") inside the for loop that iterates "for pid in \"${pids[@]}\"; do" so the
script continues waiting on remaining PIDs and correctly reports the aggregate
failure count.
- Around line 4-8: The cleanup() function currently kills processes via kill --
-$$ which assumes the shell PID equals the process-group ID; replace this with
signaling the tracked child PIDs (the pids array used when launching batches) or
ensure each batch is started in its own process group/session and record those
PGIDs, then iterate over "${pids[@]}" (or recorded PGIDs) to send termination
signals and wait; update cleanup() to use the pids variable (or recorded PGIDs)
instead of kill -- -$$ and keep the wait 2>/dev/null behavior for cleanup
confirmation.

---

Outside diff comments:
In `@cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu`:
- Around line 116-157: The batch-aware scaling is wrong because bound_rescaling_
and objective_rescaling_ are computed only over the first climber's
n_constraints/n_variables (using TransformReduce and detail::my_l2_weighted_norm
on op_problem_scaled_ starting at offset 0), making scaling batch-order
dependent; fix by aggregating across all climbers instead of just the first:
compute per-climber rescalings (e.g., run the TransformReduce /
my_l2_weighted_norm for each climber or compute per-climber norms and take a
batch-wise aggregator such as max across climbers) and then either store
per-climber rescaling factors or reduce them into a single safe scalar (e.g.,
max of per-climber values) before calling bound_rescaling_.set_value_async and
setting objective_rescaling_; update uses of bound_rescaling_ and
objective_rescaling_ to apply per-climber values if you choose per-climber
rescaling.

---

Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2113-2148: The lambda transpose_field currently allocates a fresh
rmm::device_uvector<f_t> named transposed on every call, causing hot-path
allocation churn; move the allocation out of the lambda in
transpose_problem_fields and provide a single scratch rmm::device_uvector<f_t>
that is reused for each field (resize or reserve it if needed), then have
transpose_field capture or accept a reference to that preallocated buffer and
write into it before calling raft::copy; keep existing cublasGeam calls and
scalar buffers (reusable_device_scalar_value_1_,
reusable_device_scalar_value_0_) but remove per-call new allocations to use the
pooled scratch buffer.

In `@run_obbt.sh`:
- Line 2: Update the shell options to match repo convention by replacing the
current "set -e" invocation with the stricter form that enables unset-variable
checks and pipe-failure propagation; specifically change the shell options used
in run_obbt.sh (the existing set -e line) to use set -euo pipefail so the script
fails on errors, missing variables, and failed pipeline components.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6dd9d46b-dba3-4234-a599-4f7dbf9b77cb

📥 Commits

Reviewing files that changed from the base of the PR and between fd40c9a and 72d7664.

📒 Files selected for processing (23)

cpp/include/cuopt/linear_programming/optimization_problem.hpp
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
cpp/src/branch_and_bound/pseudo_costs.cpp
cpp/src/grpc/cuopt_remote.proto
cpp/src/mip_heuristics/problem/problem.cu
cpp/src/mip_heuristics/problem/problem_helpers.cuh
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
cpp/src/pdlp/optimization_problem.cu
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/pdlp.cuh
cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cuh
cpp/src/pdlp/solve.cu
cpp/src/pdlp/solve.cuh
cpp/src/pdlp/termination_strategy/convergence_information.cu
cpp/src/pdlp/termination_strategy/convergence_information.hpp
cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/src/pdlp/termination_strategy/termination_strategy.hpp
cpp/src/pdlp/utils.cuh
cpp/tests/linear_programming/pdlp_test.cu
cpp/tests/linear_programming/utilities/pdlp_test_utilities.cuh
run_obbt.sh

coderabbitai · 2026-04-28T09:00:29Z

  bool save_best_primal_so_far = 28;
  bool first_primal_feasible = 29;
  int32 pdlp_precision = 30;
+  bool all_primal_feasible = 31;


⚠️ Potential issue | 🟠 Major

Wire the new proto field through the gRPC settings mapper.

Adding the field here is not enough yet: the provided snippets for cpp/src/grpc/grpc_settings_mapper.cpp only map first_primal_feasible, so all_primal_feasible will be ignored on requests and omitted on responses.

Suggested follow-up in cpp/src/grpc/grpc_settings_mapper.cpp

pb_settings->set_save_best_primal_so_far(settings.save_best_primal_so_far); pb_settings->set_first_primal_feasible(settings.first_primal_feasible); +pb_settings->set_all_primal_feasible(settings.all_primal_feasible);

settings.save_best_primal_so_far = pb_settings.save_best_primal_so_far(); settings.first_primal_feasible = pb_settings.first_primal_feasible(); +settings.all_primal_feasible = pb_settings.all_primal_feasible();

Given solver_settings.hpp also marks this as mutually exclusive with first_primal_feasible, keep the existing validation path in sync too.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/grpc/cuopt_remote.proto` at line 125, The new protobuf field all_primal_feasible must be wired through the gRPC settings mapper: update cpp/src/grpc/grpc_settings_mapper.cpp to map the proto field to the corresponding solver setting when translating incoming requests (alongside the existing first_primal_feasible mapping) and include it in the reverse mapping for responses; ensure you reference the proto name all_primal_feasible and the internal setting used in solver_settings.hpp so the mutual-exclusion validation path that currently checks first_primal_feasible is extended/kept in sync to treat all_primal_feasible as mutually exclusive with first_primal_feasible during validation.

@tmckayus I would like your opinion here please

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Thanks @Kh4ster. I took a closer look, and currently we are not supporting batch PDLP for gRPC, only single problems. If/when we decided to support batch PDLP on grpc, we'll add it all together (there are other settings we aren't passing, and we need a message scheme to handle the multiple problems and results, etc).

So, I removed the new proto mapping and added a comment.

coderabbitai · 2026-04-28T09:00:30Z

 void convergence_information_t<i_t, f_t>::set_relative_dual_tolerance_factor(
  f_t dual_tolerance_factor)
 {
-  l2_norm_primal_linear_objective_.set_value_async(dual_tolerance_factor, stream_view_);
+  cub::DeviceTransform::Transform(thrust::make_constant_iterator(dual_tolerance_factor),
+                                  l2_norm_primal_linear_objective_.data(),
+                                  l2_norm_primal_linear_objective_.size(),
+                                  cuda::std::identity{},
+                                  stream_view_);
 }

 template <typename i_t, typename f_t>
 void convergence_information_t<i_t, f_t>::set_relative_primal_tolerance_factor(
  f_t primal_tolerance_factor)
 {
-  l2_norm_primal_right_hand_side_.set_value_async(primal_tolerance_factor, stream_view_);
+  cub::DeviceTransform::Transform(thrust::make_constant_iterator(primal_tolerance_factor),
+                                  l2_norm_primal_right_hand_side_.data(),
+                                  l2_norm_primal_right_hand_side_.size(),
+                                  cuda::std::identity{},
+                                  stream_view_);
 }

 template <typename i_t, typename f_t>
 f_t convergence_information_t<i_t, f_t>::get_relative_dual_tolerance_factor() const
 {
-  return l2_norm_primal_linear_objective_.value(stream_view_);
+  return l2_norm_primal_linear_objective_.element(0, stream_view_);
 }

 template <typename i_t, typename f_t>
 f_t convergence_information_t<i_t, f_t>::get_relative_primal_tolerance_factor() const
 {
-  return l2_norm_primal_right_hand_side_.value(stream_view_);
+  return l2_norm_primal_right_hand_side_.element(0, stream_view_);
+}
+
+template <typename i_t, typename f_t>
+const rmm::device_uvector<f_t>&
+convergence_information_t<i_t, f_t>::get_l2_norm_primal_linear_objective() const
+{
+  return l2_norm_primal_linear_objective_;
+}
+
+template <typename i_t, typename f_t>
+const rmm::device_uvector<f_t>&
+convergence_information_t<i_t, f_t>::get_l2_norm_primal_right_hand_side() const
+{
+  return l2_norm_primal_right_hand_side_;
 }


⚠️ Potential issue | 🟠 Major

Keep tolerance factors separate from the cached L2 norms.

set_relative_dual_tolerance_factor() and set_relative_primal_tolerance_factor() now overwrite l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_. Those same vectors are later used as the denominators in get_relative_l2_*_residual_value(), so any caller of these setters silently changes convergence scaling for every climber instead of just storing a factor.

As per coding guidelines, flag algorithm correctness errors including incorrect constraint/objective computation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/termination_strategy/convergence_information.cu` around lines 346 - 391, The setters set_relative_dual_tolerance_factor and set_relative_primal_tolerance_factor incorrectly overwrite the cached L2 norm vectors (l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_) which are later used as denominators in the relative residual computations; instead add and store separate tolerance-factor members (e.g. relative_dual_tolerance_factor_ and relative_primal_tolerance_factor_, as host scalars or device scalars consistent with existing usage) and update the setters to write those members, leaving l2_norm_primal_linear_objective_ and l2_norm_primal_right_hand_side_ unchanged; adjust any getters or relative-residual computation code (referencing get_relative_l2_*_residual_value or other callers) to read the new tolerance-factor members when applying scaling.

Kh4ster · 2026-04-28T09:06:51Z

/ok to test

copy-pr-bot · 2026-04-28T09:06:56Z

/ok to test

@Kh4ster, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

Kh4ster · 2026-04-28T09:07:30Z

/ok to test a7944c8

akifcorduk

Thanks for the great work Nicolas!

In your tests, can you add more tests with randomized bounds. You do a lot of index indirection and we might be missing a bug that we are always checking for example the first batch's bounds. It would be nice to have a test with randomized bounds, objectives etc. and at the end the result of each batch is the same as single run of pdlp.

akifcorduk · 2026-04-28T11:12:51Z

      termination_status[idx] = (i_t)pdlp_termination_status_t::PrimalFeasible;
      return;
+    } else {
+      termination_status[idx] = (i_t)pdlp_termination_status_t::NoTermination;


Why is this now necessary?

If you are using per_constraint_residual and neither first_primal_feasible nor all_primal_feasible, an instance may become PrimalFeasible but then lose it, so we have to set it back to NoTermination. This is not the case for Optimal because we will always remove a climber (and end in the non-batch case) if it found optimal.
I also have the same in the non per_constraint_residual path

akifcorduk · 2026-04-28T11:14:50Z

+  // Swap per-climber scaled problem fields (objectives, constraint bounds) — all in COL-major
+  // during the convergence block when swap_context is invoked.
+  if (problem_ptr->objective_coefficients.size() > static_cast<size_t>(primal_size_h_)) {
+    matrix_swap(problem_ptr->objective_coefficients, primal_size_h_, swap_pairs);


I am not sure if i understand the matrix_swap logic

When a climber is removed from the batch, all its associated data needs to be removed so that the rest of the solve can continue. For many of those fields, this represented a list of vectors stacked next to each other, aka a dense matrix. In that case I need to identify all the slices I need to move around in the matrix and swap them accordingly.
We can now have a matrix of objective offset, so if the associated climber is removed, we need to remove this slice. That being said, we can be in batch mode without having a matrix of objective_coefficients, hence the if.

akifcorduk · 2026-04-28T11:20:31Z

+                          cuda::std::min<f_t>(tmp, -constraint_lower_bounds[bound_idx]));
    const f_t next_dual = (tmp - tmp_proj) * step_size;

    potential_next_dual[idx] = next_dual;


Why is dual indexing not the same way as constraint_bound indexing? Also, we can just use idx in constraint_upper_bounds. For single climber, it will be equal to constraint_idx anyway which is equal to idx. For batched case, it is also equal to idx all the time. So no need for extra bound_idx

We can have a batch case where all climbers have the same constraint_lower_bounds. Then we have to use constraint_idx and not global idx.

Oh I see, thanks!

akifcorduk · 2026-04-28T11:21:45Z

    cuopt_assert(step_size > f_t(0.0), "dual_step_size must be > 0");

-    const f_t tmp = current_dual / step_size - dual_gradient[idx];
+    const int bound_idx = per_climber_constraints ? idx : constraint_idx;


Same remark as above

akifcorduk · 2026-04-28T11:32:15Z

+  auto transpose_field = [&](rmm::device_uvector<f_t>& field, i_t rows) {
+    if (field.size() <= static_cast<size_t>(rows)) return;
+    rmm::device_uvector<f_t> transposed(field.size(), stream_view_);
+    if (to_row) {


Instead of writing two branches, you can write a single transpose with just a pointer swap between climber_strategies_ and rows

You are right, changed

akifcorduk · 2026-04-28T11:34:35Z

 {
-  // Hyper parameter than can be changed, I have put what I believe to be the best
+  constexpr int batch_iteration_limit = 100000;
+  constexpr f_t pdlp_tolerance        = 1e-4;


Do we have a fixed precision(1e-4) for batched pdlp ?

Yes and it's intended to have a good warm start and fast heuristics for strong branching but you are right, I should have a clean API to handle all of that properly.

We need 1e-6 abs and 1e-12 relative tolerance in MIP.

akifcorduk · 2026-04-28T11:37:51Z

+  }
+  // Step size doesn't change anyways, just to save the compute
+  if (original_settings.get_initial_step_size().has_value()) {
+    batch_settings.set_initial_step_size(original_settings.get_initial_step_size().value());


Can't we set different initial steap size and weights for different climbers?

That's a research question. For now we use the same warm information across climbers

For MIP use cases, it might make sense to pass different warm starts, but let's keep it for another PR.

akifcorduk · 2026-04-28T11:41:49Z

+  // Since we decerement iteratively, we don't want to use std::numeric_limits<size_t>::max()
+  // Even if 20K fits in memory it will never be an optimal batch size,  it's just to have a
+  // reasonable upper bound
+  constexpr size_t max_batch_size    = 20000;


Maybe we should put that into a global config or some user facing API ?

Yes, I have to think about that

akifcorduk · 2026-04-28T11:47:53Z

      settings.dual_postsolve   = false;
      for (auto [presolver, epsilon] :
-           {std::pair{presolver_t::Papilo, 1e-1}, std::pair{presolver_t::None, 1e-6}}) {
+           {std::pair{presolver_t::Papilo, 1e-1}, std::pair{presolver_t::None, 1e-4}}) {


Why tolerance change?

… and nothing else. Correctly rejects save_best_primal_so_far and batch mode, check sizes in the expanded problem case.

Kh4ster · 2026-04-29T13:04:08Z

/ok to test b0a39d7

Kh4ster · 2026-04-29T15:38:45Z

/ok to test a077f43

Kh4ster · 2026-04-29T16:25:18Z

/ok to test af4f5fb

Kh4ster · 2026-04-29T17:48:55Z

/ok to test b747ebe

Kh4ster · 2026-04-29T17:51:32Z

/ok to test 31deda0

Kh4ster · 2026-04-29T19:10:18Z

/ok to test efa4b18

If/when we add batch PDLP to grpc, we'll include all of the relevant parameters including this one. For now, grpc handles only single LP/MIP problems, there is no batch capability. Added a note to the .proto file explaining why batch parameters are missing.

tmckayus · 2026-04-29T19:56:26Z

/ok to test 86d1c18

Kh4ster · 2026-05-11T08:32:13Z

/ok to test 35c3c61

Kh4ster · 2026-05-11T08:45:22Z

/ok to test 6b1bb48

Kh4ster · 2026-05-11T12:03:39Z

/ok to test 9bfa8c9

Kh4ster · 2026-05-11T14:25:13Z

/merge

This PR greatly extends the capabilities of batch PDLP. Former batch PDLP only supported having a single variable bounds being different per climber. It now supports: - Different constraints lower and upper bounds per climber - Different objective coefficients per climber - Different objective offset per climber - More than one variable bound difference per climber This PR also adds the support of per climber residual and first primal feasible to the Stable3 PDLP solver mode and its batch version. It allows to solve a batch of problems and stop once one or all the climbers have reached primal feasibility. All those combinations can be put together, resulting in a potential: Solve a batch of LPs all having different: constraints lower and upper bounds, objective coefficients, objective offset, variable bounds, using per constraint residual instead of the L2 norm and stopping once one, or all, climbers have reached primal feasibility. Authors: - Nicolas Blin (https://github.com/Kh4ster) - Trevor McKay (https://github.com/tmckayus) Approvers: - Akif ÇÖRDÜK (https://github.com/akifcorduk) - Trevor McKay (https://github.com/tmckayus) URL: NVIDIA#1152

Kh4ster added 7 commits April 7, 2026 13:30

experiement files

ebd69c7

working per climber constraint bounds, objective coefficient, objecti…

3182c3a

…ve offset, new api to support first querying the size before expanding the problem

fix initial scaling when objective coefficient are larger

f5205d6

add support for per constraints residual, first primal feasible, all …

3ca17f5

…primal feasible, in both stable3 mode and batch mode. it can work together along with potential problem modifications

Merge branch 'main' into extend_batch_pdlp_support

689ffe8

style

efbebe3

remove old file

72d7664

Kh4ster requested review from a team as code owners April 28, 2026 08:48

Kh4ster requested review from akifcorduk, rgsl888prabhu and yuwenchen95 April 28, 2026 08:48

Kh4ster removed request for rgsl888prabhu and yuwenchen95 April 28, 2026 08:49

Kh4ster self-assigned this Apr 28, 2026

Kh4ster added feature request New feature or request non-breaking Introduces a non-breaking change pdlp labels Apr 28, 2026

coderabbitai Bot reviewed Apr 28, 2026

View reviewed changes

remove old bench file

a7944c8

calrify with comments

94804e5

akifcorduk reviewed Apr 28, 2026

View reviewed changes

fix: correctly stops if and only if any is optimal or primal feasible…

c802123

… and nothing else. Correctly rejects save_best_primal_so_far and batch mode, check sizes in the expanded problem case.

Kh4ster and others added 3 commits April 29, 2026 15:34

add support for more than one variable bound per climber

34ffce5

style

ece2a0f

Merge branch 'main' into extend_batch_pdlp_support

a077f43

increase max size

af4f5fb

Kh4ster requested a review from a team as a code owner April 29, 2026 16:24

Kh4ster requested a review from AyodeAwe April 29, 2026 16:24

fix sub mip

b747ebe

Merge branch 'main' into extend_batch_pdlp_support

31deda0

Merge branch 'main' into extend_batch_pdlp_support

efa4b18

tmckayus approved these changes Apr 29, 2026

View reviewed changes

anandhkb added this to the 26.06 milestone May 5, 2026

Merge branch 'main' into extend_batch_pdlp_support

35c3c61

fix tests following span change

6b1bb48

Merge branch 'main' into extend_batch_pdlp_support

9bfa8c9

rapids-bot Bot merged commit 322740a into main May 11, 2026
307 of 312 checks passed

Kh4ster deleted the extend_batch_pdlp_support branch May 11, 2026 14:25

klamike mentioned this pull request May 11, 2026

Add Batched{S} set jump-dev/MathOptInterface.jl#2904

Open

Conversation

Kh4ster commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

Kh4ster commented Apr 28, 2026

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

coderabbitai Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kh4ster commented Apr 28, 2026

Uh oh!

copy-pr-bot Bot commented Apr 28, 2026

Uh oh!

Kh4ster commented Apr 28, 2026

Uh oh!

akifcorduk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Kh4ster commented Apr 28, 2026 •

edited

Loading

coderabbitai Bot commented Apr 28, 2026 •

edited

Loading

coderabbitai Bot Apr 28, 2026 •

edited

Loading