Skip to content

Batch-add 12 Backlog Models#1067

Open
isPANN wants to merge 18 commits into
mainfrom
batch-add-models
Open

Batch-add 12 Backlog Models#1067
isPANN wants to merge 18 commits into
mainfrom
batch-add-models

Conversation

@isPANN
Copy link
Copy Markdown
Collaborator

@isPANN isPANN commented May 25, 2026

Summary

Serial batch-add of 12 Model issues from Backlog to one branch (batch-add-models), driven by the new /auto-pipeline orchestrator. Each model: full Rust impl + unit tests + schema-driven CLI + Typst problem-def entry + Crossref-verified BibTeX. 1 of the originally targeted 13 issues is parked on OnHold — see below.

Foundation (commit 694ef0c)

  • New skill /auto-pipeline: orchestrator that drives one Backlog issue from quality gate to Final review via fresh-context subagents (check-issue, fix-issue, run-pipeline, review-pipeline). Substantive issue-body problems route to codex xhigh; fundamental-flaw-without-reference issues park on OnHold.
  • check-issue: new Rule Check 5 (Completeness, fail label Incomplete) — mandatory literature + codebase + hand-trace on ≥2 corner cases for every [Rule] issue.
  • review-structural: new Step 4b (Round-trip Execution, mandatory for Rule reviews) — reviewer must cargo test --exact <name>, paste the test result: ok line, and confirm the test exercises all four phases of a real round-trip.
  • review-quality: promoted "closed-loop without round-trip verification" to Critical with explicit red flags.

Models added (12)

# Issue Model Category Notes
1 #994 MinimumDiscretePlanarInverseKinematics misc Robotics IK; non-binary per-link dims; new (f64,f64) + Vec<Vec<(usize,usize)>> CLI parsers
2 #1015 MaximumCoKPlex graph k-plex variant; G/W/K params (KN only); generalizes MaximumIndependentSet at k=1
3 #1018 MaximumCommonEdgeSubgraph graph MCES on local LabelledDigraph; alias MCES; corrected Bahiense/Soulé author lists
4 #1020 MaximumEdgeWeightedKClique graph Exact-k edge-weighted clique with negative weights; (SimpleGraph,i32) + (SimpleGraph,f64)
5 #1022 HighlyConnectedDeletion graph Edge-variable Min model with λ(H)>
6 #1024 EulerianPath graph Or-typed satisfaction on DirectedGraph; m^m brute-force; correct umlauts for Bang-Jensen/Ebert
7 #1026 PrizeCollectingSteinerForest graph n+m bit dims; node-prize/edge-cost/component-penalty objective
8 #1029 MinimumCostMaximumFlow graph Lex objective via single-scalar M*(max-flow)+cost encoding; integral-flow carve-out (mirrors MECF)
9 #1030 MinimumCostCirculation graph Signed costs, single Min objective; integral-flow carve-out; stronger two-cycle discriminator example
10 #1032 ClosestString misc Hamming consensus; q^m brute-force
11 #1033 ClosestSubstring misc ClosestString + window selection; concatenated dims [q;ℓ]++[W_i]
12 #1043 MaximumContactMapOverlap graph Order-preserving partial injection; aliases CMO, MaxCMO

Each model commits closes its issue via Closes #<n>. Companion [Rule] X → ILP/SteinerTree/... issues are explicitly deferred to follow-up PRs.

OnHold (1)

  • [Model] MaximumAcyclicAgreementForest #1046 MaximumAcyclicAgreementForest — substantive errors in the issue body that need human re-derivation:
    • Wrong worked-example outcome: issue claims MAAF size 4 / hybridization 3 for the quartet swap T_1=((a,b),(c,d)) vs T_2=((a,c),(b,d)), but the rooted SPR distance is 1 so MAAF should be smaller. Hand-checking the obvious 2-block partitions failed the edge-disjoint requirement, so the precise correct block structure isn't trivially derivable.
    • Wrong cited complexity: O(3.18^k · n) attributed to Whidden-Beiko-Zeh 2013, but that paper proves O(3^k · n).
    • codex xhigh rescue path failed in the sandbox (PATH permission error in the rescue subagent), so the substantive-rewrite loop couldn't complete.
    • Diagnostic comment posted at [Model] MaximumAcyclicAgreementForest #1046 (comment). Project board moved Backlog → OnHold.

Final-review fixes (commit e346156)

  • [Model] MinimumDiscretePlanarInverseKinematics #994: simplified declare_variants! complexity from num_links * total_configurations to total_configurations so it matches the issue's literal O(prod_{j=1}^n m_j) baseline.
  • [Model] MinimumCostCirculation #1030: added test_minimum_cost_circulation_issue_example_1030 which constructs the issue's verbatim 2-vertex example (arcs 0→1 cap=2 cost=3 and 1→0 cap=1 cost=-5, optimum -2). Existing richer 3-vertex canonical kept as the primary discriminator.

Test plan

  • make fmt clean
  • make clippy clean
  • make test green — ~5200 lib + 150 doc + integration suites all pass on the final commit
  • Re-run make check on CI after push — CI run 26412628216 green (5m50s) on commit e346156
  • make paper builds the Typst PDF cleanly — verified locally on commit e346156 (no warnings, 202 schemas exported)
  • Spot-check pred create --example <Model> | pred solve --solver brute-force for each of the 12 new models — 11/12 work via bare name; MaximumCoKPlex requires explicit /i32 disambiguation (same pattern as existing MaximumCliqueOne default but example only registered for i32; not a regression)
  • Review the integral-flow carve-out in [Model] MinimumCostMaximumFlow #1029 / [Model] MinimumCostCirculation #1030 (Problem trait is discrete; we sacrifice the issue's "continuous" wording but match MECF precedent) — accepted; documented explicitly in the model files and paper, consistent with sibling MinimumEdgeCostFlow
  • Review the lex-objective encoding in [Model] MinimumCostMaximumFlow #1029 (single-scalar M*(max_flow - value) + cost; alternative: tuple Value type — could be revisited later) — accepted; M > Σ cost_e·c_e strictly upper-bounds any feasible cost; integrates cleanly with Min aggregate
  • Review BibTeX corrections vs Crossref:

🤖 Generated with Claude Code

isPANN and others added 15 commits May 25, 2026 17:10
- New skill .claude/skills/auto-pipeline: orchestrator that drives one
  Backlog issue from quality gate to Final review via fresh-context
  subagents (check-issue, fix-issue, run-pipeline, review-pipeline).
  Substantive issue-body problems are routed to codex xhigh; fundamental
  flaws with no public reference park the issue on OnHold.
- check-issue: add Rule Check 5 (Completeness, fail label "Incomplete").
  Mandatory literature research + codebase corner-case enumeration +
  hand-tracing on >= 2 non-canonical instances for every [Rule] issue.
- review-structural: add Step 4b (Round-trip Execution, mandatory for
  Rule reviews). Reviewer must run cargo test by name, paste the
  "test result: ok" line, and confirm the test exercises the four
  phases of a real round-trip.
- review-quality: promote "closed-loop without round-trip verification"
  from a Minor flag to Critical, with explicit red flags
  (is_some-only, target-side-only asserts, unique-optimum instances).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Robotics inverse-kinematics problem: given link lengths l_j, target
g in R^2, per-link sampled orientations Phi_j, and consecutive-pair
admissibility sets A_j, pick indices a_j in {0..m_j-1} with
(a_{j-1}, a_j) in A_j minimizing the squared end-effector distance
||sum_j l_j (cos phi_{j,a_j}, sin phi_{j,a_j}) - g||^2.

- src/models/misc/minimum_discrete_planar_inverse_kinematics.rs:
  per-link dims (non-binary), Min<f64> objective, A_j feasibility
  returns Min(None), declare_variants! default entry, ProblemSchemaEntry
  + ProblemSizeFieldEntry, canonical example_db spec via inventory.
- src/unit_tests/models/misc/...: creation, evaluate (feasible/
  infeasible), brute-force solver, serialization roundtrip.
- problemreductions-cli/: new (f64,f64) and Vec<Vec<(usize,usize)>>
  schema parsers; --link-lengths/--target-point/--orientation-samples/
  --allowed-pairs flags via the schema-driven create path.
- docs/paper: problem-def block + display-name + worked example;
  references.bib entries for Salloum2025 and DaiIzattTedrake2019.

Reference: Salloum et al., "Quantum annealing for inverse kinematics
in robotics", Scientific Reports 2025, doi:10.1038/s41598-025-34346-z.

Closes #994

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The brute-force search space is prod_{j=1}^n m_j, not 2^n — per-link
sample counts m_j are arbitrary. Add a `total_configurations()` getter
that returns the product, and rewrite the declare_variants! complexity
as `num_links * total_configurations` (n vertices in evaluate cost
times the iteration space).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trivial single-line rewrite to match rustfmt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-k-plex problem: given graph G=(V,E), vertex weights w, and integer
k>=1, find max-weight subset S subseteq V such that the induced
subgraph G[S] has maximum degree at most k-1 (i.e. every selected
vertex has at most k-1 selected neighbours). Generalizes
MaximumIndependentSet (the k=1 case) and is the complement-graph view
of maximum k-plex from the clique-relaxation literature.

- src/models/graph/maximum_co_k_plex.rs: MaximumCoKPlex<G,W,K>
  parameterized by graph type, weight type, and K-multiplier. Only the
  KN (runtime-k) variant registered initially per the issue's
  "initially KN, K1/K2/... later" plan. Max<W::Sum> objective,
  induced-degree feasibility, declare_variants! default + i32 variant,
  canonical example via inventory (5-cycle weights (5,1,4,1,3) k=2,
  optimum {0,2,4} value 12).
- src/unit_tests/models/graph/maximum_co_k_plex.rs: creation,
  evaluate-feasible (issue optimum + smaller feasible), evaluate-
  infeasible (degree-2 violation), brute-force solver, serialization.
- problemreductions-cli/src/commands/create/: schema-driven CLI maps
  schema field bound_k to existing --k flag with semantic validation.
- docs/paper: problem-def block with C_5 worked example and k=1 ->
  MaximumIndependentSet equivalence note; references.bib gains
  Hernandez2016MolecularSimilarity and HosseinianButenko2022KDependent.

References: arXiv:1601.06693 (Hernandez et al., 2016) for the
molecular-similarity framing; doi:10.1016/j.dam.2021.10.015
(Hosseinian & Butenko, 2022) for the maximum k-dependent set view.

Closes #1015

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MCES: given two directed edge-labelled graphs G1, G2, find a partial
injective map f: U1 ⊆ V1 → V2 maximizing the number of preserved
labelled arcs (u, λ, v) ∈ E1 with f(u), f(v) defined and
(f(u), λ, f(v)) ∈ E2. Edge labels must match exactly; set semantics
(no multiplicities); disconnected common subgraphs allowed; no
secondary tie-break.

- src/models/graph/maximum_common_edge_subgraph.rs:
  local LabelledArc + LabelledDigraph structs (does not extend the
  existing Graph trait hierarchy in this PR). dims = vec![|V2|+1; |V1|]
  with the +1 slot encoding ⊥. Max<i64> objective with injectivity
  feasibility on the matched slots. ProblemSchemaEntry +
  ProblemSizeFieldEntry for num_vertices_1/_2 and num_arcs_1/_2,
  declare_variants! default with complexity (num_vertices_2+1)^num_vertices_1.
  Canonical example via inventory from the issue's 5-vs-4-vertex
  instance with optimum value 5.
- src/unit_tests/models/graph/maximum_common_edge_subgraph.rs:
  12 tests covering creation, evaluate-feasible (optimum 5),
  evaluate-injectivity-violated, evaluate-fewer-preserved, brute-force
  solver, serialization.
- problemreductions-cli/: new --graph-1 / --graph-2 flags with a
  LabelledDigraph parser; alias MCES.
- docs/paper: problem-def block, display-name, MCES worked example.
- docs/paper/references.bib: corrected per Crossref against the
  check-issue warning — Bahiense2012 first names (Laura/Gordana/Breno),
  Soule2021 author list (Soule/Reinharz/Sarrazin-Gendron/Denise/
  Waldispuhl) and venue, Bokhari1981 volume (C-30).

References: doi:10.1109/TC.1981.1675756 (Bokhari 1981),
doi:10.1016/j.dam.2012.01.026 (Bahiense et al. 2012, polyhedral
investigation), doi:10.1371/journal.pcbi.1008990 (Soule et al. 2021,
RNA networks application).

The direct `MaximumCommonEdgeSubgraph -> ILP` rule (#1019) is out of
scope for this PR and will follow separately.

Closes #1018

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exact-cardinality edge-weighted clique: given a simple undirected
graph G=(V,E), edge weights w: E→R, and an integer k with 0≤k≤|V|,
find a vertex subset S with |S|=k forming a clique that maximizes
the sum of weights of edges induced by S. Edge weights may be
negative; k=0 and k=1 are admitted with objective value 0.

Distinct from the existing MaximumClique (vertex-weighted, no
exact-k) and KClique (decision problem with threshold |S|>=k).

- src/models/graph/maximum_edge_weighted_k_clique.rs:
  MaximumEdgeWeightedKClique<W: WeightElement> with SimpleGraph fixed;
  edge_weights vector aligned to graph.edges() order, runtime k field.
  dims = vec![2; |V|]. Max<W::Sum> objective; infeasible when |S|≠k
  or S is not a clique. declare_variants! default (SimpleGraph,i32)
  plus (SimpleGraph,f64). Canonical example via inventory from the
  issue's 4-vertex instance with negative weight (clique {0,1,2}
  value 8 beats {0,1,3} value 6).
- src/unit_tests/models/graph/maximum_edge_weighted_k_clique.rs:
  12 tests covering creation, evaluate-feasible (both optima),
  evaluate-infeasible-wrong-size, evaluate-infeasible-not-clique,
  brute-force solver, edge cases k=0 and k=1 (value 0), f64 variant,
  serialization roundtrip, panic guards.
- docs/paper: problem-def block with worked example highlighting that
  the optimum includes a negative edge; display-name entry; cites
  Gouveia & Martins 2015 and Hunting/Faigle/Kern 2001.
- docs/paper/references.bib: Crossref-verified Gouveia2015MEWC
  (author corrected to Pedro Martins, not Paulo as the issue body
  said) and HuntingFaigleKern2001EWC.

References: doi:10.1007/s13675-014-0028-1 (Gouveia & Martins 2015,
sparse-graph compact formulations); doi:10.1016/S0377-2217(99)00449-X
(Hunting, Faigle & Kern 2001, Lagrangian relaxation).

The direct `MaximumEdgeWeightedKClique -> ILP` rule (#1021) is out
of scope for this PR and will follow separately.

Closes #1020

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Given a simple undirected graph G=(V,E), find a minimum-cardinality
edge set F ⊆ E such that every connected component of G - F is either
an isolated vertex or a highly connected graph on ≥3 vertices
(edge connectivity λ(H) > |V(H)|/2, strict). Components of size 2
are explicitly invalid. Weaker than clique-deletion: every K_k for
k≥3 is highly connected, but not every highly connected graph is a
clique.

- src/models/graph/highly_connected_deletion.rs: variables are EDGES
  (x_e=1 means delete edge e). Min<i64> objective counts deletions;
  infeasibility on any non-singleton component that is not highly
  connected (and any 2-vertex component). Private edge_connectivity
  helper computes λ via repeated max-flow with unit edge capacities
  (fine for small components in tests). ProblemSchemaEntry,
  ProblemSizeFieldEntry (num_vertices/num_edges), declare_variants!
  default with complexity 2^num_edges. Canonical example via
  inventory: K3 with leaf vertex 3 attached to 2 (4 vertices,
  4 edges) — optimum deletes only (2,3), value 1.
- src/unit_tests/models/graph/highly_connected_deletion.rs: 17 tests
  covering creation, evaluate-optimum, evaluate-zero-deletions-
  infeasible, evaluate-delete-all-feasible, evaluate-infeasible
  2-vertex-component and infeasible path-component, wrong-length
  config guard, brute-force on canonical + a "double triangle"
  discriminator instance (two K3's joined at a bridge — optimum 1)
  to address the check-issue warning about example discriminatory
  power, serialization, variant, plus edge_connectivity helper
  tests (single vertex=0, single edge=1, P3=1, K3=2, K4=3).
- docs/paper: problem-def block with the K3-with-leaf worked example,
  display-name entry; Crossref-verified BibTeX entries for Hüffner
  et al. 2014 (TCBB) and Hartuv & Shamir 2000 (IPL), with proper
  umlaut encoding H{"u}ffner per repo convention.

References: doi:10.1109/TCBB.2013.177 (Hüffner et al. 2014, partitioning
biological networks); doi:10.1016/S0020-0190(00)00142-3 (Hartuv &
Shamir 2000, HCS clustering algorithm).

The direct `HighlyConnectedDeletion -> ILP` rule (#1023) is out of
scope for this PR and will follow separately.

Closes #1022

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Classical directed-multigraph satisfaction problem: given D=(V,A) with
parallel arcs and loops allowed, decide whether a directed trail
exists that uses every arc exactly once. No start or end vertex is
fixed by the input. Empty-arc instance is accepted with the empty
trail; isolated vertices are ignored.

Polynomial-time solvable (O(num_vertices + num_arcs)) by the standard
Eulerian criterion plus Hierholzer construction, so this widens the
catalog beyond NP-hard problems.

- src/models/graph/eulerian_path.rs: EulerianPath { graph:
  DirectedGraph }. dims = vec![m; m] where m = num_arcs (variable t
  picks which arc occurrence is the t-th trail step); the brute-force
  search space is m^m but the registry complexity reflects the
  linear-time best-known algorithm. Or-typed feasibility: configuration
  must be a permutation of {0..m-1} and consecutive arcs must chain
  (end of arc t equals start of arc t+1). declare_variants! default
  + ProblemSchemaEntry + ProblemSizeFieldEntry. Canonical example via
  inventory from the issue's 3-vertex 4-arc instance with parallel
  arcs (yes-instance, witness config [0,2,3,1]).
- src/unit_tests/models/graph/eulerian_path.rs: 11 tests covering
  creation, evaluate-valid-witness, evaluate-not-permutation,
  evaluate-bad-trail, evaluate-out-of-range, evaluate-wrong-length,
  brute-force yes (canonical) + brute-force no (the issue's 2-vertex
  4-arc imbalanced counterexample), empty-arcs edge case (Or(true)
  with the empty witness), serialization roundtrip, variant + name.
- problemreductions-cli/src/commands/create/schema_support.rs: wire
  --graph (DirectedGraph) for EulerianPath via the existing parser.
- docs/paper: problem-def block with both the yes-instance and the
  no-instance from the issue; display-name entry; references.bib gains
  Crossref-verified BangJensenGutin2009Digraphs (J{\o}rgen Bang-Jensen,
  o-slash) and Ebert1988ComputingEulerianTrails (J{"u}rgen Ebert,
  u-umlaut) — corrected from the issue body's mojibake.

References: doi:10.1007/978-1-84800-998-1 (Bang-Jensen & Gutin 2009,
digraphs); doi:10.1016/0020-0190(88)90170-6 (Ebert 1988, computing
Eulerian trails).

The direct `EulerianPath -> ILP` rule (#1025) is out of scope for
this PR and will follow separately.

Closes #1024

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Biology-paper prize-collecting Steiner forest: given a network G with
nonnegative vertex prizes p(v), nonnegative edge costs c(e), and
tradeoff coefficients beta, omega, find a forest subgraph
F = (V_F, E_F) minimizing

  beta * sum_{v notin V_F} p(v) + sum_{e in E_F} c(e) + omega * kappa(F)

where kappa(F) is the number of tree components (singleton selected
vertices count). Generalizes prize-collecting Steiner tree from one
connected tree to a forest; the artificial-root trick is deliberately
kept out of the base model and will live in the companion reduction
rule.

- src/models/graph/prize_collecting_steiner_forest.rs:
  PrizeCollectingSteinerForest<G, W> with dims = vec![2; n+m] (vertex
  bits then edge bits), Min<W::Sum> objective. Feasibility checks
  edges-incident-to-selected-vertices and forest acyclicity; infeasible
  → Min(None). Canonical example via inventory from the issue's
  3-vertex path with optimum [1,1,1, 1,0] value 5 (cost 1 + omega*2
  components). declare_variants! default (SimpleGraph,i32) plus
  (SimpleGraph,f64). Complexity 2^(num_vertices+num_edges).
- src/unit_tests/models/graph/prize_collecting_steiner_forest.rs:
  13 tests — creation, evaluate-optimum, evaluate-full-path (value 9),
  evaluate-three-singletons (value 6), evaluate-empty-forest
  (value 12), evaluate-edge-without-endpoint-infeasible,
  evaluate-cycle-infeasible (triangle selected entirely),
  brute-force solver, serialization, f64 variant, panic guards.
- problemreductions-cli/: new --vertex-prizes / --edge-costs / --beta /
  --omega flags via schema-driven create; mapping and fixture updates.
- docs/paper: problem-def block with worked example breakdown
  (omitted-prize / edge-cost / component terms summing to 5);
  display-name; Crossref-verified BibTeX for both Tuncbag et al.
  papers (JCB 2013 and RECOMB 2012).

References: doi:10.1089/cmb.2012.0092 (Tuncbag et al. 2013, JCB);
doi:10.1007/978-3-642-29627-7_31 (Tuncbag et al. 2012, RECOMB).

The direct `PrizeCollectingSteinerForest -> SteinerTree` rule (#1027)
is out of scope for this PR and will follow separately.

Closes #1026

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lexicographic objective on a directed multigraph with source s, sink t,
arc capacities u_e, and arc costs c_e: first maximize the s-t flow
value |f|, then among maximum-value flows minimize total arc cost
sum_e c_e f_e. Captures the CellRouter (Lummertz da Rocha et al. 2018)
model directly.

Architectural carve-out: the repo's Problem trait is discrete, so the
implementation restricts to INTEGRAL flows with dims=[c_e+1; m]. This
mirrors the existing MinimumEdgeCostFlow precedent and stays sound
for any rational instance by scaling. Documented in the module doc.

Lex encoding: Min<i64> with combined scalar
  score = M * (max_possible_flow - flow_value) + cost
where M = sum_e c_e * u_e + 1 dominates any feasible cost — so lower
scores always prefer higher flow value first, breaking ties by lower
cost. Infeasible (capacity / conservation violations) → Min(None).

- src/models/graph/minimum_cost_maximum_flow.rs:
  MinimumCostMaximumFlow { graph: DirectedGraph, source, sink,
  capacities: Vec<i64>, costs: Vec<i64> }. Inherent helpers
  flow_value(config) and total_cost(config) for tests. Canonical
  example via inventory: V={0,1,2,3}, arcs [(0,1),(0,2),(1,2),
  (1,3),(2,3)], capacities [2,1,1,1,2], costs [1,0,0,1,2] — optimum
  config [2,1,1,1,2] with value 3 and cost 7. ProblemSchemaEntry +
  ProblemSizeFieldEntry (num_vertices, num_arcs). declare_variants!
  default with complexity (num_vertices+num_arcs)^6 (a conservative
  polynomial placeholder justified by the LP formulation).
- src/unit_tests/models/graph/minimum_cost_maximum_flow.rs: 9 tests
  covering creation, evaluate-optimum, evaluate-suboptimal-feasible,
  evaluate-capacity-exceeded (infeasible), evaluate-conservation-
  violated (infeasible), brute-force solver returning value 3 cost 7,
  serialization, and the lex-tiebreaker test on a 4-vertex bottleneck
  instance where two distinct max-value flows (value 1) exist with
  costs 1 and 5 — brute-force must pick the cheaper one. The
  tiebreaker test directly addresses the check-issue warning that
  the original example admits a unique max-flow.
- problemreductions-cli/: new --source / --sink (usize) flags wired
  via schema-driven create; --graph (DirectedGraph), --capacities,
  --costs reused from the MECF wiring.
- docs/paper: problem-def block explaining the lex objective and the
  integral-flow restriction, worked example with value/cost
  breakdown; display-name; Crossref-verified BibTeX for
  Lummertz da Rocha et al. 2018 (doi:10.1038/s41467-018-03214-y);
  MIT 6.854 min-cost-flow notes as a @misc entry with URL.

References: doi:10.1038/s41467-018-03214-y (CellRouter, Nature Comms
2018); MIT 6.854 scribe notes for the standard min-cost-flow
equivalence.

The direct `MinimumCostMaximumFlow -> MinimumCostCirculation` rule
(#1031) is out of scope for this PR and will follow separately.

Closes #1029

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Classical minimum-cost circulation on a directed multigraph with
finite arc capacities u_e ≥ 0 and signed arc costs a_e ∈ R: find
g: E→R_{≥0} satisfying capacity bounds (0 ≤ g_e ≤ u_e) and flow
conservation at every vertex, minimizing sum_e a_e g_e.

This is the exact companion target for `MinimumCostMaximumFlow`
(reduction #1031) — the standard MCMF → MCCirc reduction uses a
sufficiently negative return arc from sink to source, which is why
signed costs must be supported in the base model.

Architectural carve-out (same as MinimumCostMaximumFlow #1029 and
MinimumEdgeCostFlow): the discrete Problem trait restricts to
INTEGRAL circulation with dims=[c_e+1; m]; sound for any rational
instance by scaling. Documented in the module doc.

- src/models/graph/minimum_cost_circulation.rs:
  MinimumCostCirculation { graph: DirectedGraph, capacities: Vec<i64>,
  costs: Vec<i64> } — no source/sink, conservation at every vertex.
  Min<i64> objective; capacity-or-conservation violations → Min(None).
  ProblemSchemaEntry + ProblemSizeFieldEntry (num_vertices, num_arcs).
  declare_variants! default with conservative polynomial placeholder
  (num_vertices+num_arcs)^6. Canonical example via inventory — a
  3-vertex two-cycle instance discriminating between four feasible
  alternatives (zero, cycle-A-only -2, cycle-B-only -3, both at
  capacity -5) so round-trip tests have real discriminatory power,
  addressing the check-issue warning about the issue's 2-vertex
  example being too small.
- src/unit_tests/models/graph/minimum_cost_circulation.rs: 11 tests
  covering creation, evaluate-optimum (config [2,2,1,1] → Min(-5)),
  evaluate-zero, evaluate-cycle-A-only (-2), evaluate-cycle-B-only
  (-3), evaluate-half-cycle-A (-4), evaluate-infeasible (capacity
  exceeded, conservation violated), brute-force solver, serialization,
  negative-cost-only-cycle smoke.
- problemreductions-cli/: --graph (DirectedGraph), --capacities,
  --costs reused from MECF/MCMF wiring; new schema mapping for MCCirc.
- docs/paper: problem-def block with the two-cycle worked example
  spelled out (per-unit costs, capacity bottlenecks), display-name
  entry; reuses the existing mit6854MinCostFlow @misc bib entry
  added with MCMF — no new references.bib changes.

References: MIT 6.854 (S2021) min-cost flow algorithms notes (shared
with #1029).

The direct `MinimumCostMaximumFlow -> MinimumCostCirculation` rule
(#1031) is out of scope for this PR and will follow separately.

Closes #1030

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consensus-string problem under Hamming distance: given an alphabet
Σ = {0,...,q-1} and n equal-length strings s_1,...,s_n ∈ Σ^m, find a
center string c ∈ Σ^m minimizing max_i d_H(c, s_i). NP-hard (Frances
& Litman 1997, Lanctot et al. 1999), with extensive FPT and PTAS
literature (Gramm & Niedermeier 2003, Ma & Sun 2009, Li/Ma/Wang 2002).

Distinct from ClosestSubstring — every input string has the same
length as the center, so there is no window-selection decision.

- src/models/misc/closest_string.rs: ClosestString { alphabet_size,
  strings: Vec<Vec<usize>> }. Validating constructor panics on
  length mismatch or out-of-alphabet symbol. dims = vec![q; m].
  Min<i64> objective (always feasible — every config in the cube is
  a syntactically valid center). Inherent getters alphabet_size,
  num_strings, string_length, total_length. ProblemSchemaEntry +
  ProblemSizeFieldEntry with all four size fields. declare_variants!
  default with complexity alphabet_size^string_length. Canonical
  example via inventory from the issue's 4-string binary length-3
  instance (optimal center [0,0,0], radius 2).
- src/unit_tests/models/misc/closest_string.rs: 11 tests covering
  creation, evaluate at three different centers (c=000 → 2, c=100
  → 3, c=111 → 3), brute-force solver returning radius 2 over 8
  candidates, three panic guards (empty input, length mismatch,
  out-of-alphabet symbol), a q=3 length-2 ternary smoke test
  (radius 2 over 9 candidates), and serialization.
- problemreductions-cli/: schema-driven create wires --alphabet-size
  (usize) and --strings (Vec<Vec<usize>>) — reuses the existing
  Vec<Vec<usize>> parser added with #994.
- docs/paper: problem-def block with all four Hamming distances
  spelled out for c=000; display-name entry; Crossref-verified
  Li/Ma/Wang 2002 (JACM) bib entry.

Reference: doi:10.1145/506147.506150 (Li, Ma & Wang 2002, JACM).

The direct `ClosestString -> ILP` rule (#1034) is out of scope for
this PR and will follow separately.

Closes #1032

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Window-selection generalization of ClosestString (#1032): given an
alphabet Σ, n strings (NOT necessarily equal length), and a substring
length ℓ, find a center c ∈ Σ^ℓ and start positions p_i selecting
length-ℓ windows s_i[p_i .. p_i+ℓ) minimizing
  max_i d_H(c, s_i[p_i .. p_i+ℓ)).
Motif-discovery model — NP-hard, no PTAS in general (Li, Ma & Wang
2002 JACM; Marx 2008).

ClosestString is the special case where every input string has
length exactly ℓ (single window per string).

- src/models/misc/closest_substring.rs: ClosestSubstring {
  alphabet_size, strings: Vec<Vec<usize>>, substring_length }.
  Validating constructor panics on empty input, substring_length >
  min |s_i|, or out-of-alphabet symbol. dims concatenates ℓ center
  slots (domain {0..q-1}) with n window-start slots (domain
  {0..W_i-1} where W_i = |s_i| - ℓ + 1). Min<i64> objective, always
  feasible since every config in the cube is syntactically valid.
  Inherent getters alphabet_size, num_strings, substring_length,
  total_length, total_num_windows, num_window_choice_product (with
  saturating multiplication). ProblemSchemaEntry +
  ProblemSizeFieldEntry. declare_variants! default with complexity
  alphabet_size^substring_length * num_window_choice_product.
  Canonical example via inventory from the issue's 3 binary strings
  with ℓ=3 — optimum center [0,1,0] with window picks (0,1,0),
  radius 1 over 216 candidate configs.
- src/unit_tests/models/misc/closest_substring.rs: 11 tests covering
  creation, evaluate at optimum (radius 1), evaluate at center
  [0,0,0] with all-zero windows (radius 2), evaluate at center
  [1,1,1] (radius >=1), brute-force solver, ClosestString reduction
  validation (substring_length = string_length → matches the #1032
  canonical's radius 2), three panic guards (empty input, length
  mismatch, out-of-alphabet symbol), and serialization roundtrip.
- problemreductions-cli/: schema-driven create wires
  --alphabet-size + --strings (reused from #1032) plus the new
  --substring-length (usize) flag.
- docs/paper: problem-def block with the worked example listing all
  three window picks and per-window Hamming distances; display-name
  entry. Reuses the existing Li/Ma/Wang 2002 JACM BibTeX entry
  added with #1032 — no references.bib changes.

Reference: doi:10.1145/506147.506150 (Li, Ma & Wang 2002, JACM)
shared with #1032.

The direct `ClosestSubstring -> ILP` rule (#1035) is out of scope
for this PR and will follow separately.

Closes #1033

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Classical protein-structure contact-map alignment: given two ordered
contact graphs G_1=(V_1,E_1) and G_2=(V_2,E_2), find an order-preserving
partial injective map f: V_1 → V_2 ∪ {unmatched} maximizing the number
of contacts {i,k} ∈ E_1 such that both i, k are matched and
{f(i), f(k)} ∈ E_2. Aliases: CMO, MaxCMO.

NP-hard with substantial literature on exact algorithms and integer
programming (Andonov, Malod-Dognin & Yanev 2011; Xie & Sahinidis 2007).

- src/models/graph/maximum_contact_map_overlap.rs:
  MaximumContactMapOverlap { num_vertices_1, contacts_1, num_vertices_2,
  contacts_2 }. Validating constructor normalizes each pair to sorted
  form (u<v), rejects self-loops, duplicates, and out-of-range
  endpoints. dims = vec![num_vertices_2 + 1; num_vertices_1] (value 0
  encodes unmatched; value j+1 maps to vertex j of G_2). Max<i64>
  objective; non-injective or non-order-preserving matched values →
  Max(None). ProblemSchemaEntry + ProblemSizeFieldEntry; inherent
  getters num_vertices_1/_2 and num_contacts_1/_2. declare_variants!
  default with complexity (num_vertices_2+1)^num_vertices_1.
  Canonical example via inventory: G_1 with 4 vertices and contacts
  {(0,2),(1,3)}, G_2 with 5 vertices and contacts {(0,2),(0,3),(1,4)}
  — optimum [1,2,4,5] preserves both contacts → Max(Some(2)).
- src/unit_tests/models/graph/maximum_contact_map_overlap.rs: 17 tests
  covering creation, evaluate at optimum, all-unmatched, single-match,
  non-injective Max(None), non-order-preserving Max(None), suboptimal
  feasible (config [1,2,3,4] preserves 1 of 2 contacts), brute-force
  solver returning Max(2), wrong-length and out-of-range guards,
  serialization, alias resolution for CMO/MaxCMO, and three panic
  guards (self-loop, duplicate contact, endpoint out of range).
- problemreductions-cli/: schema-driven create wires --num-vertices-1
  / --num-vertices-2 / --contacts-1 / --contacts-2 (Vec<(usize,usize)>
  parser) via the existing CreateArgs + flag_map + tests fixture.
- docs/paper: problem-def block with the alignment table and the two
  preserved-contact bullets; display-name; Crossref-verified BibTeX
  for both Andonov-Malod-Dognin-Yanev 2011 and Xie-Sahinidis 2007
  JCB papers (with N{\"o}el encoded per repo umlaut convention).

References: doi:10.1089/cmb.2009.0196 (Andonov, Malod-Dognin & Yanev
2011, JCB); doi:10.1089/cmb.2007.R007 (Xie & Sahinidis 2007, JCB).

The direct `MaximumContactMapOverlap -> ILP` rule (#1044) is out of
scope for this PR and will follow separately.

Closes #1043

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@isPANN isPANN changed the title Batch-add 12 Backlog Models (one branch, serial) Batch-add 12 Backlog Models May 25, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 25, 2026

Codecov Report

❌ Patch coverage is 98.12398% with 46 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.93%. Comparing base (8df8ac0) to head (e346156).

Files with missing lines Patch % Lines
src/models/graph/maximum_co_k_plex.rs 90.32% 9 Missing ⚠️
src/models/graph/minimum_cost_maximum_flow.rs 94.85% 7 Missing ⚠️
src/models/graph/maximum_common_edge_subgraph.rs 95.77% 6 Missing ⚠️
src/models/graph/minimum_cost_circulation.rs 93.02% 6 Missing ⚠️
...rc/models/graph/prize_collecting_steiner_forest.rs 97.29% 5 Missing ⚠️
src/models/graph/highly_connected_deletion.rs 98.15% 3 Missing ⚠️
...misc/minimum_discrete_planar_inverse_kinematics.rs 97.58% 3 Missing ⚠️
src/models/misc/closest_substring.rs 98.05% 2 Missing ⚠️
...it_tests/models/graph/minimum_cost_maximum_flow.rs 97.95% 2 Missing ⚠️
src/models/graph/maximum_edge_weighted_k_clique.rs 98.80% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff            @@
##             main    #1067     +/-   ##
=========================================
  Coverage   97.93%   97.93%             
=========================================
  Files         966      990     +24     
  Lines      100154   102606   +2452     
=========================================
+ Hits        98083   100488   +2405     
- Misses       2071     2118     +47     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

isPANN and others added 2 commits May 25, 2026 23:00
Net -83 lines across the four skill files touched by 694ef0c.

auto-pipeline (-79):
- Collapse Step 0a + 0b into a single picker block; the only
  difference was "filter by number" vs "sort and pick top" — now
  branched by whether the ISSUE env var is set.
- Extract the boilerplate that was duplicated across all five
  subagent prompts (output-only-JSON-block contract, universal don'ts,
  malformed-JSON retry policy, severity vocabulary) into a single
  "Subagent Contract" section near the top. Each prompt now states
  only its scope and JSON shape. Drop the trailing "Reporting
  Contract" section (merged into the new one).
- Trim the Board states table from 8 GraphQL IDs to the 3 columns
  the orchestrator actually writes (ready, on-hold, plus the Backlog
  it reads in Step 0). The IDs for In Progress / Review pool /
  Under review / Final review live in run-pipeline / review-pipeline
  where they are used.
- Drop three rows from Common Mistakes that just echoed the spec
  (codex retry cap, increment SUBSTANTIVE_RETRIES, re-check after
  auto-fix); keep only the non-obvious cross-cutting traps.

check-issue (-9 net):
- Rule Check 5a no longer re-lists the literature fallback chain;
  one-line reference to Check 3c suffices.
- Rule Check 5c verdict table drops the (severity: ...) annotations
  on Fail rows — severity classification is owned by auto-pipeline's
  Subagent Contract, not by check-issue itself.
- Rule Check 5c drops the "cited reference does not contain the
  reduction" row that explicitly admitted overlap with Check 3c;
  add a one-line note that 3c handles that case.

review-structural (-19):
- Step 4b-4 (pred --via spot-check) was hedged out of its own
  purpose with a "fall back to 4b-2" escape hatch. Rewrite as a
  short focused step that uses pred --via when wired and is skipped
  (with a note) otherwise — no padding.

review-quality (~2 net):
- Replace the 4-criterion expansion of the round-trip rule (which
  was copy-pasted from review-structural Step 4b-3) with a single
  pointer to that source-of-truth section. Per the existing
  feedback_skill_no_duplication memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- MinimumDiscretePlanarInverseKinematics: simplify complexity
  string from "num_links * total_configurations" to
  "total_configurations" so it matches issue #994's stated
  O(prod_{j=1}^n m_j) baseline literally (the extra num_links
  factor was per-config feasibility-check work, not configs).

- MinimumCostCirculation: add test_minimum_cost_circulation_issue_example_1030
  that constructs issue #1030's verbatim 2-vertex example (arcs
  0->1 cap=2 cost=3 and 1->0 cap=1 cost=-5, optimum -2). The
  existing richer 3-vertex canonical instance is kept as the
  primary discriminator.
@isPANN isPANN marked this pull request as ready for review May 25, 2026 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant