Cheaply-verifiable rule for a buggy reduction, generalized to optimization (no solving)

## A cheaply-verifiable rule for "buggy reduction", generalized to optimization

Builds on #14 ("identify a buggy rule *without solving the problem exactly*"). This proposes a concrete rule and the verifier changes it implies.

### Design invariant (the requirement from #14)

> **A bug certificate must be checkable with a constant number of `reduce`/`extract`/`embed`/`evaluate` calls on the *given* configurations. No `pred solve`.**

A bug is an existential object, so it *should* have a short certificate. We only fall back to solving when no short certificate can exist (see "irreducible case" below).

### Where the current verifier stands (`benchmark/verify.py`)

| Violation | Verdict | Problem |
|---|---|---|
| `unsound_extraction` (`verify.py:156`) | cheap, ~sound | Evaluates the AI's `claimed_source_solution` but **never runs `pred extract`** to confirm extraction actually returns it (lines 195–199). Contract is `extract(B,c_B)==c_A`; verifier should recompute, not trust. |
| `incomplete_reduction` (`verify.py:224`) | **solves the hard problem** | Brute-force-solves *both* source (line 229) and target (line 247). Direct violation of the invariant. |
| `suboptimal_extraction` (`verify.py:280`) | cheap but **unsound** | Declares a bug when any source config beats `extract(target_config)` (line 332), but **never checks `target_config` is target-optimal**. A correct reduction maps optimal→optimal and is free to map *suboptimal→suboptimal*. False positives. Making it sound in this framing requires proving `target_config` optimal ⇒ solving. |

Root cause (same as the original SAT rule's "`c_B` unsat ∧ `extract(c_B)` sat" disjunct): the reduction exposes only the **backward** map `extract`, so completeness/optimality — which are statements about the **forward** direction — get certified with data `extract` has no contract over.

### Proposed rule

Add a **forward solution map** alongside the existing backward one:

```
reduce : A ↦ B              (instance map — exists: `pred reduce`)
extract: (B, c_B) ↦ c_A     (backward solution map — exists: `pred extract`)
embed  : (A, c_A) ↦ c_B     (forward solution map — `pred embed`, if not already exposed)
```

**Decision problems.** Correct iff (B) every `c_B` satisfying `B` has `extract(c_B)` satisfying `A`, and (F) every `c_A` satisfying `A` has `embed(c_A)` satisfying `B`. Together ⇒ `YES_A ⟺ YES_B`. A bug is one witness violating either. (F) is the cheap, sound completeness certificate that replaces the broken disjunct.

**Optimization problems.** Each problem has an objective (`pred evaluate` already returns `Max(v)`/`Min(v)`/`…(None)`); the reduction declares an affine cost relation `(a, b)`, `a ≠ 0` (sign flips min↔max). Contract, as two inequalities (WLOG min):

```
forward : obj_B(embed(c_A))   ≤ a·obj_A(c_A) + b
backward: obj_A(extract(c_B)) ≤ (obj_B(c_B) − b) / a
```

These two inequalities pin `OPT_B = a·OPT_A + b` and force optima↦optima **without knowing OPT**:
take optimal `c_A*` ⇒ `embed(c_A*)` feasible at cost ≤ `a·OPT_A+b` ⇒ `OPT_B ≤ a·OPT_A+b`;
take optimal `c_B*` ⇒ `extract(c_B*)` feasible at cost ≤ `(OPT_B−b)/a` ⇒ `a·OPT_A+b ≤ OPT_B`. Equality, and `extract(c_B*)` attains `OPT_A`. ∎

### Four cheap, sound bug certificates (no solver)

| Certificate | Witness | Catches | Replaces |
|---|---|---|---|
| **B-infeasible** | `c_B` feasible, `extract(c_B)` infeasible | garbage extraction | today's `unsound_extraction` |
| **B-cost** | `c_B` feasible, `obj_A(extract(c_B)) > (obj_B(c_B)−b)/a` | extraction loses objective value | fixes `suboptimal_extraction` |
| **F-infeasible** | `c_A` feasible, `embed(c_A)` infeasible | reduction drops solutions | cheap `incomplete_reduction` |
| **F-cost** | `c_A` feasible, `obj_B(embed(c_A)) > a·obj_A(c_A)+b` | reduction's optimum unreachable | — |

Decision is the special case where "feasible" *is* the objective: the two `*-infeasible` rows survive, the cost rows vanish. One unified rule.

Key fix for suboptimality: **B-cost is per-solution, not per-optimum** — compare `extract(c_B)` against the value *predicted from `c_B`'s own score* via `(a,b)`, so `c_B` need not be optimal and nothing is solved. Contract-free variant if we don't want to trust a declared `(a,b)`: an **order-reversal pair** — feasible `c_B1, c_B2` with `obj_B(c_B1) < obj_B(c_B2)` but `obj_A(extract(c_B1)) > obj_A(extract(c_B2))` — proves "solve B then extract" doesn't solve A, no `(a,b)` and no solver.

### The one irreducible case

"A is feasible but B is not" (pure completeness) has **no short certificate without `embed`** — that's why `incomplete_reduction` solves today. `embed(c_A)` sidesteps it by *exhibiting* the target solution that should exist instead of proving its existence by search. Hence the fork:

- **Add `pred embed` upstream** (`CodingThrust/problem-reductions`) → all four certificates are cheap + sound; the framework becomes a clean optimum-preserving-reduction checker. *(Needs an upstream issue.)*
- **Don't add it** → keep `incomplete_reduction` as an explicit "expensive, last-resort" check gated behind an instance-size cap; still replace `suboptimal_extraction` with the cheap+sound **B-cost / order-reversal** check (no forward map needed). Removes one of the two solves and fixes the unsoundness.

### Is a forward `embed` generally available?

Yes for essentially every reduction this library targets — `embed` is the *constructive content of the completeness proof*. "YES_A ⟹ YES_B" (and its optimization analog `OPT_B = a·OPT_A+b`) is proven by **building** a target solution from a source one; that construction *is* `embed`. A correct reduction cannot lack one in principle. Caveats are about implementation/cost, not existence:

- **Usually unimplemented, not unavailable.** Using a reduction needs only `extract`, so `embed` is left in the proof, not the code. The upstream ask is "expose the forward map the proof already defines."
- **Gadget / spin-glass / QUBO reductions stay cheap.** Given the source assignment, each gadget's auxiliary variables are set by a *local, per-gadget* choice/ground state — polynomial, not a global solve. (Confirm per rule that no step hides a global optimization.)
- **Non-parsimonious reductions** (one source ↔ many target solutions) need only a **canonical choice**: `embed` returns one valid target config, not all.
- **`embed` need not invert `extract`.** The certificates only require each map to land feasible at the right cost; `extract(embed(c_A)) = c_A` is a bonus parsimony check, not a requirement.
- **Composes along `path`:** for A→C→B, `embed_total = embed_{C→B} ∘ embed_{A→C}`, so a chain has `embed` iff each atomic step does.

Consequence: the rule **degrades gracefully** — even where a rule can't expose a cheap `embed`, only the *completeness* certificate falls back to the gated solve; the backward certificates (B-infeasible, B-cost, order-reversal) stay solver-free.

### Proposed work (this repo)

- [ ] `unsound_extraction`: recompute `pred extract` on `target_config` and verify *its* output is infeasible (don't trust `claimed_source_solution`).
- [ ] Replace `suboptimal_extraction` with **B-cost** (declared `(a,b)`) and/or **order-reversal** (two target configs) — both solver-free.
- [ ] Add **F-infeasible** / **F-cost** certificates once `pred embed` exists; until then, gate `incomplete_reduction` behind a size cap and label it expensive.
- [ ] Update `results.schema.json` violation enum + required fields per certificate; add fixtures + calibration cases (one true-positive and one false-positive per type).

### Acceptance criteria

- Every accepted certificate type is verifiable with O(1) `pred` calls and **zero `pred solve`** (except the explicitly-gated completeness fallback, if `embed` is not added).
- A calibration fixture demonstrating the current `suboptimal_extraction` false positive (suboptimal target config extracting to a suboptimal source solution) is **rejected** by the new check.
- `--calibrate` passes with the new fixtures.

cc @isPANN (#14)


Violation	Verdict	Problem
`unsound_extraction` (`verify.py:156`)	cheap, ~sound	Evaluates the AI's `claimed_source_solution` but never runs `pred extract` to confirm extraction actually returns it (lines 195–199). Contract is `extract(B,c_B)==c_A`; verifier should recompute, not trust.
`incomplete_reduction` (`verify.py:224`)	solves the hard problem	Brute-force-solves both source (line 229) and target (line 247). Direct violation of the invariant.
`suboptimal_extraction` (`verify.py:280`)	cheap but unsound	Declares a bug when any source config beats `extract(target_config)` (line 332), but never checks `target_config` is target-optimal. A correct reduction maps optimal→optimal and is free to map suboptimal→suboptimal. False positives. Making it sound in this framing requires proving `target_config` optimal ⇒ solving.

Certificate	Witness	Catches	Replaces
B-infeasible	`c_B` feasible, `extract(c_B)` infeasible	garbage extraction	today's `unsound_extraction`
B-cost	`c_B` feasible, `obj_A(extract(c_B)) > (obj_B(c_B)−b)/a`	extraction loses objective value	fixes `suboptimal_extraction`
F-infeasible	`c_A` feasible, `embed(c_A)` infeasible	reduction drops solutions	cheap `incomplete_reduction`
F-cost	`c_A` feasible, `obj_B(embed(c_A)) > a·obj_A(c_A)+b`	reduction's optimum unreachable	—

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cheaply-verifiable rule for a buggy reduction, generalized to optimization (no solving) #15

A cheaply-verifiable rule for "buggy reduction", generalized to optimization

Design invariant (the requirement from #14)

Where the current verifier stands (`benchmark/verify.py`)

Proposed rule

Four cheap, sound bug certificates (no solver)

The one irreducible case

Is a forward `embed` generally available?

Proposed work (this repo)

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Cheaply-verifiable rule for a buggy reduction, generalized to optimization (no solving) #15

Description

A cheaply-verifiable rule for "buggy reduction", generalized to optimization

Design invariant (the requirement from #14)

Where the current verifier stands (benchmark/verify.py)

Proposed rule

Four cheap, sound bug certificates (no solver)

The one irreducible case

Is a forward embed generally available?

Proposed work (this repo)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Where the current verifier stands (`benchmark/verify.py`)

Is a forward `embed` generally available?