Skip to content

Cheaply-verifiable rule for a buggy reduction, generalized to optimization (no solving) #15

Description

@GiggleLiu

A cheaply-verifiable rule for "buggy reduction", generalized to optimization

Builds on #14 ("identify a buggy rule without solving the problem exactly"). This proposes a concrete rule and the verifier changes it implies.

Design invariant (the requirement from #14)

A bug certificate must be checkable with a constant number of reduce/extract/embed/evaluate calls on the given configurations. No pred solve.

A bug is an existential object, so it should have a short certificate. We only fall back to solving when no short certificate can exist (see "irreducible case" below).

Where the current verifier stands (benchmark/verify.py)

Violation Verdict Problem
unsound_extraction (verify.py:156) cheap, ~sound Evaluates the AI's claimed_source_solution but never runs pred extract to confirm extraction actually returns it (lines 195–199). Contract is extract(B,c_B)==c_A; verifier should recompute, not trust.
incomplete_reduction (verify.py:224) solves the hard problem Brute-force-solves both source (line 229) and target (line 247). Direct violation of the invariant.
suboptimal_extraction (verify.py:280) cheap but unsound Declares a bug when any source config beats extract(target_config) (line 332), but never checks target_config is target-optimal. A correct reduction maps optimal→optimal and is free to map suboptimal→suboptimal. False positives. Making it sound in this framing requires proving target_config optimal ⇒ solving.

Root cause (same as the original SAT rule's "c_B unsat ∧ extract(c_B) sat" disjunct): the reduction exposes only the backward map extract, so completeness/optimality — which are statements about the forward direction — get certified with data extract has no contract over.

Proposed rule

Add a forward solution map alongside the existing backward one:

reduce : A ↦ B              (instance map — exists: `pred reduce`)
extract: (B, c_B) ↦ c_A     (backward solution map — exists: `pred extract`)
embed  : (A, c_A) ↦ c_B     (forward solution map — `pred embed`, if not already exposed)

Decision problems. Correct iff (B) every c_B satisfying B has extract(c_B) satisfying A, and (F) every c_A satisfying A has embed(c_A) satisfying B. Together ⇒ YES_A ⟺ YES_B. A bug is one witness violating either. (F) is the cheap, sound completeness certificate that replaces the broken disjunct.

Optimization problems. Each problem has an objective (pred evaluate already returns Max(v)/Min(v)/…(None)); the reduction declares an affine cost relation (a, b), a ≠ 0 (sign flips min↔max). Contract, as two inequalities (WLOG min):

forward : obj_B(embed(c_A))   ≤ a·obj_A(c_A) + b
backward: obj_A(extract(c_B)) ≤ (obj_B(c_B) − b) / a

These two inequalities pin OPT_B = a·OPT_A + b and force optima↦optima without knowing OPT:
take optimal c_A*embed(c_A*) feasible at cost ≤ a·OPT_A+bOPT_B ≤ a·OPT_A+b;
take optimal c_B*extract(c_B*) feasible at cost ≤ (OPT_B−b)/aa·OPT_A+b ≤ OPT_B. Equality, and extract(c_B*) attains OPT_A. ∎

Four cheap, sound bug certificates (no solver)

Certificate Witness Catches Replaces
B-infeasible c_B feasible, extract(c_B) infeasible garbage extraction today's unsound_extraction
B-cost c_B feasible, obj_A(extract(c_B)) > (obj_B(c_B)−b)/a extraction loses objective value fixes suboptimal_extraction
F-infeasible c_A feasible, embed(c_A) infeasible reduction drops solutions cheap incomplete_reduction
F-cost c_A feasible, obj_B(embed(c_A)) > a·obj_A(c_A)+b reduction's optimum unreachable

Decision is the special case where "feasible" is the objective: the two *-infeasible rows survive, the cost rows vanish. One unified rule.

Key fix for suboptimality: B-cost is per-solution, not per-optimum — compare extract(c_B) against the value predicted from c_B's own score via (a,b), so c_B need not be optimal and nothing is solved. Contract-free variant if we don't want to trust a declared (a,b): an order-reversal pair — feasible c_B1, c_B2 with obj_B(c_B1) < obj_B(c_B2) but obj_A(extract(c_B1)) > obj_A(extract(c_B2)) — proves "solve B then extract" doesn't solve A, no (a,b) and no solver.

The one irreducible case

"A is feasible but B is not" (pure completeness) has no short certificate without embed — that's why incomplete_reduction solves today. embed(c_A) sidesteps it by exhibiting the target solution that should exist instead of proving its existence by search. Hence the fork:

  • Add pred embed upstream (CodingThrust/problem-reductions) → all four certificates are cheap + sound; the framework becomes a clean optimum-preserving-reduction checker. (Needs an upstream issue.)
  • Don't add it → keep incomplete_reduction as an explicit "expensive, last-resort" check gated behind an instance-size cap; still replace suboptimal_extraction with the cheap+sound B-cost / order-reversal check (no forward map needed). Removes one of the two solves and fixes the unsoundness.

Is a forward embed generally available?

Yes for essentially every reduction this library targets — embed is the constructive content of the completeness proof. "YES_A ⟹ YES_B" (and its optimization analog OPT_B = a·OPT_A+b) is proven by building a target solution from a source one; that construction is embed. A correct reduction cannot lack one in principle. Caveats are about implementation/cost, not existence:

  • Usually unimplemented, not unavailable. Using a reduction needs only extract, so embed is left in the proof, not the code. The upstream ask is "expose the forward map the proof already defines."
  • Gadget / spin-glass / QUBO reductions stay cheap. Given the source assignment, each gadget's auxiliary variables are set by a local, per-gadget choice/ground state — polynomial, not a global solve. (Confirm per rule that no step hides a global optimization.)
  • Non-parsimonious reductions (one source ↔ many target solutions) need only a canonical choice: embed returns one valid target config, not all.
  • embed need not invert extract. The certificates only require each map to land feasible at the right cost; extract(embed(c_A)) = c_A is a bonus parsimony check, not a requirement.
  • Composes along path: for A→C→B, embed_total = embed_{C→B} ∘ embed_{A→C}, so a chain has embed iff each atomic step does.

Consequence: the rule degrades gracefully — even where a rule can't expose a cheap embed, only the completeness certificate falls back to the gated solve; the backward certificates (B-infeasible, B-cost, order-reversal) stay solver-free.

Proposed work (this repo)

  • unsound_extraction: recompute pred extract on target_config and verify its output is infeasible (don't trust claimed_source_solution).
  • Replace suboptimal_extraction with B-cost (declared (a,b)) and/or order-reversal (two target configs) — both solver-free.
  • Add F-infeasible / F-cost certificates once pred embed exists; until then, gate incomplete_reduction behind a size cap and label it expensive.
  • Update results.schema.json violation enum + required fields per certificate; add fixtures + calibration cases (one true-positive and one false-positive per type).

Acceptance criteria

  • Every accepted certificate type is verifiable with O(1) pred calls and zero pred solve (except the explicitly-gated completeness fallback, if embed is not added).
  • A calibration fixture demonstrating the current suboptimal_extraction false positive (suboptimal target config extracting to a suboptimal source solution) is rejected by the new check.
  • --calibrate passes with the new fixtures.

cc @isPANN (#14)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions