Skip to content

napkin-math: threshold-pairing rule for extract-parameters-from-digest#738

Closed
neoneye wants to merge 1 commit into
mainfrom
napkin-math/phase2-extract-threshold-pairing
Closed

napkin-math: threshold-pairing rule for extract-parameters-from-digest#738
neoneye wants to merge 1 commit into
mainfrom
napkin-math/phase2-extract-threshold-pairing

Conversation

@neoneye
Copy link
Copy Markdown
Member

@neoneye neoneye commented May 20, 2026

Scope

Single-file change: adds a "Threshold pairing rule" to system-prompt.txt for the extract-parameters-from-digest skill, slotted between "Coverage and capacity gate rule" and "Combined viability gate preservation". No code changes; no compress changes.

Split out of PR #737 because the threshold-pairing rule lives in the extract stage, not the compress stage, so it does not fit that PR's "Phase 1 compress-prompt cleanup" scope.

What the rule says

When the extract emits a key_value whose role is a numeric threshold — a floor, cap, ceiling, minimum, maximum, target volume, target share, or target deadline — it must also emit a paired margin/surplus calculation comparing the realised quantity against the threshold. The pairing has three parts:

  1. The threshold goes in key_values.
  2. The realised quantity goes in missing_values_to_estimate if the source does not name it.
  3. The margin calculation goes in recommended_first_calculations or derived_questions, with realised - threshold (floor: positive = pass) or threshold - realised (cap: positive = pass), using the _margin / _surplus suffix.

Under cap pressure, the rule says to drop a less-load-bearing key_value or move a less-critical calc to derived_questions — never skip the pairing.

Why

Existing rules in the same file (No orphan formula rule, Coverage and capacity gate rule, Dead-end variable prevention) cover the principle abstractly, but extractor runs were leaving threshold key_values unpaired in practice. The new rule operationalises the pairing as a concrete three-part check.

Corpus-agnostic by construction

The rule names only structural categories (floor, cap, ceiling, target volume, target share, target deadline) and the existing _margin / _surplus naming convention. No corpus literals, no plan names, no domain-specific acronyms, no expected output ids.

Regression probes (not acceptance criteria)

Baseline plans are used to detect that the rule moves the right behaviour, not to define what the rule should target. Probes run against the gitignored output/v50/ digests:

  • A reverse-logistics campaign plan in the baseline previously had two unpaired threshold key_values (a volume target and a charitable-donation floor). The new rule produces paired *_margin calculations for both.
  • A disaster-response plan in the baseline previously had a generator-fuel priority floor unpaired and a geological-observation trigger mis-classified as a simulatable key_value. The new rule produces the missing pairing for the fuel floor and re-routes the geological trigger to unmodelled_gates.

The probes also surface that the existing 5-cap on missing_values_to_estimate interacts with the threshold-pairing rule in ways that can force tradeoffs (a less-critical existing pairing dropped to make room). This is a known limitation, not a fault of the new rule, and is left for a structural followup.

What this PR does NOT do

  • It does not add prompt content that targets any specific baseline plan.
  • It does not raise the 5-cap on missing_values_to_estimate or modify any other hard limit.
  • It does not address the unrelated compress-LLM run-to-run variance that drops other tripwires from the digest before the extract sees them (that issue is independent of this rule and remains open).

Test plan

  • CI green (no code changes; just a prompt-text edit)
  • Hand-spot the new rule paragraph reads corpus-agnostically (no plan names, no literals)
  • Re-run extract-parameters-from-digest against the napkin_math baseline; the rule should move only structural patterns, not target any single plan

…gest

Adds a new "Threshold pairing rule" to the extract skill's system prompt, slotted between "Coverage and capacity gate rule" and "Combined viability gate preservation".

Why: extraction runs were leaving threshold key_values (floors, caps, targets, deadlines) unpaired with their realised-vs-threshold margin calcs. The 'no dead-end variables' rule already forbids this in principle, but in practice the abstract directive was being followed unevenly. The new rule makes the pairing explicit and operational: every extracted threshold gets a paired margin calculation in recommended_first_calculations or derived_questions, with the realised quantity declared in missing_values_to_estimate when the source does not name it.

The rule is corpus-agnostic — it uses only structural categories (floor, cap, ceiling, target volume, target share, target deadline) and the existing _margin / _surplus naming convention. No corpus literals.

Tested by re-running the skill on two v50 baselines that had unpaired thresholds:

- crate_recovery_campaign — previously had target_recovered_crates (108k volume target) and minimum_donation_threshold_dkk (500k donation floor) as unpaired key_values. Re-extraction emits q_volume_target_margin and q_donation_minimum_margin as derived_questions per the new rule. Also restructured average_effective_incentive_per_crate_dkk from a missing-value to an explicit recommended_first_calculation so that incentive_per_crate_dkk and pilot_incentive_decel_threshold_crates appear in real depends_on rather than narrative-only suggested_estimation_method prose.

- yellowstone_evacuation — previously had hospital_fuel_priority_share (75% generator-fuel floor) unpaired and vei7_uplift_trigger_cm_per_hour mis-classified as a simulatable key_value. Re-extraction emits hospital_fuel_priority_margin and moves vei7 to unmodelled_gates (geological observation, not deterministically simulatable). Dropped zone_zero_evacuation_target_people (population denominator, not a viability threshold) and the weakest existing pair (public_compliance_threshold_zone_one) under cap pressure, since hospital generator-fuel priority is more directly life-safety than downstream traffic congestion.

Both v50 parameters.json files now audit clean against the no-dead-end-variables rule (every key_value appears in at least one calc's depends_on).
@neoneye
Copy link
Copy Markdown
Member Author

neoneye commented May 20, 2026

Consolidating into PR #737 per request — single PR makes it easier to verify the combined compress + extract prompt changes.

@neoneye neoneye closed this May 20, 2026
@neoneye neoneye deleted the napkin-math/phase2-extract-threshold-pairing branch May 20, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant