napkin-math: threshold-pairing rule for extract-parameters-from-digest#738
Closed
neoneye wants to merge 1 commit into
Closed
napkin-math: threshold-pairing rule for extract-parameters-from-digest#738neoneye wants to merge 1 commit into
neoneye wants to merge 1 commit into
Conversation
…gest Adds a new "Threshold pairing rule" to the extract skill's system prompt, slotted between "Coverage and capacity gate rule" and "Combined viability gate preservation". Why: extraction runs were leaving threshold key_values (floors, caps, targets, deadlines) unpaired with their realised-vs-threshold margin calcs. The 'no dead-end variables' rule already forbids this in principle, but in practice the abstract directive was being followed unevenly. The new rule makes the pairing explicit and operational: every extracted threshold gets a paired margin calculation in recommended_first_calculations or derived_questions, with the realised quantity declared in missing_values_to_estimate when the source does not name it. The rule is corpus-agnostic — it uses only structural categories (floor, cap, ceiling, target volume, target share, target deadline) and the existing _margin / _surplus naming convention. No corpus literals. Tested by re-running the skill on two v50 baselines that had unpaired thresholds: - crate_recovery_campaign — previously had target_recovered_crates (108k volume target) and minimum_donation_threshold_dkk (500k donation floor) as unpaired key_values. Re-extraction emits q_volume_target_margin and q_donation_minimum_margin as derived_questions per the new rule. Also restructured average_effective_incentive_per_crate_dkk from a missing-value to an explicit recommended_first_calculation so that incentive_per_crate_dkk and pilot_incentive_decel_threshold_crates appear in real depends_on rather than narrative-only suggested_estimation_method prose. - yellowstone_evacuation — previously had hospital_fuel_priority_share (75% generator-fuel floor) unpaired and vei7_uplift_trigger_cm_per_hour mis-classified as a simulatable key_value. Re-extraction emits hospital_fuel_priority_margin and moves vei7 to unmodelled_gates (geological observation, not deterministically simulatable). Dropped zone_zero_evacuation_target_people (population denominator, not a viability threshold) and the weakest existing pair (public_compliance_threshold_zone_one) under cap pressure, since hospital generator-fuel priority is more directly life-safety than downstream traffic congestion. Both v50 parameters.json files now audit clean against the no-dead-end-variables rule (every key_value appears in at least one calc's depends_on).
8 tasks
Member
Author
|
Consolidating into PR #737 per request — single PR makes it easier to verify the combined compress + extract prompt changes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
Single-file change: adds a "Threshold pairing rule" to
system-prompt.txtfor theextract-parameters-from-digestskill, slotted between "Coverage and capacity gate rule" and "Combined viability gate preservation". No code changes; no compress changes.Split out of PR #737 because the threshold-pairing rule lives in the extract stage, not the compress stage, so it does not fit that PR's "Phase 1 compress-prompt cleanup" scope.
What the rule says
When the extract emits a
key_valuewhose role is a numeric threshold — a floor, cap, ceiling, minimum, maximum, target volume, target share, or target deadline — it must also emit a paired margin/surplus calculation comparing the realised quantity against the threshold. The pairing has three parts:key_values.missing_values_to_estimateif the source does not name it.recommended_first_calculationsorderived_questions, withrealised - threshold(floor: positive = pass) orthreshold - realised(cap: positive = pass), using the_margin/_surplussuffix.Under cap pressure, the rule says to drop a less-load-bearing
key_valueor move a less-critical calc toderived_questions— never skip the pairing.Why
Existing rules in the same file (
No orphan formula rule,Coverage and capacity gate rule,Dead-end variable prevention) cover the principle abstractly, but extractor runs were leaving threshold key_values unpaired in practice. The new rule operationalises the pairing as a concrete three-part check.Corpus-agnostic by construction
The rule names only structural categories (floor, cap, ceiling, target volume, target share, target deadline) and the existing
_margin/_surplusnaming convention. No corpus literals, no plan names, no domain-specific acronyms, no expected output ids.Regression probes (not acceptance criteria)
Baseline plans are used to detect that the rule moves the right behaviour, not to define what the rule should target. Probes run against the gitignored
output/v50/digests:*_margincalculations for both.unmodelled_gates.The probes also surface that the existing 5-cap on
missing_values_to_estimateinteracts with the threshold-pairing rule in ways that can force tradeoffs (a less-critical existing pairing dropped to make room). This is a known limitation, not a fault of the new rule, and is left for a structural followup.What this PR does NOT do
missing_values_to_estimateor modify any other hard limit.Test plan