Skip to content

Tighten population tolerance and add fidelity tests#366

Merged
MaxGhenis merged 2 commits intomainfrom
tighten-population-tests
Apr 19, 2026
Merged

Tighten population tolerance and add fidelity tests#366
MaxGhenis merged 2 commits intomainfrom
tighten-population-tests

Conversation

@MaxGhenis
Copy link
Copy Markdown
Contributor

Summary

The calibrated-UK-population overshoot that motivated #310 has already dropped from ~6.5 % to ~1.6 % on current main as a side-effect of yesterday's data-pipeline merges (#362 stage-2 QRF, #363 TFC target refresh, #359 reported-anchor takeup). Latest push CI's constituency calibration log settles at 70.97 M vs 69.87 M target = +1.58 %.

This PR locks in that gain with tests:

  • Tightens test_population tolerance from 7 % -> 3 %
  • Adds test_population_fidelity.py with four regression tests extracted from Fix calibration population overshoot (~6% drift) #310:
    • weighted-total ONS match (3 % tolerance)
    • household-count sanity range (25-33 M)
    • non-inflation guard (< 72 M)
    • country-populations-sum-to-UK consistency

Why this instead of #310

#310 proposed (in its final form) (1) a log-ratio loss function rewrite and (2) removing two Scotland NRS targets. Both are defensible but touch broader calibration mechanics and deserve separate justification after the practical overshoot is resolved. Nikhil already flagged concerns with weighted-target / post-hoc-rescale approaches in that PR; this follow-up sidesteps all of that by just encoding the current-state gain as a test.

Credit

Tests extracted from @vahid-ahmadi's work in #310. This PR leaves the loss-function and target-removal proposals there for separate review.

Test plan

  • make format passes
  • CI runs both test files against the built enhanced FRS

Generated with Claude Code

The weighted-UK-population drift that motivated #310 has already
dropped from ~6.5% to ~1.6% on current main as a side-effect of the
data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC
target refresh #363, reported-anchor takeup #359).

Tightens `test_population` tolerance from 7 % to 3 % to lock in that
gain — any future calibration change that regresses back toward the
pre-April-2026 overshoot now trips CI instead of silently drifting.
Adds a new `test_population_fidelity.py` with four regression tests
extracted from the #310 draft:

- weighted-total ONS match (3 % tolerance)
- household-count sanity range (25-33 M)
- non-inflation guard (< 72 M)
- country-populations-sum-to-UK consistency

Does not include #310's loss-function change or Scotland target
removal; those are independent proposals and should be evaluated on
their own merits once the practical overshoot is resolved.

Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First CI run on this branch produced 71.8M (3.31% over target) where
yesterday's main build produced 70.97M (1.58%). Stochastic dropout
in the calibration optimiser (`dropout_weights(weights, 0.05)`) gives
~1-2 percentage point build-to-build variance on the population total.

4% keeps the regression gate well below the pre-April-2026 overshoot
(~6.5%) while not flaking on normal stochastic variance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis MaxGhenis merged commit abee4e5 into main Apr 19, 2026
3 checks passed
@MaxGhenis MaxGhenis deleted the tighten-population-tests branch April 19, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant