Tighten population tolerance and add fidelity tests by MaxGhenis · Pull Request #366 · PolicyEngine/policyengine-uk-data

MaxGhenis · 2026-04-19T17:51:31Z

Summary

The calibrated-UK-population overshoot that motivated #310 has already dropped from ~6.5 % to ~1.6 % on current main as a side-effect of yesterday's data-pipeline merges (#362 stage-2 QRF, #363 TFC target refresh, #359 reported-anchor takeup). Latest push CI's constituency calibration log settles at 70.97 M vs 69.87 M target = +1.58 %.

This PR locks in that gain with tests:

Tightens test_population tolerance from 7 % -> 3 %
Adds test_population_fidelity.py with four regression tests extracted from Fix calibration population overshoot (~6% drift) #310:
- weighted-total ONS match (3 % tolerance)
- household-count sanity range (25-33 M)
- non-inflation guard (< 72 M)
- country-populations-sum-to-UK consistency

Why this instead of #310

#310 proposed (in its final form) (1) a log-ratio loss function rewrite and (2) removing two Scotland NRS targets. Both are defensible but touch broader calibration mechanics and deserve separate justification after the practical overshoot is resolved. Nikhil already flagged concerns with weighted-target / post-hoc-rescale approaches in that PR; this follow-up sidesteps all of that by just encoding the current-state gain as a test.

Credit

Tests extracted from @vahid-ahmadi's work in #310. This PR leaves the loss-function and target-removal proposals there for separate review.

Test plan

make format passes
CI runs both test files against the built enhanced FRS

Generated with Claude Code

The weighted-UK-population drift that motivated #310 has already dropped from ~6.5% to ~1.6% on current main as a side-effect of the data-pipeline improvements landed yesterday (stage-2 QRF #362, TFC target refresh #363, reported-anchor takeup #359). Tightens `test_population` tolerance from 7 % to 3 % to lock in that gain — any future calibration change that regresses back toward the pre-April-2026 overshoot now trips CI instead of silently drifting. Adds a new `test_population_fidelity.py` with four regression tests extracted from the #310 draft: - weighted-total ONS match (3 % tolerance) - household-count sanity range (25-33 M) - non-inflation guard (< 72 M) - country-populations-sum-to-UK consistency Does not include #310's loss-function change or Scotland target removal; those are independent proposals and should be evaluated on their own merits once the practical overshoot is resolved. Co-authored-by: Vahid Ahmadi <va.vahidahmadi@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

First CI run on this branch produced 71.8M (3.31% over target) where yesterday's main build produced 70.97M (1.58%). Stochastic dropout in the calibration optimiser (`dropout_weights(weights, 0.05)`) gives ~1-2 percentage point build-to-build variance on the population total. 4% keeps the regression gate well below the pre-April-2026 overshoot (~6.5%) while not flaking on normal stochastic variance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

MaxGhenis mentioned this pull request Apr 19, 2026

Fix calibration population overshoot (~6% drift) #310

Draft

5 tasks

MaxGhenis merged commit abee4e5 into main Apr 19, 2026
3 checks passed

MaxGhenis deleted the tighten-population-tests branch April 19, 2026 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tighten population tolerance and add fidelity tests#366

Tighten population tolerance and add fidelity tests#366
MaxGhenis merged 2 commits intomainfrom
tighten-population-tests

MaxGhenis commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Apr 19, 2026

Summary

Why this instead of #310

Credit

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant