chore: scrap geocoding-based NUTS resolution direction (#45)#70
Merged
Conversation
The geocoding direction explored under #45 — using Nominatim/Zippopotam to geocode a postal code to coordinates and then GISCO find-nuts.py to look up NUTS3 — is abandoned. First and only run of the daily validation workflow showed Nominatim/Zippopotam coverage at ~6-7% of the low-confidence pool (Nominatim is biased against rural / small-locality postcodes — exactly the postcodes that need a fallback) and the disagreement rate among the geocoded slice was dominated by Nominatim mis-geocodes and GISCO border-pixel artefacts rather than real errors in our estimates. Net signal is not worth the operational cost. Removes: - .github/workflows/validate-estimates.yml (daily 03:00 UTC validation) - scripts/validate_estimates.py (the script the workflow ran) - the #45 happyGISCO bullet from docs/performance.md (no longer a future re-baseline scenario) Closes #45 (parent investigation — all three sub-features die together: validation, live fallback, monitor estimation). Closes #69 (rolling tracking issue — moot once the workflow is gone). If postal-code coverage gaps need addressing again, the agreed alternative direction is GeoNames postcode bulk data (#54/#56/#57), not geocoding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes the geocoding-based validation workflow and script introduced in
e8948b7, and abandons the broader direction explored under #45 (Nominatim/Zippopotam → coordinates → GISCOfind-nuts.pyfor low-confidence postal codes).Why
First and only run of the daily validation workflow (2026-05-01 report on #69) showed:
Net signal is not worth the operational cost (daily noise email per run, plus a future hot-path latency hit if the live fallback ever shipped). All three #45 sub-features (validation, live fallback for unknown postcodes, better monitor NUTS estimation) die together — they share the same geocoding mechanism.
If postal-code coverage gaps need addressing again, the agreed alternative is GeoNames postcode bulk data (#54/#56/#57), not geocoding.
Changes
.github/workflows/validate-estimates.ymlscripts/validate_estimates.py#45 (happyGISCO outbound geocoding)re-baseline bullet fromdocs/performance.md.github/data/validation_state.jsonwas never created (the workflow only ran once via the public output and didn't push state). Repo's existingGISCO TERCETreferences inapp/data_loader.py,app/main.py, README, and CHANGELOG are about the authoritative TERCET flat files and the NUTS region names CSV — those are unrelated to the geocoding direction and are kept.Closes #45.
Closes #69.
Test plan
.github/workflows/validate-estimates.ymlno longer appears in the Actions tab once mergeddocs/performance.mdreads cleanly without the dropped bullet🤖 Generated with Claude Code