Build the 117th->119th CD vintage crosswalk from Census sources#288
Draft
MaxGhenis wants to merge 3 commits into
Draft
Build the 117th->119th CD vintage crosswalk from Census sources#288MaxGhenis wants to merge 3 commits into
MaxGhenis wants to merge 3 commits into
Conversation
The CD geography-vintage translation (#207/#208/#209) consumed a crosswalk passed as an external CLI path with no versioned, reproducible, or cited artifact in the repo. This adds that artifact and its generator, and makes it the packaged default. The crosswalk is built by a single-vintage block overlay, so no 2010<->2020 block bridge is needed: - old (117th) district of each 2020 block: the 2020 Block Assignment File CD layer (BlockAssign_ST{fips}_{usps}_CD.txt), which carries the 116th-Congress plan (identical district geography to the 117th) on 2020 tabulation blocks; - current (119th) district of each 2020 block: the 119th BEF (NationalCD119.txt), the same source the block ladder already uses; - weight: 2020 P.L. 94-171 POP100 per block (the block ladder's parse_pl_geo_blocks convention). Both district assignments are read on the same 2020 blocks weighted by the same 2020 block populations, so each old district's population is redistributed across the current districts it overlaps and never invented. Population is the correct default basis (apportionment and equal-population redistricting are population operations); ACS income/tax proxy weights for fiscal targets are a documented future refinement. The committed national crosswalk covers all 436 current 119th-Congress districts from 436 source districts, with exact per-state population conservation over all 331,449,281 people in the 50 states + DC (zero unmatched or cross-state). The apportionment-shrunk districts (CA-53, IL-18, MI-14, NY-27, OH-16, PA-18, WV-03) appear only as sources; the new districts (CO-08, FL-28, MT-02, NC-14, OR-06, TX-37, TX-38) appear as populated targets. Montana's at-large district splits ~50/50 into MT-01/MT-02, as equal-population districts require. The derived crosswalk is a regenerable build artifact, not a Ledger fact -- the fact-vs-computed boundary of PolicyEngine/ledger#71; the same declared-consumer- side-transform pattern applies to the Belgian NIS-code vintage work in PolicyEngine/ledger#69. Files: - congressional_district_vintage_crosswalk.py: pure parsers (BAF CD layer, CD BEF) and the population-weighted join with conservation diagnostics. - tools/build_us_congressional_district_vintage_crosswalk.py: download + cache + provenance orchestration, mirroring build_us_block_ladder_artifact. - us/congressional_district_vintage_crosswalk.csv (+ .provenance.json + .md): the committed artifact, per-source SHA-256s, and the data-source doc. - congressional_district_vintage.py: packaged-default loader helpers. - build_us_fiscal_refresh_release.py: default to the packaged crosswalk when CD targets are requested and no path is passed. Closes #205. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CI (wheels job) surfaced two issues: 1. The `us/` country package is a spec-only directory: the governance tests (test_spec_only_country_packages) and country_spec.py require it to contain only .json resources, all declared in country_package.json. Move the crosswalk CSV, its provenance JSON, and the data-source doc to a new us_runtime/data/ package (us_runtime is exempt from the spec-only rule), and re-anchor the packaged-default loader at populace.build.us_runtime.data. 2. test_cd_targets_require_vintage_crosswalk asserted the release builder errors when CD targets are requested without a crosswalk; the builder now defaults to the packaged crosswalk, so rename/rewrite the test to assert the default is applied and an explicit path still overrides it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The congressional-district (CD) geography-vintage translation merged in #207 /
#208 / #209 consumed a crosswalk passed as an external CLI path — there was
no versioned, reproducible, or Census-cited crosswalk artifact in the repo, and
the only file that existed (an ad-hoc validation CSV) had demonstrably wrong
weights (Montana split 39/61 rather than the equal-population ~50/50, and DC
keyed as
…1198rather than the repo-wide at-large/delegate…1100).This PR ships the missing piece: a versioned, reproducible, population-weighted
117th → 119th CD crosswalk built from primary Census sources, its generator, and
its provenance, and makes it the packaged default. It closes #205.
The mechanics (
translate_congressional_district_facts_to_current_vintage, thestate-total proxy for at-large states, the support-provenance guard) already
landed in #207–#209 and are unchanged here; the hole was the data artifact and
its lineage.
Method
A single-vintage block overlay, so no 2010↔2020 block relationship file is
needed:
CDlayer (BlockAssign_ST{fips}_{usps}_CD.txt) — the 116th-Congress plan (identical district geography to the 117th) on 2020 blocksBLOCKID|DISTRICTNationalCD119.txt) — the same source the block ladder (#277) already usesGEOID,CDFPPOP100per block (parse_pl_geo_blocksconvention)Both district assignments are read on the same 2020 blocks and weighted by the
same 2020 block populations, so each old district's population is redistributed
across the current districts it overlaps and never invented. Population is the
correct default basis (apportionment and equal-population redistricting are
population operations); ACS income/tax proxy weights for fiscal targets are the
documented next refinement noted in #205.
The committed national crosswalk
(including one-district states and DC).
331,449,281 people, zero unmatched or cross-state population.
WV-03) appear only as sources; new districts (CO-08, FL-28, MT-02, NC-14,
OR-06, TX-37, TX-38) appear as populated targets.
Every source file's URL + SHA-256, the crosswalk SHA-256, the source/target
vintages, and the full per-state conservation table are recorded in
congressional_district_vintage_crosswalk.csv.provenance.json; the human-readablerecipe and citations are in
CONGRESSIONAL_DISTRICT_VINTAGE_CROSSWALK.md.Fit with the Ledger schema direction
The translated CD targets are derived build artifacts, never Ledger facts —
the fact-vs-computed boundary of PolicyEngine/ledger#71. The crosswalk is a
declared consumer-side transform over facts with its own lineage, exactly the
pattern #71 prescribes and that #280 mirrors for period aging. The same mechanism
is called out for the Belgian NIS-code vintage crosswalk in PolicyEngine/ledger#69.
Files
packages/populace-build/src/populace/build/us_runtime/congressional_district_vintage_crosswalk.py— pure, tested parsers (BAFCDlayer, CD BEF) and the population-weighted join with conservation diagnostics.tools/build_us_congressional_district_vintage_crosswalk.py— download + cache + provenance orchestration, mirroringbuild_us_block_ladder_artifact.py.packages/populace-build/src/populace/build/us/congressional_district_vintage_crosswalk.csv(+.provenance.json, +.md) — the committed artifact, per-source SHA-256s, and the data-source doc.packages/populace-build/src/populace/build/us_runtime/congressional_district_vintage.py— packaged-default loader helpers (load_default_…,default_…_path).tools/build_us_fiscal_refresh_release.py— default to the packaged crosswalk when CD targets are requested and no path is passed (explicit paths still override).Verification
pytest test_us_congressional_district_vintage_crosswalk.py→ 14 passed (parsers, at-large/delegate normalization, conservation math, uncovered-population reporting, and integration tests that load the real packaged crosswalk: 436/436, MT ~50/50, extra-only-as-source / new-only-as-target, exact fact-value conservation).pytest test_us_congressional_district_vintage.py test_us_congressional_district_geography.py test_us_block_ladder_sources.py test_us_fiscal_targets.py→ all green (no regressions).pytest --extra us test_us_fiscal_refresh_builder.py→ all green (provenance-guard path).ruff check,ruff format --check, andgit diff --checkall clean.Closes #205. Refs PolicyEngine/ledger#69, PolicyEngine/ledger#71.
🤖 Generated with Claude Code