# devlog 2024-05-03

_Author: Tyler Coles_

Testing our us_census functions for loading canonical sets of IDs for Census granularities from state to block group for all supported TIGER years (2000 and 2009-2023).

Since this is our source of truth for these delineations, we want to make sure we're getting complete data. One thing we can test is that at each level of granularity (above block group) each node should contain at least one child node. That is every state should contain a county, every county a tract, and every tract a block group. Otherwise we know something is missing.

(This may seem like a trivial test, but in fact it discovered that my original assumptions about how TIGER provides the data were invalid and has already saved us from bugs!)

WARNING: this will take a very long time if you don't have the TIGER files cached.

In [1]:
import epymorph.geography.us_census as c
import epymorph.geography.us_tiger as t


class Fail(Exception):
    pass


def test_year(year: int) -> None:
    # 1. test that we have 52 states
    states = c.get_us_states(year).geoid

    if len(states) != 52:
        raise Fail("There weren't 52 states!")

    # 2. test that each state contains at least one county
    counties = c.get_us_counties(year).geoid
    counties_by_state = c.STATE.grouped(counties)

    exs = [
        Fail(f"State {x} does not have at least one county.")
        for x in states
        if x not in counties_by_state or len(counties_by_state[x]) == 0
    ]
    if len(exs) > 0:
        raise ExceptionGroup("Failed checking counties.", exs)

    # 3. test that each county contains at least one tract
    tracts = c.get_us_tracts(year).geoid
    tracts_by_county = c.COUNTY.grouped(tracts)

    exs = []
    for x in counties:
        if x not in tracts_by_county or len(tracts_by_county[x]) == 0:
            exs.append(Fail(f"County {x} does not have at least one tract."))
    if len(exs) > 0:
        raise ExceptionGroup("Failed checking tracts.", exs)

    # 4. test that each tract contains at least one block group
    cbgs = c.get_us_block_groups(year).geoid
    cbgs_by_tract = c.TRACT.grouped(cbgs)

    exs = []
    for x in tracts:
        if x not in cbgs_by_tract or len(cbgs_by_tract[x]) == 0:
            exs.append(Fail(f"Tract {x} does not have at least one block group."))
    if len(exs) > 0:
        raise ExceptionGroup("Failed checking block groups.", exs)

    print(f"Census year {year} passed!")

In [2]:
for year in t.TIGER_YEARS:
    test_year(year)

Census year 2000 passed!
Census year 2009 passed!
Census year 2010 passed!
Census year 2011 passed!
Census year 2012 passed!
Census year 2013 passed!
Census year 2014 passed!
Census year 2015 passed!
Census year 2016 passed!
Census year 2017 passed!
Census year 2018 passed!
Census year 2019 passed!
Census year 2020 passed!
Census year 2021 passed!
Census year 2022 passed!
Census year 2023 passed!
