Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add popsafe-fips to geomapper #1787

Merged
merged 20 commits into from
Mar 14, 2023

Conversation

nmdefries
Copy link
Contributor

Description

Create new level in the geomapper to support changes to chng data. Background and design considerations.

CHNG would like to stop publishing county-level information for counties with low population (<20k). They have proposed a list of ~400 county groups which bundle together low-population counties until the population is above the 20k threshold. The groups are reasonably contiguous and combining data from low-population areas is a common technique to alleviate privacy concerns

To allow this, add a popsafe-fips level to the geomapper that codes high-population (>20k) counties with their official FIPS codes and low-population counties with a code of the format <two digit state FIPS>g<two digit group number within the state>, e.g. "01g05". Add mappings for fips -> popsafe-fips and popsafe-fips -> state.

Changelog

  • Original county groups file, data_proc/geomap/lowpop_county_groups.csv
  • geo_data_proc.py - functions to generate popsafe-fips map tables from input county groups file
  • 2019 and 2020 files mapping fips -> popsafe-fips and popsafe-fips -> state
  • geomap.py
  • test_geomap.py

@nmdefries nmdefries marked this pull request as ready for review February 21, 2023 18:51
@nmdefries
Copy link
Contributor Author

The provided county groupings don't seem to meet the stated requirements. Using our population data from 2020, there are many ungrouped counties and county groups with populations less than 20k (data file generated with script).

Some counties and county groups are just barely below the cutoff and would likely meet the 20k threshold using updated population data, but there are many counties that are far (>50%) below the population requirement.

chng could purposefully have not bothered grouping counties they don't report, however, all but 46 of the ungrouped counties included in the csv above were reported by chng in the last three weeks. (The 46 unreported counties are mostly territories and low-pop counties in Alaska.)

@krivard
Copy link
Contributor

krivard commented Feb 23, 2023

I'm checking the population results with Mina and will hold my final review until that gets resolved. Thank you for doing the initial analysis!

@nmdefries
Copy link
Contributor Author

The data provider wants to keep the county groups as-is. We should rename the new level to make it clear that it isn't generally applicable nor "population safe". Maybe chng_fips (along the same lines as the indicator-specific geo level jhu_uid)?

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for adding good tests too! Just one (non-blocking) question: does CHNG have any prop signals? Might be good to add some population methods if so.

_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
elif contained_geocode_type == "county" and container_geocode_type == "state":
crosswalk = self._crosswalks["fips"]["state"]
return set(crosswalk[crosswalk["state_id"] == container_geocode]["fips"])
elif (contained_geocode_type in ("county", "fips", "popsafe-fips") and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: thanks for enforcing consistency!

@krivard
Copy link
Contributor

krivard commented Mar 1, 2023

@nmdefries
I'd accept any of the following:

  • chng-fips
  • chng-county

(use dashes so it's easier to parse out of the CSV filenames)

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modulo the name change, looks great!

"pop": int,
"weight": float
"weight": float,
**{geo: str for geo in self._geos - set("nation")}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: nicely crystallized!

@nmdefries
Copy link
Contributor Author

Will go with chng-fips for the new name.

@nmdefries
Copy link
Contributor Author

does CHNG have any prop signals? Might be good to add some population methods if so.

CHNG doesn't currently report any prop signals or directly use any population data, so we don't need to do this.

@nmdefries nmdefries requested a review from krivard March 3, 2023 17:51
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(best remove all implied relationships between chng groups and population)

_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
_delphi_utils_python/data_proc/geomap/geo_data_proc.py Outdated Show resolved Hide resolved
@nmdefries
Copy link
Contributor Author

@krivard I won't get to splitting the fips_list field today. This PR works as-is so we could merge if someone will be working on getting changehc to use this in the next week while I'm gone. Otherwise, I'll pick up this and #1803 when I get back.

Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHNG probably needs another couple weeks to fix their end, but approving this provisionally anyhow

We can't rely on the component counties of a group being listed
individually, so reconstruct those fields ourselves based on the
concatenated `fips_list` field. Split by the separator (pipe `|`) and
save each result to a new column.
@nmdefries
Copy link
Contributor Author

Splitting the fips_list field has been added in #1803.

@nmdefries
Copy link
Contributor Author

@korlaxxalrok This is ready to merge.

…-map-generation

generate local county mapping from CHNG spreadsheet
@korlaxxalrok korlaxxalrok merged commit 3943bd4 into main Mar 14, 2023
@korlaxxalrok korlaxxalrok deleted the ndefries/geomapper/popsafe-county-level branch March 14, 2023 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants