Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

City Boundaries #874

Merged
merged 31 commits into from
Apr 20, 2022
Merged

City Boundaries #874

merged 31 commits into from
Apr 20, 2022

Conversation

afontani
Copy link
Contributor

@afontani afontani commented Apr 4, 2022

Pull Request Description

Companion PR: resstock-estimation (#196)

Purpose: Adding City boundaries as a housing characteristic in ResStock. This way users can aggregate directly by a given City.

Inclusion of City boundaries

What was included

The term "City Boundaries" refers to the census geography "Place". A place is defined by the U.S. Census Office: here. Places are defined by census blocks and are not contained in tracts, or counties (see census geography hierarchy below). Not all places are included in the new characteristics. There are approximately 29,000 places in the U.S. To reduce the number of options being tagged by ResStock, a dwelling unit count threshold was added. Places with 15,000 dwelling units or more are included in the "Cities.tsv". This threshold allows for each State (in the continental U.S.) to have at least 1 place tagged in ResStock. The limiting factor was Burlington Vermont with about 16,000 dwelling units.

censusgeochart

Implimentation

  • The dwelling units at the census tract level from the 2016 5-yr American Community Survey were apportioned to census blocks using the dwelling unit counts at the block level from the 2020 Decennial Redistricting Data.
    • The 2020 census block unit counts were mapped to 2010 census blocks using NHGIS Crosswalk file.
    • The occupied unit counts and vacant unit counts from the redistricting data were used to create a weight to apportion the census tract unit counts in ACS to census blocks.
    • The ACS tables used are B25001 and B25004.
    • There are some small differences in the unit counts from ACS as the tables seems to have changed since the last time it was downloaded.
  • The 2010 census blocks contained in each Place were downloaded here.
  • Some modifications to the county and census tracts were made based on errors from the 2010 Census or name changes since the 2010 census. The date of the geographies should be July 1, 2015.

Sample size considerations

Simply adding a tag in the metadata of the results does not guarantee that enough samples have been allocated to these cities to ensure precise results. Sampling from ResStock is deterministic but is also random. At this time, it is generally recommended that a minimum of 1,000 samples should be used for timeseries and annual results. One should use standard error or confidence intervals to understand how much faith should be used in trusting the results.

Updated spatial lookup tables

As part of this work new spatial lookup tables exist. There are 3 spatial lookup tables. These new tables include Alaska, Hawaii, and Puerto Rico.

  1. Census block level
  2. Census tract level
  3. County level

These spatial lookup tables are stored on s3 (s3://resstock-estimation/various_datasets/spatial_data_v2/) on the resbldg sandbox account and is part of the resstock-estimation repository.

The spatial lookup tables connect census geographies and other geographic tags and housing units. The spatial entities contained in these files are:

  • nhgis_2010_block_gisjoin: The NHGIS GISJOIN block identifier
  • nhgis_2010_block_group_gisjoin: The NHGIS GISJOIN block group identifier
  • nhgis_2010_tract_gisjoin: The NHGIS GISJOIN tract identifier
  • nhgis_2010_puma_gisjoin: The NHGIS GISJOIN Public Use Microdata Area (PUMA) identifier
  • nhgis_2010_county_gisjoin: The NHGIS GISJOIN county identifier
  • housing_units_2020_redistricting: The number of total dwelling units from the 2020 decennial redistricting data.
  • occupied_units_2020_redistricting: The number of occupied dwelling units from the 2020 decennial redistricting data.
  • vacant_units_2020_redistricting: The number of vacant dwelling units from the 2020 decennial redistricting data.
  • housing_units_tract_to_block_weight: The weight that maps the total dwelling units at the tract level to census blocks. Used in mapping dwelling units from ACS to census blocks based on distributions in the 2020 decennial redistricting data.
  • occupied_units_tract_to_block_weight: The weight that maps the occupied dwelling units at the tract level to census blocks. Used in mapping dwelling units from ACS to census blocks based on distributions in the 2020 decennial redistricting data.
  • vacant_units_tract_to_block_weight: The weight that maps the vacant dwelling units at the tract level to census blocks. Used in mapping dwelling units from ACS to census blocks based on distributions in the 2020 decennial redistricting data.
  • place: Census place name and identifier from the 2010 census.
  • county_name: County names at July 1, 2015.
  • state_abbreviation: State abbreviations for the State.tsv.
  • state_name: State names.
  • census_region: Census region names.
  • census_division: Census division names.
  • building_america_climate_zone: Building America climate zones defined by county.
  • iecc_2012_climate_zone: 2012 IECC climate and moisture zones. This is also the same as the 2004 ASHRAE 169 climate and moisture zones.
  • iso_zone: Independant system operator regions defined by county from EIA Form 861.
  • custom_region: Geography based on a collection of states in RECS 2009.
  • puma_name: PUMA name from the 2010 Census.
  • puma_tsv: PUMA naming convention for the PUMA.tsv (state_abbreviation, 5 digit PUMA)
  • county_and_puma: A column that describes all combinations of PUMA and Counties in the U.S. PUMAs and Counties are both a collection of census tracts, but often overlap. This characteristic is the finest geographic granularity in ResStock. The characteristic is formatted by nhgis_2010_county_gisjoin, nhgis_2010_puma_gisjoin.
  • ahs_region_2013: American Housing Survey (AHS) regions. 13 of the largest Core Based Statistical Areas (CBSAs) and Census divisions. CBSAs are defined at the county level. These data are taken from 2013 census definitions to match the 2013 AHS survey.
  • census_division_recs: Census divisions defined in RECS 2015. The census divisions are the same as the U.S. Census except the "Mountain" cenus division is split into North and South.
  • reeds_balancing_areas: ReEDS balancing areas that are county based and loosely represent grid balancing athorities.
  • generation_emissions_assessment_regions: Generation and emission regions used for carbon calculations. These regions are aggregations of ReEDS balancing areas.
  • weather_file_2012: 2012 Annual Meteorological Year (AMY) weather files used for a given county.
  • weather_file_2015: 2015 AMY weather files used for a given county.
  • weather_file_2016: 2016 AMY weather files used for a given county.
  • weather_file_2017: 2017 AMY weather files used for a given county.
  • weather_file_2018: 2018 AMY weather files used for a given county.
  • weather_file_2019: 2019 AMY weather files used for a given county.
  • weather_file_TMY3: Typical Meteorological Year (TMY) weather files used for a given county.
  • housing_units: Total dwelling units from ACS mapped to census blocks using the 2020 decennial redistricting data.
  • vacant_units: Vacant dwelling units from ACS mapped to census blocks using the 2020 decennial redistricting data.
  • occupied_units: Occupied dwelling units from ACS mapped to census blocks using the 2020 decennial redistricting data.
  • cities: Census Places with dwelling units larger than 15,000 dwelling units.

If a given column does not exist in either the census tract or county lookup table, it is because that particular geography does not nicely map to tracts or counties. Here are a couple examples:

  • PUMAs do not map nicely to Counties and columns corresponding to PUMAs are not in the county spatial lookup table
  • Places are defined at the census block and do not map nicely to census tracts. As a result, the place and cities columns to not exist in the census tract or county lookup tables.

Housing Characteristics Changes

graphs_before_after

New Characteristics

  • County and PUMA.tsv - Counties and PUMAs are a many to many relationship. This characteristic defines the finest granularity of ResStock results. The characteristic makes it convenient to define city boundaries as there are ~4500 geographic entities in the "County and PUMA.tsv" vs using "County" and "PUMA" as dependencies which would result in >7,000,000 rows.
  • Cities.tsv: This characteristic is constructed by the definition of the census "Place" which is defined by census blocks. There are about 29,000 Places in the 2010 Census. Of these Places, the largest places are selected based on the conditions and information below. The dwelling unit threshold was determined based on the condition that every State has 1 City represented. The limiting case is Vermont with Burlington having about 16,000 dwelling units.
    • 1,099 Places explicitly tagged as an option.
    • Threshold is 15,000 dwelling units
    • All States and D.C. in the continental U.S. have at least 1 "City" tagged
    • 744/2,336 PUMAs do not have a "City" tagged
    • 2,535/3,108 Counties do not have a "City" tagged

Updated Characteristics

  • County.tsv - County was previously dependent on "ASHRAE/IECC Climate Zone 2004.tsv" is now dependent on "County and PUMA.tsv"
  • PUMA.tsv - PUMA was previously dependent on "County.tsv" and "ASHRAE/IECC Climate Zone.tsv" is now dependent on "County and PUMA.tsv"
  • CEC Climate Zone.tsv: CEC Climate Zone was previously dependent on "PUMA" is now dependent on "County and PUMA.tsv"

Options Lookup Changes

  • Options added for "County and PUMA.tsv"
  • Options added for "Cities.tsv"
  • Names of options changed for PUMA.tsv. Options changed from PUMA GISJOIN IDs to PUMA names.

Checklist

Not all may apply:

  • Tests (and test files) have been updated
  • Documentation has been updated
  • Changelog has been updated
  • openstudio tasks.rb update_measures has been run
  • No unexpected integration/regression test changes

@afontani afontani force-pushed the feature/city_boundaries branch 6 times, most recently from ab44585 to e1ebf30 Compare April 4, 2022 19:02
@shorowit
Copy link
Contributor

shorowit commented Apr 4, 2022

I don't love the idea of requiring LFS. It strikes me that the reason these TSVS are so large is because 6 decimal places are required for every value. Aside from the fact that that it has always seemed excessively precise to me given the uncertainty on the values, these two large TSVs are essentially just 1s and 0s. Maybe it's time to remove the 6 decimal place requirement? For what it's worth, I changed from 6 decimal places to 0 decimal places on County.tsv and it dropped from 124mb to 28mb.

@afontani
Copy link
Contributor Author

afontani commented Apr 4, 2022

@shorowit: Thanks. I think it is time. I can try to relax this requirement in the integrity checks.

@shorowit
Copy link
Contributor

shorowit commented Apr 4, 2022

@afontani Another option would be to create a separate file that defines the required number of decimal places on a per-TSV basis. So these new TSVs could use 0 while others could remain at 6.

@joseph-robertson
Copy link
Contributor

That approach might also be helpful for tracking which TSVs have actual distributions and which just provide mappings.

@afontani
Copy link
Contributor Author

afontani commented Apr 5, 2022

@joseph-robertson, @shorowit: I think I will add a column to the source_report in resstock-estimation that states if the characteristic is a "mapping" characteristic or a "distribution" characteristic.

@afontani afontani self-assigned this Apr 6, 2022
@afontani afontani marked this pull request as ready for review April 13, 2022 17:48
Copy link
Contributor

@ejhw ejhw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The limiting factor was Burlington Vermont with about 16,000 dwelling units.
What does "limiting factor" mean?

CHANGELOG.md Outdated
Comment on lines 10 to 11
- Cities are added as a geographic characteristic ([#874](https://github.com/NREL/resstock/pull/874))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the threshold and why was that threshold selected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejhw: Thanks. Addressed 76337ee

@afontani afontani requested a review from ekpresent April 13, 2022 19:28
@afontani afontani requested a review from ejhw April 13, 2022 19:32
Copy link
Contributor

@ejhw ejhw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @afontani . Few comments/questions

  • - Cities.tsv --> City.tsv to be consistent with naming of other TSV files
  • Why not use human readable strings for city/place, e.g., "Birmingham, AL" instead of "al_birmingham". I guess so it is in alphabetical order? I think "Birmingham, AL" would makes things easier because it would be readily interpretted by Tableau.
  • - @tony, we could make "County and PUMA" the string version, and leave PUMA and County as geo IDs? That would not mess up Sightglass but would let us have one field with human-useful information. I retracted this suggestion; @afontani and I decided to revert the PUMA names to use PUMA ids instead
  • - "# Assumption: The value 'In another census Place' designates the fraction of dwelling units in a Census Place with less total dwelling units than the threshold." "less" --> "fewer"
  • - What does "U.S. Census data as of July 1, 2015" mean if the census only happens every ten years?

@ejhw ejhw self-requested a review April 20, 2022 16:42
@afontani afontani merged commit 65ffece into develop Apr 20, 2022
@afontani afontani deleted the feature/city_boundaries branch April 20, 2022 19:30
@joseph-robertson joseph-robertson added this to the ResStock v2.6.0 Beta milestone May 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants