-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
City Boundaries #874
City Boundaries #874
Conversation
…stock into feature/city_boundaries
ab44585
to
e1ebf30
Compare
I don't love the idea of requiring LFS. It strikes me that the reason these TSVS are so large is because 6 decimal places are required for every value. Aside from the fact that that it has always seemed excessively precise to me given the uncertainty on the values, these two large TSVs are essentially just 1s and 0s. Maybe it's time to remove the 6 decimal place requirement? For what it's worth, I changed from 6 decimal places to 0 decimal places on County.tsv and it dropped from 124mb to 28mb. |
e1ebf30
to
349c6b7
Compare
@shorowit: Thanks. I think it is time. I can try to relax this requirement in the integrity checks. |
@afontani Another option would be to create a separate file that defines the required number of decimal places on a per-TSV basis. So these new TSVs could use 0 while others could remain at 6. |
That approach might also be helpful for tracking which TSVs have actual distributions and which just provide mappings. |
@joseph-robertson, @shorowit: I think I will add a column to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limiting factor was Burlington Vermont with about 16,000 dwelling units.
What does "limiting factor" mean?
CHANGELOG.md
Outdated
- Cities are added as a geographic characteristic ([#874](https://github.com/NREL/resstock/pull/874)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the threshold and why was that threshold selected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @afontani . Few comments/questions
- - Cities.tsv --> City.tsv to be consistent with naming of other TSV files
-
Why not use human readable strings for city/place, e.g., "Birmingham, AL" instead of "al_birmingham". I guess so it is in alphabetical order? I think "Birmingham, AL" would makes things easier because it would be readily interpretted by Tableau. - -
@tony, we could make "County and PUMA" the string version, and leave PUMA and County as geo IDs? That would not mess up Sightglass but would let us have one field with human-useful information.I retracted this suggestion; @afontani and I decided to revert the PUMA names to use PUMA ids instead - - "# Assumption: The value 'In another census Place' designates the fraction of dwelling units in a Census Place with less total dwelling units than the threshold." "less" --> "fewer"
- - What does "U.S. Census data as of July 1, 2015" mean if the census only happens every ten years?
Pull Request Description
Companion PR: resstock-estimation (#196)
Purpose: Adding City boundaries as a housing characteristic in ResStock. This way users can aggregate directly by a given City.
Inclusion of City boundaries
What was included
The term "City Boundaries" refers to the census geography "Place". A place is defined by the U.S. Census Office: here. Places are defined by census blocks and are not contained in tracts, or counties (see census geography hierarchy below). Not all places are included in the new characteristics. There are approximately 29,000 places in the U.S. To reduce the number of options being tagged by ResStock, a dwelling unit count threshold was added. Places with 15,000 dwelling units or more are included in the "Cities.tsv". This threshold allows for each State (in the continental U.S.) to have at least 1 place tagged in ResStock. The limiting factor was Burlington Vermont with about 16,000 dwelling units.
Implimentation
Sample size considerations
Simply adding a tag in the metadata of the results does not guarantee that enough samples have been allocated to these cities to ensure precise results. Sampling from ResStock is deterministic but is also random. At this time, it is generally recommended that a minimum of 1,000 samples should be used for timeseries and annual results. One should use standard error or confidence intervals to understand how much faith should be used in trusting the results.
Updated spatial lookup tables
As part of this work new spatial lookup tables exist. There are 3 spatial lookup tables. These new tables include Alaska, Hawaii, and Puerto Rico.
These spatial lookup tables are stored on s3 (
s3://resstock-estimation/various_datasets/spatial_data_v2/
) on the resbldg sandbox account and is part of the resstock-estimation repository.The spatial lookup tables connect census geographies and other geographic tags and housing units. The spatial entities contained in these files are:
State.tsv
.state_abbreviation
,5 digit PUMA
)nhgis_2010_county_gisjoin, nhgis_2010_puma_gisjoin
.If a given column does not exist in either the census tract or county lookup table, it is because that particular geography does not nicely map to tracts or counties. Here are a couple examples:
Housing Characteristics Changes
New Characteristics
Updated Characteristics
Options Lookup Changes
Checklist
Not all may apply:
openstudio tasks.rb update_measures
has been run