Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate static state, territory, and province data in one place #1958

Closed
10 tasks done
zaneselvans opened this issue Sep 28, 2022 · 0 comments · Fixed by #1966
Closed
10 tasks done

Consolidate static state, territory, and province data in one place #1958

zaneselvans opened this issue Sep 28, 2022 · 0 comments · Fixed by #1966
Assignees
Labels
data-cleaning Tasks related to cleaning & regularizing data during ETL.

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Sep 28, 2022

We have accumulated several dictionaries of information related to states and provinces in various modules. With the integration of the bulk EIA electricity data it seems like it may finally be worth consolidating them to avoid duplication and ensure that there's a single source of truth with a uniform method of access.

Existing constants to replace

  • pudl.metadata.enums.US_STATES
  • pudl.metadata.enums.US_TERRITORIES
  • pudl.metadata.enums.US_STATES_TERRITORIES
  • pudl.metadata.enums.CANADA_PROVINCES_TERRITORIES (mapping of abbreviations to names)
  • pudl.metadata.enums.EPACEMS_STATES (bespoke subset of state & territories applicable only to EPA CEMS data)
  • pudl.transform.eia.APPROXIMATE_TIMEZONES (state/territory/province to timezone mapping)
  • pudl.metadata.dfs.STATES (state and census region codes required for bulk EIA data aggregations)
  • pudl.analysis.state_demand.STATES (state FIPS codes)

Other tasks

  • Define political_subdivisions fields & resource metadata
  • Integrate political_subdivisions into the static table ETL

New table structure

This isn't going to be a well normalized table. It's just a way to look up state-associated attributes. Columns should include:

  • country_code (string, primary key): ISO-3166-1 country code (alpha-2 or alpha-3?)
  • country_name (string, primary key): ISO-3166-1 country name
  • subdivision_category (string): ISO-3166-2 subdivision category (e.g. state, district, outlying area, territory, province)
  • subdivision_code (string): 2-letter subdivision abbreviation (ISO-3166-2 political subdivisions)
  • subdivision_name (string): full political subdivision name (ISO-3166)
  • state_id_fips (string): 2-character numerical state FIPS code (or NA if not in the US)
  • region_code_us_census (string): US Census region abbreviation (or NA if not in the US)
  • timezone (string): Approximate canonical timezone, (from IANA, e.g. America/New_York) associated with the political subdivision.
  • is_epacems (bool): Whether this subdivision is present in the EPA CEMS dataset.
@zaneselvans zaneselvans added the data-cleaning Tasks related to cleaning & regularizing data during ETL. label Sep 28, 2022
@zaneselvans zaneselvans self-assigned this Sep 28, 2022
@zaneselvans zaneselvans changed the title Consolidates static state, territory, and province data in one place Consolidate static state, territory, and province data in one place Sep 28, 2022
zaneselvans added a commit that referenced this issue Oct 3, 2022
Consolidates several dictionaries and enumerations that we had scattered
across the codebase into a single static table, with information about
states, territories, provinces, etc. Including membership in various
geographic aggregations, FIPS codes, etc.

In the process, also added a new `ownership_country` column to the
`ownership_eia860` table, to clearly differentiate between country and
political subdivision information, which was comingled in the state
column previously.

Closes #1958
@zaneselvans zaneselvans linked a pull request Oct 3, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-cleaning Tasks related to cleaning & regularizing data during ETL.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant