Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEMS: Repartition extraction process and parquet files. #2973

Closed
14 tasks done
Tracked by #2902
e-belfer opened this issue Oct 23, 2023 · 0 comments · Fixed by #3096
Closed
14 tasks done
Tracked by #2902

CEMS: Repartition extraction process and parquet files. #2973

e-belfer opened this issue Oct 23, 2023 · 0 comments · Fixed by #3096
Assignees

Comments

@e-belfer
Copy link
Member

e-belfer commented Oct 23, 2023

After merging this PR, we will have CEMS files in the format of one file per quarter, rather than one file per state/year. Year and state should both still function as filters for data extraction from the parquet files, however. To process this new data format in PUDL and integrate more recently downloaded data, we will need to do the following:

  • Run CEMs production archive
  • Add the new Zenodo archive DOI values to pudl/workspace/datastore.py.
  • Run the datastore script to download the new year of data.
  • Decide how etl_full and etl_fast should call year and quarter - should we only specify years and by default include all avail quarters?
  • Add the new year/quarters to etl_full.yml and etl_fast.yml.
  • Add the new year/quarters to the working_partitions in pudl/metadata/sources.py
  • Update the extractor to ingest year and quarter partitions rather than year and state
  • Update pudl.transform.epacems
  • Update CEMS DOI that is written into unit tests
  • Launch dagit and refresh the code location.
  • Tone down dagster concurrency on epacems yearly partitions to prevent memory issues
  • Materialize epacems asset with new data and remap any plants missing from data
  • Remove state partitions from dagster launchpad (either by updating pudl.output.epacems or by removing it from the launchpad)
  • Update the validation tests if needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants