Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate EIA 923 2022 final release data and most recent 923 monthly data #3009

Closed
15 tasks done
Tracked by #2699
aesharpe opened this issue Nov 2, 2023 · 2 comments · Fixed by #3073
Closed
15 tasks done
Tracked by #2699

Integrate EIA 923 2022 final release data and most recent 923 monthly data #3009

aesharpe opened this issue Nov 2, 2023 · 2 comments · Fixed by #3073
Assignees
Labels
eia923 Anything having to do with EIA Form 923 new-data Requests for integration of new data.

Comments

@aesharpe
Copy link
Member

aesharpe commented Nov 2, 2023

Annual Updates Docs: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/annual_updates.html

  • Add the new Zenodo archive DOI values to pudl/workspace/datastore.py.
  • Run the datastore script to download the new data: pudl_datastore --dataset eia923. The new raw data will appear in pudl_input/eia923/<ZENODO_DOI>/...
  • Update the information in pudl/package_data/eia923 if necessary:
    • file maps (remove _Early_Release suffix!)
    • column maps (probably the same)
    • page maps (probably the same)
    • skip footer (probably the same)
    • skip rows (the early release data has an extra row that we skip. Now that we have the final release, we don't need to skip that row. Subtract 1 from the skip rows for the year. It should probably look like the rest of the years--if it's 0, leave it as 0.)
  • Launch dagit and refresh the code location (run in your terminaldagster-webserver -m pudl.etl and then open http://127.0.0.1:3000/locations/pudl.etl/jobs/etl_full in a browser)
  • Materialize the raw_eia923 asset group. Look out for warnings in the logs about missing or extra columns. If they appear, check and update the package_data accordingly.
  • Materialize the _core_eia923 asset group. Look out for warnings and fix accordingly.
  • Materialize the norm_eia and then denorm_eia asset groups. You'll probably see some errors related to encoding. Take a look at which column it's talking about and look at metadata/resources/eia.py to see which encoder in CODE_METADATA to tweak.
  • Update the validation test test_minmax_rows in test/validate/eia_test.py. Sometimes it helps to just run the test (pytest test/validate/eia_test.py::test_minmax_rows) in the terminal because it will print out how many rows it found vs. how many it expected and you can put the found rows into the code so they become expected rows. Make sure none of the rows have less rows than before. Also make sure none of the row changes are unexpectedly large.
  • Test table outputs in a notebook to make sure expected dates appear
  • Run tox and troubleshoot what else might be broken! Might include things like:
@aesharpe aesharpe added eia923 Anything having to do with EIA Form 923 new-data Requests for integration of new data. labels Nov 2, 2023
@aesharpe aesharpe changed the title Integrate EIA 923 2022 final release data Integrate EIA 923 2022 final release data and most recent 923 monthly data Nov 2, 2023
@aesharpe
Copy link
Member Author

aesharpe commented Nov 3, 2023

Looping in @robertozanchi

@cmgosnell cmgosnell assigned cmgosnell and unassigned cmgosnell Nov 6, 2023
@aesharpe aesharpe linked a pull request Nov 22, 2023 that will close this issue
@aesharpe aesharpe self-assigned this Nov 22, 2023
@aesharpe
Copy link
Member Author

aesharpe commented Nov 30, 2023

Last fix here is related to #2448 - Updated the file map to say Final instead of Early Release and actually extracted this raw table (it had been blocked due to issues with the 2018 archive). See #3100

@aesharpe aesharpe closed this as completed Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia923 Anything having to do with EIA Form 923 new-data Requests for integration of new data.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants