Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the 860m doi #3189

Merged
merged 2 commits into from
Dec 26, 2023
Merged

update the 860m doi #3189

merged 2 commits into from
Dec 26, 2023

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Dec 22, 2023

Overview

Closes #3186.

What problem does this address?
We had to make a new zenodo archive with eia860m a years worth of monthly files zipped together into one resource because file upload limits. So this pr

What did you change?
basically nothing except the DOI. bc @e-belfer added a little logic into the datastore to work with partitions that are lists of partitions. And the excel extractor/the datastore combined already knows how to grab a file out of a zipped file bc of course it does bc so many of our one partition resources have many files. The main place where this is happening is load_excel_file.

We could remove the first try in load_excel_file because the old eia860m archive being individual files was actually the edge case.

Testing

How did you make sure this worked? How can a reviewer verify this?
I ran the fast etl locally. But first I thought I was going to have to muck with the excel extractor so I setup a little notebook testing situation and the simplest setup gave me the eia860m outputs:

from pudl.extract.eia860m import Extractor
from pudl.workspace.datastore import Datastore

ds = Datastore(local_cache_path=pudl.workspace.setup.PudlPaths().pudl_input)
self = Extractor(ds=ds)
raw_eia860m_dfs = self.extract(year_month=pudl.metadata.sources.SOURCES["eia860m"]["working_partitions"]["year_month"])

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. For major data coverage & analysis changes, run data validation tests
    Options
  3. If updating analyses or data processing functions: make sure to update or write data validation tests
    Options
  4. Update the release notes: reference the PR and related issues.
    Options
  5. Review the PR yourself and call out any questions or issues you have
    Options
Loading

it seems to all just work which is tres fun but makes sense after looking at it
@cmgosnell cmgosnell linked an issue Dec 22, 2023 that may be closed by this pull request
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'll do the final validation in the CEMS branch.

@e-belfer e-belfer marked this pull request as ready for review December 26, 2023 15:58
@zaneselvans
Copy link
Member

The change to allow lists in the partitions has broken the docs build script, which reads those partitions to generate the dataset docs and.

@zaneselvans zaneselvans self-requested a review December 26, 2023 16:28
@e-belfer
Copy link
Member

The change to allow lists in the partitions has broken the docs build script, which reads those partitions to generate the dataset docs and.

Already on it!

Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation build script isn't expecting to find lists in the partitions, and so is failing when it attempts to build the data source specific docs pages using the Jinja templates. It needs to be updated to accommodate the new metadata structure associated with the newly bundled raw archives.

@e-belfer e-belfer merged commit 542ce85 into cems-extraction Dec 26, 2023
12 of 13 checks passed
@e-belfer e-belfer deleted the eia860m-extraction branch December 26, 2023 17:08
@zaneselvans zaneselvans added eia860 Anything having to do with EIA Form 860 zenodo Issues having to do with Zenodo data archiving and retrieval. excel Issues involving data in Microsoft Excel spreadsheets labels Feb 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia860 Anything having to do with EIA Form 860 excel Issues involving data in Microsoft Excel spreadsheets zenodo Issues having to do with Zenodo data archiving and retrieval.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

EIA 860M: Retool extraction to handle listed partitions
3 participants