Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update archiver for EIA 176, write new archivers for Forms 191, 757A #267

Merged
merged 9 commits into from
Feb 1, 2024

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jan 29, 2024

Closes #262. Switch to scraping NGQV data, rather than relying on the bundled zipfile which requires manual line mapping. This PR produces one zipped CSV per year per form. This PR also reorganizes all EIA archivers into an eia subfolder to make subclassing the natural gas archiver logic more straightforward and to mirror the ferc format.

Testing

See sandbox archives for Form 176, Form 191 and Form 757A.

@e-belfer e-belfer added the eia176 EIA Form 176 natural gas sources and dispositions label Jan 29, 2024
@e-belfer e-belfer requested a review from zschira January 29, 2024 20:16
@e-belfer e-belfer self-assigned this Jan 29, 2024
@e-belfer e-belfer linked an issue Jan 29, 2024 that may be closed by this pull request
Copy link
Member

@zschira zschira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, sorry for the slow review! I think the biggest thing is just abstracting out some of the shared natural gas logic/structures, which I left some suggestions for, but also would be happy to talk through in more detail if that'd be helpful.

src/pudl_archiver/archivers/classes.py Outdated Show resolved Hide resolved
src/pudl_archiver/archivers/eia176.py Outdated Show resolved Hide resolved
src/pudl_archiver/archivers/eia191.py Outdated Show resolved Hide resolved
@e-belfer e-belfer requested a review from zschira January 31, 2024 21:42
@@ -27,7 +29,18 @@ def all_archivers():
for module in module_names:
# AbstractDatasetArchiver won't know about the subclasses unless they are imported
__import__(module)
return AbstractDatasetArchiver.__subclasses__()

def all_subclasses(cls):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zschira This is the only part of the transition to subclassing the NGQV archiver that felt a bit questionable, but otherwise we wind up having a non-functional NGQV archiver class in the list and not including 191 or 757A. If you have suggestions on how to do this better I'm all ears.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah it's a bit annoying, but I might just leave it. We could pull out the get_datasets and get_year_resource methods to be standalone functions that get called by the archivers, but I'm not sure if it's actually worth it

Copy link
Member

@zschira zschira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! One small typo, and as for the weird subclassing thing, I think it might be fine for now. I think there's probably better ways to make the top-level process aware of all the archivers, but that's probably a project for a future day.

return datasets

async def get_resources(self) -> ArchiveAwaitable:
"""Download EIA 191 resources."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably want to make this docstring generic

@@ -27,7 +29,18 @@ def all_archivers():
for module in module_names:
# AbstractDatasetArchiver won't know about the subclasses unless they are imported
__import__(module)
return AbstractDatasetArchiver.__subclasses__()

def all_subclasses(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah it's a bit annoying, but I might just leave it. We could pull out the get_datasets and get_year_resource methods to be standalone functions that get called by the archivers, but I'm not sure if it's actually worth it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia176 EIA Form 176 natural gas sources and dispositions
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Update EIA 176 archiver to grab CSVs, rather than bundled data
2 participants