-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update archiver for EIA 176, write new archivers for Forms 191, 757A #267
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, sorry for the slow review! I think the biggest thing is just abstracting out some of the shared natural gas logic/structures, which I left some suggestions for, but also would be happy to talk through in more detail if that'd be helpful.
@@ -27,7 +29,18 @@ def all_archivers(): | |||
for module in module_names: | |||
# AbstractDatasetArchiver won't know about the subclasses unless they are imported | |||
__import__(module) | |||
return AbstractDatasetArchiver.__subclasses__() | |||
|
|||
def all_subclasses(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zschira This is the only part of the transition to subclassing the NGQV archiver that felt a bit questionable, but otherwise we wind up having a non-functional NGQV archiver class in the list and not including 191 or 757A. If you have suggestions on how to do this better I'm all ears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah it's a bit annoying, but I might just leave it. We could pull out the get_datasets
and get_year_resource
methods to be standalone functions that get called by the archivers, but I'm not sure if it's actually worth it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! One small typo, and as for the weird subclassing thing, I think it might be fine for now. I think there's probably better ways to make the top-level process aware of all the archivers, but that's probably a project for a future day.
return datasets | ||
|
||
async def get_resources(self) -> ArchiveAwaitable: | ||
"""Download EIA 191 resources.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably want to make this docstring generic
@@ -27,7 +29,18 @@ def all_archivers(): | |||
for module in module_names: | |||
# AbstractDatasetArchiver won't know about the subclasses unless they are imported | |||
__import__(module) | |||
return AbstractDatasetArchiver.__subclasses__() | |||
|
|||
def all_subclasses(cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah it's a bit annoying, but I might just leave it. We could pull out the get_datasets
and get_year_resource
methods to be standalone functions that get called by the archivers, but I'm not sure if it's actually worth it
Closes #262. Switch to scraping NGQV data, rather than relying on the bundled zipfile which requires manual line mapping. This PR produces one zipped CSV per year per form. This PR also reorganizes all EIA archivers into an
eia
subfolder to make subclassing the natural gas archiver logic more straightforward and to mirror theferc
format.Testing
See sandbox archives for Form 176, Form 191 and Form 757A.