-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data
and other
to EIA 176 raw data extraction
#3264
Conversation
Responding to this question in #2603 since I think it's more broadly related to our integration of EIA's natural gas data:
|
@asset(required_resource_keys={"datastore"}) | ||
def raw_eia176__company(context): | ||
"""Extract raw EIA company data from CSV sheets into dataframes. | ||
def eia_176_asset_factory( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stick with the dataset ID eia176
and never introduce an _
in here. It's nice to be able to search for these IDs reliably and use the same idioms for all the datasets like eia860
, eia861
, eia923
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean in the function names? Simple enough fix, that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function names, variable names, tables, classes, anywhere we're naming something to refer to the dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what we'll want to do about the mish-mash of different forms that were hiding inside this zipfile.
Closing this PR, in order to address changes to the underlying data format and needed extraction code changes in a separate issue. |
@e-belfer can we delete the |
Overview
This PR builds on #3227 and adds
data
andother
176 data to the extraction process designed by @davidmudrauskas, transforming the extractor function into a dagster asset factory to enable simple creation of a dagster asset for each table.What problem does this address?
Extracts raw EIA 176 "data" and "other" tables.
What did you change?
Converted the extractor in
pudl.extract.eia176
to a factory, and added to thetable_file_map.csv
. Updated the unit test to handle multiple files, rather than just one.Testing
How did you make sure this worked? How can a reviewer verify this?
Generate the raw assets in dagster and open them to review.
Remaining questions
table_file_map.csv
to account for the fact that there are multipledata
CSV files, one per form.To-do list
make pytest-integration-full
passes locally