Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data and other to EIA 176 raw data extraction #3264

Closed
wants to merge 3 commits into from

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jan 19, 2024

Overview

This PR builds on #3227 and adds data and other 176 data to the extraction process designed by @davidmudrauskas, transforming the extractor function into a dagster asset factory to enable simple creation of a dagster asset for each table.

What problem does this address?

Extracts raw EIA 176 "data" and "other" tables.

What did you change?

Converted the extractor in pudl.extract.eia176 to a factory, and added to the table_file_map.csv. Updated the unit test to handle multiple files, rather than just one.

Testing

How did you make sure this worked? How can a reviewer verify this?

Generate the raw assets in dagster and open them to review.

Remaining questions

  • EIA 176 zipfiles also bundle a few other forms - 191 and 757. Where do we want to extract and process these datasets? As separate modules, or as part of the EIA 176 extraction? If in this module, we'll want to adapt the table_file_map.csv to account for the fact that there are multiple data CSV files, one per form.

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. If updating analyses or data processing functions: make sure to update or write data validation tests
    Options
  3. Update the release notes: reference the PR and related issues.
    Options
  4. Review the PR yourself and call out any questions or issues you have
    Options
Loading

@e-belfer e-belfer self-assigned this Jan 19, 2024
@e-belfer e-belfer added eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset. new-data Requests for integration of new data. labels Jan 19, 2024
@e-belfer e-belfer linked an issue Jan 19, 2024 that may be closed by this pull request
@zaneselvans
Copy link
Member

zaneselvans commented Jan 19, 2024

Responding to this question in #2603 since I think it's more broadly related to our integration of EIA's natural gas data:

EIA 176 zipfiles also bundle a few other forms - 191 and 757. Where do we want to extract and process these datasets? As separate modules, or as part of the EIA 176 extraction?

See this comment

@asset(required_resource_keys={"datastore"})
def raw_eia176__company(context):
"""Extract raw EIA company data from CSV sheets into dataframes.
def eia_176_asset_factory(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick with the dataset ID eia176 and never introduce an _ in here. It's nice to be able to search for these IDs reliably and use the same idioms for all the datasets like eia860, eia861, eia923 etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean in the function names? Simple enough fix, that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function names, variable names, tables, classes, anywhere we're naming something to refer to the dataset.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what we'll want to do about the mish-mash of different forms that were hiding inside this zipfile.

@e-belfer
Copy link
Member Author

e-belfer commented Feb 2, 2024

Closing this PR, in order to address changes to the underlying data format and needed extraction code changes in a separate issue.

@e-belfer e-belfer closed this Feb 2, 2024
@zaneselvans
Copy link
Member

@e-belfer can we delete the eia176_extraction branch?

@e-belfer e-belfer deleted the eia176_extraction branch February 2, 2024 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset. new-data Requests for integration of new data.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate EIA 176, 191 and 757A into PUDL
2 participants