Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CsvExtractor to handle new 176/191/757A data format #3339

Closed
Tracked by #2603
e-belfer opened this issue Feb 2, 2024 · 0 comments · Fixed by #3402
Closed
Tracked by #2603

Update CsvExtractor to handle new 176/191/757A data format #3339

e-belfer opened this issue Feb 2, 2024 · 0 comments · Fixed by #3402
Assignees
Labels
community eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset.

Comments

@e-belfer
Copy link
Member

e-belfer commented Feb 2, 2024

EIA 176 data was originally getting pulled from the bundled data, after investigation of the bulk API data proved it was not suitable. In #2603 (comment) we discussed updating the data source to query NGQV directly. This has now been implemented in catalyst-cooperative/pudl-archiver#267, producing 3 new archives with 1 zipped CSV per year (176, 191, 757A archives). The advantage of this format is that we:

  • don't have to hand-map line numbers to variables
  • can easily extract a subset of years, as we do with other EIA data

However, the CsvExtractor written by @davidmudrauskas will have to be slightly updated to handle this new data format. The extractor should be able to take a subset of years from the ETL settings, pass it through dataset_settings to the extractor, and extract and concatenate the relevant zipped CSVs into one dagster asset. There should be three extractors with essentially identical implementations of the archiver - 176, 191 and 757A.

@e-belfer e-belfer added the eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset. label Feb 2, 2024
@e-belfer e-belfer linked a pull request Feb 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants