Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EIA 176 to sources.py #2258

Merged
merged 6 commits into from
Feb 8, 2023
Merged

Add EIA 176 to sources.py #2258

merged 6 commits into from
Feb 8, 2023

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Feb 1, 2023

This is a small PR, which is a necessary dependency for #catalyst-cooperative/pudl-archiver#50, an EIA 176 archiver PR in the pudl-archiver repo. This updates sources.py to include EIA 176. An example of a sample Zenodo upload based on this metadata can be found here. The reviewer should review both PRs concurrently.

A question for the reviewer: do raw archive additions need to go in the release notes?

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • [n/a] For major data coverage & analysis changes, run data validation tests
  • [n/a] Include unit tests for new functions and classes.
  • [n/a] Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@e-belfer e-belfer changed the base branch from main to dev February 1, 2023 21:36
@codecov
Copy link

codecov bot commented Feb 1, 2023

Codecov Report

Base: 85.7% // Head: 85.7% // No change to project coverage 👍

Coverage data is based on head (3b8c2e3) compared to base (f37671e).
Patch has no changes to coverable lines.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #2258   +/-   ##
=====================================
  Coverage   85.7%   85.7%           
=====================================
  Files         73      73           
  Lines       8997    8997           
=====================================
  Hits        7713    7713           
  Misses      1284    1284           
Impacted Files Coverage Δ
src/pudl/metadata/sources.py 100.0% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zaneselvans zaneselvans added metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset. labels Feb 2, 2023
@zaneselvans
Copy link
Member

I thought that this data was all CSV, and so was surprised to see the type listed as XLS/XLSX in this metadata, so I downloaded a copy and opened it up and found a bit of a mess.

Using the file command on an old copy of the bulk eia176 data I had laying around, I found the following:

all_company_176.csv:      CSV text
all_data_176.csv:         ASCII text, with CRLF line terminators
all_data_191.csv:         CSV text
all_data_757.csv:         Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
all_other_176.csv:        ASCII text, with CRLF line terminators

But in the new data, there's a mix of file types, which in some cases do not match the filename extensions:

all_company_176.xlsx: Microsoft Excel 2007+
all_data_176.csv:     Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1, Code page: 1252, Author:             , Last Saved By:             , Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Mar 19 16:38:19 2013, Security: 0
all_data_191.csv:     Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1, Code page: 1252, Author:             , Last Saved By:             , Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Mar 19 16:38:19 2013, Security: 0
all_data_757.csv:     Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
all_other_176.csv:    Composite Document File V2 Document, Little Endian, Os: Windows, Version 6.1, Code page: 1252, Author:             , Last Saved By:             , Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Mar 19 16:38:19 2013, Security: 0

The ones with "Composite Docuemnt File..." appear to actually be XLSX files -- if I rename the files to .xlsx then LibreOffice opens them up correctly. In some cases they contain more than one worksheet.

I've emailed Michael.Kopalek@eia.gov (who is listed as the eia176 data contact) about this change / discrepancy.

@zaneselvans
Copy link
Member

I never did get a response from EIA on this, but it looks like they did fix the problem. I just downloaded a new copy of the data and it's all truly CSVs.

@@ -21,6 +21,45 @@
"license_raw": LICENSES["us-govt"],
"license_pudl": LICENSES["cc-by-4.0"],
},
"eia176": {
"title": "EIA Form 176",
"path": "https://www.eia.gov/naturalgas/ngqs/all_ng_data.zip",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC and looking at the other sources it seems like the path is meant as a human reference, rather than the actual data URL, so I think we probably want to use https://www.eia.gov/naturalgas/ngqs/ here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Done & done.

@e-belfer e-belfer merged commit 70a7aef into dev Feb 8, 2023
@e-belfer e-belfer deleted the eia176-sources branch February 8, 2023 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia176 Issues related to the EIA Form 176 natural gas supply and disposition dataset. inframundo metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants