-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add EIA 176 to sources.py #2258
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportBase: 85.7% // Head: 85.7% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## dev #2258 +/- ##
=====================================
Coverage 85.7% 85.7%
=====================================
Files 73 73
Lines 8997 8997
=====================================
Hits 7713 7713
Misses 1284 1284
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
I thought that this data was all CSV, and so was surprised to see the type listed as XLS/XLSX in this metadata, so I downloaded a copy and opened it up and found a bit of a mess. Using the
But in the new data, there's a mix of file types, which in some cases do not match the filename extensions:
The ones with "Composite Docuemnt File..." appear to actually be XLSX files -- if I rename the files to .xlsx then LibreOffice opens them up correctly. In some cases they contain more than one worksheet. I've emailed Michael.Kopalek@eia.gov (who is listed as the eia176 data contact) about this change / discrepancy. |
I never did get a response from EIA on this, but it looks like they did fix the problem. I just downloaded a new copy of the data and it's all truly CSVs. |
src/pudl/metadata/sources.py
Outdated
@@ -21,6 +21,45 @@ | |||
"license_raw": LICENSES["us-govt"], | |||
"license_pudl": LICENSES["cc-by-4.0"], | |||
}, | |||
"eia176": { | |||
"title": "EIA Form 176", | |||
"path": "https://www.eia.gov/naturalgas/ngqs/all_ng_data.zip", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC and looking at the other sources it seems like the path
is meant as a human reference, rather than the actual data URL, so I think we probably want to use https://www.eia.gov/naturalgas/ngqs/ here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. Done & done.
This is a small PR, which is a necessary dependency for #catalyst-cooperative/pudl-archiver#50, an EIA 176 archiver PR in the pudl-archiver repo. This updates
sources.py
to include EIA 176. An example of a sample Zenodo upload based on this metadata can be found here. The reviewer should review both PRs concurrently.A question for the reviewer: do raw archive additions need to go in the release notes?
PR Checklist
dev
).