Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate EIA Bulk data vs original API source #1896

Closed
Tracked by #1708
TrentonBush opened this issue Sep 1, 2022 · 3 comments
Closed
Tracked by #1708

Validate EIA Bulk data vs original API source #1896

TrentonBush opened this issue Sep 1, 2022 · 3 comments
Assignees

Comments

@TrentonBush
Copy link
Member

Does the new data source cover the expected areas at the expected granularity? If it is different, is it still workable?

@TrentonBush TrentonBush self-assigned this Sep 1, 2022
@zaneselvans
Copy link
Member

Did this get done? Is this applicable to the current (state-fuel only) version of the aggregated bulk fuel price data? How serious is the per-row vs. total aggregate MMBTU per unit issue that you mentioned in comments on #1765?

@TrentonBush
Copy link
Member Author

The API data has additional aggregates not present in the bulk data and has slightly different coverage. The advantages of the API are likely small or would require a large amount of additional work to make use of.

The additional aggregates are of two types: 1) finer grained fuel type aggregates (such as breaking "petroleum liquids" into DFO, RFO, waste oil, etc) and 2) alternative groupings (such as "all fossil fuels", "natural gas plus other gas", or "Electric power non-CHP").

  1. The advantage in precision of the fine grained fuel aggregates is small. This is because many of these smaller categories don't exist in the fuel receipts costs data -- only DFO, RFO, and waste coal contribute any meaningful MMBTU of fuel receipts and even they are only 0.9% of MMBTU combined since 2013.
  2. The additional aggregates (like "all fossil fuels" or "nat gas plus other gas") could be useful in error checking or possibly for deducing more precise aggregates for redacted items. But that would probably be an involved process of setting up a big linear algebra system, debugging it, and managing tradeoffs between tractable solvers and noisy data.

A few other notes:

  • The API offers data for Puerto Rico and Pacific territories, though much of the data lacks price information. The bulk data does not offer anything from these places. The fuel receipts costs table does not cover PR or territories.
  • The API offers an additional sectoral aggregate: "Electric power non-CHP".
  • The API and bulk data offer the same temporal resolutions: monthly, quarterly, annual.
  • When restricting the API to the same categories as the bulk data, the API has around 20% more records, but closer inspection reveals the additional data to be entirely zeros (at least for a spot check of 2015 Q1).

@zaneselvans
Copy link
Member

We're well beyond the EIA API at this point, so this validation will not happen. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants