Validate EIA Bulk data vs original API source #1896

TrentonBush · 2022-09-01T22:48:21Z

Does the new data source cover the expected areas at the expected granularity? If it is different, is it still workable?

zaneselvans · 2022-10-18T21:04:13Z

Did this get done? Is this applicable to the current (state-fuel only) version of the aggregated bulk fuel price data? How serious is the per-row vs. total aggregate MMBTU per unit issue that you mentioned in comments on #1765?

TrentonBush · 2022-12-28T02:56:41Z

The API data has additional aggregates not present in the bulk data and has slightly different coverage. The advantages of the API are likely small or would require a large amount of additional work to make use of.

The additional aggregates are of two types: 1) finer grained fuel type aggregates (such as breaking "petroleum liquids" into DFO, RFO, waste oil, etc) and 2) alternative groupings (such as "all fossil fuels", "natural gas plus other gas", or "Electric power non-CHP").

The advantage in precision of the fine grained fuel aggregates is small. This is because many of these smaller categories don't exist in the fuel receipts costs data -- only DFO, RFO, and waste coal contribute any meaningful MMBTU of fuel receipts and even they are only 0.9% of MMBTU combined since 2013.
The additional aggregates (like "all fossil fuels" or "nat gas plus other gas") could be useful in error checking or possibly for deducing more precise aggregates for redacted items. But that would probably be an involved process of setting up a big linear algebra system, debugging it, and managing tradeoffs between tractable solvers and noisy data.

A few other notes:

The API offers data for Puerto Rico and Pacific territories, though much of the data lacks price information. The bulk data does not offer anything from these places. The fuel receipts costs table does not cover PR or territories.
The API offers an additional sectoral aggregate: "Electric power non-CHP".
The API and bulk data offer the same temporal resolutions: monthly, quarterly, annual.
When restricting the API to the same categories as the bulk data, the API has around 20% more records, but closer inspection reveals the additional data to be entirely zeros (at least for a spot check of 2015 Q1).

zaneselvans · 2023-04-14T20:23:11Z

We're well beyond the EIA API at this point, so this validation will not happen. Closing.

TrentonBush self-assigned this Sep 1, 2022

zaneselvans mentioned this issue Oct 18, 2022

Estimate redacted EIA 923 fuel prices #1708

Open

zaneselvans closed this as completed Apr 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate EIA Bulk data vs original API source #1896

Validate EIA Bulk data vs original API source #1896

TrentonBush commented Sep 1, 2022

zaneselvans commented Oct 18, 2022

TrentonBush commented Dec 28, 2022

zaneselvans commented Apr 14, 2023

Validate EIA Bulk data vs original API source #1896

Validate EIA Bulk data vs original API source #1896

Comments

TrentonBush commented Sep 1, 2022

zaneselvans commented Oct 18, 2022

TrentonBush commented Dec 28, 2022

zaneselvans commented Apr 14, 2023