Estimate redacted EIA 923 fuel prices #1708
Labels
data-cleaning
Tasks related to cleaning & regularizing data during ETL.
data-repair
Interpolating or extrapolating data that we don't actually have.
eia923
Anything having to do with EIA Form 923
epic
Any issue whose primary purpose is to organize other issues into a group.
inframundo
Milestone
Motivation
This project began with our desire to remove our external dependency on the EIA API (see epic #1491 and its issue #1343), but expanded into this Epic after we realized we could improve substantially on EIA's methodology.
Why impute? About a third of the fuel cost data are missing. This data can and has been used by advocates to identify plants with high fuel costs (particularly coal plants) to target for early retirement campaigns. The more complete and accurate this data is, the more opportunities for such action we can support.
Issues:
Systematic Biases
47% of plants have complete data, 47% have no data, only 6% have partial data. It turns out that the data is not missing at random:
Scope
A major organizational concern is not necessarily whether we should do this project but rather how much effort to devote to it. Improving model accuracy is a potentially endless spiral of diminishing returns. Is there an accuracy threshold we can call good enough? Is there a certain accuracy improvement per time threshold that we use as a stopping point? We need to define this before getting sucked into the endless labyrinth of interesting technical problems.
Requirements
fuel_receipts_costs_eia923
table.Phase 1: Replace EIA API w/ Bulk Data
Phase 2: Impute Missing Values
The text was updated successfully, but these errors were encountered: