Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Estimate redacted EIA 923 fuel prices #1708

Open
12 of 15 tasks
TrentonBush opened this issue Jun 22, 2022 · 0 comments
Open
12 of 15 tasks

Estimate redacted EIA 923 fuel prices #1708

TrentonBush opened this issue Jun 22, 2022 · 0 comments
Assignees
Labels
data-cleaning Tasks related to cleaning & regularizing data during ETL. data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923 epic Any issue whose primary purpose is to organize other issues into a group. inframundo
Milestone

Comments

@TrentonBush
Copy link
Member

TrentonBush commented Jun 22, 2022

Motivation

This project began with our desire to remove our external dependency on the EIA API (see epic #1491 and its issue #1343), but expanded into this Epic after we realized we could improve substantially on EIA's methodology.

Why impute? About a third of the fuel cost data are missing. This data can and has been used by advocates to identify plants with high fuel costs (particularly coal plants) to target for early retirement campaigns. The more complete and accurate this data is, the more opportunities for such action we can support.

Issues:

Systematic Biases

47% of plants have complete data, 47% have no data, only 6% have partial data. It turns out that the data is not missing at random:

  • In general IPPs (merchant generators) redact all their fuel prices, and these generators are concentrated in competitive wholesale markets, especially the Northeastern US, where there are essentially no reported fuel prices.
    • In addition, the Northeast has a unique seasonality in its natural gas prices, which would be impossible to infer by sampling data elsewhere in the country.
    • This means we have to use the aggregate data from the EIA API to accurately estimate prices nationwide.
  • Major discontinuities in data collection and processing methodology. See details in this comment.
    • The most severe (and most recent) was in 2013, when the temporal resolution changed from annual to monthly. Prior to 2013, monthly resolution data was collected for a sample of plants and the rest calculated via a regression model. Also, the reporting threshold for oil and gas fueled plants changed from 50MW to 200MW.

Scope

A major organizational concern is not necessarily whether we should do this project but rather how much effort to devote to it. Improving model accuracy is a potentially endless spiral of diminishing returns. Is there an accuracy threshold we can call good enough? Is there a certain accuracy improvement per time threshold that we use as a stopping point? We need to define this before getting sucked into the endless labyrinth of interesting technical problems.

Requirements

  • Produce a fuel price estimate for every delivery in the fuel_receipts_costs_eia923 table.
  • Do not rely on the EIA API in the pipeline, due to reliability issues in CI / testing and user setup difficulties.
  • Estimates should be at least as accurate as the coarse aggregations that we've used historically.
  • Estimates should be consistent with the spatial and temporal variation we see in the aggregated data (e.g. the seasonal variability that's observed in the aggregated natural gas prices from the Northeastern US).
  • The model should be performant enough to run as part of our nightly builds (less than maybe 10 minutes of run time)
  • We should avoid new software dependencies if possible

Phase 1: Replace EIA API w/ Bulk Data

  1. data-repair eia923
    TrentonBush
  2. data-repair eia923
  3. data-repair eia923
    zaneselvans
  4. data-repair eia923
    TrentonBush
  5. data-repair eia923
    zaneselvans
  6. eia860
    zaneselvans
  7. analysis data-repair eia923
    zaneselvans
  8. eia923 new-data
    TrentonBush
  9. datastore eia923 new-data zenodo
    zaneselvans
  10. eia923 new-data
    TrentonBush

Phase 2: Impute Missing Values

  1. analysis data-repair eia923 inframundo
    TrentonBush
  2. TrentonBush
  3. 2 of 5
    analysis data-repair eia923
    TrentonBush
  4. analysis data-repair eia923 inframundo
    TrentonBush
  5. analysis data-repair eia923 inframundo
    TrentonBush
@TrentonBush TrentonBush added epic Any issue whose primary purpose is to organize other issues into a group. data-cleaning Tasks related to cleaning & regularizing data during ETL. data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923 labels Jun 22, 2022
@zaneselvans zaneselvans changed the title EIA 923 Fuel Price Imputation Estimate redacted EIA 923 fuel prices Jul 18, 2022
@zaneselvans zaneselvans added this to the 2023Q2 milestone Apr 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-cleaning Tasks related to cleaning & regularizing data during ETL. data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923 epic Any issue whose primary purpose is to organize other issues into a group. inframundo
Projects
Status: Backlog
Development

No branches or pull requests

3 participants