Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorm Fuel Price Imputation Experiments #1710

Closed
Tracked by #1708
TrentonBush opened this issue Jun 22, 2022 · 2 comments
Closed
Tracked by #1708

Brainstorm Fuel Price Imputation Experiments #1710

TrentonBush opened this issue Jun 22, 2022 · 2 comments
Labels
data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923

Comments

@TrentonBush
Copy link
Member

Come up with a few research directions and take wild guesses about which might be most fruitful.

Some ideas already floated:

  • modified target variables
    • does it make more sense to try and estimate the price per unit or the price per MMBTU?
    • combine records into energy-weighted plant-wise timeseries?
  • How should we identify outlier values in the fuel prices which should be replaced? Some are outrageous.
    • use model residuals in an iterative process
    • analysis of coarser distributions like fuel-month or something
    • use robust metrics or is variance stabilization sufficient
  • feature engineering
    • time based features
    • mine info
    • pipeline info
@zaneselvans
Copy link
Member

Feature Engineering

More Numerical Features

  • Days until contract expiration
  • Distance from plant lat/lon to the centroid of the coalmine county if we've got a FIPS ID (further => more expensive?)
  • Coal sulfur or ash content (dirtier => cheaper?)
  • Total heat content of the delivery (it seems like smaller shipments are more likely to be expensive?)

More categorical features

  • contract type (spot more expensive than contract?)
  • rto_iso or balancing_authority IDs (different markets have different prices?)
  • Coalmine state
  • Coalmine county FIPS ID
  • Coalmine name (messy freeform strings)
  • MSHA Mine ID (unique and clean, but often missing)
  • Name of fuel supplier (messy freeform strings)
  • Fuel transportation mode (pipeline, slurry, rail, truck, barge, conveyor...)
  • Mine type (surface vs. underground)

Time based issues:

  • Fuel prices / costs are reported in nominal dollars, should we adjust for inflation?
  • Strong seasonal signal in natural gas prices can be captured by monthly categorical feature.
  • Secular reduction in gas prices as fracking boom took off, but generally fossil fuel prices are pretty random.
  • The elapsed-days since the beginning of the time series seems like it would work for chunking the timeline into pieces within the decision tree.

Spatial correlations

  • We know the lat/lon of almost all the plants. Rather than looking at jurisdictional proximity (in the same state or census region, or in adjacent states) we could calculate an average fuel price within a given distance of each plant for each fuel in each month.

Target variables

  • Fuel price per unit will have a much higher variance than per MMBTU (especially for coal) since different tons contain different amounts of heat.
  • The value of the fuel is overwhelmingly the heat content, so it seems like that's the variable that the market would be optimizing around primarily.

@zaneselvans zaneselvans added eia923 Anything having to do with EIA Form 923 data-repair Interpolating or extrapolating data that we don't actually have. labels Jun 23, 2022
@zaneselvans
Copy link
Member

See #1766 for continuation of this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923
Projects
None yet
Development

No branches or pull requests

2 participants