Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanity check EIA fuel price predictions visually #1720

Closed
3 of 4 tasks
Tracked by #1708
zaneselvans opened this issue Jun 27, 2022 · 12 comments
Closed
3 of 4 tasks
Tracked by #1708

Sanity check EIA fuel price predictions visually #1720

zaneselvans opened this issue Jun 27, 2022 · 12 comments
Assignees
Labels
analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Jun 27, 2022

The model error metric gives an extremely condensed (zero-dimensional) view of whether or not we're predicting fuel prices correctly. Develop some additional visualizations that let us see whether things look right, or at least good enough.

Compare by state & fuel type:

  • Reported data
  • HistGBR predictions of reported data
  • Previously implemented groupby-weighted-median aggregations
  • Aggregated fuel prices obtained from the EIA API

Visualizations

  • 1-D histograms
  • 2-D histograms (to show dispersion between models)
  • Break down error metrics by fuel
  • Monthly time series
@zaneselvans zaneselvans added eia923 Anything having to do with EIA Form 923 analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. data-repair Interpolating or extrapolating data that we don't actually have. labels Jun 27, 2022
@zaneselvans zaneselvans self-assigned this Jun 27, 2022
@zaneselvans
Copy link
Member Author

For states with high fuel prices, it seems like the model often ends up being influenced pretty strongly by other markets with lower prices and ends up systematically off. It also predicts high price excursions, even in states where they don't seem to happen:

image
image

@zaneselvans
Copy link
Member Author

Looking at some 2-d histograms, even when the price distribution looks pretty similar overall, there's not necessarily great 1-to-1 correlation between individual reported and predicted prices.

image

@zaneselvans
Copy link
Member Author

I looked at all of the per-state natural_gas price distributions, and saw that the model predictions were biased low across the board. Some of them had the right range, many were low, but they were basically never high. Which seemed weird.

But then I looked at the coal prices, and of course there the opposite was true -- all the predictions are biased high. How can we make the model differentiate more strongly between the fuel types? Would it make sense to train separate models for each fuel type? They long tail of high price excursions that I think comes entirely from gas and petroleum also shows up in coal price distributions.

image
image
image
image

@zaneselvans
Copy link
Member Author

zaneselvans commented Jun 27, 2022

One thing that seems to be going on is that it predicts coal + natural gas like price distributions for petroleum, and seems to bring petroleum-like high price outliers into to coal+gas distributions. There are way fewer records / MMBTU representing petroleum, so it'll be hard for the model to integrate petroleum-specific pricing information unless it know it only applies to the other petroleum records.

image
image
image

@zaneselvans
Copy link
Member Author

It seems to have this same issue with or without the weighting by received MMBTU, and it persists even when I narrow the features down to just fuel type, state, report month, and elapsed days.

I feel like I must be doing something wrong. Isn't the idea with this kind of regression that it can identify different regimes in one variable (like fuel type) that indicate different ranges of desirable predictions?

Would it help to have an anchoring value that can be modulated by other variables? E.g. a state/regional/national average price for the type of fuel each record contains?

@zaneselvans
Copy link
Member Author

Restricting the model to just looking at NG records (with very few features) it gets a bit less blobby and more diagonal:

image

@TrentonBush
Copy link
Member

That model certainly looks bad! What's your model score?

Also, I'm confused by the y axis on these histograms. The dataset only has 500k points, but the y-axes are scaled in 1e6 to 1e9? I tried to reproduce the Arkansas coal histogram but get something quite different (with whatever model I had in a notebook from last week):
image

And the correlation is much more linear:
image

@zaneselvans
Copy link
Member Author

zaneselvans commented Jun 27, 2022

The numbers are larger than the number of samples because they're weighted by MMBTU. It seems crazy to me that a unit train (~10,000 tons of coal) would get the same prominence as 1 mcf of natural gas or 1 bbl of diesel fuel oil, which is what the unweighted distribution would show, right?

Your scatter plot looks way way better! Let me re-run with a single model in GridSearchCV and see what the score looks like now. I think the last one I saw was -0.48. What kind of scores are you getting?

@TrentonBush
Copy link
Member

The one I have from last week uses these features:
["fuel_group_eiaepm", "state", "report_month", "plant_id_eia", "elapsed_days", "fuel_mmbtu_per_unit",]
and these hyperparams:

param_grid = {
    "hist_gbr__max_depth": [7,],
    "hist_gbr__max_leaf_nodes": [2**7],
    "hist_gbr__learning_rate": [0.1],
    "hist_gbr__min_samples_leaf": [25],
}

to get a neg_median_absolute_error of -0.4155

@zaneselvans
Copy link
Member Author

Well it turns out my problem was a mixup between pandas and numpy indexing, and when the indexes are aligned, everything is fine. 🙄

image
image
image

image
image
image
image

@zaneselvans
Copy link
Member Author

zaneselvans commented Jun 28, 2022

I was curious how the quality of predictions varies by different fuels, and it looks like we do good on all the major ones: all types of coal and natural gas, plus DFO/RFO. Not so great on petcoke. But overall this seems great!

from sklearn.metrics import mean_absolute_percentage_error

def fuel_price_wmape(df, predict_col):
    return mean_absolute_percentage_error(
        df["fuel_cost_per_mmbtu"],
        df[predict_col],
        sample_weight=df["fuel_received_mmbtu"]
    )

def err_by_fuel(frc):
    fuel_cols = ["energy_source_code", "fuel_group_eiaepm"]
    out_df = pd.DataFrame()
    for col in fuel_cols:
        valid_rows = (
            (frc["fuel_cost_per_mmbtu"].notna())
            & (frc["fuel_received_mmbtu"].notna())
            & (frc[col].notna())
        )
        gb = frc.loc[valid_rows].groupby(col, observed=True)
        wm_err = gb.apply(fuel_price_wmape, predict_col="fuel_cost_per_mmbtu_wm").to_frame(name="wm_wmape")
        hgbr_err = gb.apply(fuel_price_wmape, predict_col="fuel_cost_per_mmbtu_predicted").to_frame(name="hgbr_wmape")
        total_mmbtu = gb["fuel_received_mmbtu"].sum().to_frame(name="total_mmbtu")
        tmp = pd.concat([wm_err, hgbr_err, total_mmbtu], axis="columns")
        out_df = pd.concat([out_df, tmp])
        
    return out_df.sort_values("total_mmbtu", ascending=False)

err_by_fuel(frc_predicted)
wm_wmape hgbr_wmape total_mmbtu
coal 0.129684 0.0563751 1.45131e+11
SUB 0.122974 0.0533036 7.05948e+10
BIT 0.135227 0.0596738 6.90624e+10
NG 0.0940054 0.0999395 5.65777e+10
natural_gas 0.0940054 0.0999395 5.65777e+10
LIG 0.146808 0.0535479 5.39209e+09
petroleum 0.0358797 0.085129 1.40623e+09
PC 0.0931735 0.207044 1.18256e+09
petroleum_coke 0.0931735 0.207044 1.18256e+09
RFO 0.0342987 0.0830832 9.01874e+08
DFO 0.0369909 0.0847551 4.85247e+08
WC 0.123332 0.109256 7.31621e+07
other_gas 0.000248288 0.233354 2.10886e+07
JF 0.000102664 0.0707134 1.52616e+07
SGP 0 0.132342 1.48044e+07
SC 0.00262676 0.1042 8.2274e+06
OG 0.000371954 0.455647 5.69508e+06
WO 0.553216 0.798909 2.83575e+06
KER 0.00184038 0.30512 1.01084e+06
PG 0.00529242 0.622841 589095
NULL 0 0.736635 84378.6
NULL 0 0.736635 84378.6

Looking at just the fuels with more than 1e8 MMBTU reported (SUB through DFO):

big_fuels = ["SUB", "BIT", "NG", "LIG", "PC", "RFO", "DFO"]
valid_rows = (
    (frc_predicted["fuel_cost_per_mmbtu"].notna())
    & (frc_predicted["fuel_received_mmbtu"].notna())
    & (frc_predicted["fuel_cost_per_mmbtu_predicted"].notna())
)
fuel_price_wmape(
    frc_predicted[frc_predicted.energy_source_code.isin(big_fuels)].loc[valid_rows],
    predict_col="fuel_cost_per_mmbtu_predicted")

# Result: 0.06948276509771906

Within the coal category, we seem to do equally well on all the coal types, and better than the weighted medians. With the petroleum and some other fuels, the HGBR has higher relative error. In the weighted median approach, there are a number of fuel deliveries with prices that are exactly identical to the weighted median (any time there's only a single delivery of a given fuel in a given state and month). E.g. the perfect 1:1 lines in the middle of these plots...

Weighted Medians vs. Hist Gradient Boosted Regression (SUB)

image
image

Weighted Medians vs. Hist Gradient Boosted Regression (DFO)

image
image

@zaneselvans
Copy link
Member Author

See continued work in #1767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. data-repair Interpolating or extrapolating data that we don't actually have. eia923 Anything having to do with EIA Form 923
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants