Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FERC regressions #204

Closed
alanawlsn opened this issue Sep 28, 2018 · 8 comments
Closed

FERC regressions #204

alanawlsn opened this issue Sep 28, 2018 · 8 comments
Assignees
Labels
analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. ferc1 Anything having to do with FERC Form 1

Comments

@alanawlsn
Copy link
Contributor

Implement regressions within FERC O&M dataset that will allow us to attribute fixed vs. variable cost components at the (FERC) plant level.

@alanawlsn alanawlsn self-assigned this Sep 28, 2018
@zaneselvans zaneselvans added ferc1 Anything having to do with FERC Form 1 analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. labels Sep 28, 2018
@zaneselvans zaneselvans added this to the PUDL v0.1 alpha release milestone Sep 28, 2018
@zaneselvans
Copy link
Member

Also, the scikit-learn python library has a nice page talking about its collection of linear models, with some background on each one.

The end goal of looking at these costs is really to build some kind of model of the cost per MWh on a per plant basis, that depends on... what? Like, if we wanted to predict the marginal cost of electricity for a given plant, what would we need? I think the relevant inputs (independent variables, X) end up being something like:

  • plant capacity (MW)
  • plant heat rate (mmBTU/MWh)
  • capacity factor (unitless)
  • fuel heat content (mmBTU/unit)
  • fuel price ($/mmBTU or $/unit)
  • primary fuel type (categorical -- coal or gas)
  • plant technology (categorical)
  • year of construction (number? or is it categorial?)

And the output we're trying to obtain is just the cost of a marginal unit of electricity production ($/MWh).

Especially given how little information we have about what goes into all of those different FERC cost categories, and whether or not the utilities are reporting those cost categories in a really standard way, I wonder if we might have better luck just using these simpler inputs / output in the regression?

These inputs would allow terms that are a function of plant capacity (i.e $/MW installed), as well as capacity factor, and of course also true fixed costs. The regression wouldn't be confounded by fuel cost volatility, which are probably one of the larger sources of variance, since it would have all the information required to get the fuel cost component right (heat rate, fuel price, fuel heat content). We could even just leave in all the different fuels with their different prices and heat contents separated if we wanted to (since some coal plants use a non-trivial fraction of gas or oil).

Does that seem sensible?

@zaneselvans
Copy link
Member

Thinking about this some more... don't we really just need to figure out a model for the non-fuel costs since we know pretty much exactly how the fuel costs contribute to the overall cost per MWh? Then we just need to fit the remaining non-fuel costs per MWh to a function of the plant capacity, capacity factor, plant & fuel type, and year/decade of construction. We can leave out the fuel price, heat rate, heat content, etc. since we can assemble that function from scratch, and (I think) have little reason to believe that those variables would have much influence on the other plant costs. Would we want to include an interaction term to find a dependence on (capacity * capacity factor) (aka net generation) in addition to capacity and capacity factor independently? Or are we interested in just the capacity & net generation terms?

@michaelpburt
Copy link

Hi Zane, This may not be what you are looking for, but many ISO's publish guidance on what the VOM (variable operations and maintenance) costs are on a per-technology basis (aka supercritical coal, subcritical coal, CT, CC, hydro, etc.). This guidance is used widely as an input in marginal cost models. See page 22 of this PJM manual > https://www.pjm.com/-/media/documents/manuals/archive/m15/m15v28-cost-development-guidelines-10-18-2016.ashx

In my experience, VOM is usually sufficient to encompass all costs beyond fuel input and carbon & MATS compliance related costs. Those costs include things like fly ash, urea, chlorine, or other inputs into scrubbers and such. I am not sure what the magnitude of those costs are, but I bet there is some pretty good documentation out there. Off the top of my head, I think they are around $1-$3 per MWhr for big nasty coal plants.

@zaneselvans
Copy link
Member

@michaelpburt I definitely don't completely understand the calculations that PJM is describing in that document but the sense I got was that there's an acceptable VOM number that generators can include in their prices based on the technology of the generator, and that that number may be different from the actual variable expenses they've experienced? Is that right? Is that to compensate for typical expenses that just haven't been experienced by a generator yet? Like how you know the cost of maintenance on a new car isn't $0/mi even if it might look that way for the first few years of operations? Are the expenses small enough (relative to fuel) and/or uniform enough across different plants of a given technology that it's not really worth trying to extract the particular per-plant expenses? Is the effect of expected but as of yet unrealized O&M large enough that these categorical estimates are more useful than real per-plant expenditures?

@gschivley
Copy link
Contributor

@alanawlsn and @zaneselvans Let me know if there's anything I can do to help with the cost calculations.

@zaneselvans
Copy link
Member

Okay, I've merged together annualized records from FERC and EIA on the basis of their report_year, plant_id_pudl and primary fuel type, and plotted some of the more interesting values which are available in both datasets (capacity, fuel cost, total heat content of fuel consumed, net generation), as well as some derived values (heat rate, fuel cost per MWh and mmBTU, capacity factor) against each other, separated out for the coal and gas portions of each of the power plants. The results are below.

One thing that I noted: there were only about 1450 records shared between the two datasets, which seems kind of small (this data is for 2009-2017, the years which we have for both of them). Now in retrospect I realize this is probably (yet again) an artifact of the NA values that are common in the EIA data wiping out a bunch of the aggregated values.

Generally it looks better than I expected it would. Thoughts? @alanawlsn @gschivley @cmgosnell

eia_vs_ferc_capacity_mw
eia_vs_ferc_net_generation_mwh
eia_vs_ferc_total_mmbtu
eia_vs_ferc_opex_fuel
eia_vs_ferc_capacity_factor
eia_vs_ferc_fuel_cost_per_mmbtu
eia_vs_ferc_fuel_cost_per_mwh
eia_vs_ferc_heat_rate_mmbtu_mwh

@gschivley
Copy link
Contributor

I'm not as familiar with FERC data - who is required to report to them? The plots do show a nice agreement.

zaneselvans added a commit that referenced this issue Mar 29, 2019
Each record in the FERC Form 1 corresponds to a particular type of fuel.
Many plants -- especially coal plants -- use more than one fuel, with
gas and/or diesel serving as startup fuels. In order to be able to
classify the type of plant based on relative proportions of fuel
consumed or fuel costs it is useful to aggregate these per-fuel records
into a single record for each plant.

Fuel cost (in nominal dollars) and fuel heat content (in mmBTU) are
calculated for each fuel based on the cost and heat content per unit,
and the number of units consumed, and then summed by fuel type (there
can be more than one record for a given type of fuel in each plant
because we are simplifying the fuel categories). The per-fuel records
are then pivoted to create one column per fuel type. The total is summed
and stored separately, and the individual fuel costs & heat contents are
divided by that total, to yield fuel proportions.  Based on those
proportions and a minimum threshold that's passed in, a "primary" fuel
type is then assigned to the plant-year record and given a string label.

Also required for FERC non-fuel OpEx regressions in #204
@zaneselvans zaneselvans modified the milestones: 0.1.0, 0.2.0 Jun 28, 2019
@zaneselvans zaneselvans modified the milestones: 0.3.0, future_release Sep 23, 2019
@cmgosnell cmgosnell removed this from the future_release milestone Oct 4, 2019
@cmgosnell
Copy link
Member

closing because this it no longer relevant. we've generally learned that it is difficult to impossible to categorize the specific O&M lines in FERC as fixed and variable O&M. We have been employing NEMS' breakdown of fixed a variable O&M. See example here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis Data analysis tasks that involve actually using PUDL to figure things out, like calculating MCOE. ferc1 Anything having to do with FERC Form 1
Projects
None yet
Development

No branches or pull requests

5 participants