# Generate Inputs for MCOE Calculations
One of the main things that we're building PUDL to do initially is enable the calculation of a marginal cost of electricity (MCOE) for the individual generators within plants. To do this, we need to bring together information from both FERC Form 1 and the EIA Form 923.

Given a target utility, we want to extract and summarize the following information, required to calculate the MCOE on a per plant and per generating unit basis:

* Generator capacities.
* Annual historical generation (in MWh) by each generator.
* Annual historical capacity factor for each generator.
* Annual historical heat rate (mmBTU/MWh) for each generator.
* Annual historical fuel costs per unit generation (\$/MWh) for each generator.
* Annual historical non-fuel production costs (\$/MWh) for each generator.
* Annual historical variable O&M costs (\$/MWh) for each generator.
* Annual historical fixed O&M costs by generator.

Some of these can be extracted directly from the tables we have, some will need to be calculated, and some will be our best estimates or inferrences.

In [4]:
import sys
import os
sys.path.append(os.path.abspath(os.path.join('..','..')))
from pudl import pudl, ferc1, eia923, settings, constants
from pudl import models, models_ferc1, models_eia923
from pudl import clean_eia923, clean_ferc1, clean_pudl
from pudl import analysis
import numpy as np
import pandas as pd
import sqlalchemy as sa
import matplotlib.pyplot as plt
%matplotlib inline
%load_ext autoreload
%autoreload
pudl_engine = pudl.db_connect_pudl()

# Where does this data live?
PUDL DB tables that hold the information we're looking for:

### Cost Data:
* Cost of Fuel Delivered: `fuel_receipts_costs_eia923` and `fuel_ferc1`
* Non-Fuel Production Costs: `plants_steam_ferc1`
* Variable O&M Costs: `plants_steam_ferc1`
* Fixed O&M Costs: `plants_steam_ferc1`

### Generation Data:
* Annual Generation by Plant: `plants_steam_ferc1` and `generation_eia923`
* Monthly Generation by Unit: `generation_eia923`

### Fuel Use Data:
* Monthly Fuel Consumed by Unit: `generation_fuel_eia923`
* Annual Fuel Consumed by Plant: `fuel_ferc1`

### Capacity Data:
* Nameplate Capacity by Plant: `plants_steam_ferc1` and `plant_info_eia923`

# Identifying Plants of Interest

Given a utility ID (either a FERC respondent_id, an EIA operator_id, or a PUDL utility_id), generate a list of plants which are associatd with that utility, whose data we will pull from the database.

In [5]:
# These IDs can be generated from each other based on the glue tables. But for now, let's assume that we know them.
operator_id_eia923 = 15466
respondent_id_ferc1 = 145
utility_id_pudl = 272

# Extracting FERC Form 1 Fuel & Large Plant Data
We're going to pull information about a respondent, their large steam plants, and the fuel usage at those plants into a single dataframe for analysis.

In [267]:
pudl_tables = models.PUDLBase.metadata.tables
respondents_ferc1_tbl = pudl_tables['utilities_ferc1']
plants_ferc1_tbl = pudl_tables['plants_ferc1']
fuel_ferc1_tbl = pudl_tables['fuel_ferc1']
plants_steam_ferc1_tbl = pudl_tables['plants_steam_ferc1']

# We need to pull the fuel information separately, because it has several entries for
# each plant for each year -- we'll groupby() plant before merging it with the steam plant info
fuel_ferc1_select = sa.sql.select([
    fuel_ferc1_tbl.c.report_year,
    respondents_ferc1_tbl.c.respondent_id,
    respondents_ferc1_tbl.c.util_id_pudl,
    respondents_ferc1_tbl.c.respondent_name,
    plants_ferc1_tbl.c.plant_id_pudl,
    fuel_ferc1_tbl.c.plant_name,
    fuel_ferc1_tbl.c.fuel,
    fuel_ferc1_tbl.c.fuel_qty_burned,
    fuel_ferc1_tbl.c.fuel_avg_mmbtu_per_unit,
    fuel_ferc1_tbl.c.fuel_cost_per_unit_burned,
    fuel_ferc1_tbl.c.fuel_cost_per_unit_delivered,
    fuel_ferc1_tbl.c.fuel_cost_per_mmbtu,
    fuel_ferc1_tbl.c.fuel_cost_per_mwh,
    fuel_ferc1_tbl.c.fuel_mmbtu_per_mwh]).\
    where(sa.sql.and_(respondents_ferc1_tbl.c.respondent_id == respondent_id_ferc1,
                      fuel_ferc1_tbl.c.respondent_id == respondent_id_ferc1,
                      plants_ferc1_tbl.c.respondent_id == respondent_id_ferc1,
                      plants_ferc1_tbl.c.plant_name == fuel_ferc1_tbl.c.plant_name))
    
fuel_df = pd.read_sql(fuel_ferc1_select, pudl_engine)

# Add some columns with totals so we can sum things up...
fuel_df['fuel_burned_mmbtu_total'] = fuel_df['fuel_qty_burned']*fuel_df['fuel_avg_mmbtu_per_unit']
fuel_df['fuel_burned_cost_total'] = fuel_df['fuel_qty_burned']*fuel_df['fuel_cost_per_unit_burned']

In [268]:
fuel_merge = fuel_df[['report_year','plant_id_pudl','plant_name']]
fuel_merge = fuel_merge.drop_duplicates(subset=['report_year','plant_id_pudl'])

gb_plant_yr = fuel_df.groupby(['plant_id_pudl','report_year'])

mmbtu_sum = pd.DataFrame(gb_plant_yr['fuel_burned_mmbtu_total'].sum())
cost_sum = pd.DataFrame(gb_plant_yr['fuel_burned_cost_total'].sum())

fuel_merge = fuel_merge.merge(mmbtu_sum, left_on=['plant_id_pudl','report_year'], right_index=True)
fuel_merge = fuel_merge.merge(cost_sum, left_on=['plant_id_pudl','report_year'], right_index=True)

In [270]:
def ferc1_expns_corr(capacity_factor, pudl_engine):
    steam_df = pd.read_sql('SELECT * FROM plants_steam_ferc1', pudl_engine)
    steam_df['capacity_factor'] = \
        (steam_df['net_generation_mwh'] / 8760 * steam_df['total_capacity_mw'])

    # Limit plants by capacity factor
    steam_df = steam_df[steam_df['capacity_factor'] > capacity_factor]
    production_expns = {}
    for expns in steam_df.filter(regex='expns').columns.tolist():
        mwh_plants = steam_df.net_generation_mwh[steam_df[expns] != 0]
        expns_plants = steam_df[expns][steam_df[expns] != 0]
        production_expns[expns] = np.corrcoef(mwh_plants, expns_plants)[0, 1]

    return(production_expns)

In [269]:
steam_ferc1_select = sa.sql.select([
    plants_steam_ferc1_tbl.c.report_year,
    respondents_ferc1_tbl.c.respondent_id,
    respondents_ferc1_tbl.c.util_id_pudl,
    respondents_ferc1_tbl.c.respondent_name,
    plants_ferc1_tbl.c.plant_id_pudl,
    plants_steam_ferc1_tbl.c.plant_name,
    plants_steam_ferc1_tbl.c.total_capacity_mw,
    plants_steam_ferc1_tbl.c.net_generation_mwh,
    plants_steam_ferc1_tbl.c.expns_operations,
    plants_steam_ferc1_tbl.c.expns_fuel,
    plants_steam_ferc1_tbl.c.expns_coolants,
    plants_steam_ferc1_tbl.c.expns_steam,
    plants_steam_ferc1_tbl.c.expns_steam_other,
    plants_steam_ferc1_tbl.c.expns_transfer,
    plants_steam_ferc1_tbl.c.expns_electric,
    plants_steam_ferc1_tbl.c.expns_misc_power,
    plants_steam_ferc1_tbl.c.expns_rents,
    plants_steam_ferc1_tbl.c.expns_allowances,
    plants_steam_ferc1_tbl.c.expns_engineering,
    plants_steam_ferc1_tbl.c.expns_structures,
    plants_steam_ferc1_tbl.c.expns_boiler,
    plants_steam_ferc1_tbl.c.expns_plants,
    plants_steam_ferc1_tbl.c.expns_misc_steam,
    plants_steam_ferc1_tbl.c.expns_production_total,
    plants_steam_ferc1_tbl.c.expns_per_mwh]).\
    where(sa.sql.and_(respondents_ferc1_tbl.c.respondent_id == respondent_id_ferc1,
                      plants_steam_ferc1_tbl.c.respondent_id == respondent_id_ferc1,
                      plants_ferc1_tbl.c.respondent_id == respondent_id_ferc1,
                      plants_ferc1_tbl.c.plant_name == plants_steam_ferc1_tbl.c.plant_name))

steam_df = pd.read_sql(steam_ferc1_select, pudl_engine)

# What additional columns do we need for aggregation purposes?
# net_generation_mwh

# total_fuel

In [348]:
expns_corr = ferc1_expns_corr(0.6, pudl_engine)

steam_common_cols = ['report_year','plant_id_pudl','plant_name','total_capacity_mw']
steam_bad_cols = ['expns_per_mwh','expns_production_total']

# Remove the expns_* columns that we don't want
for key in steam_bad_cols:
    x = expns_corr.pop(key, None)

#For now using correlation with net_generation > 0.5 as indication of "production" vs. non-produciton
production_expns = [k for k in expns_corr.keys() if expns_corr[k] >= 0.5]
nonproduction_expns = [k for k in expns_corr.keys() if expns_corr[k] < 0.5]

steam_expns = steam_df[steam_common_cols].copy()
steam_expns['expns_total_production'] = steam_df[production_expns].copy().sum(axis=1)
steam_expns['expns_total_nonproduction'] = steam_df[nonproduction_expns].copy().sum(axis=1)

steam_prod_gb = steam_expns.groupby(['plant_id_pudl','report_year'])

#steam_prod_gb['net_generation_mwh'].sum()
#expns_fuel_sum = pd.DataFrame(gb_plant_yr['expns_fuel'].sum())

#steam_merge = steam_merge.merge(net_generation_mwh_sum, left_on=['plant_id_pudl','report_year'], right_index=True)
#steam_merge = steam_merge.merge(expns_fuel_sum, left_on=['plant_id_pudl','report_year'], right_index=True)

In [349]:
steam_expns

Unnamed: 0,report_year,plant_id_pudl,plant_name,total_capacity_mw,expns_total_production,expns_total_nonproduction
0,2007,20,Arapahoe,144.00,25568229.0,7040047.0
1,2007,1078,Cameo,66.00,13148381.0,3548908.0
2,2007,105,Cherokee,710.00,103117204.0,14856031.0
3,2007,122,Comanche,700.00,67108063.0,8652836.0
4,2007,136,Craig,86.90,10778040.0,1318182.0
5,2007,272,Hayden,447.00,40279865.0,4350909.0
6,2007,440,Pawnee,547.00,49898785.0,8634141.0
7,2007,626,Valmont 5,166.25,31168324.0,3960241.0
8,2007,661,Zuni,101.00,2983902.0,591003.0
9,2007,13,Alamosa,33.20,188556.0,97736.0


In [227]:
df = pd.DataFrame(columns=[['prod','prod','non_prod','non_prod'],
                           ['fuel','boilers','rents','staff']])

In [6]:
Session = sa.orm.sessionmaker()
Session.configure(bind = pudl_engine)
session = Session()

plant_pudl_id=122

Unnamed: 0,plant_id,plant_name,plant_id_pudl
0,470,Comanche (CO),122


In [13]:
mine_by_plant = pd.read_sql('''SELECT frc.plant_id,
                                      frc.report_date,
                                      plants_eia923.plant_name,
                                      frc.fuel_quantity,
                                      frc.average_heat_content,
                                      frc.supplier,
                                      frc.fuel_cost,
                                      cmi.coalmine_name,
                                      cmi.coalmine_type,
                                      cmi.coalmine_state,
                                      cmi.coalmine_county,
                                      cmi.coalmine_msha_id
                                FROM fuel_receipts_costs_eia923 AS frc, coalmine_info_eia923 AS cmi, plants_eia923
                                WHERE frc.coalmine_id=cmi.id
                                  AND plants_eia923.plant_id=frc.plant_id''', pudl_engine)

In [14]:
mine_by_plant

Unnamed: 0,plant_id,report_date,plant_name,fuel_quantity,average_heat_content,supplier,fuel_cost,coalmine_name,coalmine_type,coalmine_state,coalmine_county,coalmine_msha_id
