### Energy Innovation MCOE Compilation

- <a href=#setup>Setup</a>
- <a href=#data_out>Data Outputs</a>
    * <a href=#part1>Part 1: Basic Plant & Unit Information</a>
    * <a href=#part2>Part 2: Cost Data</a>
    * Part 3: Emissions & Public Health Data

-------------

## <a id='setup'>Setup</a>

In [63]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [224]:
import pudl
import sqlalchemy as sa
from ei_mcoe import *
import sys
#import logging

In [225]:
# # basic setup for logging
# logger = logging.getLogger()
# logger.setLevel(logging.INFO)
# handler = logging.StreamHandler(stream=sys.stdout)
# formatter = logging.Formatter('%(message)s')
# handler.setFormatter(formatter)
# logger.handlers = [handler]
# pd.options.display.max_columns = None

In [66]:
pudl_settings = pudl.workspace.setup.get_defaults()
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])

In [68]:
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine, freq='AS', rolling=True)

-----------

## <a id='data_out'>Data Outputs</a>

### <a id='part1'>Part 1: Plant & Unit Level Data</a>
EIA 860 and 923 generator-level data aggregated by either plant or unit and subdivided by broad fuel type (coal, gas, oil, waste). Generator age calculated by weighted average (capacity as weight) and heat rate calculated by weighted average (net generation as weight). Capacity and net generation calculated by summing generator-level data. For purely qualitative information (just plant name and location) add enter [`drop_calcs=True`] to the parameters.

**Plant Level Table**

In [95]:
#plant_df = ei_mcoe.part1_main(pudl_out, 'plant', drop_calcs=True)
#test_segment(plant_df)

**Unit Level Table**

In [227]:
unit_df = ei_mcoe.part1_main(pudl_out, 'unit')
test_segment(unit_df)

beginning date conversion
calculating generator age
regrouping data
calculating weighted average for generator_age_years
calculating weighted average for heat_rate_mmbtu_mwh
Finished Part 1 unit level compilation


Unnamed: 0,plant_id_pudl,plant_id_eia,unit_id_pudl,fuel_type_code_pudl,report_year,total_fuel_cost,net_generation_mwh,capacity_mw,weighted_ave_generator_age_years,weighted_ave_heat_rate_mmbtu_mwh,count,state,city,latitude,longitude
342,32,3,7.0,gas,2017,97585150.0,4217873.0,535.4,17.0,6.917677,3,AL,Bucks,31.0069,-88.0103
278,32,3,1.0,gas,2017,657705.2,7221.0,153.1,63.0,27.23353,1,AL,Bucks,31.0069,-88.0103
315,32,3,6.0,gas,2017,98502010.0,4199100.0,535.4,17.0,7.013889,3,AL,Bucks,31.0069,-88.0103
282,32,3,2.0,gas,2017,620610.0,7498.0,153.1,63.0,24.748185,1,AL,Bucks,31.0069,-88.0103
296,32,3,5.0,coal,2017,77744900.0,2710308.0,788.8,46.0,9.881649,1,AL,Bucks,31.0069,-88.0103
291,32,3,4.0,coal,2017,22790380.0,722554.0,403.7,48.0,10.865694,1,AL,Bucks,31.0069,-88.0103
290,32,3,4.0,coal,2016,39501030.0,1122258.0,403.7,47.0,10.544656,1,AL,Bucks,31.0069,-88.0103
333,32,3,7.0,gas,2016,85211260.0,4267917.0,535.4,16.0,6.934648,3,AL,Bucks,31.0069,-88.0103
306,32,3,6.0,gas,2016,83751270.0,4133443.0,535.4,16.0,7.037572,3,AL,Bucks,31.0069,-88.0103
295,32,3,5.0,coal,2016,106445800.0,3235623.0,788.8,45.0,9.855693,1,AL,Bucks,31.0069,-88.0103


### <a id='part2'>Part 2: Cost Data</a>

Cost and generation data from EIA-860, EIA-923, and FERC Form 1 subdivided by plant and broad fuel type. The fuel-type breakdown for FERC Form 1 plants is determined by the EIA fuel breakdown for plants of the same pudl id. MCOE is calculated using data from the following sources:

##### MCOE Variable Origins (as named in original database)
- Fuel cost = **EIA-923**: [`total_fuel_cost`]
- MW Capacity = **EIA-860**: [`capacity_mw`]
- Net MWh Generated = **EIA-923**: [`net_generation_mwh`]
- Variable O&M = **FERC Form 1**: [`opex_production_total`] - [`opex_fuel`]
- Fixed O&M = **FERC Form 1**: [`capex_total`]

##### Data Flags
[`sig_hr`] - a field indicating whether a plant fuel type contains units that comprise more than an even share of the fuel type's heat rate within the given plant. I.e. if the coal portion of a plant has 4 units, [`sig_hr`] will appear [`True`] if the heat rate of one of those unit's is more than 1/4 of the total for coal units at that plant.

**MCOE Table**

In [226]:
mcoe_df = part2_main(pudl_out)
test_segment(mcoe_df)

beginning date conversion
calculating generator age
readying eia fuel pct data to merge with ferc
building eia table broken down by plant and fuel type
regrouping data
regrouping data
calculating eia fuel type percentages
turning eia fuel percent values for net_generation_mwh into columns
turning eia fuel percent values for capacity_mw into columns
building FERC table broken down by plant
regrouping data
merging FERC data with EIA pct data
building FERC table broken down by plant and fuel type
melting FERC pct data back to row values
melting FERC pct data back to row values
building eia table broken down by plant and fuel type
regrouping data
merging FERC and EIA data on plant and fuel type
beginning date conversion
calculating generator age
regrouping data
calculating weighted average for generator_age_years
calculating weighted average for heat_rate_mmbtu_mwh
Finished Part 1 unit level compilation
comparing heat rates internally
regrouping data
checking df length compatability
Finish

Unnamed: 0,plant_id_pudl,fuel_type_code_pudl,report_year,fuel_cost_mwh_eia923,variable_om_mwh_ferc1,fixed_om_mwh_ferc1,mcoe,sig_hr
97,32,coal,2017,29.286141,9.207508,276.119975,329311.564142,True
100,32,gas,2017,23.407577,9.207508,129.812044,178783.800261,True
96,32,coal,2016,33.490314,7.350057,235.277926,280609.766978,True
99,32,gas,2016,20.111331,7.350057,109.586468,117372.651362,True
95,32,coal,2015,33.210894,8.756509,221.199598,323988.778939,True
98,32,gas,2015,22.276422,8.756509,125.497311,153627.191411,True
94,32,coal,2013,47.650233,41.811372,1022.052053,412691.875258,False
93,32,coal,2012,48.186988,13.276031,321.889506,570031.210571,True
92,32,coal,2011,44.667848,18.222452,360.95903,639213.044735,True


#### Data Validation FERC vs. EIA

In [77]:
# Snatched from ferc1-eia923-comparison notebook
# FERC1 data merge 

fuel_ferc1 = pudl_out.fuel_ferc1()#[[
    #'report_year',
    #'plant_id_pudl',
    #'fuel_type_code_pudl',
    #'fuel_consumed_mmbtu',
    #'fuel_consumed_total_cost',
    #'fuel_cost_per_mmbtu'
#]]
steam_ferc1 = pudl_out.plants_steam_ferc1()#[[
    #'report_year',
    #'plant_id_pudl',
    #'capacity_mw',
    #'net_generation_mwh'
#]]

nf = pudl.transform.ferc1.fuel_by_plant_ferc1(pudl_out.fuel_ferc1())

key_cols = [
    'report_year',
    'utility_id_ferc1',
    'plant_name_ferc1',
]
ferc1_plants = (
    pd.merge(nf, steam_ferc1, on=key_cols, how='inner').
    assign(heat_rate_mmbtu_mwh=lambda x: x.fuel_mmbtu / x.net_generation_mwh).
    merge(steam_ferc1[key_cols+['utility_id_pudl', 'utility_name_ferc1','plant_id_pudl', 'plant_id_ferc1']]))
    #query(f'report_year >= {start_year}')

In [109]:
ferc_small = ferc1_plants[[
    'report_year',
    'utility_id_ferc1',
    'plant_name_ferc1',
    'primary_fuel_by_mmbtu',
    'plant_id_pudl',
    'capacity_mw',
    'net_generation_mwh',
    'opex_fuel',
    'fuel_cost']]

In [87]:
#ferc1_plants.columns.to_list()

In [110]:
# example of difficult data

ferc_small.loc[(ferc_small['plant_id_pudl']==123) & (ferc_small['report_year']==2016)]

Unnamed: 0,report_year,utility_id_ferc1,plant_name_ferc1,primary_fuel_by_mmbtu,plant_id_pudl,capacity_mw,net_generation_mwh,opex_fuel,fuel_cost
8488,2016,89,columbia 1,coal,123,112.6,463964.0,12648598.0,12363170.0
8513,2016,89,columbia 2,coal,123,112.4,624504.0,16205510.0,15919290.0
8538,2016,89,columbia total,coal,123,225.0,1088468.0,28854108.0,28282270.0
18056,2016,194,columbia 1 (all),coal,123,556.0,2221726.967,59148521.0,59149470.0
18059,2016,194,columbia 1 (wpl),coal,123,256.9,1069401.614,28674248.0,28672640.0
18062,2016,194,columbia 2 (all),coal,123,556.0,2755172.549,70857110.0,70854110.0
18065,2016,194,columbia 2 (wpl),coal,123,256.9,1264637.996,32529438.0,32528110.0
18750,2016,195,columbia 1 & 2,coal,123,335.2,1577770.0,42492965.0,42496010.0


In [None]:
#ferc1_steam_count = ferc1_steam.groupby(
#    ['plant_id_pudl','report_year']).size().reset_index(name='count')

#ferc = ferc_small.groupby(['plant_id_pudl','primary_fuel_by_mmbtu','report_year']).size().reset_index(name='count')
#ferc.sort_values('count',ascending=False)
#123

In [62]:
ferc1_merge = pd.merge(ferc_fuel,ferc_steam,on=['plant_id_pudl','report_year'],how='outer')

In [105]:
eia_subset = mcoe[[
    'plant_id_pudl',
    'unit_id_pudl',
    'generator_id',
    'fuel_type_code_pudl',
    'report_date',
    'total_mmbtu',
    'capacity_mw',
    'net_generation_mwh',
    'heat_rate_mmbtu_mwh',
]].drop_duplicates()

#eia_by_plant = eia_subset.groupby(['plant_id_pudl','report_year'])
eia_subset = eia_subset.assign(report_year=lambda x: x.report_date.dt.year)

In [108]:
eia_subset.loc[(eia_subset['plant_id_pudl']==123)&(eia_subset['report_year']==2015)]

Unnamed: 0,plant_id_pudl,unit_id_pudl,generator_id,fuel_type_code_pudl,report_date,total_mmbtu,capacity_mw,net_generation_mwh,heat_rate_mmbtu_mwh,report_year
104278,123,1.0,1,coal,2015-01-01,27465940.0,556.0,2528128.0,10.864143,2015
104279,123,2.0,2,coal,2015-01-01,24833700.0,556.0,2331530.0,10.651244,2015
