# Add Overrides to Train FERC-EIA Connecter

The FERC-EIA record linkage process requries training data in order to work properly. Training matches also serve as overrides. This notebook helps you check whether the machine learning algroythem did a good job of matching FERC and EIA records. If you find a good match (or you correct a bad match), this process will turn it into training data.

This notebook has two purposes: 

1) [**Output override tools to verify connection between EIA and FERC1**](#verify-tools)
2) [**Upload changes to training data**](#upload-overrides)

## Settings

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pudl_rmi
from pudl_rmi.create_override_spreadsheets import *
                                           
import pudl
import sqlalchemy as sa
import logging
import sys

import warnings
warnings.filterwarnings('ignore')

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

pudl_settings = pudl.workspace.setup.get_defaults()
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine, freq='AS',fill_fuel_cost=True,roll_fuel_cost=True,fill_net_gen=True)
rmi_out = pudl_rmi.coordinate.Output(pudl_out)

## Specify Utilities & Years

In [3]:
# old

specified_utilities = {
    # 'Dominion': {'utility_id_pudl': [292, 293, 349],
    #              'utility_id_eia': [17539, 17554, 19876]},
    # 'Evergy': {'utility_id_pudl': [159, 160, 161, 1270, 13243],
    #            'utility_id_eia': [10000, 10005, 56211, 3702, 55329]}, # pudl/eia 359/22500 --> 13243/55329, 1270/3702 --> BAD
    # 'IDACORP': {'utility_id_pudl': [140],
    #             'utility_id_eia': [9191]},
    # 'Duke': {'utility_id_pudl': [90, 91, 92, 93, 96, 97],
    #          'utility_id_eia': [5416, 6455, 15470, 55729, 3542, 3046]},
    'BHE': {'utility_id_pudl': [185, 246, 204, 287],
            'utility_id_eia': [12341, 14354, 13407, 17166]},
    'Southern': {'utility_id_pudl': [123, 18, 190, 11830],
                 'utility_id_eia': [7140, 195, 12686, 17622]},
    # 'NextEra': {'utility_id_pudl': [121, 130],
    #             'utility_id_eia': [6452, 7801]},
    # 'AEP': {'utility_id_pudl': [29, 301, 144, 275, 162, 361, 7],
    #         'utility_id_eia': [733, 17698, 9324, 15474, 22053, 20521, 343]},
    # 'Entergy': {'utility_id_pudl': [107, 106, 311, 113, 110],
    #             'utility_id_eia': [11241, 814, 12465, 55937, 13478]},
    # 'Xcel': {'utility_id_pudl': [224, 302, 272, 11297],
    #          'utility_id_eia': [13781, 13780, 17718, 15466]}
}

In [3]:
specified_utilities = {
    'BHE': [12341, 14354, 13407, 17166],
    #'Southern':[7140, 195, 12686, 17622]
}

specified_years = [
    2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
    2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020
] 

<a id='verify-tools'></a>
## 1) Output Override Tools
Run the following function and you'll find excel files called `<UTILITY>_fix_FERC-EIA_overrides.xlsx` in the `outputs/overrides` directory created based on the utility and year inputs you specified above. Read the [Override Instructions](https://docs.google.com/document/d/1nJfmUtbSN-RT5U2Z3rJKfOIhWsRFUPNxs9NKTes0SRA/edit#) to learn how to begin fixing/verifying the FERC-EIA connections.

In [7]:
generate_override_tools(pudl_out, rmi_out, specified_utilities, specified_years)

Generating inputs
Reading the FERC to EIA connection from /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Prepping FERC-EIA table
Reading the EIA plant-parts from /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz
Prepping Plant Parts Table
Grabbing depreciation study output from /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/deprish.pkl.gz
Prepping Deprish Data
Developing outputs for BHE
Getting utility-year subset for ferc_eia
Getting utility-year subset for ppl
Getting utility-year subset for deprish
Outputing table subsets to tabs



<a id='upload-overrides'></a>
## 2) Upload changes to training data
When you've finished editing the `<UTILITY>_fix_FERC-EIA_overrides.xlsx` and want to add your changes to the official override csv, move your file to the directory called `add_to_training` and then run the following function. 

**Note:** If you have changed or marked TRUE any records that have already been overridden and included in the training data, you will want to set `expect_override_overrides = True`. Otherwise, the function will check to see if you have accidentally tampered with values that have already been matched.

Right now, the module points to a COPY of the training data so it doesn't override the official version. You'll need to change that later if you want to update the official version.

In [3]:
validate_and_add_to_training(
    pudl_out, rmi_out, expect_override_overrides=True
)

Reading the FERC to EIA connection from /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Reading the EIA plant-parts from /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz
Processing fixes in BHE_fix_FERC-EIA_overrides_2020.xlsx
Validating overrides
Checking eia record id consistency for values that don't exist
Checking ferc record id consistency for values that don't exist
Checking for duplicate override ids
Checking for mismatched utility ids
Checking that year in override id matches report year
Combining all new overrides with existing training data
Found 119 new overrides
Adding record_id_ferc1 values with no EIA match to null_overrides csv
Found 2 new null matches


In [3]:
rmi_out.ferc1_to_eia(clobber=True)

FERC to EIA granular connection not found at /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz... Generating a new output.
Reading the EIA plant-parts from /Users/aesharpe/Desktop/Work/Catalyst_Coop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz
Preparing the FERC1 tables.
loading steam table
loading small gens table
loading hydro table
loading pumped storage table
prepping steam table
prepping hydro tables
combining all tables
Generated 168541 all candidate features.
Generated 3754 training candidate features.
We are about to test hyper parameters of the model while doing k-fold cross validation. This takes a few minutes....
train: newton-cg: c-1, cw-balanced, p-l2, l1-None
train: newton-cg: c-1, cw-balanced, p-l2, l1-None
train: newton-cg: c-1, cw-balanced, p-l2, l1-None
train: newton-cg: c-1, cw-balanced, p-none, l1-None
train: newton-cg: c-1, cw-balanced, p-none, l1-None
train: newton-cg: c-1, cw-balanced, p-none, l1-None
train: newton-cg: 

Unnamed: 0,record_id_ferc1,record_id_eia,match_type,plant_name_new,plant_part,report_year,report_date,ownership,plant_name_eia,plant_id_eia,generator_id,unit_id_pudl,prime_mover_code,energy_source_code_1,technology_description,ferc_acct_name,utility_id_eia,utility_id_pudl,true_gran,appro_part_label,appro_record_id_eia,record_count,fraction_owned,ownership_dupe,operational_status,operational_status_pudl,plant_id_pudl,total_fuel_cost_eia,fuel_cost_per_mmbtu_eia,net_generation_mwh_eia,capacity_mw_eia,capacity_factor_eia,total_mmbtu_eia,heat_rate_mmbtu_mwh_eia,fuel_type_code_pudl_eia,installation_year_eia,plant_part_id_eia,utility_id_ferc1,utility_name_ferc1,plant_id_ferc1,plant_name_ferc1,asset_retirement_cost,avg_num_employees,capacity_factor_ferc1,capacity_mw_ferc1,capex_equipment,capex_land,capex_per_mw,capex_structures,capex_total,construction_type,construction_year,installation_year_ferc1,net_generation_mwh_ferc1,not_water_limited_capacity_mw,opex_allowances,opex_boiler,opex_coolants,opex_electric,opex_engineering,opex_fuel,fuel_cost_per_mwh,opex_misc_power,opex_misc_steam,opex_nonfuel,opex_nonfuel_per_mwh,opex_operations,opex_per_mwh,opex_plant,opex_production_total,opex_rents,opex_steam,opex_steam_other,opex_structures,opex_transfer,peak_demand_mw,plant_capability_mw,plant_hours_connected_while_generating,plant_type,water_limited_capacity_mw,ferc_license_id,fuel_cost_per_mmbtu_ferc1,fuel_type,opex_maintenance,opex_total,plant_name_clean,total_cost_of_plant,capex_facilities,capex_roads,net_capacity_adverse_conditions_mw,net_capacity_favorable_conditions_mw,opex_dams,opex_generation_misc,opex_hydraulic,opex_misc_plant,opex_water_for_power,capex_equipment_electric,capex_equipment_misc,capex_wheels_turbines_generators,energy_used_for_pumping_mwh,net_load_mwh,opex_production_before_pumping,opex_pumped_storage,opex_pumping,total_fuel_cost_ferc1,total_mmbtu_ferc1,fuel_type_code_pudl_ferc1,plant_id_report_year,plant_id_report_year_util_id,heat_rate_mmbtu_mwh_ferc1,capex_wo_retirement_total,capex_total_shifted,capex_annual_addition,capex_annual_addition_rolling,capex_annual_per_mwh,capex_annual_per_mw,capex_annual_per_kw,capex_annual_per_mwh_rolling,capex_annual_per_mw_rolling,capex_annual_addition_gen_std,capex_annual_addition_gen_mean,capex_annual_addition_diff_mean
0,f1_steam_1994_12_1_0_1,,,,,1994,1994-01-01,,,,,,,,,,,7,,,,,,,,,527,,,,,,,,,,,1,AEP Generating Company,1051,rockport unit 1,,,0.819843,650.0,490684127.0,6395551.0,894688.3,84467746.0,581547424.0,conventional,1984,1984.0,4668184.0,650.0,,3185935.0,,353599.0,427906.0,51694529.0,11.073799,1040610.0,781181.0,8300498.0,1.7781,1032559.0,12.9,631598.0,59995027.0,7559.0,442763.0,,396788.0,,650.0,,,steam,,,,,,,,,,,,,,,,,,,,,,,,,,,,,527_1994,527_1994_7,,581547424.0,,,,,,,,,,,
1,f1_steam_1995_12_1_0_1,,,,,1995,1995-01-01,,,,,,,,,,,7,,,,,,,,,527,,,,,,,,,,,1,AEP Generating Company,1051,rockport u1 aeg,,,0.755040,650.0,490674427.0,6395551.0,894797.6,84548492.0,581618470.0,conventional,1984,1984.0,4299195.0,650.0,,4320386.0,,368314.0,542942.0,47711517.0,11.097779,1037095.0,721517.0,10016176.0,2.329779,1230673.0,13.4,1023977.0,57727693.0,5712.0,427845.0,,337715.0,,650.0,,,steam,,,,,,,,,,,,,,,,,,,,,,,,,,,,,527_1995,527_1995_7,,581618470.0,581547424.0,71046.0,,0.016525,109.301538,0.109302,,,,71046.0,0.0
2,f1_steam_1996_12_1_0_1,,,,,1996,1996-01-01,,,,,,,,,,,7,,,,,,,,,527,,,,,,,,,,,1,AEP Generating Company,1051,rockport unit 1 aeg,,,0.776630,650.0,490043406.0,6472089.0,893923.2,84534613.0,581050108.0,conventional,1984,1984.0,4422134.0,650.0,,3127777.0,,300238.0,1499113.0,48800291.0,11.035462,1054245.0,538077.0,9075434.0,2.052275,1025399.0,13.0,568940.0,57875725.0,4105.0,567317.0,,390223.0,,650.0,,,steam,,,,,,,,,,,,,,,,,,,,,,,,,,,,,527_1996,527_1996_7,,581050108.0,581618470.0,-568362.0,-313666.333333,-0.128527,-874.403077,-0.874403,-0.070931,-482.563590,,-568362.0,0.0
3,f1_steam_1997_12_1_0_1,,,,,1997,1997-01-01,,,,,,,,,,,7,,,,,,,,,527,,,,,,,,,,,1,AEP Generating Company,1051,rockport unit 1 aeg,,,0.731598,650.0,489577814.0,6472089.0,893240.6,84556522.0,580606425.0,conventional,1984,1984.0,4165721.0,650.0,,4389279.0,,351896.0,1968381.0,47929720.0,11.505744,1114758.0,353970.0,10948475.0,2.628231,1165503.0,14.1,735897.0,58878195.0,25.0,551801.0,,316965.0,,650.0,,,steam,,,,,,,,,,,,,,,,,,,,,,,,,,,,,527_1997,527_1997_7,,580606425.0,581050108.0,-443683.0,-495501.000000,-0.106508,-682.589231,-0.682589,-0.118947,-762.309231,,-443683.0,0.0
4,f1_steam_1998_12_1_0_1,,,,,1998,1998-01-01,,,,,,,,,,,7,,,,,,,,,527,,,,,,,,,,,1,AEP Generating Company,1051,rockport unit 1 aeg,,,0.792891,650.0,489104691.0,6472089.0,892510.7,84555187.0,580131967.0,conventional,1984,1984.0,4514723.0,650.0,,2521396.0,,371030.0,1616482.0,51178213.0,11.335848,1125021.0,454483.0,8333946.0,1.845948,998231.0,13.2,417617.0,59512159.0,,491066.0,,338620.0,,650.0,,,steam,,,,,,,,,,,,,,,,,,,,,,,,,,,,,527_1998,527_1998_7,,580131967.0,580606425.0,-474458.0,-156827.000000,-0.105091,-729.935385,-0.729935,-0.034737,-241.272308,,-474458.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51541,f1_steam_2019_12_191_0_2,1240_gt_2019_plant_prime_mover_total_10005,no prediction; training,Gordon Evans GT,plant_prime_mover,2019,2019-01-01,total,Gordon Evans,1240,,,GT,NG,Natural Gas Fired Combustion Turbine,Other,10005,160,True,plant_prime_mover,1240_gt_2019_plant_prime_mover_total_10005,2.0,1.0,False,existing,operating,244,,,303314.0,375.1,0.092308,,,gas,,1240_GT_plant_prime_mover_total_10005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,
51542,f1_gnrt_plant_2019_12_182_0_2,6074_pv_2019_plant_prime_mover_total_56211,no prediction; training,Greenwood PV,plant_prime_mover,2019,2019-01-01,total,Greenwood,6074,5,,PV,SUN,Solar Photovoltaic,Other,56211,161,True,plant_prime_mover,6074_pv_2019_plant_prime_mover_total_56211,2.0,1.0,False,existing,operating,238,,,4545.0,3.0,0.172945,,,solar,,6074_PV_plant_prime_mover_total_56211,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,
51543,f1_steam_2019_12_182_1_3,2098_1_2019_plant_unit_total_56211,no prediction; training,,,2019,2019-01-01,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,
51544,f1_steam_2019_12_191_2_1,60689_2019_plant_total_22500,no prediction; training,Western Plains Wind Farm,plant,2019,2019-01-01,total,Western Plains Wind Farm,60689,1,,WT,WND,Onshore Wind Turbine,Other,22500,359,True,plant,60689_2019_plant_total_22500,1.0,1.0,False,existing,operating,10479,,,1130116.0,280.6,0.459760,,,wind,,60689_plant_total_22500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,,,,,,,,,,,
