# Manually Overriding FERC-EIA Record Linkage

The FERC-EIA record linkage process requries training data in order to work properly. Training matches also serve as overrides. This notebook helps you check whether the machine learning algroythem did a good job of matching FERC and EIA records. If you find a good match (or you correct a bad match), this process will turn it into training data.

This notebook has three purposes: 

- [**Step 1: Output Override Tools:**](#verify-tools) Where you create and output the spreadsheets used to conduct the manual overrides.
- [**Step 2: Validate New Training Data:**](#validate) Where you check that the overrides we made are sound.
- [**Step 3: Upload Changes to Training Data:**](#upload-overrides) Where integrate the overrides into the training data.

## Settings

In [775]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [776]:
import pudl_rmi
from pudl_rmi.create_override_spreadsheets import *
                                           
import pudl
import sqlalchemy as sa
import logging
import sys
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

pudl_settings = pudl.workspace.setup.get_defaults()
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine, freq='AS',fill_fuel_cost=True,roll_fuel_cost=True,fill_net_gen=True)
rmi_out = pudl_rmi.coordinate.Output(pudl_out)

In [None]:
# old

specified_utilities = {
    # 'Dominion': {'utility_id_pudl': [292, 293, 349],
    #              'utility_id_eia': [17539, 17554, 19876]},
    # 'Evergy': {'utility_id_pudl': [159, 160, 161, 1270, 13243],
    #            'utility_id_eia': [10000, 10005, 56211, 25000]},
    # 'IDACORP': {'utility_id_pudl': [140],
    #             'utility_id_eia': [9191]},
    # 'Duke': {'utility_id_pudl': [90, 91, 92, 93, 96, 97],
    #          'utility_id_eia': [5416, 6455, 15470, 55729, 3542, 3046]},
    'BHE': {'utility_id_pudl': [185, 246, 204, 287],
            'utility_id_eia': [12341, 14354, 13407, 17166]},
    'Southern': {'utility_id_pudl': [123, 18, 190, 11830],
                 'utility_id_eia': [7140, 195, 12686, 17622]},
    # 'NextEra': {'utility_id_pudl': [121, 130],
    #             'utility_id_eia': [6452, 7801]},
    # 'AEP': {'utility_id_pudl': [29, 301, 144, 275, 162, 361, 7],
    #         'utility_id_eia': [733, 17698, 9324, 15474, 22053, 20521, 343]},
    # 'Entergy': {'utility_id_pudl': [107, 106, 311, 113, 110],
    #             'utility_id_eia': [11241, 814, 12465, 55937, 13478]},
    # 'Xcel': {'utility_id_pudl': [224, 302, 272, 11297],
    #          'utility_id_eia': [13781, 13780, 17718, 15466]}
}

<a id='verify-tools'></a>
## Step 1: Output Override Tools

In [607]:
specified_utilities = {
    #'BHE': [12341, 14354, 13407, 17166], 
    #'Southern':[7140, 195, 12686, 17622]
    #'Dominion': [17539, 17554, 19876, 5248] # 5248...
    #'Entergy': [11241, 814, 12465, 55937, 13478],
    #'Xcel': [13781, 13780, 17718, 15466],
    #'NextEra': [6452, 7801]
    #'IDACORP': [9191]
    #'Evergy': [10000, 10005, 56211, 22500]
    'Duke': [3046, 3542, 5416, 6455, 15470, 55729]
}

specified_years = [2020
    # 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
    # 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020
]

Run the following function and you'll find excel files called `<UTILITY>_fix_FERC-EIA_overrides.xlsx` in the `outputs/overrides` directory created based on the utility and year inputs you specified above. Read the [Override Instructions](https://docs.google.com/document/d/1nJfmUtbSN-RT5U2Z3rJKfOIhWsRFUPNxs9NKTes0SRA/edit#) to learn how to begin fixing/verifying the FERC-EIA connections.

In [608]:
generate_all_override_spreadsheets(pudl_out, rmi_out, specified_utilities, specified_years)

Generating inputs
Reading the FERC to EIA connection from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Prepping FERC-EIA table
Adding pct diff col: net_generation_mwh_pct_diff
Adding pct diff col: capacity_mw_pct_diff
Adding pct diff col: capacity_factor_pct_diff
Adding pct diff col: total_fuel_cost_pct_diff
Adding pct diff col: total_mmbtu_pct_diff
Adding pct diff col: fuel_cost_per_mmbtu_pct_diff
Reading the EIA plant-parts from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz
Prepping Plant Parts Table
Grabbing depreciation study output from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/deprish.pkl.gz
Prepping Deprish Data
Developing outputs for Duke
Getting utility-year subset for ferc_eia
Getting utility-year subset for ppl
Getting utility-year subset for deprish
Outputing table subsets to tabs



<a id='validate'></a>
## Step 2: Validate New Training Data

Once you've finished checking the maps, make sure everything you want to validate is set to `verified=TRUE`. Then, move the file into the add_to_training folder and run the following function:

In [897]:
# Define function inputs
ferc1_eia_df = rmi_out.ferc1_to_eia()
ppl_df = rmi_out.plant_parts_eia().reset_index()
utils_df = pudl_out.utils_eia860()
training_df = pd.read_csv(pudl_rmi.TRAIN_FERC1_EIA_CSV)
path_to_overrides = pudl_rmi.INPUTS_DIR / "add_to_training" 

override_files = os.listdir(path_to_overrides)
override_files = [file for file in override_files if file.endswith(".xlsx")]

Reading the FERC to EIA connection from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Reading the EIA plant-parts from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz


In [915]:
training_df

Unnamed: 0,record_id_eia,record_id_ferc1,signature_1,signature_2,notes
0,2707_hy_2018_plant_prime_mover_total_3046,f1_hydro_2018_12_17_0_1,OIU,,
1,2707_hy_2019_plant_prime_mover_total_3046,f1_hydro_2019_12_17_0_1,OIU,,
2,3266_2019_plant_total_5416_retired,f1_hydro_2019_12_45_2_1,OIU,,
3,6113_2018_plant_owned_15470,f1_steam_2018_12_144_0_3,OIU,,
4,1001_gt_2018_plant_prime_mover_total_15470,f1_steam_2018_12_144_0_4,OIU,,
...,...,...,...,...,...
4299,822_2019_plant_total_9191,f1_hydro_2019_12_70_2_1,JER,SW,upper salmon in FERC = upper salmon A + upper ...
4300,821_2020_plant_total_9191,f1_hydro_2020_12_70_1_5,AS,,
4301,818_2020_plant_total_9191,f1_hydro_2020_12_70_2_2,AS,,
4302,816_2020_plant_total_9191,f1_hydro_2020_12_70_2_4,AS,,


In [892]:
logger.setLevel(logging.INFO)

for file in override_files:
    if not file.startswith("~$"):
        print(f"VALIDATING {file} ************** ")
        file_df = pd.read_excel(path_to_overrides / file)

        validate_override_fixes(
            validated_connections=file_df,
            utils_eia860=utils_df,
            ppl=ppl_df,
            ferc1_eia=ferc1_eia_df,
            training_data=training_df,
            expect_override_overrides=True,
            expect_utility_missmatch=True
        )
    print(" ")

VALIDATING Evergy_fix_FERC-EIA_overrides - sbw 07172022 (1).xlsx ************** 
Checking record_id_eia_override_1 consistency for values that don't exist
Checking record_id_ferc1 consistency for values that don't exist
Checking for duplicate override ids
Checking for mismatched utility ids
Found the following utility missmatches. Make sure you approve them all! 
                                                 self  other
record_id_ferc1               plant_name_ferc1              
f1_gnrt_plant_2005_12_182_0_1 westar wind       161.0   6824
f1_gnrt_plant_2006_12_182_0_1 westar wind       161.0   6824
f1_gnrt_plant_2007_12_182_0_1 westar wind       161.0  11098
f1_steam_2005_12_191_1_4      gordon evans ctf  359.0  10743
f1_steam_2005_12_182_1_5      pueblo            161.0   6824
...                                               ...    ...
f1_steam_2015_12_191_0_2      gordon evans ctf  359.0    160
f1_steam_2016_12_191_0_2      gordon evans ctf  359.0    160
f1_steam_2017_12_191_0_2

In [None]:
logger.debug("Checking for mismatched utility ids")
compare_evergy_utils = evergy.merge(
    ppl_df[["record_id_eia", "utility_id_eia"]].drop_duplicates(),
    left_on="record_id_eia_override_1",
    right_on="record_id_eia",
    how="left",
    suffixes=("", "_ppl"),
)

# Now merge the utility_id_pudl from EIA in so that you can compare it with the
# utility_id_pudl from FERC that's already in the overrides
compare_evergy_utils = compare_evergy_utils.merge(
    utils_df[["utility_id_eia", "utility_id_pudl"]].drop_duplicates(),
    left_on="utility_id_eia_ppl",
    right_on="utility_id_eia",
    how="left",
    suffixes=("", "_utils"),
)

compare_evergy_utils.set_index(["record_id_ferc1", "plant_name_ferc1"])["utility_id_pudl"].compare(
    compare_evergy_utils.set_index(["record_id_ferc1", "plant_name_ferc1"])["utility_id_pudl_utils"]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,self,other
record_id_ferc1,plant_name_ferc1,Unnamed: 2_level_1,Unnamed: 3_level_1
f1_gnrt_plant_2005_12_182_0_1,westar wind,161.0,6824
f1_gnrt_plant_2006_12_182_0_1,westar wind,161.0,6824
f1_gnrt_plant_2007_12_182_0_1,westar wind,161.0,11098
f1_steam_2005_12_191_1_4,gordon evans ctf,359.0,10743
f1_steam_2005_12_182_1_5,pueblo,161.0,6824
...,...,...,...
f1_steam_2015_12_191_0_2,gordon evans ctf,359.0,160
f1_steam_2016_12_191_0_2,gordon evans ctf,359.0,160
f1_steam_2017_12_191_0_2,gordon evans ctf,359.0,160
f1_steam_2018_12_191_0_2,gordon evans ctf,359.0,160


In [239]:
compare_evergy_utils[compare_evergy_utils["utility_id_pudl"]!=compare_evergy_utils["utility_id_pudl_utils"]][
    ["record_id_eia_override_1",
     "record_id_eia_override_2",
     "utility_name_ferc1",
     "utility_id_pudl",
     "utility_id_pudl_utils",
     "notes"]].drop_duplicates(subset=["utility_id_pudl", "utility_id_pudl_utils", "notes"])

Unnamed: 0,record_id_eia_override_1,record_id_eia_override_2,utility_name_ferc1,utility_id_pudl,utility_id_pudl_utils,notes
1,56219_2005_plant_owned_770,,KCP&L Greater Missouri Operations Company,161,6824,"wrong utility, right plant"
4,56219_2006_plant_owned_770,,KCP&L Greater Missouri Operations Company,161,6824,"wrong utility, right plant; ferc net gen wrong..."
7,56219_2007_plant_owned_12695,,KCP&L Greater Missouri Operations Company,161,11098,"wrong utility, right plant"
28,1240_2005_plant_owned_10015,,"Westar Energy, Inc.",359,10743,"wrong utility, right plant"
50,6516_2008_plant_total_56146,,KCP&L Greater Missouri Operations Company,161,43,"wrong utility, right plant"
53,1230_gt_2007_plant_prime_mover_total_56032,,KCP&L Greater Missouri Operations Company,161,13455,"wrong utility, right plant"
71,460_ic_2008_plant_prime_mover_total_56146,,KCP&L Greater Missouri Operations Company,161,43,"wrong utility, right plant, net gen off"
181,2098_gt_2005_plant_prime_mover_total_770,,KCP&L Greater Missouri Operations Company,161,6824,"wrong utility, right plant part"
196,6065_2005_plant_owned_17881,,KCP&L Greater Missouri Operations Company,161,7308,"wrong utility, right plant part"
202,6065_2007_plant_owned_761,,KCP&L Greater Missouri Operations Company,161,7232,"wrong utility, right plant part"


## Step 2.1: Examine Overrides More Closely

This allows you to look at the overrides file you made in pandas vs. excel. It will run the best match column calculations on the file so you can scrutinize your overrides

In [774]:
check_overrides_dict = {}
for file in override_files:
    if not file.startswith("~$"):
        file_df = pd.read_excel(path_to_overrides / file)
        logger.info(f"Creating a closer look at {file}")
        logger.info(" ")
        check_overrides_dict[file.split("_")[0]] = compare_override_matches(file_df, ppl_df)
        logger.info(" ")

FileNotFoundError: [Errno 2] No such file or directory: '/Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/inputs/add_to_training/all_years_DUKE_fix_FERC-EIA_overrides.xlsx'

In [345]:
check_overrides_dict.keys()
nex = check_overrides_dict["NextEra"]

## Step 2.2: Check PPL for Matches

If you're programatically adept, sometimes it's easier to just search the PPL for the records you're looking for rather than the spreadsheet. Especially when there is a record that may have fallen threw the cracks and is assigned to a different utility or we updated the PPL since you made the spreadsheet.

In [610]:
useful_cols = [
     "true_gran",
     "ownership_dupe",
     "record_id_eia", 
     "plant_id_eia", 
     "utility_id_eia", 
     "report_year", 
     "generator_id", 
     "plant_name_new", 
     "capacity_mw", 
     "net_generation_mwh",
     "installation_year",
     "technology_description",
]

In [773]:
ppl_df[
    #(ppl_df["record_id_eia"]=="3283_2006_plant_total_17539")
    #(ppl_df["plant_id_eia"]==6043)
    (ppl_df["plant_id_pudl"]==145)
    #ppl_df["plant_name_new"].str.contains("Cape Fear")
    & (ppl_df["report_date"].dt.year.isin([2005]))
    #& (ppl_df["utility_id_eia"]==3542)
    #&(ppl_df["capacity_mw"]==75)
    #& (ppl_df["net_generation_mwh"] > 1900)
    #& (ppl_df["net_generation_mwh"] < 2000)
    #& (ppl_df["capacity_mw"]> 500)
    #& (ppl_df["capacity_mw"]>1000)
    #& (ppl_df["technology_description"].str.contains("Solar"))
    & (ppl_df["true_gran"])
    & (ppl_df["ownership_dupe"]==False)
].sort_values(["report_year", "capacity_mw"])[useful_cols].head(60).reset_index(drop=True)#[13:14].record_id_eia.item()

Unnamed: 0,true_gran,ownership_dupe,record_id_eia,plant_id_eia,utility_id_eia,report_year,generator_id,plant_name_new,capacity_mw,net_generation_mwh,installation_year,technology_description
0,True,False,628_2005_plant_owned_14610,628,14610,2005,,Crystal River,14.2464,102443.2,1984,
1,True,False,628_2005_plant_owned_21554,628,21554,2005,,Crystal River,15.1368,108846.0,1984,
2,True,False,628_2005_plant_owned_99996,628,99996,2005,,Crystal River,43.80768,315013.0,1984,
3,True,False,628_1966_2005_plant_operating_year_total_6455,628,6455,2005,1,Crystal River 1966,440.5,2864798.0,1966,Conventional Steam Coal
4,True,False,628_1969_2005_plant_operating_year_total_6455,628,6455,2005,2,Crystal River 1969,523.8,3406541.0,1969,Conventional Steam Coal
5,True,False,628_1982_2005_plant_operating_year_total_6455,628,6455,2005,ST4,Crystal River 1982,739.2,4807398.0,1982,Conventional Steam Coal
6,True,False,628_1984_2005_plant_operating_year_total_6455,628,6455,2005,5,Crystal River 1984,739.2,4807398.0,1984,Conventional Steam Coal
7,True,False,628_nuclear_2005_plant_technology_owned_6455,628,6455,2005,3,Crystal River Nuclear,817.20912,5876401.0,1977,Nuclear
8,True,False,628_nuclear_2005_plant_technology_total_6455,628,6455,2005,3,Crystal River Nuclear,890.4,6402703.0,1977,Nuclear
9,True,False,628_conventional_steam_coal_2005_plant_technol...,628,6455,2005,,Crystal River Conventional Steam Coal,2442.7,15886130.0,1984,Conventional Steam Coal


In [893]:
ppl_df[ppl_df["record_id_eia"]=="812_2020_plant_total_9191"].sort_values(["report_year", "capacity_mw"])[useful_cols]

Unnamed: 0,true_gran,ownership_dupe,record_id_eia,plant_id_eia,utility_id_eia,report_year,generator_id,plant_name_new,capacity_mw,net_generation_mwh,installation_year,technology_description
264609,True,False,812_2020_plant_total_9191,812,9191,2020,,C J Strike,82.8,447516.0,1952,Conventional Hydroelectric


In [546]:
ppl_df[ppl_df["record_id_eia"].str.contains("612_natural_gas_fired_combined_cycle_2005_plan")].sort_values(["report_year", "capacity_mw"])[useful_cols][1:2].record_id_eia.item()

'612_natural_gas_fired_combined_cycle_2005_plant_technology_total_6452'

In [894]:
utils = pudl_out.utils_eia860()
utils[utils["utility_id_eia"]==9191]

Unnamed: 0,report_date,utility_id_eia,utility_id_pudl,utility_name_eia,address_2,attention_line,city,contact_firstname,contact_firstname_2,contact_lastname,contact_lastname_2,contact_title,contact_title_2,entity_type,phone_extension,phone_extension_2,phone_number,phone_number_2,plants_reported_asset_manager,plants_reported_operator,plants_reported_other_relationship,plants_reported_owner,state,street_address,zip_code,zip_code_4
27733,2021-01-01,9191,140,Idaho Power Co,,,,,,,,,,,,,,,,,,,,,,
27734,2020-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,,,True,ID,1221 W. Idaho Street,83702.0,
27735,2019-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,,,True,ID,1221 W. Idaho Street,83702.0,
27736,2018-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,,,True,ID,1221 W. Idaho Street,83702.0,
27737,2017-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,,,True,ID,1221 W. Idaho Street,83702.0,
27738,2016-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,,,True,ID,1221 W. Idaho Street,83702.0,
27739,2015-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,True,,True,ID,1221 W. Idaho Street,83702.0,
27740,2014-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,,True,,True,ID,1221 W. Idaho Street,83702.0,
27741,2013-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,I,,,,,True,True,True,True,ID,1221 W. Idaho Street,83702.0,
27742,2012-01-01,9191,140,Idaho Power Co,,,Boise,,,,,,,,,,,,,,,,ID,,83707.0,


In [196]:
steam[steam["utility_id_pudl"]==161]

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_id_ferc1,plant_name_ferc1,asset_retirement_cost,avg_num_employees,capacity_factor,capacity_mw,capex_equipment,capex_land,capex_per_mw,capex_structures,capex_total,construction_type,construction_year,installation_year,net_generation_mwh,not_water_limited_capacity_mw,opex_allowances,opex_boiler,opex_coolants,opex_electric,opex_engineering,opex_fuel,opex_fuel_per_mwh,opex_misc_power,opex_misc_steam,opex_nonfuel_per_mwh,opex_operations,opex_per_mwh,opex_plants,opex_production_total,opex_rents,opex_steam,opex_steam_other,opex_structures,opex_total_nonfuel,opex_transfer,peak_demand_mw,plant_capability_mw,plant_hours_connected_while_generating,plant_type,record_id,water_limited_capacity_mw
893,1994,182,161,KCP&L Greater Missouri Operations Company,532,680,sibley,,140.0,0.580232,523.50,169881208.0,396706.0,386900.5,32264484.0,202542398.0,outdoor,1960.0,1969.0,2660861.100,496.0,,5047698.0,,249577.0,496017.0,30488226.0,11.458030,1006699.0,911958.0,4.089237,853839.0,15.5,1422129.0,41369118.0,,892975.0,,,10880892.0,,454.0,,8760.0,steam,f1_steam_1994_12_182_0_1,493.0
894,1994,182,161,KCP&L Greater Missouri Operations Company,476,681,ralph green,,3.0,0.028642,94.75,10524308.0,5817.0,117334.6,587325.0,11117450.0,,1981.0,1981.0,23772.935,80.0,,,,72680.0,,532712.0,22.408340,,,28.671512,,51.1,51576.0,1214318.0,508380.0,,,48970.0,681606.0,,76.0,,530.0,combustion_turbine,f1_steam_1994_12_182_0_2,65.0
895,1994,182,161,KCP&L Greater Missouri Operations Company,307,682,jec,,,0.681944,174.00,77816810.0,280057.0,551883.2,17930818.0,96027685.0,semioutdoor,1978.0,1983.0,1039447.000,,,933638.0,,181418.0,403186.0,12737562.0,12.254172,343064.0,52802.0,2.845509,371270.0,15.1,194526.0,15695318.0,-29.0,387869.0,,90012.0,2957756.0,,,,,steam,f1_steam_1994_12_182_0_3,
896,1994,182,161,KCP&L Greater Missouri Operations Company,238,683,greenwood,,8.0,0.000003,287566.00,5767322.0,233662.0,21.9,288106.0,6289090.0,,1975.0,1979.0,7652.260,244.0,,,,108678.0,104704.0,587417.0,76.763858,,,343.262252,50831.0,420.0,154029.0,3214149.0,2207471.0,,,1019.0,2626732.0,,171.0,,147.0,combustion_turbine,f1_steam_1994_12_182_0_5,212.0
897,1994,182,161,KCP&L Greater Missouri Operations Company,414,684,nevada,,,-0.000296,22.22,378783.0,59905.0,20151.6,9121.0,447809.0,,1974.0,1974.0,-57.680,25.0,,,,6849.0,,5258.0,-91.158114,,,,,-4115.1,21517.0,237359.0,196951.0,,,6784.0,232101.0,,19.0,,6.0,combustion_turbine,f1_steam_1994_12_182_1_3,19.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28341,2020,182,161,KCP&L Greater Missouri Operations Company,543,1528,south harper,,4.0,0.017982,351.00,109128886.0,1034875.0,349096.1,12368963.0,122532724.0,,2005.0,2005.0,55289.000,,,,,70531.0,111644.0,4668493.0,84.438008,300894.0,5943.0,13.908300,9826.0,98.3,248587.0,5437469.0,,,,21551.0,768976.0,,349.0,,357.0,combustion_turbine,f1_steam_2020_12_182_1_2,
28342,2020,182,161,KCP&L Greater Missouri Operations Company,315,960,lake road,3698953.0,58.0,-0.016190,150.50,147412799.0,50370.0,1192171.0,28259621.0,179421743.0,outdoor,1951.0,1990.0,-21344.000,,,1638142.0,,977581.0,391573.0,8426491.0,-394.794368,1365830.0,66441.0,,414839.0,-786.5,862660.0,16786932.0,43.0,2079396.0,,563936.0,8360441.0,,95.0,,142.0,steam,f1_steam_2020_12_182_1_3,
28343,2020,182,161,KCP&L Greater Missouri Operations Company,315,1428,lake road,12058.0,,0.002239,127.56,24219273.0,,202496.6,1599138.0,25830469.0,,1951.0,1990.0,2502.000,,,1.0,,543639.0,911.0,481041.0,192.262590,11349.0,16263.0,344.793765,2690.0,537.1,257921.0,1343715.0,,338.0,,29562.0,862674.0,,57.0,,472.0,combustion_turbine,f1_steam_2020_12_182_1_4,
28344,2020,182,161,KCP&L Greater Missouri Operations Company,295,1439,iatan 1 (18%),9928439.0,159.0,0.313063,135.88,197065110.0,254287.0,1657905.7,18028394.0,225276230.0,outdoor,1980.0,1980.0,372642.000,,-5.0,952008.0,,288947.0,148840.0,3122016.0,8.378057,293386.0,3517.0,7.847379,14985.0,16.2,70262.0,6046279.0,7430.0,743556.0,,401337.0,2924263.0,,127.0,,3925.0,steam,f1_steam_2020_12_182_1_5,


<a id='upload-overrides'></a>
## Step 3: Upload Changes to Training Data

When you've finished editing the `<UTILITY>_fix_FERC-EIA_overrides.xlsx` and want to add your changes to the official override csv, move your file to the directory called `add_to_training` and then run the following function. 

**Note:** If you have changed or marked TRUE any records that have already been overridden and included in the training data, you will want to set `expect_override_overrides = True`. Otherwise, the function will check to see if you have accidentally tampered with values that have already been matched.

Right now, the module points to a COPY of the training data so it doesn't override the official version. You'll need to change that later if you want to update the official version.

In [891]:
logger.setLevel(logging.DEBUG)

validate_and_add_to_training(
    pudl_out, rmi_out, expect_override_overrides=True, expect_utility_missmatch=True
)

Reading the FERC to EIA connection from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Reading the EIA plant-parts from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz
Processing fixes in Evergy_fix_FERC-EIA_overrides - sbw 07172022 (1).xlsx
Checking record_id_eia_override_1 consistency for values that don't exist
Checking record_id_ferc1 consistency for values that don't exist
Checking for duplicate override ids
Checking for mismatched utility ids
Found the following utility missmatches. Make sure you approve them all! 
                                                 self  other
record_id_ferc1               plant_name_ferc1              
f1_gnrt_plant_2005_12_182_0_1 westar wind       161.0   6824
f1_gnrt_plant_2006_12_182_0_1 westar wind       161.0   6824
f1_gnrt_plant_2007_12_182_0_1 westar wind       161.0  11098
f1_steam_2005_12_191_1_4      gordon evans ctf  359.0  10743
f1_steam_2005_12_182_1_5      pueblo            16

ValueError: Excel file format cannot be determined, you must specify an engine manually.

In [None]:
rmi_out.ferc1_to_eia(clobber=True)