# Manually Overriding FERC-EIA Record Linkage

The FERC-EIA record linkage process requries training data in order to work properly. Training matches also serve as overrides. This notebook helps you check whether the machine learning algroythem did a good job of matching FERC and EIA records. If you find a good match (or you correct a bad match), this process will turn it into training data.

This notebook has three purposes: 

- [**Step 1: Output Override Tools:**](#verify-tools) Where you create and output the spreadsheets used to conduct the manual overrides.
- [**Step 2: Validate New Training Data:**](#validate) Where you check that the overrides we made are sound.
- [**Step 3: Upload Changes to Training Data:**](#upload-overrides) Where integrate the overrides into the training data.

## Settings

In [397]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [398]:
import pudl_rmi
from pudl_rmi.create_override_spreadsheets import *
                                           
import pudl
import sqlalchemy as sa
import logging
import sys
import numpy as np
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

pudl_settings = pudl.workspace.setup.get_defaults()
pudl_engine = sa.create_engine(pudl_settings["pudl_db"])
pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine, freq='AS',fill_fuel_cost=True,roll_fuel_cost=True,fill_net_gen=True)
rmi_out = pudl_rmi.coordinate.Output(pudl_out)

In [None]:
# old

specified_utilities = {
    # 'Dominion': {'utility_id_pudl': [292, 293, 349],
    #              'utility_id_eia': [17539, 17554, 19876]},
    # 'Evergy': {'utility_id_pudl': [159, 160, 161, 1270, 13243],
    #            'utility_id_eia': [10000, 10005, 56211, 25000]},
    # 'IDACORP': {'utility_id_pudl': [140],
    #             'utility_id_eia': [9191]},
    # 'Duke': {'utility_id_pudl': [90, 91, 92, 93, 96, 97],
    #          'utility_id_eia': [5416, 6455, 15470, 55729, 3542, 3046]},
    'BHE': {'utility_id_pudl': [185, 246, 204, 287],
            'utility_id_eia': [12341, 14354, 13407, 17166]},
    'Southern': {'utility_id_pudl': [123, 18, 190, 11830],
                 'utility_id_eia': [7140, 195, 12686, 17622]},
    # 'NextEra': {'utility_id_pudl': [121, 130],
    #             'utility_id_eia': [6452, 7801]},
    # 'AEP': {'utility_id_pudl': [29, 301, 144, 275, 162, 361, 7],
    #         'utility_id_eia': [733, 17698, 9324, 15474, 22053, 20521, 343]},
    # 'Entergy': {'utility_id_pudl': [107, 106, 311, 113, 110],
    #             'utility_id_eia': [11241, 814, 12465, 55937, 13478]},
    # 'Xcel': {'utility_id_pudl': [224, 302, 272, 11297],
    #          'utility_id_eia': [13781, 13780, 17718, 15466]}
}

<a id='verify-tools'></a>
## Step 1: Output Override Tools

In [80]:
specified_utilities = {
    #'BHE': [12341, 14354, 13407, 17166], 
    #'Southern':[7140, 195, 12686, 17622]
    'Dominion': [17539, 17554, 19876, 5248] # 5248...
    #'Entergy': [11241, 814, 12465, 55937, 13478],
    #'Xcel': [13781, 13780, 17718, 15466],
    #'NextEra': [6452, 7801]
    #'IDACORP': [9191]
    #'Evergy': [10000, 10005, 56211, 22500]
}

specified_years = [2020
    # 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
    # 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020
]

Run the following function and you'll find excel files called `<UTILITY>_fix_FERC-EIA_overrides.xlsx` in the `outputs/overrides` directory created based on the utility and year inputs you specified above. Read the [Override Instructions](https://docs.google.com/document/d/1nJfmUtbSN-RT5U2Z3rJKfOIhWsRFUPNxs9NKTes0SRA/edit#) to learn how to begin fixing/verifying the FERC-EIA connections.

In [81]:
generate_all_override_spreadsheets(pudl_out, rmi_out, specified_utilities, specified_years)

Generating inputs
Reading the FERC to EIA connection from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Prepping FERC-EIA table
Reading the EIA plant-parts from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz
Prepping Plant Parts Table
Grabbing depreciation study output from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/deprish.pkl.gz
Prepping Deprish Data
Developing outputs for Dominion
Getting utility-year subset for ferc_eia
Getting utility-year subset for ppl
Getting utility-year subset for deprish
Outputing table subsets to tabs



<a id='validate'></a>
## Step 2: Validate New Training Data

Once you've finished checking the maps, make sure everything you want to validate is set to `verified=TRUE`. Then, move the file into the add_to_training folder and run the following function:

In [443]:
# Define function inputs
ferc1_eia_df = rmi_out.ferc1_to_eia()
ppl_df = rmi_out.plant_parts_eia().reset_index()
utils_df = pudl_out.utils_eia860()
training_df = pd.read_csv(pudl_rmi.TRAIN_FERC1_EIA_CSV)
path_to_overrides = pudl_rmi.INPUTS_DIR / "add_to_training" 

override_files = os.listdir(path_to_overrides)
override_files = [file for file in override_files if file.endswith(".xlsx")]

Reading the FERC to EIA connection from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/ferc1_eia.pkl.gz
Reading the EIA plant-parts from /Users/austensharpe/Desktop/Repos/rmi-ferc1-eia/outputs/plant_parts_eia.pkl.gz


In [444]:
logger.setLevel(logging.DEBUG)

for file in override_files:
    if not file.startswith("~$"):
        print(f"VALIDATING {file} ************** ")
        file_df = pd.read_excel(path_to_overrides / file)

        validate_override_fixes(
            validated_connections=file_df,
            utils_eia860=utils_df,
            ppl=ppl_df,
            ferc1_eia=ferc1_eia_df,
            training_data=training_df,
            expect_override_overrides=True,
            expect_utility_missmatch=True
        )
    print(" ")

 
VALIDATING Dominion_fix_FERC-EIA_overrides.xlsx ************** 
Checking record_id_eia_override_1 consistency for values that don't exist
Checking record_id_ferc1 consistency for values that don't exist
Checking for duplicate override ids
Checking for mismatched utility ids
Found the following utility missmatches. Make sure you approve them all! 
                                                self  other
record_id_ferc1           plant_name_ferc1                 
f1_steam_2019_12_186_10_2 gutenberg solar      349.0   6484
f1_steam_2019_12_186_10_1 gloucester solar     349.0   6484
f1_steam_2019_12_186_9_2  puller solar         349.0   1498
f1_steam_2018_12_186_9_2  puller solar         349.0   1498
f1_steam_2019_12_186_9_3  pecan solar          349.0   1498
f1_steam_2018_12_186_9_3  pecan solar          349.0   1498
f1_hydro_2008_12_159_0_4  columbia hydro       292.0  13492
f1_hydro_2007_12_159_0_4  columbia hydro       292.0  13492
f1_steam_2019_12_159_5_1  williams combined    29

## Step 2.1: Examine Overrides More Closely

In [486]:
check_overrides_dict = {}
for file in override_files:
    if not file.startswith("~$"):
        file_df = pd.read_excel(path_to_overrides / file)
        logger.info(f"Creating a closer look at {file}")
        logger.info(" ")
        check_overrides_dict[file.split("_")[0]] = compare_override_matches(file_df, ppl_df)
        logger.info(" ")

Creating a closer look at Dominion_fix_FERC-EIA_overrides.xlsx
 
Breaking validated overrides into 1:1 and 1:many
Merging 1:1 matches with PPL data
Merging 1:m matches with PPL data
Recombining 1:1 and 1:m matches
Adding pct diff col: capacity_mw
Adding pct diff col: net_generation_mwh
 


In [487]:
check_overrides_dict.keys()
dom = check_overrides_dict["Dominion"]

In [489]:
dom

Unnamed: 0_level_0,plant_name_ferc1,record_id_eia_override_1,record_id_eia_override_2,record_id_eia_override_3,best_match,notes,capacity_mw_ferc1,capacity_mw_eia,net_generation_mwh_ferc1,net_generation_mwh_eia,installation_year_ferc1,installation_year_eia,installation_year_eia_multi,capacity_mw_pct_diff,net_generation_mwh_pct_diff,installation_year_diff,used_match_record,signature_1,signature_2
record_id_ferc1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
f1_hydro_2016_12_186_0_3,0,,,,,Record appears to be blank,0.00,,,,,,,,,,True,SW,AS
f1_steam_2019_12_186_10_2,gutenberg solar,63076_2019_plant_total_5248,,,cap,Net gen more than twice off. Listed under a di...,79.90,79.9,37889.6,17981.000000,2019.0,,,0.00,52.54,,False,SW,AS
f1_steam_2019_12_186_10_1,gloucester solar,63031_2019_plant_total_5248,,,cap_net-gen_inst_year,Found under different subsidiary,19.80,19.9,30968.6,35520.000000,2019.0,2019,,-0.51,-14.70,0.0,False,SW,AS
f1_steam_2019_12_186_10_3,colonial trail west,,,,,"Found the plant, but it's under a different ut...",161.28,,2031.5,,2019.0,,,,,,True,SW,AS
f1_steam_2019_12_186_9_2,puller solar,62140_2019_plant_total_58468,,,cap_net-gen_inst_year,Found plant listed under a different utility: ...,15.00,15.0,29008.7,28962.000000,2018.0,2018,,0.00,0.16,0.0,False,SW,AS
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
f1_steam_2007_12_159_1_2,parr #1 & 2,3291_gt1_2007_plant_gen_total_17539,3291_gt2_2007_plant_gen_total_17539,,cap_net-gen,,35.00,35.0,5077.0,5402.770270,1970.0,,"[1970, 1970]",0.00,-6.42,,False,CG,SW
f1_steam_2006_12_159_1_4,parr #3 & 4,3291_gt3_2006_plant_gen_total_17539,3291_gt4_2006_plant_gen_total_17539,,cap,,39.00,39.0,1675.0,1536.283784,1971.0,,"[1971, 1971]",0.00,8.28,,False,CG,SW
f1_steam_2006_12_159_1_3,parr #1 & 2,3291_gt1_2006_plant_gen_total_17539,3291_gt2_2006_plant_gen_total_17539,,cap_net-gen,,35.00,35.0,1240.0,1378.716216,1970.0,,"[1970, 1970]",0.00,-11.19,,False,CG,SW
f1_steam_2005_12_159_1_4,parr #3 & 4,3291_gt3_2005_plant_gen_total_17539,3291_gt4_2005_plant_gen_total_17539,,cap,,39.00,39.0,1109.0,959.189189,1971.0,,"[1971, 1971]",0.00,13.51,,False,CG,SW


## Step 2.2: Check PPL for Matches

If you're programatically adept, sometimes it's easier to just search the PPL for the records you're looking for rather than the spreadsheet. Especially when there is a record that may have fallen threw the cracks and is assigned to a different utility or we updated the PPL since you made the spreadsheet.

In [391]:
useful_cols = [
     "true_gran",
     "ownership_dupe",
     "record_id_eia", 
     "plant_id_eia", 
     "utility_id_eia", 
     "report_year", 
     "generator_id", 
     "plant_name_new", 
     "capacity_mw", 
     "net_generation_mwh",
     "installation_year",
     "technology_description",
]

In [485]:
ppl_df[
    #(ppl_df["record_id_eia"]=="3283_2006_plant_total_17539")
    #(ppl_df["plant_id_eia"]==3289)
    (ppl_df["plant_id_pudl"]==638)
    #ppl_df["plant_name_new"].str.contains("2315")
    & (ppl_df["report_date"].dt.year.isin([2020]))
     #(ppl_df["utility_id_eia"]==19876)
    #&(ppl_df["capacity_mw"]==2)
    #& (ppl_df["net_generation_mwh"] > 1900)
    #& (ppl_df["net_generation_mwh"] < 2000)
    #& (ppl_df["capacity_mw"]> 200)
    #& (ppl_df["capacity_mw"]<2)
    #& (ppl_df["technology_description"].str.contains("Solar"))
    #& (ppl_df["true_gran"])
    #& (ppl_df["ownership_dupe"]==False)
].sort_values(["report_year", "capacity_mw"])[useful_cols]

Unnamed: 0,true_gran,ownership_dupe,record_id_eia,plant_id_eia,utility_id_eia,report_year,generator_id,plant_name_new,capacity_mw,net_generation_mwh,installation_year,technology_description
454050,True,True,3298_1_2020_plant_gen_owned_17554,3298,17554,2020,1,Williams 1,26.9,68.5,1972,Natural Gas Fired Combustion Turbine
454051,True,True,3298_2_2020_plant_gen_owned_17554,3298,17554,2020,2,Williams 2,26.9,68.5,1972,Natural Gas Fired Combustion Turbine
485819,True,False,3298_1_2020_plant_gen_total_17554,3298,17554,2020,1,Williams 1,26.9,68.5,1972,Natural Gas Fired Combustion Turbine
485820,True,False,3298_2_2020_plant_gen_total_17554,3298,17554,2020,2,Williams 2,26.9,68.5,1972,Natural Gas Fired Combustion Turbine
286225,True,True,3298_gt_2020_plant_prime_mover_owned_17554,3298,17554,2020,,Williams GT,53.8,137.0,1972,Natural Gas Fired Combustion Turbine
302594,True,False,3298_gt_2020_plant_prime_mover_total_17554,3298,17554,2020,,Williams GT,53.8,137.0,1972,Natural Gas Fired Combustion Turbine
318985,False,True,3298_natural_gas_fired_combustion_turbine_2020...,3298,17554,2020,,Williams Natural Gas Fired Combustion Turbine,53.8,137.0,1972,Natural Gas Fired Combustion Turbine
334845,False,False,3298_natural_gas_fired_combustion_turbine_2020...,3298,17554,2020,,Williams Natural Gas Fired Combustion Turbine,53.8,137.0,1972,Natural Gas Fired Combustion Turbine
350605,False,True,3298_ng_2020_plant_prime_fuel_owned_17554,3298,17554,2020,,Williams NG,53.8,137.0,1972,Natural Gas Fired Combustion Turbine
366247,False,False,3298_ng_2020_plant_prime_fuel_total_17554,3298,17554,2020,,Williams NG,53.8,137.0,1972,Natural Gas Fired Combustion Turbine


In [482]:
ppl_df[ppl_df["record_id_eia"]=="3298_1_2020_plant_unit_owned_17554"].sort_values(["report_year", "capacity_mw"])[useful_cols]

Unnamed: 0,true_gran,ownership_dupe,record_id_eia,plant_id_eia,utility_id_eia,report_year,generator_id,plant_name_new,capacity_mw,net_generation_mwh,installation_year,technology_description
279360,True,True,3298_1_2020_plant_unit_owned_17554,3298,17554,2020,ST1,Williams 1,659.7,2681399.993,1973,Conventional Steam Coal


In [481]:
utils = pudl_out.utils_eia860()
utils[utils["utility_id_eia"]==17554]

Unnamed: 0,report_date,utility_id_eia,utility_id_pudl,utility_name_eia,address_2,attention_line,city,contact_firstname,contact_firstname_2,contact_lastname,contact_lastname_2,contact_title,contact_title_2,entity_type,phone_extension,phone_extension_2,phone_number,phone_number_2,plants_reported_asset_manager,plants_reported_operator,plants_reported_other_relationship,plants_reported_owner,state,street_address,zip_code,zip_code_4
54372,2021-01-01,17554,293,South Carolina Genertg Co Inc,,,,,,,,,,,,,,,,,,,,,,
54373,2020-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54374,2019-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54375,2018-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54376,2017-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54377,2016-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54378,2015-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54379,2014-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,,,,True,SC,220 Operation Way; MC A221,29033.0,
54380,2013-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,I,,,,,True,True,True,True,SC,220 Operation Way; MC A221,29033.0,
54381,2012-01-01,17554,293,South Carolina Genertg Co Inc,,,Cayce,,,,,,,,,,,,,,,,SC,220 Operation Way; MC A221,29033.0,


In [478]:
steam = pudl_out.plants_steam_ferc1()
steam[steam["record_id"]=="f1_steam_2020_12_160_0_1"]

# small = pudl_out.plants_small_ferc1()
# small[small["record_id"]=="f1_gnrt_plant_2019_12_159_0_3"]

Unnamed: 0,report_year,utility_id_ferc1,utility_id_pudl,utility_name_ferc1,plant_id_pudl,plant_id_ferc1,plant_name_ferc1,asset_retirement_cost,avg_num_employees,capacity_factor,capacity_mw,capex_equipment,capex_land,capex_per_mw,capex_structures,capex_total,construction_type,construction_year,installation_year,net_generation_mwh,not_water_limited_capacity_mw,opex_allowances,opex_boiler,opex_coolants,opex_electric,opex_engineering,opex_fuel,opex_fuel_per_mwh,opex_misc_power,opex_misc_steam,opex_nonfuel_per_mwh,opex_operations,opex_per_mwh,opex_plants,opex_production_total,opex_rents,opex_steam,opex_steam_other,opex_structures,opex_total_nonfuel,opex_transfer,peak_demand_mw,plant_capability_mw,plant_hours_connected_while_generating,plant_type,record_id,water_limited_capacity_mw
27991,2020,160,293,"South Carolina Generating Company, Inc.",638,926,williams,651117.0,74.0,0.463993,659.7,633618909.0,2141277.0,1116948.9,100439866.0,736851169.0,outdoor,1973.0,1973.0,2681400.0,610.0,,2313242.0,,24834.0,429655.0,105522935.0,39.353672,2198525.0,732490.0,3.702849,837582.0,43.1,502637.0,115451754.0,,2203265.0,,686589.0,9928819.0,,596.0,,6947.0,steam,f1_steam_2020_12_160_0_1,605.0


In [335]:
southern_overrides = pd.read_excel(path_to_overrides / override_files[0])

In [241]:
multi_match = southern_overrides[southern_overrides["record_id_eia_override_2"].notna()][[
        "used_match_record",
        "signature_1",
        "signature_2",
        "notes",
        "record_id_eia_override_1",
        "record_id_eia_override_2",
        "record_id_eia_override_3",
        "capacity_mw_ferc1",
        "net_generation_mwh_ferc1",
        "installation_year_ferc1"
]]
multi_match

Unnamed: 0,used_match_record,signature_1,signature_2,notes,record_id_eia_override_1,record_id_eia_override_2,record_id_eia_override_3,capacity_mw_ferc1,net_generation_mwh_ferc1,installation_year_ferc1
380,1.0,CO,GT,,709_2014_plant_total_7140,709_2014_plant_total_7140_retired,,1746.0,2537860.0,1969.0
381,0.0,CO,GT,,709_2015_plant_total_7140,709_2015_plant_total_7140_retired,,1746.0,770932.0,1969.0
1005,0.0,CO,GT,,649_1_2019_plant_gen_owned_7140,649_2_2019_plant_gen_owned_7140,,1110.0,9215442.0,1987.0


In [344]:
logger.setLevel(logging.DEBUG)
compare_override_matches(southern_overrides, ppl_df)

Breaking validated overrides into 1:1 and 1:many
Merging 1:1 matches with PPL data
Merging 1:m matches with PPL data
Recombining 1:1 and 1:m matches
Adding pct diff col: capacity_mw
Adding pct diff col: net_generation_mwh


Unnamed: 0_level_0,record_id_eia_override_1,record_id_eia_override_2,record_id_eia_override_3,best_match,notes,capacity_mw_ferc1,capacity_mw_eia,net_generation_mwh_ferc1,net_generation_mwh_eia,installation_year_ferc1,installation_year_eia,installation_year_eia_multi,capacity_mw_pct_diff,net_generation_mwh_pct_diff,installation_year_diff,used_match_record,signature_1,signature_2
record_id_ferc1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
f1_gnrt_plant_2017_12_2_0_2,60680_2017_plant_total_195,,,cap,,7.38,7.40,7668439.0,7668.000,,2017,,-0.27,99.90,,0.0,CO,GT
f1_gnrt_plant_2018_12_2_0_2,60680_2018_plant_total_195,,,cap,,7.38,7.40,,17168.000,,2017,,-0.27,,,0.0,CO,GT
f1_gnrt_plant_2019_12_2_0_2,60680_2019_plant_total_195,,,cap,,7.38,7.40,8869899.0,8870.000,,2017,,-0.27,99.90,,0.0,CO,GT
f1_hydro_2005_12_2_2_2,2_2005_plant_total_195,,,cap_net-gen_inst_year,,45.10,45.00,184952.0,184952.001,1963.0,1963,,0.22,-0.00,0.0,1.0,CO,GT
f1_hydro_2006_12_2_2_2,2_2006_plant_total_195,,,cap_net-gen_inst_year,,45.10,45.00,134716.0,134716.000,1963.0,1963,,0.22,0.00,0.0,1.0,CO,GT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
f1_steam_2020_12_99_1_5,2047_2020_plant_total_12686,,,cap_net-gen_inst_year,,170.47,170.50,1021593.0,1021583.000,1994.0,1994,,-0.02,0.00,0.0,0.0,AS,
f1_steam_2020_12_99_0_2,6073_natural_gas_fired_combined_cycle_2020_pla...,,,cap_net-gen_inst_year,,1132.03,1132.40,8502834.0,8502834.000,2001.0,2001,,-0.03,0.00,0.0,0.0,AS,
f1_steam_2014_12_57_1_4,709_2014_plant_total_7140,709_2014_plant_total_7140_retired,,cap_net-gen,,1746.00,1746.20,2537860.0,2537860.001,1969.0,,"[1969, 1967]",-0.01,-0.00,,1.0,CO,GT
f1_steam_2015_12_57_1_4,709_2015_plant_total_7140,709_2015_plant_total_7140_retired,,cap,,1746.00,1746.20,770932.0,0.000,1969.0,,"[1969, 1967]",-0.01,,,0.0,CO,GT


<a id='upload-overrides'></a>
## Step 3: Upload Changes to Training Data

When you've finished editing the `<UTILITY>_fix_FERC-EIA_overrides.xlsx` and want to add your changes to the official override csv, move your file to the directory called `add_to_training` and then run the following function. 

**Note:** If you have changed or marked TRUE any records that have already been overridden and included in the training data, you will want to set `expect_override_overrides = True`. Otherwise, the function will check to see if you have accidentally tampered with values that have already been matched.

Right now, the module points to a COPY of the training data so it doesn't override the official version. You'll need to change that later if you want to update the official version.

In [None]:
validate_and_add_to_training(
    pudl_out, rmi_out, expect_override_overrides=True
)

In [None]:
rmi_out.ferc1_to_eia(clobber=True)