# PUDL ID Mapping Help

This notebook helps to support the manual mapping of FERC to EIA plant IDs. See the [PUDL ID mapping](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/pudl_id_mapping.html) documentation for more information.

In [None]:
import pandas as pd
import pudl
import pudl.logging_helpers
from pudl.etl import default_assets, defs

logger = pudl.logging_helpers.get_logger(__name__)

In [None]:
plants_eia = defs.load_asset_value("out_eia__yearly_plants")
plants_pudl = defs.load_asset_value("core_pudl__entity_plants_pudl")
plants_ferc = defs.load_asset_value("out_ferc1__yearly_all_plants")

In [None]:
cols_eia = ["plant_id_pudl","plant_id_eia","plant_name_eia","utility_name_eia","city","county", "latitude","longitude","state"]
cols_ferc = ["plant_id_pudl","plant_id_ferc1","plant_name_ferc1", "utility_name_ferc1", "capacity_mw", "record_id"]
plants = pd.merge(
    plants_pudl,
    plants_eia[cols_eia].drop_duplicates(),
    how="outer",
    on=["plant_id_pudl"],
    validate="1:m"
).merge(
    plants_ferc[cols_ferc].drop_duplicates(subset=[col for col in cols_ferc if col != "record_id"]),
    how="outer",
    on=["plant_id_pudl"],
    suffixes=("_eia", "_ferc")
)
plants.plant_name_eia = plants.plant_name_eia.str.lower()

Use the snippet of code below to speed up searching for plant matches. Update the matching ID value in the spreadsheet by linking it to the cell, _not_ by hard-coding the value!

In [None]:
name_bit = "richmond"
# when you actually need to restrict it by state bc there are too many
# add your state and un-comment out the state line below
state = "VT"
plants[
    (plants.plant_name_eia.str.contains(name_bit)
    | plants.plant_name_pudl.str.contains(name_bit)
    | plants.plant_name_ferc1.str.contains(name_bit))
    & ((plants.state == state) | plants.state.isnull())
].sort_values(["latitude"])

In [None]:
plants_entity = defs.load_asset_value("out_eia__entity_plants")