# Working with the FERC Form 1 Extract / Transform
This notebook steps through PUDL's extract and transform steps for FERC Form 1 to make it easier to test and add new years of data, or new tables from the various spreadsheets that haven't been integrated yet.

## Setup

In [1]:
%load_ext autoreload
%autoreload 3
import pudl
import logging
import sys
from pathlib import Path
import pandas as pd
pd.options.display.max_columns = None

pudl_settings is being deprecated in favor of environment variables variables PUDL_OUTPUT and PUDL_INPUT. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
pudl_settings is being deprecated in favor of environment variables PUDL_OUTPUT and PUDL_INPUT. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
sqlite and parquet directories are no longer being used. Make sure there is a single directory named 'output' at the root of your workspace. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
pudl_settings is being deprecated in favor of environment variables variables PUDL_OUTPUT and PUDL_INPUT. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
pudl_settings is being deprecated in favor of environment variables PUDL_OUTPUT and PUDL_INPUT. For more info see: https://catalystcoop-pudl.readthedocs.io/en/dev/dev/dev_setup.html
sqlite and parquet direct

In [2]:
logger = logging.getLogger()
logger.setLevel(logging.INFO)
handler = logging.StreamHandler(stream=sys.stdout)
formatter = logging.Formatter('%(message)s')
handler.setFormatter(formatter)
logger.handlers = [handler]

## Extract DBF and XBRL Data:

In [3]:
import pandas as pd

from dagster import build_init_resource_context, build_input_context, build_output_context, AssetKey
from typing import Literal

from pudl.etl import defs, default_assets, load_dataset_settings_from_file
from pudl.io_managers import ferc1_xbrl_sqlite_io_manager, ferc1_dbf_sqlite_io_manager
from pudl.resources import dataset_settings
from pudl.extract.ferc1 import create_raw_ferc1_assets
from pudl.extract.ferc1 import TABLE_NAME_MAP_FERC1


years = [2020, 2021] # add desired years here

configured_dataset_settings = {'ferc1': {'years': years}}

dataset_init_context = build_init_resource_context(config=configured_dataset_settings)
configured_dataset_settings = dataset_settings(dataset_init_context)

In [4]:
import pandas as pd

from typing import Literal

from pudl.extract.ferc1 import TABLE_NAME_MAP_FERC1
from pudl.settings import DatasetsSettings



def extract_dbf(dataset_settings: DatasetsSettings) -> dict[str, pd.DataFrame]:
    """
    Coordinates the extraction of all FERC Form 1 tables into PUDL.
    
    Args:
        dataset_settigns: object containing desired years to extract.
    
    Returns:
        A dictionary of DataFrames, with the names of PUDL database tables as the keys.
        These are the raw unprocessed dataframes, reflecting the data as it is in the
        FERC Form 1 DB, for passing off to the data tidying and cleaning functions found
        in the :mod:`pudl.transform.ferc1` module.
    """
    
    ferc1_dbf_raw_dfs = {}
    
    io_manager_init_context = build_init_resource_context(resources={"dataset_settings": dataset_settings})
    io_manager = ferc1_dbf_sqlite_io_manager(io_manager_init_context)
    
    for table_name, raw_table_mapping in TABLE_NAME_MAP_FERC1.items():
        dbf_table_or_tables = raw_table_mapping["dbf"]
        if not isinstance(dbf_table_or_tables, list):
            dbf_tables = [dbf_table_or_tables]
        else:
            dbf_tables = dbf_table_or_tables

        tables = []
        for dbf_table in dbf_tables:
            
            context = build_input_context(
                asset_key=AssetKey(dbf_table),
                upstream_output=None,
                resources={"dataset_settings": dataset_settings}
            )
            tables.append(io_manager.load_input(context))
        ferc1_dbf_raw_dfs[table_name] = pd.concat(tables)
    return ferc1_dbf_raw_dfs



def extract_xbrl(dataset_settings: DatasetsSettings) -> dict[str, dict[Literal["duration", "instant"], pd.DataFrame]]:
    """Coordinates the extraction of all FERC Form 1 tables into PUDL from XBRL data.
    
    Args:
        dataset_settigns: object containing desired years to extract.
        
    Returns:
        A dictionary where keys are the names of the PUDL database tables, values are
        dictionaries of DataFrames coresponding to the instant and duration tables from
        the XBRL derived FERC 1 database.
        
    """
    ferc1_xbrl_raw_dfs = {}
    
    io_manager_init_context = build_init_resource_context(resources={"dataset_settings": dataset_settings})
    io_manager = ferc1_xbrl_sqlite_io_manager(io_manager_init_context)
    
    for table_name, raw_table_mapping in TABLE_NAME_MAP_FERC1.items():
        xbrl_table_or_tables = raw_table_mapping["xbrl"]
        if not isinstance(xbrl_table_or_tables, list):
            xbrl_tables = [xbrl_table_or_tables]
        else:
            xbrl_tables = xbrl_table_or_tables

        ferc1_xbrl_raw_dfs[table_name] = {}
        
        for period in ("duration", "instant"):
            tables = []
            for xbrl_table in xbrl_tables:
                full_xbrl_table_name = f"{xbrl_table}_{period}"
                context = build_input_context(
                    asset_key=AssetKey(full_xbrl_table_name),
                    upstream_output=None,
                    resources={"dataset_settings": dataset_settings}
                )
                tables.append(io_manager.load_input(context))
            ferc1_xbrl_raw_dfs[table_name][period] = pd.concat(tables)
    return ferc1_xbrl_raw_dfs

In [5]:
ferc1_dbf_raw_dfs = extract_dbf(configured_dataset_settings)
ferc1_xbrl_raw_dfs = extract_xbrl(configured_dataset_settings)



steam_electric_generating_plant_statistics_large_plants_fuel_statistics_402_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




purchased_power_326_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




electric_energy_account_401a_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




electric_energy_account_401a_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




summary_of_utility_plant_and_accumulated_provisions_for_depreciation_amortization_and_depletion_200_duration not found in database metadata. Dtypes of returned DataFrame might be incorrect.




transmission_line_statistics_422_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




electric_operations_and_maintenance_expenses_320_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




comparative_balance_sheet_liabilities_and_other_credits_110_duration not found in database metadata. Dtypes of returned DataFrame might be incorrect.




comparative_balance_sheet_assets_and_other_debits_110_duration not found in database metadata. Dtypes of returned DataFrame might be incorrect.




statement_of_income_114_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




retained_earnings_appropriations_118_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




summary_of_depreciation_and_amortization_charges_section_a_336_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




accumulated_provision_for_depreciation_of_electric_utility_plant_functional_classification_section_b_219_duration not found in database metadata. Dtypes of returned DataFrame might be incorrect.




electric_operating_revenues_300_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_4491_provision_for_rate_refunds_304_duration not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_440_residential_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_442_commercial_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_442_industrial_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_444_public_street_and_highway_lighting_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_445_other_sales_to_public_authorities_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_446_sales_to_railroads_and_railways_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_448_interdepartmental_sales_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_4491_provision_for_rate_refunds_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.




sales_of_electricity_by_rate_schedules_account_totals_304_instant not found in database metadata. Dtypes of returned DataFrame might be incorrect.


In [6]:
from pudl.extract.ferc1 import xbrl_metadata_json
from dagster import build_op_context

context = build_op_context()
xbrl_metadata_json_dict = xbrl_metadata_json(context)

## Transform FERC 1 Tables:

### Build Transformers

In [7]:
# Get table class information
import inspect
from pudl.transform.ferc1 import *
from pudl.transform.params import *

def get_table_classes(module):
    classes = [member[1] for member in inspect.getmembers(module, inspect.isclass)]
    table_classes = [x for x in classes if x.__name__.endswith("Ferc1TableTransformer")]
    return [x for x in table_classes if x.__name__ != "AbstractFerc1TableTransformer"]

classes = get_table_classes(pudl.transform.ferc1)
table_id_dict = {clas.table_id.value: clas for clas in classes}

# Loop over selected tables to build the transformers
transformers = {}
for table in TABLE_NAME_MAP_FERC1.keys():
    # this table is in the name map but doesn't have a transform class
    if table == "retained_earnings_appropriations_ferc1":
        continue
    transformers[table] = (
        table_id_dict[table](
            xbrl_metadata_json=xbrl_metadata_json_dict[table],
            cache_dfs=True,
            clear_cached_dfs=False
        )
    )

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 fuel_ferc1: Processing XBRL metadata.


fuel_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 plants_steam_ferc1: Processing XBRL metadata.


plants_steam_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 plants_small_ferc1: Processing XBRL metadata.


plants_small_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 plants_hydro_ferc1: Processing XBRL metadata.


plants_hydro_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 plants_pumped_storage_ferc1: Processing XBRL metadata.


plants_pumped_storage_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 plant_in_service_ferc1: Processing XBRL metadata.


plant_in_service_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 purchased_power_ferc1: Processing XBRL metadata.


purchased_power_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electric_energy_sources_ferc1: Processing XBRL metadata.


electric_energy_sources_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electric_energy_dispositions_ferc1: Processing XBRL metadata.


electric_energy_dispositions_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 utility_plant_summary_ferc1: Processing XBRL metadata.


utility_plant_summary_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 transmission_statistics_ferc1: Processing XBRL metadata.


transmission_statistics_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electric_operating_expenses_ferc1: Processing XBRL metadata.


electric_operating_expenses_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 balance_sheet_liabilities_ferc1: Processing XBRL metadata.


balance_sheet_liabilities_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 balance_sheet_assets_ferc1: Processing XBRL metadata.


balance_sheet_assets_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 income_statement_ferc1: Processing XBRL metadata.


income_statement_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 retained_earnings_ferc1: Processing XBRL metadata.


retained_earnings_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 depreciation_amortization_summary_ferc1: Processing XBRL metadata.


depreciation_amortization_summary_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electric_plant_depreciation_changes_ferc1: Processing XBRL metadata.


electric_plant_depreciation_changes_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electric_plant_depreciation_functional_ferc1: Processing XBRL metadata.


electric_plant_depreciation_functional_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electric_operating_revenues_ferc1: Processing XBRL metadata.


electric_operating_revenues_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 cash_flow_ferc1: Processing XBRL metadata.


cash_flow_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 electricity_sales_by_rate_schedule_ferc1: Processing XBRL metadata.


electricity_sales_by_rate_schedule_ferc1: Processing XBRL metadata.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:993 other_regulatory_liabilities_ferc1: Processing XBRL metadata.


other_regulatory_liabilities_ferc1: Processing XBRL metadata.


### Transform Individual Tables

In [8]:
from pprint import pprint

# Pick one table to transform
pprint(list(transformers.keys()))

['fuel_ferc1',
 'plants_steam_ferc1',
 'plants_small_ferc1',
 'plants_hydro_ferc1',
 'plants_pumped_storage_ferc1',
 'plant_in_service_ferc1',
 'purchased_power_ferc1',
 'electric_energy_sources_ferc1',
 'electric_energy_dispositions_ferc1',
 'utility_plant_summary_ferc1',
 'transmission_statistics_ferc1',
 'electric_operating_expenses_ferc1',
 'balance_sheet_liabilities_ferc1',
 'balance_sheet_assets_ferc1',
 'income_statement_ferc1',
 'retained_earnings_ferc1',
 'depreciation_amortization_summary_ferc1',
 'electric_plant_depreciation_changes_ferc1',
 'electric_plant_depreciation_functional_ferc1',
 'electric_operating_revenues_ferc1',
 'cash_flow_ferc1',
 'electricity_sales_by_rate_schedule_ferc1',
 'other_regulatory_liabilities_ferc1']


In [9]:
table_name = "other_regulatory_liabilities_ferc1"
TRANSFORMER = transformers[table_name] # add a table here

#### Test each step of the transform process:

In [10]:
xbrl = TRANSFORMER.process_xbrl(
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["duration"]
)

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1100 other_regulatory_liabilities_ferc1: Processing XBRL data pre-concatenation.


other_regulatory_liabilities_ferc1: Processing XBRL data pre-concatenation.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1154 other_regulatory_liabilities_ferc1: Unstacking balances to the report years.


other_regulatory_liabilities_ferc1: Unstacking balances to the report years.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1381 other_regulatory_liabilities_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


other_regulatory_liabilities_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1285 other_regulatory_liabilities_ferc1: Both XBRL instant & duration tables found.


other_regulatory_liabilities_ferc1: Both XBRL instant & duration tables found.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1304 other_regulatory_liabilities_ferc1: Combining XBRL instant & duration tables using RIGHT-MERGE.


other_regulatory_liabilities_ferc1: Combining XBRL instant & duration tables using RIGHT-MERGE.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 9 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 9 columns.


In [11]:
dbf = TRANSFORMER.process_dbf(
    raw_dbf=ferc1_dbf_raw_dfs[TRANSFORMER.table_id.value]
)

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1079 other_regulatory_liabilities_ferc1: Processing DBF data pre-concatenation.


other_regulatory_liabilities_ferc1: Processing DBF data pre-concatenation.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1142 other_regulatory_liabilities_ferc1: After selection only annual records, we have 26.7% of the original table.


other_regulatory_liabilities_ferc1: After selection only annual records, we have 26.7% of the original table.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 13 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 13 columns.


In [12]:
start = TRANSFORMER.transform_start(
    raw_dbf=ferc1_dbf_raw_dfs[TRANSFORMER.table_id.value],
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["duration"]
)

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1079 other_regulatory_liabilities_ferc1: Processing DBF data pre-concatenation.


other_regulatory_liabilities_ferc1: Processing DBF data pre-concatenation.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1142 other_regulatory_liabilities_ferc1: After selection only annual records, we have 26.7% of the original table.


other_regulatory_liabilities_ferc1: After selection only annual records, we have 26.7% of the original table.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 13 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 13 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1100 other_regulatory_liabilities_ferc1: Processing XBRL data pre-concatenation.


other_regulatory_liabilities_ferc1: Processing XBRL data pre-concatenation.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1154 other_regulatory_liabilities_ferc1: Unstacking balances to the report years.


other_regulatory_liabilities_ferc1: Unstacking balances to the report years.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1381 other_regulatory_liabilities_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


other_regulatory_liabilities_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1285 other_regulatory_liabilities_ferc1: Both XBRL instant & duration tables found.


other_regulatory_liabilities_ferc1: Both XBRL instant & duration tables found.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1304 other_regulatory_liabilities_ferc1: Combining XBRL instant & duration tables using RIGHT-MERGE.


other_regulatory_liabilities_ferc1: Combining XBRL instant & duration tables using RIGHT-MERGE.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 9 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 9 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:916 other_regulatory_liabilities_ferc1: Concatenating DBF + XBRL dataframes.


other_regulatory_liabilities_ferc1: Concatenating DBF + XBRL dataframes.


In [13]:
main = TRANSFORMER.transform_main(
    start
)

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1336 other_regulatory_liabilities_ferc1: Spot fixing missing values.


other_regulatory_liabilities_ferc1: Spot fixing missing values.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1234 other_regulatory_liabilities_ferc1: Normalizing freeform string columns.


other_regulatory_liabilities_ferc1: Normalizing freeform string columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1258 other_regulatory_liabilities_ferc1: Categorizing string columns using a controlled vocabulary.


other_regulatory_liabilities_ferc1: Categorizing string columns using a controlled vocabulary.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1283 other_regulatory_liabilities_ferc1: Converting units and renaming columns accordingly.


other_regulatory_liabilities_ferc1: Converting units and renaming columns accordingly.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1245 other_regulatory_liabilities_ferc1: Stripping non-numeric values from [].


other_regulatory_liabilities_ferc1: Stripping non-numeric values from [].


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1272 other_regulatory_liabilities_ferc1: Nullifying outlying values.


other_regulatory_liabilities_ferc1: Nullifying outlying values.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1325 other_regulatory_liabilities_ferc1: Replacing specified values with NA.


other_regulatory_liabilities_ferc1: Replacing specified values with NA.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1312 other_regulatory_liabilities_ferc1: Dropping remaining invalid rows.


other_regulatory_liabilities_ferc1: Dropping remaining invalid rows.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:823 29.8% of records (2113 rows) contain only {0, '', <NA>, nan} values in required columns. Dropped these 💩💩💩 records.


29.8% of records (2113 rows) contain only {0, '', <NA>, nan} values in required columns. Dropped these 💩💩💩 records.


In [14]:
end = TRANSFORMER.transform_end(
    main
)

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1343 other_regulatory_liabilities_ferc1: Enforcing database schema on dataframe.


other_regulatory_liabilities_ferc1: Enforcing database schema on dataframe.


#### Test all steps together

In [15]:
full = TRANSFORMER.transform(
    raw_dbf=ferc1_dbf_raw_dfs[TRANSFORMER.table_id.value],
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["duration"]
)

2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1079 other_regulatory_liabilities_ferc1: Processing DBF data pre-concatenation.


other_regulatory_liabilities_ferc1: Processing DBF data pre-concatenation.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1142 other_regulatory_liabilities_ferc1: After selection only annual records, we have 26.7% of the original table.


other_regulatory_liabilities_ferc1: After selection only annual records, we have 26.7% of the original table.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 13 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 13 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1100 other_regulatory_liabilities_ferc1: Processing XBRL data pre-concatenation.


other_regulatory_liabilities_ferc1: Processing XBRL data pre-concatenation.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:49 [    INFO] catalystcoop.pudl.transform.ferc1:1154 other_regulatory_liabilities_ferc1: Unstacking balances to the report years.


other_regulatory_liabilities_ferc1: Unstacking balances to the report years.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1381 other_regulatory_liabilities_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


other_regulatory_liabilities_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1285 other_regulatory_liabilities_ferc1: Both XBRL instant & duration tables found.


other_regulatory_liabilities_ferc1: Both XBRL instant & duration tables found.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1304 other_regulatory_liabilities_ferc1: Combining XBRL instant & duration tables using RIGHT-MERGE.


other_regulatory_liabilities_ferc1: Combining XBRL instant & duration tables using RIGHT-MERGE.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1210 other_regulatory_liabilities_ferc1: Attempting to rename 9 columns.


other_regulatory_liabilities_ferc1: Attempting to rename 9 columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:916 other_regulatory_liabilities_ferc1: Concatenating DBF + XBRL dataframes.


other_regulatory_liabilities_ferc1: Concatenating DBF + XBRL dataframes.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1336 other_regulatory_liabilities_ferc1: Spot fixing missing values.


other_regulatory_liabilities_ferc1: Spot fixing missing values.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1234 other_regulatory_liabilities_ferc1: Normalizing freeform string columns.


other_regulatory_liabilities_ferc1: Normalizing freeform string columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1258 other_regulatory_liabilities_ferc1: Categorizing string columns using a controlled vocabulary.


other_regulatory_liabilities_ferc1: Categorizing string columns using a controlled vocabulary.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1283 other_regulatory_liabilities_ferc1: Converting units and renaming columns accordingly.


other_regulatory_liabilities_ferc1: Converting units and renaming columns accordingly.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1245 other_regulatory_liabilities_ferc1: Stripping non-numeric values from [].


other_regulatory_liabilities_ferc1: Stripping non-numeric values from [].


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1272 other_regulatory_liabilities_ferc1: Nullifying outlying values.


other_regulatory_liabilities_ferc1: Nullifying outlying values.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1325 other_regulatory_liabilities_ferc1: Replacing specified values with NA.


other_regulatory_liabilities_ferc1: Replacing specified values with NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1312 other_regulatory_liabilities_ferc1: Dropping remaining invalid rows.


other_regulatory_liabilities_ferc1: Dropping remaining invalid rows.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:823 29.8% of records (2113 rows) contain only {0, '', <NA>, nan} values in required columns. Dropped these 💩💩💩 records.


29.8% of records (2113 rows) contain only {0, '', <NA>, nan} values in required columns. Dropped these 💩💩💩 records.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1343 other_regulatory_liabilities_ferc1: Enforcing database schema on dataframe.


other_regulatory_liabilities_ferc1: Enforcing database schema on dataframe.


### Transform All Tables

In [16]:
transformed_tables = {}
for table_name, transformer in transformers.items():
    transformed_tables[transformer.table_id.value] = transformer.transform(
        raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],
        raw_xbrl_instant=ferc1_xbrl_raw_dfs[transformer.table_id.value]["instant"],
        raw_xbrl_duration=ferc1_xbrl_raw_dfs[transformer.table_id.value]["duration"]
    )

2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1079 fuel_ferc1: Processing DBF data pre-concatenation.


fuel_ferc1: Processing DBF data pre-concatenation.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1210 fuel_ferc1: Attempting to rename 17 columns.


fuel_ferc1: Attempting to rename 17 columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1283 fuel_ferc1: Converting units and renaming columns accordingly.


fuel_ferc1: Converting units and renaming columns accordingly.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1234 fuel_ferc1: Normalizing freeform string columns.


fuel_ferc1: Normalizing freeform string columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1258 fuel_ferc1: Categorizing string columns using a controlled vocabulary.


fuel_ferc1: Categorizing string columns using a controlled vocabulary.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1210 fuel_ferc1: Attempting to rename 0 columns.


fuel_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1154 fuel_ferc1: Unstacking balances to the report years.


fuel_ferc1: Unstacking balances to the report years.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1210 fuel_ferc1: Attempting to rename 0 columns.


fuel_ferc1: Attempting to rename 0 columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1381 fuel_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


fuel_ferc1: After selection of dates based on the report year, we have 100.0% of the original table.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1279 fuel_ferc1: No XBRL instant table found.


fuel_ferc1: No XBRL instant table found.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1210 fuel_ferc1: Attempting to rename 15 columns.


fuel_ferc1: Attempting to rename 15 columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1283 fuel_ferc1: Converting units and renaming columns accordingly.


fuel_ferc1: Converting units and renaming columns accordingly.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1234 fuel_ferc1: Normalizing freeform string columns.


fuel_ferc1: Normalizing freeform string columns.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1258 fuel_ferc1: Categorizing string columns using a controlled vocabulary.


fuel_ferc1: Categorizing string columns using a controlled vocabulary.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1783 fuel_ferc1: Aggregating 30 rows with duplicate primary keys out of 1318 total rows.


fuel_ferc1: Aggregating 30 rows with duplicate primary keys out of 1318 total rows.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1787 fuel_ferc1: Dropping 98 records with inconsistent fuel units preventing aggregation out of 1318 total rows.


fuel_ferc1: Dropping 98 records with inconsistent fuel units preventing aggregation out of 1318 total rows.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:916 fuel_ferc1: Concatenating DBF + XBRL dataframes.


fuel_ferc1: Concatenating DBF + XBRL dataframes.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1336 fuel_ferc1: Spot fixing missing values.


fuel_ferc1: Spot fixing missing values.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1312 fuel_ferc1: Dropping remaining invalid rows.


fuel_ferc1: Dropping remaining invalid rows.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:823 66.2% of records (4715 rows) contain only {0, '', <NA>, nan} values in required columns. Dropped these 💩💩💩 records.


66.2% of records (4715 rows) contain only {0, '', <NA>, nan} values in required columns. Dropped these 💩💩💩 records.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:823 0.5% of records (12 rows) contain only {'', '-', 'ant1-3', '0', 'elk 1-3', 'must 456', 'must 123', nan, 'not applicable', <NA>} values in required columns. Dropped these 💩💩💩 records.


0.5% of records (12 rows) contain only {'', '-', 'ant1-3', '0', 'elk 1-3', 'must 456', 'must 123', nan, 'not applicable', <NA>} values in required columns. Dropped these 💩💩💩 records.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.ferc1:1857 fuel_ferc1: Dropping 0/2400rows representing plant-level all-fuel totals.


fuel_ferc1: Dropping 0/2400rows representing plant-level all-fuel totals.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1299 fuel_ferc1: Correcting inferred non-standard column units.


fuel_ferc1: Correcting inferred non-standard column units.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:698 Correcting units of fuel_mmbtu_per_unit where fuel_type_code_pudl==coal.


Correcting units of fuel_mmbtu_per_unit where fuel_type_code_pudl==coal.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:719 34/445 (7.64%) of records could not be corrected and were set to NA.


34/445 (7.64%) of records could not be corrected and were set to NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:698 Correcting units of fuel_mmbtu_per_unit where fuel_type_code_pudl==gas.


Correcting units of fuel_mmbtu_per_unit where fuel_type_code_pudl==gas.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:719 92/1041 (8.84%) of records could not be corrected and were set to NA.


92/1041 (8.84%) of records could not be corrected and were set to NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:698 Correcting units of fuel_mmbtu_per_unit where fuel_type_code_pudl==oil.


Correcting units of fuel_mmbtu_per_unit where fuel_type_code_pudl==oil.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:719 90/749 (12.02%) of records could not be corrected and were set to NA.


90/749 (12.02%) of records could not be corrected and were set to NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:698 Correcting units of fuel_cost_per_mmbtu where fuel_type_code_pudl==coal.


Correcting units of fuel_cost_per_mmbtu where fuel_type_code_pudl==coal.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:719 57/445 (12.81%) of records could not be corrected and were set to NA.


57/445 (12.81%) of records could not be corrected and were set to NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:698 Correcting units of fuel_cost_per_mmbtu where fuel_type_code_pudl==gas.


Correcting units of fuel_cost_per_mmbtu where fuel_type_code_pudl==gas.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:719 87/1041 (8.36%) of records could not be corrected and were set to NA.


87/1041 (8.36%) of records could not be corrected and were set to NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:698 Correcting units of fuel_cost_per_mmbtu where fuel_type_code_pudl==oil.


Correcting units of fuel_cost_per_mmbtu where fuel_type_code_pudl==oil.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:719 167/749 (22.30%) of records could not be corrected and were set to NA.


167/749 (22.30%) of records could not be corrected and were set to NA.


2023-03-15 16:51:50 [    INFO] catalystcoop.pudl.transform.classes:1343 fuel_ferc1: Enforcing database schema on dataframe.


fuel_ferc1: Enforcing database schema on dataframe.


TypeError: PlantsSteamFerc1TableTransformer.transform() missing 1 required positional argument: 'transformed_fuel'