# Working with the FERC Form 1 Extract / Transform
This notebook steps through PUDL's extract and transform steps for FERC Form 1 to make it easier to test and add new years of data, or new tables from the various spreadsheets that haven't been integrated yet.

This notebook deviates from other devtool debug notebooks in that it doesn't make use of the most recently created dagster asset values. Instead, the extraction and transforms steps are rerun within the notebook so we can inspect the outputs of lower level transform functions that don't have their own assets like `process_xbrl` and `transform_start`.

**Make sure you've created the raw FERC databases using one of the `ferc_to_sqlite` jobs!**

## Setup

In [1]:
%load_ext autoreload
%autoreload 3
import logging
import sys
from pathlib import Path

import pandas as pd

import pudl

pd.options.display.max_columns = None

## Extract DBF and XBRL Data:

In [2]:
from dagster import build_init_resource_context

from pudl.resources import dataset_settings

years = [2020, 2021]  # add desired years here
configured_dataset_settings = {"ferc1": {"years": years}}

dataset_init_context = build_init_resource_context(config=configured_dataset_settings)
configured_dataset_settings = dataset_settings(dataset_init_context)

In [3]:
from pudl.extract.ferc1 import extract_dbf, extract_xbrl

ferc1_dbf_raw_dfs = extract_dbf(configured_dataset_settings)
ferc1_xbrl_raw_dfs = extract_xbrl(configured_dataset_settings)

In [4]:
ferc1_xbrl_raw_dfs["core_ferc1__yearly_steam_plants_fuel_sched402"]["duration"].report_year

0       2021
1       2021
2       2021
3       2021
4       2021
        ... 
1322    2021
1323    2021
1324    2021
1325    2021
1326    2021
Name: report_year, Length: 1327, dtype: int64

In [5]:
from dagster import build_op_context

from pudl.extract.ferc1 import raw_xbrl_metadata_json
from pudl.transform.ferc1 import clean_xbrl_metadata_json

context = build_op_context()
xbrl_metadata_json_dict = clean_xbrl_metadata_json(raw_xbrl_metadata_json(context))

## Transform FERC 1 Tables:

### Build Transformers

In [6]:
# Get table class information
import inspect

from pudl.transform.ferc1 import *
from pudl.transform.params import *


def get_table_classes(module):
    classes = [member[1] for member in inspect.getmembers(module, inspect.isclass)]
    table_classes = [x for x in classes if x.__name__.endswith("TableTransformer")]
    return [x for x in table_classes if x.__name__ not in ("AbstractTableTransformer", "Ferc1AbstractTableTransformer")]


classes = get_table_classes(pudl.transform.ferc1)
table_id_dict = {clas.table_id.value: clas for clas in classes}

# Loop over selected tables to build the transformers
transformers = {}
for table in TABLE_NAME_MAP_FERC1.keys():
    # this table is in the name map but doesn't have a transform class
    if table == "retained_earnings_appropriations_ferc1":
        continue
    transformers[table] = table_id_dict[table](
        xbrl_metadata_json=xbrl_metadata_json_dict[table],
        cache_dfs=True,
        clear_cached_dfs=False,
    )

2024-01-09 18:04:12 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_steam_plants_fuel_sched402: Processing XBRL metadata.
  pd.concat([tbl_meta, correction_meta])
2024-01-09 18:04:12 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_steam_plants_sched402: Processing XBRL metadata.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_small_plants_sched410: Processing XBRL metadata.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_hydroelectric_plants_sched406: Processing XBRL metadata.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_pumped_storage_plants_sched408: Processing XBRL metadata.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_plant_in_service_sched204: Processing XBRL metadata.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:1895 core_ferc1__yearly_purchased_power_

### Transform Individual Tables

In [7]:
from pprint import pprint

# Pick one table to transform
pprint(list(transformers.keys()))

['core_ferc1__yearly_steam_plants_fuel_sched402',
 'core_ferc1__yearly_steam_plants_sched402',
 'core_ferc1__yearly_small_plants_sched410',
 'core_ferc1__yearly_hydroelectric_plants_sched406',
 'core_ferc1__yearly_pumped_storage_plants_sched408',
 'core_ferc1__yearly_plant_in_service_sched204',
 'core_ferc1__yearly_purchased_power_and_exchanges_sched326',
 'core_ferc1__yearly_energy_sources_sched401',
 'core_ferc1__yearly_energy_dispositions_sched401',
 'core_ferc1__yearly_utility_plant_summary_sched200',
 'core_ferc1__yearly_transmission_lines_sched422',
 'core_ferc1__yearly_operating_expenses_sched320',
 'core_ferc1__yearly_balance_sheet_liabilities_sched110',
 'core_ferc1__yearly_balance_sheet_assets_sched110',
 'core_ferc1__yearly_income_statements_sched114',
 'core_ferc1__yearly_retained_earnings_sched118',
 'core_ferc1__yearly_depreciation_summary_sched336',
 'core_ferc1__yearly_depreciation_changes_sched219',
 'core_ferc1__yearly_depreciation_by_function_sched219',
 'core_ferc1_

In [8]:
table_name = "core_ferc1__yearly_other_regulatory_liabilities_sched278"
TRANSFORMER = transformers[table_name]  # add a table here

#### Test each step of the transform process:

In [9]:
xbrl = TRANSFORMER.process_xbrl(
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["duration"],
)

2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:2391 core_ferc1__yearly_other_regulatory_liabilities_sched278: Processing XBRL data pre-concatenation.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 0 columns.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:2445 core_ferc1__yearly_other_regulatory_liabilities_sched278: Unstacking balances to the report years.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 0 columns.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:2670 core_ferc1__yearly_other_regulatory_liabilities_sched278: After selection of dates based on the report year, we have 100.0% of the original table.
2024-01-09 18:04:13 [    INFO] catalystcoop.pudl.transform.ferc1:2573 core_ferc1__yearly_other_regulatory_liabilities_sche

In [10]:
dbf = TRANSFORMER.process_dbf(raw_dbf=ferc1_dbf_raw_dfs[TRANSFORMER.table_id.value])

2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2370 core_ferc1__yearly_other_regulatory_liabilities_sched278: Processing DBF data pre-concatenation.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2433 core_ferc1__yearly_other_regulatory_liabilities_sched278: After selection of only annual records, we have 26.7% of the original table.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 13 columns.


In [11]:
start = TRANSFORMER.transform_start(
    raw_dbf=ferc1_dbf_raw_dfs[TRANSFORMER.table_id.value],
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["duration"],
)

2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2370 core_ferc1__yearly_other_regulatory_liabilities_sched278: Processing DBF data pre-concatenation.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2433 core_ferc1__yearly_other_regulatory_liabilities_sched278: After selection of only annual records, we have 26.7% of the original table.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 13 columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2391 core_ferc1__yearly_other_regulatory_liabilities_sched278: Processing XBRL data pre-concatenation.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 0 columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2445 core_ferc1__yearly_other_regulatory_liabilities_sched278: Unstack

In [12]:
main = TRANSFORMER.transform_main(start)

2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1353 core_ferc1__yearly_other_regulatory_liabilities_sched278: Spot fixing missing values.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1251 core_ferc1__yearly_other_regulatory_liabilities_sched278: Normalizing freeform string columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1275 core_ferc1__yearly_other_regulatory_liabilities_sched278: Categorizing string columns using a controlled vocabulary.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1300 core_ferc1__yearly_other_regulatory_liabilities_sched278: Converting units and renaming columns accordingly.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1262 core_ferc1__yearly_other_regulatory_liabilities_sched278: Stripping non-numeric values from [].
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1289 core_ferc1__yearly_other_regulatory_liabilities_sched278: Nullify

In [13]:
end = TRANSFORMER.transform_end(main)

2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1360 core_ferc1__yearly_other_regulatory_liabilities_sched278: Enforcing database schema on dataframe.


#### Test all steps together

In [14]:
full = TRANSFORMER.transform(
    raw_dbf=ferc1_dbf_raw_dfs[TRANSFORMER.table_id.value],
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[TRANSFORMER.table_id.value]["duration"],
)

2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2370 core_ferc1__yearly_other_regulatory_liabilities_sched278: Processing DBF data pre-concatenation.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2433 core_ferc1__yearly_other_regulatory_liabilities_sched278: After selection of only annual records, we have 26.7% of the original table.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 13 columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2391 core_ferc1__yearly_other_regulatory_liabilities_sched278: Processing XBRL data pre-concatenation.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_other_regulatory_liabilities_sched278: Attempting to rename 0 columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2445 core_ferc1__yearly_other_regulatory_liabilities_sched278: Unstack

### Transform All Tables

In [15]:
transformed_tables = {}
for table_name, transformer in transformers.items():
    if table_name == "core_ferc1__yearly_steam_plants_sched402":
        # core_ferc1__yearly_steam_plants_sched402 is a special case. It depends on the transformed core_ferc1__yearly_steam_plants_fuel_sched402 table.
        continue
    transformed_tables[transformer.table_id.value] = transformer.transform(
        raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],
        raw_xbrl_instant=ferc1_xbrl_raw_dfs[transformer.table_id.value]["instant"],
        raw_xbrl_duration=ferc1_xbrl_raw_dfs[transformer.table_id.value]["duration"],
    )

2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.ferc1:2370 core_ferc1__yearly_steam_plants_fuel_sched402: Processing DBF data pre-concatenation.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_steam_plants_fuel_sched402: Attempting to rename 17 columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1300 core_ferc1__yearly_steam_plants_fuel_sched402: Converting units and renaming columns accordingly.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1251 core_ferc1__yearly_steam_plants_fuel_sched402: Normalizing freeform string columns.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1275 core_ferc1__yearly_steam_plants_fuel_sched402: Categorizing string columns using a controlled vocabulary.
2024-01-09 18:04:14 [    INFO] catalystcoop.pudl.transform.classes:1227 core_ferc1__yearly_steam_plants_fuel_sched402: Attempting to rename 0 columns.
2024-01-09 18:04:14 [    INFO] cataly

AssertionError: Found errors while running tests on the calculations:
                                                                                                                    error_frequency  tolerance_error_frequency  is_error_error_frequency  relative_error_magnitude  tolerance_relative_error_magnitude  is_error_relative_error_magnitude  null_calculated_value_frequency  tolerance_null_calculated_value_frequency is_error_null_calculated_value_frequency  null_reported_value_frequency  tolerance_null_reported_value_frequency  is_error_null_reported_value_frequency
group            table_name                                        group_value                                                                                                                                                                                                                                                                                                                                                                                                                                                
ungrouped        ungrouped                                         ungrouped                                               0.010353                     0.0092                      True                  0.049893                               0.039                               True                         0.333503                                        0.7                                    False                       0.406416                                      1.0                                   False
xbrl_factoid     core_ferc1__yearly_utility_plant_summary_sched200 utility_plant_and_construction_work_in_progress         0.178309                     0.1600                      True                  0.237453                               0.200                               True                         0.015625                                        1.0                                    False                       0.000000                                      1.0                                   False
utility_id_ferc1 core_ferc1__yearly_utility_plant_summary_sched200 376                                                     0.062500                     0.2100                     False                  0.105940                               0.074                               True                         0.250000                                        1.0                                    False                       0.187500                                      1.0                                   False
                                                                   377                                                     0.014085                     0.2100                     False                  0.122823                               0.074                               True                         0.333333                                        1.0                                    False                       0.535211                                      1.0                                   False
                                                                   444                                                     0.062500                     0.2100                     False                  0.120875                               0.074                               True                         0.333333                                        1.0                                    False                       0.125000                                      1.0                                   False
                                                                   447                                                     0.166667                     0.2100                     False                  0.454168                               0.074                               True                         0.333333                                        1.0                                    False                       0.333333                                      1.0                                   False

In [16]:
# Handle special case for "core_ferc1__yearly_steam_plants_sched402"
transformer = transformers["core_ferc1__yearly_steam_plants_sched402"]
transformed_tables[transformer.table_id.value] = transformer.transform(
    raw_dbf=ferc1_dbf_raw_dfs[transformer.table_id.value],
    raw_xbrl_instant=ferc1_xbrl_raw_dfs[transformer.table_id.value]["instant"],
    raw_xbrl_duration=ferc1_xbrl_raw_dfs[transformer.table_id.value]["duration"],
    transformed_fuel=transformed_tables["core_ferc1__yearly_steam_plants_fuel_sched402"],
)

TypeError: Ferc1AbstractTableTransformer.transform_start() got an unexpected keyword argument 'transformed_fuel'