Use all dimensions in `XbrlCalculationForestFerc1` and exploded tables #2763

zaneselvans · 2023-07-31T13:47:02Z

PR Overview

This PR updates the MetadataExploder and Exploder and XbrlCalculationForestFerc1 classes to use the additional dimensions that are required to uniquely identify all reported facts so they can be independently annotated and used to filter the output data.

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

src/pudl/output/ferc1.py

zaneselvans · 2023-07-31T17:50:15Z

I've added the drop_duplicates() and weirdly it seems like now I have more records than before (1044 total).

I tried selecting all of the "leaf" nodes (which have parent columns, but all NA calc columns) and it looks all of the leaves also currently lack any additional dimensions in their parents, which doesn't seem like what we would expect.

new_calcs = MetadataExploder(
    table_names=table_names,
    clean_xbrl_metadata_json=clean_xbrl_metadata_json,
    calculation_components_xbrl_ferc1=calculation_components_xbrl_ferc1,
).calculations

new_calcs[new_calcs[calc_cols].isna().all(axis="columns")][parent_cols + calc_cols + ["weight"]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 239 entries, 15 to 782
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   table_name_parent      239 non-null    string
 1   xbrl_factoid_parent    239 non-null    string
 2   utility_type_parent    0 non-null      string
 3   plant_status_parent    0 non-null      string
 4   plant_function_parent  0 non-null      string
 5   table_name             0 non-null      string
 6   xbrl_factoid           0 non-null      string
 7   utility_type           0 non-null      string
 8   plant_status           0 non-null      string
 9   plant_function         0 non-null      string
 10  weight                 0 non-null      Int64 
dtypes: Int64(1), string(10)
memory usage: 22.6 KB

… dim-trees

…amed names

zaneselvans · 2023-08-11T05:03:51Z

Oooookay, I've got the whole explosion + filtering based on the calculation forest working again, but there are lots of issues involving the new dimensions that we'll need to untangle. Hopefully many of them can be fixed systematically rather than manually.

Not everything in this PR is pretty. I'm sure it can be improved. But do we want to get this stuff merged in and work on the fixes to the calculations with all the dimensions separate from these changes?

The setup I'm using right now to identify issues looks like this...

Setup Inputs

import importlib

from dagster import AssetKey

from pudl.etl import defs
from pudl.output.ferc1 import (
    Exploder,
    MetadataExploder,
    NodeId,
    XbrlCalculationForestFerc1,
)

tags_csv = (
    importlib.resources.files("pudl.package_data.ferc1")
    / "xbrl_factoid_rate_base_tags.csv"
)
tags_df = (
    pd.read_csv(tags_csv, usecols=["table_name", "xbrl_factoid", "in_rate_base"])
    .drop_duplicates()
    .dropna(subset=["table_name", "xbrl_factoid"], how="any")
    .astype(pd.StringDtype())
)

clean_xbrl_metadata_json = defs.load_asset_value(AssetKey("clean_xbrl_metadata_json"))
metadata_xbrl_ferc1 = defs.load_asset_value(AssetKey("metadata_xbrl_ferc1"))
calculation_components_xbrl_ferc1 = defs.load_asset_value(
    AssetKey("calculation_components_xbrl_ferc1")
)

explosion_args = {
    "income_statement_ferc1": {
        "root_table": "income_statement_ferc1",
        "table_names_to_explode": [
            "income_statement_ferc1",
            "depreciation_amortization_summary_ferc1",
            "electric_operating_expenses_ferc1",
            "electric_operating_revenues_ferc1",
        ],
        "calculation_tolerance": 0.27,
        "seeds": [
            NodeId(
                table_name="income_statement_ferc1",
                xbrl_factoid="net_income_loss",
                utility_type="total",
                plant_status=pd.NA,
                plant_function=pd.NA,
            ),
        ],
        "tags": tags_df,
    },
    "balance_sheet_assets_ferc1": {
        "root_table": "balance_sheet_assets_ferc1",
        "table_names_to_explode": [
            "balance_sheet_assets_ferc1",
            "utility_plant_summary_ferc1",
            "plant_in_service_ferc1",
            "electric_plant_depreciation_functional_ferc1",
        ],
        "calculation_tolerance": 0.81,
        "seeds": [
            NodeId(
                table_name="balance_sheet_assets_ferc1",
                xbrl_factoid="assets_and_other_debits",
                utility_type=pd.NA,
                plant_status=pd.NA,
                plant_function=pd.NA,
            )
        ],
        "tags": tags_df,
    },
    "balance_sheet_liabilities_ferc1": {
        "root_table": "balance_sheet_liabilities_ferc1",
        "table_names_to_explode": [
            "balance_sheet_liabilities_ferc1",
            "retained_earnings_ferc1",
        ],
        "calculation_tolerance": 0.075,
        "seeds": [
            NodeId(
                table_name="balance_sheet_liabilities_ferc1",
                xbrl_factoid="liabilities_and_other_credits",
                utility_type=pd.NA,
                plant_status=pd.NA,
                plant_function=pd.NA,
            )
        ],
        "tags": tags_df,
    },
}

Coordinating Function

def exploded_table(
    root_table: str,
    table_names_to_explode: list[str],
    calculation_tolerance: float,
    seeds: list[NodeId],
    tags: pd.DataFrame,
):
    metadata_xbrl_ferc1 = defs.load_asset_value(
        AssetKey("metadata_xbrl_ferc1")
    )
    calculation_components_xbrl_ferc1 = defs.load_asset_value(
        AssetKey("calculation_components_xbrl_ferc1")
    )

    dfs_to_explode = {
        table: pd.read_sql(table, pudl_engine) for table in table_names_to_explode
    }

    exploder = Exploder(
        root_table=root_table,
        table_names=table_names_to_explode,
        metadata_xbrl_ferc1=metadata_xbrl_ferc1,
        calculation_components_xbrl_ferc1=calculation_components_xbrl_ferc1,
        seed_nodes=seeds,
        tags=tags,
    )
    return {
        "exploder": exploder,
        "exploded_meta": exploder.metadata_exploded,
        "exploded_calcs": exploder.calculations_exploded,
        "forest": exploder.calculation_forest,
        "leafy_meta": exploder.calculation_forest.leafy_meta,
        "root_calcs": exploder.calculation_forest.root_calculations,
        "exploded_data": exploder.boom(
            tables_to_explode=dfs_to_explode,
            calculation_tolerance=calculation_tolerance,
        ),
    }

Run the Explosions

test_explode = {}
for root_table in explosion_args:
    print(f"Exploding: {root_table}")
    test_explode[root_table] = exploded_table(**explosion_args[root_table])

Display Results

for root_table in test_explode:
    test_explode[root_table]["forest"].plot("full_digraph")
    test_explode[root_table]["forest"].plot("forest")
    
    print(f"\n ======== ORPHAN NODES: ========\n")
    display(pd.DataFrame(test_explode[root_table]['forest'].orphans))
    
    print(f"\n ======== PRUNED NODES: ========\n")
    display(pd.DataFrame(test_explode[root_table]['forest'].pruned))

Outputs

Income Statement

Balance Sheet Assetse

Balance Sheet Liabilities

… dim-trees

WIP update to MetadataExploder.calculations

d0bc1cf

zaneselvans added ferc1 Anything having to do with FERC Form 1 metadata Anything having to do with the content, formatting, or storage of metadata. Mostly datapackages. xbrl Related to the FERC XBRL transition labels Jul 31, 2023

zaneselvans requested a review from cmgosnell July 31, 2023 13:47

zaneselvans linked an issue Jul 31, 2023 that may be closed by this pull request

Add dimensions and tabular calculations to XBRL Calculation Forests #2736

Closed

zaneselvans mentioned this pull request Jul 31, 2023

Add dimensions and tabular calculations to XBRL Calculation Forests #2736

Closed

zaneselvans commented Jul 31, 2023

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

zaneselvans marked this pull request as draft July 31, 2023 14:02

cmgosnell reviewed Jul 31, 2023

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

cmgosnell reviewed Jul 31, 2023

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

cmgosnell reviewed Jul 31, 2023

View reviewed changes

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved

zaneselvans and others added 17 commits August 1, 2023 08:47

Drop duplicate parents after nulling child columns.

7451403

add in the parental dimensions

b605a67

Merge branch 'dim-trees' of github.com:catalyst-cooperative/pudl into…

28f612a

… dim-trees

Merge branch 'explode_ferc1' into dim-trees

45a494e

add in parent-only _correction records into calc component tabler

19c258e

Merge branch 'dim-trees' of github.com:catalyst-cooperative/pudl into…

d55d25d

… dim-trees

USE THE THING YOU JUST MADE DUH

7cb0e81

rename calc components based on the source table's transformers

5a27b1e

Add tables to DBF-only calculation fixes. Fix incorrect factoid name.

f1ac024

Add a dictionary of tables to only extract XBRL metadata from.

25ce224

remove plant_function specificity in calc fixes

89c9f00

Merge branch 'dim-trees' of github.com:catalyst-cooperative/pudl into…

120c559

… dim-trees

fully convert the factoids from other tables to be their natively ren…

e112cf1

…amed names

make all of the dimensions so so explict omigosh

7b0f10f

Calculation and metadata fixes for the balance_sheet_assets_ferc1 table.

9daa77a

omigosh do not reorder the lists of value types

86f86a6

fix the unit test re is_within_table_calc falg

6d7b207

zaneselvans added 6 commits August 10, 2023 15:27

Make orphans work with all dimensions.

06dd321

Add other dimensions to leafy_meta

90d6dcb

Update exploded table asset parameters.

44ec91c

Make leafy_meta() work with additional dimensions.

1b6f588

Attempt to use leafy metadata to filter exploded dataframe.

fdadde8

Implement explosion filtering using all dimensions.

963f407

zaneselvans marked this pull request as ready for review August 11, 2023 04:55

zaneselvans requested a review from e-belfer August 11, 2023 05:11

zaneselvans self-assigned this Aug 11, 2023

Merge branch 'explode_ferc1' into dim-trees

6e6dd71

zaneselvans changed the title ~~WIP update to MetadataExploder.calculations~~ Update XbrlCalculationForestFerc1 and exploded tables to use all dimensions Aug 11, 2023

zaneselvans changed the title ~~Update XbrlCalculationForestFerc1 and exploded tables to use all dimensions~~ Use all dimensions in XbrlCalculationForestFerc1 and exploded tables Aug 11, 2023

zaneselvans and others added 15 commits August 14, 2023 09:22

Merge branch 'explode_ferc1' into dim-trees

41183b3

Merge MCOE tables in; reset alembic.

6b1dc9c

Merge branch 'explode_ferc1' into dim-trees

356b840

transition to use the metadata table to make the total->subdim calcs

450152e

Merge branch 'dim-trees' of github.com:catalyst-cooperative/pudl into…

78a0130

… dim-trees

only make forest add attributes to tree

30c3db4

Allow plotting of an arbitrary list of nodes.

90a3b4f

Remove pure-stepparent total nodes when constructing the forest.

3f3792c

Merge branch 'explode_ferc1' into dim-trees

421df6e

graft on pruned nodes

1c86d26

Merge branch 'explode_ferc1' into dim-trees

28dda95

Merge branch 'explode_ferc1' into dim-trees

18a22ce

Reset Alembic migrations after merging in dev.

e2bf5e3

Merge branch 'explode_ferc1' into dim-trees

5f619ab

Merge branch 'explode_ferc1' into dim-trees

84a2c4d

zaneselvans merged commit 7f7909e into explode_ferc1 Aug 22, 2023
7 of 8 checks passed

zaneselvans deleted the dim-trees branch August 22, 2023 22:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use all dimensions in `XbrlCalculationForestFerc1` and exploded tables #2763

Use all dimensions in `XbrlCalculationForestFerc1` and exploded tables #2763

zaneselvans commented Jul 31, 2023 •

edited

zaneselvans commented Jul 31, 2023

zaneselvans commented Aug 11, 2023

Use all dimensions in XbrlCalculationForestFerc1 and exploded tables #2763

Use all dimensions in XbrlCalculationForestFerc1 and exploded tables #2763

Conversation

zaneselvans commented Jul 31, 2023 • edited

PR Overview

PR Checklist

zaneselvans commented Jul 31, 2023

zaneselvans commented Aug 11, 2023

Setup Inputs

Coordinating Function

Run the Explosions

Display Results

Outputs

Income Statement

Balance Sheet Assetse

Balance Sheet Liabilities

Use all dimensions in `XbrlCalculationForestFerc1` and exploded tables #2763

Use all dimensions in `XbrlCalculationForestFerc1` and exploded tables #2763

zaneselvans commented Jul 31, 2023 •

edited