-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use all dimensions in XbrlCalculationForestFerc1
and exploded tables
#2763
Conversation
I've added the I tried selecting all of the "leaf" nodes (which have parent columns, but all NA calc columns) and it looks all of the leaves also currently lack any additional dimensions in their parents, which doesn't seem like what we would expect. new_calcs = MetadataExploder(
table_names=table_names,
clean_xbrl_metadata_json=clean_xbrl_metadata_json,
calculation_components_xbrl_ferc1=calculation_components_xbrl_ferc1,
).calculations
new_calcs[new_calcs[calc_cols].isna().all(axis="columns")][parent_cols + calc_cols + ["weight"]].info()
|
Oooookay, I've got the whole explosion + filtering based on the calculation forest working again, but there are lots of issues involving the new dimensions that we'll need to untangle. Hopefully many of them can be fixed systematically rather than manually. Not everything in this PR is pretty. I'm sure it can be improved. But do we want to get this stuff merged in and work on the fixes to the calculations with all the dimensions separate from these changes? The setup I'm using right now to identify issues looks like this... Setup Inputsimport importlib
from dagster import AssetKey
from pudl.etl import defs
from pudl.output.ferc1 import (
Exploder,
MetadataExploder,
NodeId,
XbrlCalculationForestFerc1,
)
tags_csv = (
importlib.resources.files("pudl.package_data.ferc1")
/ "xbrl_factoid_rate_base_tags.csv"
)
tags_df = (
pd.read_csv(tags_csv, usecols=["table_name", "xbrl_factoid", "in_rate_base"])
.drop_duplicates()
.dropna(subset=["table_name", "xbrl_factoid"], how="any")
.astype(pd.StringDtype())
)
clean_xbrl_metadata_json = defs.load_asset_value(AssetKey("clean_xbrl_metadata_json"))
metadata_xbrl_ferc1 = defs.load_asset_value(AssetKey("metadata_xbrl_ferc1"))
calculation_components_xbrl_ferc1 = defs.load_asset_value(
AssetKey("calculation_components_xbrl_ferc1")
)
explosion_args = {
"income_statement_ferc1": {
"root_table": "income_statement_ferc1",
"table_names_to_explode": [
"income_statement_ferc1",
"depreciation_amortization_summary_ferc1",
"electric_operating_expenses_ferc1",
"electric_operating_revenues_ferc1",
],
"calculation_tolerance": 0.27,
"seeds": [
NodeId(
table_name="income_statement_ferc1",
xbrl_factoid="net_income_loss",
utility_type="total",
plant_status=pd.NA,
plant_function=pd.NA,
),
],
"tags": tags_df,
},
"balance_sheet_assets_ferc1": {
"root_table": "balance_sheet_assets_ferc1",
"table_names_to_explode": [
"balance_sheet_assets_ferc1",
"utility_plant_summary_ferc1",
"plant_in_service_ferc1",
"electric_plant_depreciation_functional_ferc1",
],
"calculation_tolerance": 0.81,
"seeds": [
NodeId(
table_name="balance_sheet_assets_ferc1",
xbrl_factoid="assets_and_other_debits",
utility_type=pd.NA,
plant_status=pd.NA,
plant_function=pd.NA,
)
],
"tags": tags_df,
},
"balance_sheet_liabilities_ferc1": {
"root_table": "balance_sheet_liabilities_ferc1",
"table_names_to_explode": [
"balance_sheet_liabilities_ferc1",
"retained_earnings_ferc1",
],
"calculation_tolerance": 0.075,
"seeds": [
NodeId(
table_name="balance_sheet_liabilities_ferc1",
xbrl_factoid="liabilities_and_other_credits",
utility_type=pd.NA,
plant_status=pd.NA,
plant_function=pd.NA,
)
],
"tags": tags_df,
},
} Coordinating Functiondef exploded_table(
root_table: str,
table_names_to_explode: list[str],
calculation_tolerance: float,
seeds: list[NodeId],
tags: pd.DataFrame,
):
metadata_xbrl_ferc1 = defs.load_asset_value(
AssetKey("metadata_xbrl_ferc1")
)
calculation_components_xbrl_ferc1 = defs.load_asset_value(
AssetKey("calculation_components_xbrl_ferc1")
)
dfs_to_explode = {
table: pd.read_sql(table, pudl_engine) for table in table_names_to_explode
}
exploder = Exploder(
root_table=root_table,
table_names=table_names_to_explode,
metadata_xbrl_ferc1=metadata_xbrl_ferc1,
calculation_components_xbrl_ferc1=calculation_components_xbrl_ferc1,
seed_nodes=seeds,
tags=tags,
)
return {
"exploder": exploder,
"exploded_meta": exploder.metadata_exploded,
"exploded_calcs": exploder.calculations_exploded,
"forest": exploder.calculation_forest,
"leafy_meta": exploder.calculation_forest.leafy_meta,
"root_calcs": exploder.calculation_forest.root_calculations,
"exploded_data": exploder.boom(
tables_to_explode=dfs_to_explode,
calculation_tolerance=calculation_tolerance,
),
} Run the Explosionstest_explode = {}
for root_table in explosion_args:
print(f"Exploding: {root_table}")
test_explode[root_table] = exploded_table(**explosion_args[root_table]) Display Resultsfor root_table in test_explode:
test_explode[root_table]["forest"].plot("full_digraph")
test_explode[root_table]["forest"].plot("forest")
print(f"\n ======== ORPHAN NODES: ========\n")
display(pd.DataFrame(test_explode[root_table]['forest'].orphans))
print(f"\n ======== PRUNED NODES: ========\n")
display(pd.DataFrame(test_explode[root_table]['forest'].pruned)) OutputsIncome StatementBalance Sheet AssetseBalance Sheet Liabilities |
XbrlCalculationForestFerc1
and exploded tables to use all dimensions
XbrlCalculationForestFerc1
and exploded tables to use all dimensionsXbrlCalculationForestFerc1
and exploded tables
PR Overview
This PR updates the
MetadataExploder
andExploder
andXbrlCalculationForestFerc1
classes to use the additional dimensions that are required to uniquely identify all reported facts so they can be independently annotated and used to filter the output data.PR Checklist
dev
).