Clean-up XBRL calculation fixes #2728

cmgosnell · 2023-07-17T16:25:15Z

PR Overview

#2605

currently working through all of the transforms:

Tasks

Give feedback

`balance_sheet_liabilities_ferc1`: failing bc of dupes in "long_term_portion_of_derivative_instrument_liabilities" & "long_term_portion_of_derivative_instrument_liabilities_hedges",

balance_sheet_liabilities_ferc1: failing bc of dupes in "long_term_portion_of_derivative_instrument_liabilities" & "long_term_portion_of_derivative_instrument_liabilities_hedges",
Options
Successfully updated the issue's project

There was an error updating the issue's project
`electric_plant_depreciation_changes_ferc1`: instant["name"] = instant["name"] + ["_starting_balance", "_ending_balance"] [see comment](https://github.com/catalyst-cooperative/pudl/pull/2728#discussion_r1275023583)

electric_plant_depreciation_changes_ferc1: instant["name"] = instant["name"] + ["_starting_balance", "_ending_balance"] see comment
Options
Successfully updated the issue's project

There was an error updating the issue's project
`electric_operating_expenses_ferc1`: pandas.errors.MergeError: Merge keys are not unique in left dataset; not a one-to-many merge during `reconcile_table_calculations` -> `calculate_values_from_components`

electric_operating_expenses_ferc1: pandas.errors.MergeError: Merge keys are not unique in left dataset; not a one-to-many merge during reconcile_table_calculations -> calculate_values_from_components
Options
Successfully updated the issue's project

There was an error updating the issue's project
Options

this last one looks like a calc fix problem:

renaming tasks:

cleanup/normalize the columns in metadata vs calc tables
- decide where to make intra_table_calc_flag... it is necessary/useful in the calc table even in the reconciliation step
anything to integrate over in the explosion step?
use the generalized calculate_values_from_components over in the explosion land
not that the dimensions are kind incorporated into the first checks in the calc reconciliation step... should we now also to the subtotals step? :scratches_head:

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

codecov · 2023-07-17T18:25:19Z

Codecov Report

Patch coverage has no change and project coverage change: -0.1% ⚠️

Comparison is base (1a6efa4) 88.4% compared to head (b3a7ca9) 88.4%.
Report is 2 commits behind head on explode_ferc1.

❗ Current head b3a7ca9 differs from pull request most recent head 4a4df01. Consider uploading reports for the commit 4a4df01 to get more accurate results

Additional details and impacted files

@@               Coverage Diff               @@
##           explode_ferc1   #2728     +/-   ##
===============================================
- Coverage           88.4%   88.4%   -0.1%     
===============================================
  Files                 89      88      -1     
  Lines              10711   10668     -43     
===============================================
- Hits                9478    9434     -44     
- Misses              1233    1234      +1

see 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cmgosnell · 2023-07-18T19:07:02Z

this is just here for posterity bc i'm not planning on checking in the code i am using to jump back and forth between the big calculated_fields_to_fix dict that is currently in apply_xbrl_calculation_fixes (also don't mind all of the duplication here....)

calc_fix_idx = ["table_name", "xbrl_factoid", "xbrl_factoid_calc"]

add_me = []
for table, calcs in calculated_fields_to_fix.items():
    for factoid, fixes in calcs.items():
        for fix in fixes:
           add_me.append(
               pd.json_normalize(fix["calc_component_new"]).assign(xbrl_factoid=factoid,table_name=table)
           )
add = (
    pd.concat(add_me).explode("source_tables")
    .rename(
        columns={
            "name": "xbrl_factoid_calc",
            "source_tables": "table_name_calc",
        }
    )
    .dropna(subset=calc_fix_idx)
    .set_index(calc_fix_idx)   
)

delete_me = []
for table, calcs in calculated_fields_to_fix.items():
    for factoid, fixes in calcs.items():
        for fix in fixes:
           delete_me.append(
               pd.json_normalize(fix["calc_component_to_replace"]).assign(xbrl_factoid=factoid,table_name=table)
           )
delete = (
    pd.concat(delete_me)
    .rename(
        columns={
            "name": "xbrl_factoid_calc",
        }
    )
    [calc_fix_idx]
    .dropna()
    .set_index(calc_fix_idx)
)
fixes = pd.concat([delete.loc[delete.index.difference(add.index)], add]).sort_index()
assert not fixes.index.duplicated().any()

src/pudl/transform/ferc1.py

Inject missing dbf-only factoids into XBRL metadata

…dl into calc_fix_cleanup

…_generation

src/pudl/transform/ferc1.py

cmgosnell · 2023-07-26T19:55:06Z

src/pudl/transform/ferc1.py

+        calculation_components.intra_table_calc_flag
+        & calculation_components.xbrl_factoid.notnull()  # no nulls bc we have all parents
+    ]
+    # !!! Add dimensions into the calculation components!!!


i ended up added the implied dimensions into the calculation component table in here because it made the calculation checking simpler and more in line with how we are checking calcs over in output land. I think this really belongs over in process_xbrl_metadata_calculations.

yes the answer is yes

hm we have a problem. adding in the dimensions requires having the data. bc the implied dimensions are gleaned from the processed data. typically we've done all of the metadata and calculation processing before and fully independent from the data processing.

I think because of this interdependency, I'd like to keep this as is.

I don't know if this would make sense, but in the context of the assets being written into the database, we could have per-table calculation component tables that do depend on the data (taking the data tables as inputs).

But maybe this is duplicative with the all-tables calculation components table, which will have a more complete knowledge of all the dimensional values that are observed?

But anyway, not a blocker for merging!

src/pudl/transform/ferc1.py

zaneselvans · 2023-07-27T13:56:41Z

src/pudl/transform/ferc1.py

+    gby_parent = [
+        f"{col}_parent" if col in ["table_name", "xbrl_factoid"] else col
+        for col in data_idx
+    ]


Sorry if we talked about this yesterday and I forgot but why don't we need to group by all of the parent key columns, including the dimensions? Why is it only using table name and factoid?

Ah, I think I was maybe I was thrown off by this conditional comprehension. I don't understand what you expect to have in here now. It seems like a mix of _parent and non-parent columns. Why is that appropriate? Is there a way we can make this more readable?

Definitely not understanding the nature of the initial merge + groupby.

I think the part that's confusing me is why we wouldn't want all of the gby_parent values that show up in calc_idx to have the _parent suffix.

the only columns that have the _parent suffix are the "table_name" and "xbrl_factoid"

we didn't talk about this yesterday but we did talk about this for a while in my last pr #2753!

if it would be more clear, I could check for any _parent suffixed columns in calculation_components and replace the ["table_name", "xbrl_factoid"] with that list.

zaneselvans

I don't really understand the merge + groupby for the validation of the calculations but I guess that's just how it's going to be.

zaneselvans · 2023-07-27T19:38:41Z

src/pudl/transform/ferc1.py

+        calculation_components.intra_table_calc_flag
+        & calculation_components.xbrl_factoid.notnull()  # no nulls bc we have all parents
+    ]
+    # !!! Add dimensions into the calculation components!!!


But anyway, not a blocker for merging!

zaneselvans · 2023-07-27T19:43:22Z

src/pudl/transform/ferc1.py

-    source = files(pudl.package_data.ferc1).joinpath("dbf_to_xbrl.csv")
-    with as_file(source) as file:
+
+    source = importlib.resources.files("pudl.package_data.ferc1").joinpath(
+        "dbf_to_xbrl.csv"
+    )
+    with importlib.resources.as_file(source) as file:


I did a little poking and I think the most concise / readable way to use the new API for our purposes is probably something like:

mapped_rows = ( pd.read_csv(importlib.resources.files("pudl.package_data.ferc1") / "table_file_map.csv") .set_index(idx_cols) .drop(columns=["row_literal"]) )

make calc fix CSV

b3a7ca9

cmgosnell linked an issue Jul 17, 2023 that may be closed by this pull request

Use tabular calculation fixes in FERC 1 table transformers #2605

Closed

Merge branch 'calc_component_tbl' into calc_fix_cleanup

b0014a2

Base automatically changed from calc_component_tbl to explode_ferc1 July 18, 2023 22:12

VERY NOT WORKING first parts of CSV-ifying the calc fixes

5e6feba

cmgosnell commented Jul 20, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 20, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 20, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 20, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 20, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell and others added 2 commits July 24, 2023 10:40

definitely still broken tiny cleanup

5e7f35b

Add dbf-only factoids to xbrl metadata

a3e7333

cmgosnell commented Jul 24, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

e-belfer and others added 5 commits July 25, 2023 09:54

Add factoids into calculations

514149b

Fix calc csv to match parent branch

71ad57f

Merge pull request #2747 from catalyst-cooperative/dbf-factoids

2b546d8

Inject missing dbf-only factoids into XBRL metadata

migrate the CSV -> pd.Dataframe to its own meth for easier unit testing

9c2ab66

Merge branch 'explode_add_parent_dims' into calc_fix_cleanup

e292606

cmgosnell added 5 commits July 25, 2023 18:18

get all but 4 transformers working with new setup

0fe751b

make it work for all but THREE TABLES WAHOO

690a58d

Merge branch 'calc_fix_cleanup' of github.com:catalyst-cooperative/pu…

e79bd4e

…dl into calc_fix_cleanup

fix duplicate in the calc fixes.... i think?

8c2c115

remove the lil guys operation_supervision_and_engineering_steam_power…

a8050de

…_generation

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell added 2 commits July 26, 2023 10:18

Merge branch 'explode_ferc1' into calc_fix_cleanup

f577c8c

docs cleanup

eca8d40

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

zaneselvans marked this pull request as ready for review July 26, 2023 19:57

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

zaneselvans reviewed Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Show resolved Hide resolved

zaneselvans reviewed Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

cmgosnell commented Jul 26, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

zaneselvans reviewed Jul 27, 2023

View reviewed changes

docs cleanup/revert to importlib.resources

b3932b3

cmgosnell self-assigned this Jul 27, 2023

add unit test for calc building

f293c23

cmgosnell requested a review from e-belfer July 27, 2023 17:05

zaneselvans approved these changes Jul 27, 2023

View reviewed changes

make the parent/child columns in the calc checks more clear

4a4df01

cmgosnell merged commit 3448f08 into explode_ferc1 Jul 28, 2023
7 of 8 checks passed

cmgosnell deleted the calc_fix_cleanup branch July 28, 2023 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean-up XBRL calculation fixes #2728

Clean-up XBRL calculation fixes #2728

cmgosnell commented Jul 17, 2023 •

edited

Tasks

codecov bot commented Jul 17, 2023 •

edited

cmgosnell commented Jul 18, 2023 •

edited

cmgosnell Jul 26, 2023

cmgosnell Jul 26, 2023

zaneselvans Jul 26, 2023

cmgosnell Jul 27, 2023

zaneselvans Jul 27, 2023

zaneselvans Jul 27, 2023

zaneselvans Jul 27, 2023

zaneselvans Jul 27, 2023

zaneselvans Jul 27, 2023

zaneselvans Jul 27, 2023

cmgosnell Jul 27, 2023 •

edited

cmgosnell Jul 27, 2023

zaneselvans left a comment

zaneselvans Jul 27, 2023

zaneselvans Jul 27, 2023

Clean-up XBRL calculation fixes #2728

Clean-up XBRL calculation fixes #2728

Conversation

cmgosnell commented Jul 17, 2023 • edited

PR Overview

currently working through all of the transforms:

Tasks

renaming tasks:

PR Checklist

codecov bot commented Jul 17, 2023 • edited

Codecov Report

cmgosnell commented Jul 18, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmgosnell Jul 27, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaneselvans left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmgosnell commented Jul 17, 2023 •

edited

codecov bot commented Jul 17, 2023 •

edited

cmgosnell commented Jul 18, 2023 •

edited

cmgosnell Jul 27, 2023 •

edited