Improve calculation error checking #2915

zaneselvans · 2023-10-03T16:40:07Z

PR Overview

Split the calculation checks into 3 steps: applying the calculations, checking the calculations, and adding correction records to the data.
Switch to using a CalculationTolerance object to pass around a standardized set of expected error levels.
Add a collection of error checking functions that can run on whole dataframes or via GroupBy.apply() and some helper functions that use them to calculate a matrix of different error metrics across different groupings, to be run in check_calculation_metrics()

Review Questions

We're using np.isclose() to determine whether reported & calculated values match. Depending on the scale of the values and the values of rtol and atol this means we may have some values that aren't exactly the same, but that still count as "matching" Do we want to correct all values even if isclose() says they're the same? Right now we're using the default atol=1e-5 which I think will only ever catch floating point math differences, rather than values that are off by say $1.00, or $0.001. Is that the intention?
Should the CalculatonTolerance and ReconcileTableCalculations classes be consolidated into a single parameter? Seems like the ReconcileTableCalculations class contains parameters that only really apply to the intra-table calculation case.
Note that the across-dimension calculation checking should be refactored to use these parameters too. See Standardize corrections and treatment of sub-totals into the ferc1 table transforms #2688 and standardize the calc checks for the total to subtotal calcs #2886
reconcile_table_calculations() has a bunch of prep work happening before it gets to the part where it does the calculations. It might be better if this was made into its own function that can be run in a modular way. Similarly the part after it checks & corrects the intra-table calculations, dealing with the dimension-to-total calculations, but I think @cmgosnell is already working on that.

Error Exploration

There many cases where the reported value we're looking at is non-null, but the calculated value is NA. This seems like the more common arrangement than having a value for both! In balance_sheet_assets about 85% of records in 1994-2004 and 2021 have a non-null reported value but a null calculated value.
Even weirder, in balance_sheet_assets for 2005-2020, 100% of the reported versions of the calculated values are showing up as NA in the calculation checks, but there are still a bunch of non-null corresponding calculated values.
There are also 5 years where the number of null calculated values jumps up dramatically from 5-6k to 12-13k.
There are instances where a single calculation has an error that is thousands of times larger than the reported value, which can have a significant impact on overall aggregations, even to the point of being several percent of the value reported by all utilities in a year.
- balance_sheet_liabilities where (utility_id_ferc1=165, report_year=1995): error is $10B, which is 2.5% of all value reported by all utilities in that year.
- balance_sheet_assets where (utility_id_ferc1=172, report_year=1996): error is 83% of reported value.
- balance_sheet_assets where (utility_id_ferc1=292, report_year=2004): error is 32% of reported value.

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

…nsformParam

…trix args.

…pudl into better-calc-checks

src/pudl/transform/ferc1.py

zaneselvans

Mostly naming and documentation suggestions, but also I think I may have made a mistake in a couple of the error metrics initially, and we need to use some absolute values.

src/pudl/transform/ferc1.py

zaneselvans · 2023-10-25T15:49:10Z

Playing with the results in a notebook using those snippets you sent, I'm wanting to drill down and identify which combinations of groupby columns identify the most egregious errors, but I don't think that's possible with just the summary output.

For example, the relative error magnitude has a huge spike in 2006, and I'd like to know what combinations of table, fact, and utility IDs are responsible for that. Is it just a single entry that's off by a huge amount? Or is it a handful of utility filings that are super wrong in a single table? (maybe a table that changed its line number meanings in that 2006?)

Can we imagine an all-tables concatenated output that allows this kind of dynamic slicing and dicing of the data for diagnostic purposes? Would it just be all of the tables with the standard names that get passed into the error checking infrastructure (with reported_value, calculated_value, abs_diff, rel_diff etc.) including all of their rows and all of the intact groupby columns (report_year, xbrl_factoid, table_name, utility_id_ferc)?

With such a table, is there a straightforward way to manually apply the different error metrics with multiple groupby columns and selections, so we can answer questions like "Looking just at 2006, what values of utility_id_ferc1, xbrl_factoid, and table_name are responsible for the biggest errors?" or "Given that utility_id_ferc1==152 has a big relative error magnitude overall, is that error coming from a single year? A single xbrl_factoid? Or a range of years? Or a whole table of facts?

It seems like we could do this by manually applying an ErrorMetric.metric() method, bypassing the build-in groupby:

absolute_error_magnitude = AbsoluteErrorMagnitude()
absolute_error_magnitude_by_utility_year = (
    all_calculated_errors.
    gropuby(["utility_id_ferc1", "report_year"])
    .apply(absolute_error_magnitude.metric())
)

bc the is_not_close inputs have changed, lots of the metrics are failing rn... stil need to track all of that down

…de idk how this works

zaneselvans

I'm still fuzzy on that one docstring, but I left suggested language that reflects my understanding of what it's supposed to be saying.

I left another larger question in the comments on the PR, about how we can make it easy to interactively explore errors in more than one dimension to narrow down the exact source of the problems, which I think may require another concatenated asset, but that can be done in a separate PR.

src/pudl/transform/ferc1.py

src/pudl/output/ferc1.py

zaneselvans · 2023-10-27T22:11:16Z

src/pudl/transform/ferc1.py

+    # @root_validator
+    # def grouped_tol_ge_ungrouped_tol(cls, values):
+    #     """Grouped tolerance should always be greater than or equal to ungrouped."""
+    #     group_metric_tolerances = values["group_metric_tolerances"]
+    #     groups_to_check = values["groups_to_check"]
+    #     for group in groups_to_check:
+    #         metric_tolerances = group_metric_tolerances.dict().get(group)
+    #         for metric_name, tolerance in metric_tolerances.items():
+    #             ungrouped_tolerance = group_metric_tolerances.dict()["ungrouped"].get(
+    #                 metric_name
+    #             )
+    #             if tolerance < ungrouped_tolerance:
+    #                 raise AssertionError(
+    #                     f"In {group=}, {tolerance=} for {metric_name} should be greater than {ungrouped_tolerance=}."
+    #                 )
+    #     return values


Did this end up having other problems that weren't simple?

The mechanics of the check are okay imo, but the substance of the check itself is a pain because of the various ways to set these tolerances. I think it would have been simpler if we could have removed one layer of defaults but as we discussed that was less simple.

…m 0.00013 to 0.0002

zaneselvans · 2023-10-30T21:51:12Z

Looks like the ungrouped error_frequency tolerance for the balance_sheet_assets_ferc1 table was a bit too low (at least for the fast ETL) so I bumped it from 0.00013 to 0.0002.

codecov · 2023-10-30T23:20:10Z

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (eb3b07e) 88.6% compared to head (ab71e2d) 88.6%.
Report is 1 commits behind head on dev.

Additional details and impacted files

@@          Coverage Diff           @@
##             dev   #2915    +/-   ##
======================================
  Coverage   88.6%   88.6%            
======================================
  Files         91      91            
  Lines      10854   10991   +137     
======================================
+ Hits        9618    9749   +131     
- Misses      1236    1242     +6

Files	Coverage Δ
src/pudl/transform/params/ferc1.py	`100.0% <ø> (ø)`
src/pudl/output/ferc1.py	`88.2% <69.2%> (-0.5%)`	⬇️
src/pudl/transform/ferc1.py	`96.7% <94.8%> (+<0.1%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Split calculation checks into 3 steps; make calculation_tolerance Tra…

e4e4095

…nsformParam

zaneselvans linked an issue Oct 3, 2023 that may be closed by this pull request

Define XBRL explosion success metrics and measure them #2872

Closed

zaneselvans added ferc1 Anything having to do with FERC Form 1 testing Writing tests, creating test data, automating testing, etc. xbrl Related to the FERC XBRL transition labels Oct 3, 2023

Merge branch 'dev' into better-calc-checks

6822ffb

zaneselvans requested a review from cmgosnell October 3, 2023 16:42

zaneselvans changed the title ~~Split calculation checks into 3 steps; make calculation_tolerance Tra…~~ WIP: Improve calculation error checking Oct 3, 2023

e-belfer assigned zaneselvans Oct 4, 2023

zaneselvans added 21 commits October 6, 2023 10:05

Add error checking functions.

ec7bf0e

Fix relative error magnitude calculation & default aggregate_error_ma…

fd40b83

…trix args.

Uncomment stuff in reconcile_table_calculations()

ac93f1e

Merge branch 'dev' into better-calc-checks

adaea9f

Merge branch 'dev' into better-calc-checks

99b6b08

Merge branch 'dev' into better-calc-checks

c2937f5

Merge branch 'dev' into better-calc-checks

ab47481

Merge branch 'dev' into better-calc-checks

ed3c1e1

Merge branch 'dev' into better-calc-checks

9835f50

Update intertable calc checks to use CalculationTolerance class.

e1e62c7

Remove cruft and break up reconcile_table_calculations()

bd4b902

Rename calculated_amount to calculated_value for consistency

ef52b3f

Merge branch 'dev' into better-calc-checks

85d3ecd

Fix deprecated dtype checking method

c4db47e

Catch ZeroDivisionError exception in add_corrections()

298e5fb

Merge branch 'dev' into better-calc-checks

6343240

Merge branch 'dev' into better-calc-checks

ebce4a6

Merge branch 'dev' into better-calc-checks

12d5ea4

WIP: calculation checks draft.

48d24e9

Merge branch 'dev' into better-calc-checks

9bf1654

Add missing cls option to validator

66cad5b

cmgosnell and others added 6 commits October 18, 2023 17:32

cleanup and enable turning on and off groups and metrics

548a9d9

Merge branch 'dev' into better-calc-checks

39bfa5b

make transformers all work w/ new calc checking

14b26b4

set explosion inter table calc tolerances & fix lil null tags thing

7b2b794

Merge branch 'better-calc-checks' of github.com:catalyst-cooperative/…

c5cb14e

…pudl into better-calc-checks

Merge branch 'dev' into better-calc-checks

660ecf8

cmgosnell reviewed Oct 24, 2023

View reviewed changes

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved

zaneselvans commented Oct 25, 2023

View reviewed changes

cmgosnell added 7 commits October 27, 2023 11:05

add mega metrics asset and update docs from pr comments

d65c0aa

bc the is_not_close inputs have changed, lots of the metrics are failing rn... stil need to track all of that down

Merge branch 'dev' into better-calc-checks

0c0f27f

make the validator thing work

22e1ef3

:face-palm: rename the things right

508fc60

did i not include this change in the :face-palm: commit idk how to co…

902cbb3

…de idk how this works

"last" name changes + integration of 2022 data

92f925e

break out the subtotal calc checks into two functions

55825f3

cmgosnell marked this pull request as ready for review October 27, 2023 19:55

Merge branch 'dev' into better-calc-checks

97c2513

cmgosnell changed the title ~~WIP: Improve calculation error checking~~ Improve calculation error checking Oct 27, 2023

zaneselvans commented Oct 27, 2023

View reviewed changes

cmgosnell and others added 5 commits October 30, 2023 10:39

update tolerances for fast etl run and update docs

9ac2b0d

edit tolerances for fast run

314476f

Merge branch 'dev' into better-calc-checks

75d80b9

Increase balance sheet assets ungrouped error frequency tolernace fro…

da4c3fa

…m 0.00013 to 0.0002

Merge branch 'dev' into better-calc-checks

ab71e2d

zaneselvans merged commit bbd82ba into dev Oct 31, 2023
11 checks passed

cmgosnell deleted the better-calc-checks branch October 31, 2023 12:35

cmgosnell mentioned this pull request Jan 2, 2024

Define XBRL explosion success metrics and measure them #2872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve calculation error checking #2915

Improve calculation error checking #2915

zaneselvans commented Oct 3, 2023 •

edited

zaneselvans left a comment

zaneselvans commented Oct 25, 2023 •

edited

zaneselvans left a comment

zaneselvans Oct 27, 2023

cmgosnell Oct 30, 2023

zaneselvans commented Oct 30, 2023

codecov bot commented Oct 30, 2023

Improve calculation error checking #2915

Improve calculation error checking #2915

Conversation

zaneselvans commented Oct 3, 2023 • edited

PR Overview

Review Questions

Error Exploration

PR Checklist

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans commented Oct 25, 2023 • edited

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans Oct 27, 2023

Choose a reason for hiding this comment

cmgosnell Oct 30, 2023

Choose a reason for hiding this comment

zaneselvans commented Oct 30, 2023

codecov bot commented Oct 30, 2023

Codecov Report

zaneselvans commented Oct 3, 2023 •

edited

zaneselvans commented Oct 25, 2023 •

edited