Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve calculation error checking #2915

Merged
merged 58 commits into from
Oct 31, 2023
Merged

Improve calculation error checking #2915

merged 58 commits into from
Oct 31, 2023

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Oct 3, 2023

PR Overview

  • Split the calculation checks into 3 steps: applying the calculations, checking the calculations, and adding correction records to the data.
  • Switch to using a CalculationTolerance object to pass around a standardized set of expected error levels.
  • Add a collection of error checking functions that can run on whole dataframes or via GroupBy.apply() and some helper functions that use them to calculate a matrix of different error metrics across different groupings, to be run in check_calculation_metrics()

Review Questions

  • We're using np.isclose() to determine whether reported & calculated values match. Depending on the scale of the values and the values of rtol and atol this means we may have some values that aren't exactly the same, but that still count as "matching" Do we want to correct all values even if isclose() says they're the same? Right now we're using the default atol=1e-5 which I think will only ever catch floating point math differences, rather than values that are off by say $1.00, or $0.001. Is that the intention?
  • Should the CalculatonTolerance and ReconcileTableCalculations classes be consolidated into a single parameter? Seems like the ReconcileTableCalculations class contains parameters that only really apply to the intra-table calculation case.
  • Note that the across-dimension calculation checking should be refactored to use these parameters too. See Standardize corrections and treatment of sub-totals into the ferc1 table transforms #2688 and standardize the calc checks for the total to subtotal calcs #2886
  • reconcile_table_calculations() has a bunch of prep work happening before it gets to the part where it does the calculations. It might be better if this was made into its own function that can be run in a modular way. Similarly the part after it checks & corrects the intra-table calculations, dealing with the dimension-to-total calculations, but I think @cmgosnell is already working on that.

Error Exploration

  • There many cases where the reported value we're looking at is non-null, but the calculated value is NA. This seems like the more common arrangement than having a value for both! In balance_sheet_assets about 85% of records in 1994-2004 and 2021 have a non-null reported value but a null calculated value.
  • Even weirder, in balance_sheet_assets for 2005-2020, 100% of the reported versions of the calculated values are showing up as NA in the calculation checks, but there are still a bunch of non-null corresponding calculated values.
  • There are also 5 years where the number of null calculated values jumps up dramatically from 5-6k to 12-13k.
  • There are instances where a single calculation has an error that is thousands of times larger than the reported value, which can have a significant impact on overall aggregations, even to the point of being several percent of the value reported by all utilities in a year.
    • balance_sheet_liabilities where (utility_id_ferc1=165, report_year=1995): error is $10B, which is 2.5% of all value reported by all utilities in that year.
    • balance_sheet_assets where (utility_id_ferc1=172, report_year=1996): error is 83% of reported value.
    • balance_sheet_assets where (utility_id_ferc1=292, report_year=2004): error is 32% of reported value.

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@zaneselvans zaneselvans linked an issue Oct 3, 2023 that may be closed by this pull request
@zaneselvans zaneselvans added ferc1 Anything having to do with FERC Form 1 testing Writing tests, creating test data, automating testing, etc. xbrl Related to the FERC XBRL transition labels Oct 3, 2023
@zaneselvans zaneselvans changed the title Split calculation checks into 3 steps; make calculation_tolerance Tra… WIP: Improve calculation error checking Oct 3, 2023
Copy link
Member Author

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly naming and documentation suggestions, but also I think I may have made a mistake in a couple of the error metrics initially, and we need to use some absolute values.

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
@zaneselvans
Copy link
Member Author

zaneselvans commented Oct 25, 2023

Playing with the results in a notebook using those snippets you sent, I'm wanting to drill down and identify which combinations of groupby columns identify the most egregious errors, but I don't think that's possible with just the summary output.

For example, the relative error magnitude has a huge spike in 2006, and I'd like to know what combinations of table, fact, and utility IDs are responsible for that. Is it just a single entry that's off by a huge amount? Or is it a handful of utility filings that are super wrong in a single table? (maybe a table that changed its line number meanings in that 2006?)

Can we imagine an all-tables concatenated output that allows this kind of dynamic slicing and dicing of the data for diagnostic purposes? Would it just be all of the tables with the standard names that get passed into the error checking infrastructure (with reported_value, calculated_value, abs_diff, rel_diff etc.) including all of their rows and all of the intact groupby columns (report_year, xbrl_factoid, table_name, utility_id_ferc)?

With such a table, is there a straightforward way to manually apply the different error metrics with multiple groupby columns and selections, so we can answer questions like "Looking just at 2006, what values of utility_id_ferc1, xbrl_factoid, and table_name are responsible for the biggest errors?" or "Given that utility_id_ferc1==152 has a big relative error magnitude overall, is that error coming from a single year? A single xbrl_factoid? Or a range of years? Or a whole table of facts?

It seems like we could do this by manually applying an ErrorMetric.metric() method, bypassing the build-in groupby:

absolute_error_magnitude = AbsoluteErrorMagnitude()
absolute_error_magnitude_by_utility_year = (
    all_calculated_errors.
    gropuby(["utility_id_ferc1", "report_year"])
    .apply(absolute_error_magnitude.metric())
)

@cmgosnell cmgosnell marked this pull request as ready for review October 27, 2023 19:55
@cmgosnell cmgosnell changed the title WIP: Improve calculation error checking Improve calculation error checking Oct 27, 2023
Copy link
Member Author

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still fuzzy on that one docstring, but I left suggested language that reflects my understanding of what it's supposed to be saying.

I left another larger question in the comments on the PR, about how we can make it easy to interactively explore errors in more than one dimension to narrow down the exact source of the problems, which I think may require another concatenated asset, but that can be done in a separate PR.

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Show resolved Hide resolved
Comment on lines +833 to +848
# @root_validator
# def grouped_tol_ge_ungrouped_tol(cls, values):
# """Grouped tolerance should always be greater than or equal to ungrouped."""
# group_metric_tolerances = values["group_metric_tolerances"]
# groups_to_check = values["groups_to_check"]
# for group in groups_to_check:
# metric_tolerances = group_metric_tolerances.dict().get(group)
# for metric_name, tolerance in metric_tolerances.items():
# ungrouped_tolerance = group_metric_tolerances.dict()["ungrouped"].get(
# metric_name
# )
# if tolerance < ungrouped_tolerance:
# raise AssertionError(
# f"In {group=}, {tolerance=} for {metric_name} should be greater than {ungrouped_tolerance=}."
# )
# return values
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this end up having other problems that weren't simple?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mechanics of the check are okay imo, but the substance of the check itself is a pain because of the various ways to set these tolerances. I think it would have been simpler if we could have removed one layer of defaults but as we discussed that was less simple.

@zaneselvans
Copy link
Member Author

Looks like the ungrouped error_frequency tolerance for the balance_sheet_assets_ferc1 table was a bit too low (at least for the fast ETL) so I bumped it from 0.00013 to 0.0002.

@codecov
Copy link

codecov bot commented Oct 30, 2023

Codecov Report

Attention: 13 lines in your changes are missing coverage. Please review.

Comparison is base (eb3b07e) 88.6% compared to head (ab71e2d) 88.6%.
Report is 1 commits behind head on dev.

Additional details and impacted files
@@          Coverage Diff           @@
##             dev   #2915    +/-   ##
======================================
  Coverage   88.6%   88.6%            
======================================
  Files         91      91            
  Lines      10854   10991   +137     
======================================
+ Hits        9618    9749   +131     
- Misses      1236    1242     +6     
Files Coverage Δ
src/pudl/transform/params/ferc1.py 100.0% <ø> (ø)
src/pudl/output/ferc1.py 88.2% <69.2%> (-0.5%) ⬇️
src/pudl/transform/ferc1.py 96.7% <94.8%> (+<0.1%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zaneselvans zaneselvans merged commit bbd82ba into dev Oct 31, 2023
11 checks passed
@cmgosnell cmgosnell deleted the better-calc-checks branch October 31, 2023 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc1 Anything having to do with FERC Form 1 testing Writing tests, creating test data, automating testing, etc. xbrl Related to the FERC XBRL transition
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Define XBRL explosion success metrics and measure them
2 participants