-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve calculation error checking #2915
Conversation
…pudl into better-calc-checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly naming and documentation suggestions, but also I think I may have made a mistake in a couple of the error metrics initially, and we need to use some absolute values.
Playing with the results in a notebook using those snippets you sent, I'm wanting to drill down and identify which combinations of groupby columns identify the most egregious errors, but I don't think that's possible with just the summary output. For example, the relative error magnitude has a huge spike in 2006, and I'd like to know what combinations of table, fact, and utility IDs are responsible for that. Is it just a single entry that's off by a huge amount? Or is it a handful of utility filings that are super wrong in a single table? (maybe a table that changed its line number meanings in that 2006?) Can we imagine an all-tables concatenated output that allows this kind of dynamic slicing and dicing of the data for diagnostic purposes? Would it just be all of the tables with the standard names that get passed into the error checking infrastructure (with With such a table, is there a straightforward way to manually apply the different error metrics with multiple groupby columns and selections, so we can answer questions like "Looking just at 2006, what values of It seems like we could do this by manually applying an absolute_error_magnitude = AbsoluteErrorMagnitude()
absolute_error_magnitude_by_utility_year = (
all_calculated_errors.
gropuby(["utility_id_ferc1", "report_year"])
.apply(absolute_error_magnitude.metric())
) |
bc the is_not_close inputs have changed, lots of the metrics are failing rn... stil need to track all of that down
…de idk how this works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still fuzzy on that one docstring, but I left suggested language that reflects my understanding of what it's supposed to be saying.
I left another larger question in the comments on the PR, about how we can make it easy to interactively explore errors in more than one dimension to narrow down the exact source of the problems, which I think may require another concatenated asset, but that can be done in a separate PR.
# @root_validator | ||
# def grouped_tol_ge_ungrouped_tol(cls, values): | ||
# """Grouped tolerance should always be greater than or equal to ungrouped.""" | ||
# group_metric_tolerances = values["group_metric_tolerances"] | ||
# groups_to_check = values["groups_to_check"] | ||
# for group in groups_to_check: | ||
# metric_tolerances = group_metric_tolerances.dict().get(group) | ||
# for metric_name, tolerance in metric_tolerances.items(): | ||
# ungrouped_tolerance = group_metric_tolerances.dict()["ungrouped"].get( | ||
# metric_name | ||
# ) | ||
# if tolerance < ungrouped_tolerance: | ||
# raise AssertionError( | ||
# f"In {group=}, {tolerance=} for {metric_name} should be greater than {ungrouped_tolerance=}." | ||
# ) | ||
# return values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this end up having other problems that weren't simple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mechanics of the check are okay imo, but the substance of the check itself is a pain because of the various ways to set these tolerances. I think it would have been simpler if we could have removed one layer of defaults but as we discussed that was less simple.
Looks like the |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev #2915 +/- ##
======================================
Coverage 88.6% 88.6%
======================================
Files 91 91
Lines 10854 10991 +137
======================================
+ Hits 9618 9749 +131
- Misses 1236 1242 +6
☔ View full report in Codecov by Sentry. |
PR Overview
CalculationTolerance
object to pass around a standardized set of expected error levels.GroupBy.apply()
and some helper functions that use them to calculate a matrix of different error metrics across different groupings, to be run incheck_calculation_metrics()
Review Questions
np.isclose()
to determine whether reported & calculated values match. Depending on the scale of the values and the values ofrtol
andatol
this means we may have some values that aren't exactly the same, but that still count as "matching" Do we want to correct all values even ifisclose()
says they're the same? Right now we're using the defaultatol=1e-5
which I think will only ever catch floating point math differences, rather than values that are off by say $1.00, or $0.001. Is that the intention?CalculatonTolerance
andReconcileTableCalculations
classes be consolidated into a single parameter? Seems like theReconcileTableCalculations
class contains parameters that only really apply to the intra-table calculation case.reconcile_table_calculations()
has a bunch of prep work happening before it gets to the part where it does the calculations. It might be better if this was made into its own function that can be run in a modular way. Similarly the part after it checks & corrects the intra-table calculations, dealing with the dimension-to-total calculations, but I think @cmgosnell is already working on that.Error Exploration
balance_sheet_assets
about 85% of records in 1994-2004 and 2021 have a non-null reported value but a null calculated value.balance_sheet_assets
for 2005-2020, 100% of the reported versions of the calculated values are showing up as NA in the calculation checks, but there are still a bunch of non-null corresponding calculated values.balance_sheet_liabilities
where (utility_id_ferc1=165
,report_year=1995
): error is $10B, which is 2.5% of all value reported by all utilities in that year.balance_sheet_assets
where (utility_id_ferc1=172
,report_year=1996
): error is 83% of reported value.balance_sheet_assets
where (utility_id_ferc1=292
,report_year=2004
): error is 32% of reported value.PR Checklist
dev
).