-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate calcuation checks into the ferc1 table transformers #2618
Conversation
For more information, see https://pre-commit.ci
…perative/pudl into sub_total_xbrl_calc
…perative/pudl into sub_total_xbrl_calc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, mostly have spelling changes and a few questions to suggest. A few more major issues:
When I run this, I get a ZeroDivisionError for utility_plant_summary_ferc1
from the relative difference calculation:
rel_diff=lambda x: abs(x.abs_diff / x[params.column_to_check])
When the reported calculated value is 0, we should add an exception here to avoid dividing by zero!
I also get ValueError: No objects to concatenate
on the retained_earnings_ferc1
table - if we haven't integrated this yet enough for it to work, perhaps we shouldn't run it through the explosion just yet.
The values calculated for the balance_sheet_assets_ferc1
is >20%, which is far higher than I remember it being in previous iterations. Have we diagnosed why that is?
src/pudl/transform/ferc1.py
Outdated
calculation_tolerance: float = 0.05 | ||
"""The tolerance ratio of the off calcuations and the possible calcuations.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's common for these kinds of tolerance checks to include both "absolute" (difference) and "relative" (fractional) tolerances (e.g. see the atol
and rtol
parameters to np.isclose()
. Even if we aren't using both types of tolerance here I think it would be helpful if the parameter name indicated which type we're talking about (fractional in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hear you but this is not the tolerance of the calced-value vs the reported-value, but rather the # of calculations that are ~np.isclose
w/ default atol
and rtol
vs the total number of calculated values.
although another name for this is encouraged!
…perative/pudl into check_calcs_in_transformers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still conceptually confused about the m:1 validation, but otherwise this branch now runs as expected for me, with all calc checks passing. Duplicated components are now removed, and the div by 0 issue has been fixed. Presuming it passes CI should be ready to merge into the main reshape branch.
Compare sub-total calculations to total calculations for XBRL explosion
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## xbrl_meta_reshape #2618 +/- ##
===================================================
+ Coverage 86.9% 87.0% +0.1%
===================================================
Files 84 84
Lines 9608 9847 +239
===================================================
+ Hits 8356 8576 +220
- Misses 1252 1271 +19
☔ View full report in Codecov by Sentry. |
PR Overview
Closes #2605, #2604
This PR migrates the
check_table_calculations
into the table transformers and does some cleaning/standardization that came up in that process. It became clear that despite the dollar column name standardization, it wasn't crystal clear how we could extract thedollar_value
orending_balance
column from the parameters like we are now pulling out thexbrl_factoid
from the params. So I made a new param + transform function + transform wrapper for this stage.I added a
calc_contains_components_from_other_tables
column into the table of metadata so its easy to identify downstream if a calc is inter- or intra-table calc. I'd love some work-shopping on that name. it long.The one weird thing (imo) to note is that this transform step really cares about/needs to know about more than just the
df
andparams
. I used the table transformer method to access those bits that the table transformer is aware of and pass them in as args to the transform function.I also made some
@property
's of theFerc1TableTransformParams
to have a simpler access point to some of the info we're now pulling out of the params to access during the metadata processing and/or the calc checking.I also also removed some of the cruft in
process_xbrl_metadata
that is no longer needed bc of the standardrename_column_xbrl_to_pudl
PR Checklist
dev
).