Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate calcuation checks into the ferc1 table transformers #2618

Merged
merged 27 commits into from
Jun 2, 2023

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented May 31, 2023

PR Overview

Closes #2605, #2604

This PR migrates the check_table_calculations into the table transformers and does some cleaning/standardization that came up in that process. It became clear that despite the dollar column name standardization, it wasn't crystal clear how we could extract the dollar_value or ending_balance column from the parameters like we are now pulling out the xbrl_factoid from the params. So I made a new param + transform function + transform wrapper for this stage.

I added a calc_contains_components_from_other_tables column into the table of metadata so its easy to identify downstream if a calc is inter- or intra-table calc. I'd love some work-shopping on that name. it long.

The one weird thing (imo) to note is that this transform step really cares about/needs to know about more than just the df and params. I used the table transformer method to access those bits that the table transformer is aware of and pass them in as args to the transform function.

I also made some @property's of the Ferc1TableTransformParams to have a simpler access point to some of the info we're now pulling out of the params to access during the metadata processing and/or the calc checking.

I also also removed some of the cruft in process_xbrl_metadata that is no longer needed bc of the standard rename_column_xbrl_to_pudl

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@e-belfer e-belfer self-requested a review May 31, 2023 18:37
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, mostly have spelling changes and a few questions to suggest. A few more major issues:

When I run this, I get a ZeroDivisionError for utility_plant_summary_ferc1 from the relative difference calculation:
rel_diff=lambda x: abs(x.abs_diff / x[params.column_to_check])
When the reported calculated value is 0, we should add an exception here to avoid dividing by zero!

I also get ValueError: No objects to concatenate on the retained_earnings_ferc1 table - if we haven't integrated this yet enough for it to work, perhaps we shouldn't run it through the explosion just yet.

The values calculated for the balance_sheet_assets_ferc1 is >20%, which is far higher than I remember it being in previous iterations. Have we diagnosed why that is?

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Show resolved Hide resolved
src/pudl/transform/ferc1.py Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
Comment on lines 705 to 706
calculation_tolerance: float = 0.05
"""The tolerance ratio of the off calcuations and the possible calcuations."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's common for these kinds of tolerance checks to include both "absolute" (difference) and "relative" (fractional) tolerances (e.g. see the atol and rtol parameters to np.isclose(). Even if we aren't using both types of tolerance here I think it would be helpful if the parameter name indicated which type we're talking about (fractional in this case)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hear you but this is not the tolerance of the calced-value vs the reported-value, but rather the # of calculations that are ~np.isclose w/ default atol and rtol vs the total number of calculated values.

although another name for this is encouraged!

@e-belfer e-belfer self-requested a review June 1, 2023 19:37
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still conceptually confused about the m:1 validation, but otherwise this branch now runs as expected for me, with all calc checks passing. Duplicated components are now removed, and the div by 0 issue has been fixed. Presuming it passes CI should be ready to merge into the main reshape branch.

@codecov
Copy link

codecov bot commented Jun 2, 2023

Codecov Report

Patch coverage: 93.7% and project coverage change: +0.1 🎉

Comparison is base (b547f4f) 86.9% compared to head (738b9a1) 87.0%.

❗ Current head 738b9a1 differs from pull request most recent head d19d8b6. Consider uploading reports for the commit d19d8b6 to get more accurate results

Additional details and impacted files
@@                 Coverage Diff                 @@
##           xbrl_meta_reshape   #2618     +/-   ##
===================================================
+ Coverage               86.9%   87.0%   +0.1%     
===================================================
  Files                     84      84             
  Lines                   9608    9847    +239     
===================================================
+ Hits                    8356    8576    +220     
- Misses                  1252    1271     +19     
Impacted Files Coverage Δ
src/pudl/etl/epacems_assets.py 100.0% <ø> (ø)
src/pudl/metadata/codes.py 100.0% <ø> (ø)
src/pudl/metadata/fields.py 100.0% <ø> (ø)
src/pudl/metadata/resources/eia860.py 100.0% <ø> (ø)
src/pudl/metadata/resources/ferc1.py 100.0% <ø> (ø)
src/pudl/metadata/resources/glue.py 100.0% <ø> (ø)
src/pudl/output/pudltabl.py 95.0% <ø> (ø)
src/pudl/transform/eia.py 97.2% <ø> (ø)
src/pudl/transform/params/ferc1.py 100.0% <ø> (ø)
src/pudl/validate.py 49.4% <ø> (ø)
... and 9 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cmgosnell cmgosnell merged commit fe4fc9e into xbrl_meta_reshape Jun 2, 2023
8 checks passed
@cmgosnell cmgosnell deleted the check_calcs_in_transformers branch June 2, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

xbrl metadata cleaning and management for 💥 tables
3 participants