-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
💥 FERC feature branch 💥 : FERC tables post caclulation validation by concat-ing & deduping #2633
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #2633 +/- ##
======================================
Coverage 88.5% 88.6%
======================================
Files 90 90
Lines 10139 10832 +693
======================================
+ Hits 8982 9599 +617
- Misses 1157 1233 +76
☔ View full report in Codecov by Sentry. |
src/pudl/output/ferc1.py
Outdated
def get_table_level(table_name: str, top_table: str) -> int: | ||
"""Get a table level.""" | ||
# we may be able to infer this nesting from the metadata | ||
table_nesting = { | ||
"balance_sheet_assets_ferc1": { | ||
"utility_plant_summary_ferc1": { | ||
"plant_in_service_ferc1": None, | ||
"electric_plant_depreciation_changes_ferc1": None, | ||
}, | ||
}, | ||
"balance_sheet_liabilities_ferc1": {"retained_earnings_ferc1": None}, | ||
"income_statement_ferc1": { | ||
"depreciation_amortization_summary_ferc1": None, | ||
"electric_operating_expenses_ferc1": None, | ||
"electric_operating_revenues_ferc1": None, | ||
}, | ||
} | ||
if table_name == top_table: | ||
level = 1 | ||
elif table_name in table_nesting[top_table].keys(): | ||
level = 2 | ||
elif table_name in pudl.helpers.dedupe_n_flatten_list_of_lists( | ||
[values.keys() for values in table_nesting[top_table].values()] | ||
): | ||
level = 3 | ||
else: | ||
raise AssertionError( | ||
f"AH we didn't find yer table name {table_name} in the nested group of " | ||
"tables. Be sure all the tables you are trying to explode are related." | ||
) | ||
return level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i expect this whole level getting setup will be nullified if we get #2625 working. i just needed something to work for now
Many of the new functions in the Explode section of
This seems weird, since I am fixing the typo and making some names more consistent and pushing to see if that gets these functions run / assets materialized in the CI. We might want to use |
src/pudl/output/ferc1.py
Outdated
return value_col | ||
|
||
|
||
def remove_factoids_from_mutliple_tables( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently assumes factoids are the same if the original names are the same. After #2623 is done we'll need to revisit this assumption, since we may be comparing the transformed name. We'll need a different way to compare those factoids by name, whether by modifying this method or by writing a different one.
src/pudl/output/ferc1.py
Outdated
return pd.DataFrame(table_levels) | ||
|
||
|
||
def remove_totals_from_other_dimensions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other dimensions here refers to the sub-total columns.
Currently this method drops "totals" for any table that actively has sub-dimensions.
We currently aren't correcting for sub-total v. total differences, which we should probably do if we're going to drop these.
Reviewing exploded_balance_sheet_assets_ferc1.pkl sent by @e-belfer on 6/16/2023! Currently only includes 2020-2021. Is your plan to test just on 2020-2021 then expand? Eventually we'd like to have 2005-2021 (and then -2022 after that's looking good). The only table_names I see are
row_type_xbrl is NA for How will users add a plant_function (steam, nuclear, hydro, other) label? I see it included as a column in exploded_income_statement_ferc1.pkl, and this that'll be a common enough use it should either be included the same way or with one step of mapping to add that column. |
src/pudl/output/ferc1.py
Outdated
explosion_tables.append(tbl) | ||
metadata_exploded = self.meta_exploder.boom(clean_xbrl_metadata_json) | ||
exploded = pd.concat(explosion_tables) | ||
# drop any metadata columns coming from the tbls bc we may have edited the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does clean_xbrl_metadata_json
contain the calculation corrections added to each table?
* Rename "seeds" argument to "seed_nodes" * Validate some aspects of the exploded metadata as soon as it is created, and give useful error messages if we find problems. * Make the calculation_forest a property of the Exploder class, rather than automatically building it upon instantiation to aid in debugging.
…ion_and_depletion_of_plant_utility as in rate base
…ations Link electric OpEx to income statement table
…hrough calcs, clean up comments and calcs
… checks note: the tests don't pass! these guys are too off which is to say something is probably wrong with the code.
* Handle multi-dim totals * Do not treat NA value as total value. * Runtime check to avoid extra components in total calcs. * Add some calculation metadata to the inferred total calculations for downstream consumers; get existing tree annotation code working. There will be new, shiny, better tree annotation code in explode_tree_fixes, but this works for now. --------- Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop>
Make leafy balance sheet assets & liabilities data
PR Overview
First pass at #2624
To dos:
Questions
Explode
class which does some of the upfront metadata compilation and validation for a given set of tables to explode.PR Checklist
dev
).