New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename and test XBRL metadata calculations #2563
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #2563 +/- ##
=======================================
+ Coverage 86.9% 87.1% +0.2%
=======================================
Files 84 84
Lines 9720 9912 +192
=======================================
+ Hits 8447 8636 +189
- Misses 1273 1276 +3
☔ View full report in Codecov by Sentry. |
this will be migrated away bc the "missing" names mostly seem to be references to other tables
Okay I pulled out the metadata work into it's own income_statement_tables = [
"income_statement_ferc1",
"depreciation_amortization_summary_ferc1",
"electric_operating_expenses_ferc1",
"electric_operating_revenues_ferc1",
]
income_table_dollar_cols = {
"income_statement_ferc1": "income",
"depreciation_amortization_summary_ferc1": "depreciation_amortization_value",
"electric_operating_expenses_ferc1": "expense",
"electric_operating_revenues_ferc1": "revenue",
}
# tables = {tbl: defs.load_asset_value(AssetKey(tbl)) for tbl in income_statement_tables}
tables = {tbl: pd.read_sql(tbl, pudl_engine) for tbl in income_statement_tables}
meta_converted = ExplodeMeta(xbrl_meta).convert_metadata(income_statement_tables)
calc_dfs = {}
for table_name in income_statement_tables:
calculated_values = meta_converted[table_name]
dollar_value_col = income_table_dollar_cols[table_name]
table_df = tables[table_name]
calc_dfs[table_name] = check_table_calcs(table_name, table_df, dollar_value_col, calculated_values) Results: 2023-05-04 17:46:19 [ INFO] catalystcoop.pudl.transform.ferc1:4092 income_statement_ferc1: has #7097 / 2.23% records that don't calculate exactly
2023-05-04 17:46:19 [ INFO] catalystcoop.pudl.transform.ferc1:4092 depreciation_amortization_summary_ferc1: has #16 / 0.01% records that don't calculate exactly
2023-05-04 17:46:20 [ INFO] catalystcoop.pudl.transform.ferc1:4092 electric_operating_expenses_ferc1: has #7707 / 1.44% records that don't calculate exactly
2023-05-04 17:46:21 [ INFO] catalystcoop.pudl.transform.ferc1:4092 electric_operating_revenues_ferc1: has #197 / 0.28% records that don't calculate exactly |
…tatements Fix income_statement_ferc1 utility_type categorization bug
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
…udl into xbrl_meta_reshape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow there is a lot in here even if we ignore the DB migrations (which unfortunately make it impossible to tell how much there is actually in here).
src/pudl/transform/ferc1.py
Outdated
for table_name, table_meta in raw_xbrl_metadata_json.items(): | ||
for list_of_facts in table_meta.values(): | ||
for xbrl_fact in list_of_facts: | ||
# all facts have ``calculations``, but they are empty lists when null | ||
for calc_component in xbrl_fact["calculations"]: | ||
# does the calc component show up in the table? if not, add a label | ||
if calc_component["name"] not in tables_to_fields[table_name]: | ||
calc_component = label_source_table( | ||
calc_component, tables_to_fields | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably define Pydantic classes for the objects that are represented in the JSON metadata, and methods for those classes that know how to perform these operations internal to the objects, otherwise it's all very dependent on the implicit structure of the nested dicts/lists and as those contents evolve, it'll be time consuming to chase down all the places where things break and fix them.
What are the component data structures we need to represent in here?
- table metadata (a list of fact metadata objects? Can be converted to a dataframe?)
- fact metadata (what's in here besides calculations? Can be converted to a row in a dataframe?)
- calculation (a list of calculation components, needs context of what table it's embedded within?)
- calculation component (name + weight, plus we need to qualify the name with a source table?)
I think we're already partly there since the sub-functions defined above would become methods of one of these classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some additional comments in our existing threads and maybe added another one regarding naming / my ongoing confusions.
…udl into xbrl_meta_reshape
…omponent-cleanup add minor inter table calc fixes
PR Overview
Working on sub-task of #2016. Desire here is to:
Tasks
Latest income-statement component results
Note: I was previously calculating the % as off_records/total_records and recently converted this to check the off_records/calculated_records. So the % went up while the # stayed the same
PR Checklist
dev
).