New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spot fix ferc exploder #2647
Spot fix ferc exploder #2647
Conversation
…ding_balance values for the utility_plant_summary_ferc1 table
@jrea-rmi wondering what you think about this... 1.) Fix certain negative values in
|
report_year | utility_id_ferc1 | utility_type | utility_plant_asset_type | ending_balance | calculated_amount | |
---|---|---|---|---|---|---|
93866 | 2009 | 211 | electric | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility | 2.46878e+09 | -2.46878e+09 |
93868 | 2009 | 211 | electric | amortization_of_other_utility_plant_utility_plant_in_service | -4.14332e+07 | nan |
93871 | 2009 | 211 | electric | depreciation_utility_plant_in_service | -2.42735e+09 | nan |
In total, 53 records are spot fixed (this is higher than 29 b/c we are spot fixing the components that add up to the calculated value, not the calculated value itself). This represents just 0.03% of all records!
The Solution
This PR spot-fixes the subcomponents so that they are positive instead of negative. This causes our calculated_amount
values to match the values in ending_balance
for these calculated values.
I employ a spot fix rather than a programmatic fix here. This is because the way to programmatically identify errors is to use the outputs from the reconcile_table_calculations
function. This function is also what we use to check that the reported calculations are correct. To fix the errors programmatically, I would have to run reconcile_table_calculations
, identify the errors, fix the errors, and then run reconcile_table_calculations
again to get rid of any adjustments made by running it the first time with the erroneous records (the function assumes that the programmatically calculated values are correct and will add an adjustment record to make the values add up).
By flipping the sign value of the subcomponents used in our calculation of the calculated_amount
, we can run reconcile_table_calculations
and get a calculation that matches the value in ending_balance
.
report_year | utility_id_ferc1 | utility_type | utility_plant_asset_type | ending_balance | calculated_amount | |
---|---|---|---|---|---|---|
93866 | 2009 | 211 | electric | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility | 2.46878e+09 | 2.46878e+09 |
93868 | 2009 | 211 | electric | amortization_of_other_utility_plant_utility_plant_in_service | 4.14332e+07 | nan |
93871 | 2009 | 211 | electric | depreciation_utility_plant_in_service | 2.42735e+09 | nan |
The Problem with the Problem
The 29 problem utility-years all have another field called accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail
(the same as the calculated field above but with the suffix _detail. This is a relic of the DBF form. In all 29 cases, the detail
field is negative! (i.e., the opposite of the non-detail field). According to the form (below), these values should be the same.
(this is just 3 of the 29)
report_year | utility_id_ferc1 | utility_type | utility_plant_asset_type_y | ending_balance_y | calculated_amount_y | |
---|---|---|---|---|---|---|
0 | 2002 | 170 | other1 | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility | -8732 | 8732 |
1 | 2002 | 170 | other1 | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail | 8732 | nan |
6 | 2006 | 211 | total | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility | 2.0784e+09 | -2.0784e+09 |
7 | 2006 | 211 | total | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail | -2.0784e+09 | nan |
20 | 2006 | 211 | electric | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility | 2.0784e+09 | -2.0784e+09 |
21 | 2006 | 211 | electric | accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail | -2.0784e+09 | nan |
We decided to spot fix these values because we felt that if the calculated value was reported as a positive value then the components were probably miss-reported as negative values. However, now there are two reported calculated values (the latter isn't listed as a calculated value because it comes from DBF, pre-XBRL metadata including calculated values etc.), one of which is negative and one of which is positive...grr. This erodes my confidence that these values should be spot fixed at all, especially because the reported subcomponents are also negative.
The reason why I've kept the spot fixes (for now) is because negative values for depreciation, amortization, and depletion (DAD) are almost always positive. It seems especially rare/bad for a utility report repeated negative values for DAD which is the case for utility_id_ferc1
211 that is part of the spot fix.
In this form:
- line no. 14 =
accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility
- line no. 33 =
accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail
Ideally I'd like to get rid of the accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail
record because it's supposed to be a duplicate and it stops getting reported in XBRL.
Before doing so, it's worth considering that 4.3% of the year-utility-utility_type combos in this table have different values reported for accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility
and accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail
. I have no idea why. The vast majority of those records are below 3% different from one another (or have a flipped sign).
At this point, I'm ok getting rid of the _detail
record. However, I'm still not 100% sure which record carries more weight for those with a flipped sign!
Spot fixes to change component values to positive sounds like the right move to me. Negative depreciation means the asset gains value over time, but I don't think that should happen for power plants - as they get used, they deteriorate. But there could be exceptions... to determine which record carries more weight, we want this to match the value of UtilityPlantNet in Another thought: does |
It appears that See |
Great idea. A preliminary check shows that the positive value ( According to the FERC taxonomy, 5.09278e+09 - 2.46878e+09 = 2.624e+09
|
After all this investigation, I would feel pretty comfortable removing the |
What's the purpose of removing, and for what output? My understanding for exploded tables is that detail gets concatenated and then totals are removed; so both total lines would get removed at that step. |
Because it's technically a duplicate value with no new information that also isn't flagged as a total or calculated value which isn't great because we don't want folks to accidentally concatenate it. I would propose removing it from the |
I figured it would have been flagged as a calculated value! Would it still be useful for doing checks that ensure signs and values of components are correct? If not, then I'm in agreement with removing it pre-explosion. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #2647 +/- ##
=====================================
Coverage 87.2% 87.2%
=====================================
Files 87 87
Lines 10130 10155 +25
=====================================
+ Hits 8839 8864 +25
Misses 1291 1291
☔ View full report in Codecov by Sentry. |
Previously we mapped FERC row 14, Accum Prov for Depr, Amort, & Depl, to the XBRL row accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and FERC row 33, Total Accum Prov (equals 14) (22,26,30,31,32), to a unique row accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail (note the '_detail' suffix). The former (sans '_detail') maps to an XBRL calculated value while the latter does not. Technically, row 33 should be the calculated value as it notes all the rows that should sum up to it). This commit maps row 33 to accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and row 14 to accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_reported. I added the '_reported' suffix instead of the '_detail' suffix because it's more informative.This commit also replaces all the spot-fix rows that reference these values with the correct value name. This commit also adds one more value to the spot fixer for utility_plant_summary_ferc1
…lues for electric_plant_sold values where the overall electric_plant_in_service calculation is wrong
…transform_main function for the plant_in_service table (rather than to the dataframe itself) so we can use the calculations to find the index values for the rows to change in the original dataframe without running the calculations on it (and adding new rows for calculations fixes etc.) This change makes the reconcile_table_calculations function work properly in the transform_end function! Previously it was not calculating the right values after some of the spot fixes.
Updates: We did not get rid of the We then turned the previous value for |
2.) Fix certain negatives in the
|
@aesharpe How often is this the case? Doing an across the board flip seems more justifiable to me and neater. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor change and a bigger question - is there a reason why you aren't using the spot fixer to make the changes for the utility_plant_summary_ferc1
table? It's already unit tested and is set up to take PKs and values to change, so it seems like a good fit for this issue unless there is something I'm not seeing?
I just checked, and idk what I saw before, but you're right. Which is a bummer because that code was kind of labor intensive. But yay to things being simpler than I thought. |
dquote> dquote> Simplify the plant_in_service function so that it just flips all electric_plant_sold values to positive values. dquote> dquote> Update the accomodations for the fast test so that they only happen in one place (years are trunkated from the final list of spot_fix_pks instead of each of the individual lists before the are concatinated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no! I have one tiny suggestion on the wording of the logger statement, and a bigger question about the utility_plant_summary
spot fix.
With this spot fix:
plant_in_service_ferc1
goes from 156 to 96 records with incorrect calculations. Nice!utility_plant_summary_ferc1
goes from 179 to 258 records with incorrect calculations. Uh oh! This spot fix should probably not get applied if it is in aggregate making things worse.
Update: the change in numbers found above has to do with the change in the _detail
column renaming. This more accurately reflects what field should be summed to, but results in a higher number of inaccurate calculations.
# Par down spot fixes to account for fast tests where not all years are used | ||
df_years = df.report_year.unique().tolist() | ||
spot_fix_pks = [x for x in spot_fix_pks if x[0] in df_years] | ||
logger.info(f"{self.table_id.value}: Spotfixing {len(spot_fix_pks)} records.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just make this "Spotfixing {} records with incorrectly signed values" or something like this so it's a bit more descriptive. Otherwise works like a charm.
…to reflect the fact that now there should only be negative values swapped to positive values
@aesharpe One last thought here other than the alembic rebase - can you add a note to the release notes explaining the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
This PR addresses the spot fixes outlined in #2599
1.) Fix certain negative values in
utility_plant_summary_ferc1
: #2647 (comment)2.) Fix negatives in the
plant_in_service_ferc1
table,electric_plant_sold
column: #2647 (comment)