Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot fix ferc exploder #2647

Merged
merged 20 commits into from Jun 21, 2023
Merged

Spot fix ferc exploder #2647

merged 20 commits into from Jun 21, 2023

Conversation

aesharpe
Copy link
Member

@aesharpe aesharpe commented Jun 8, 2023

This PR addresses the spot fixes outlined in #2599

1.) Fix certain negative values in utility_plant_summary_ferc1: #2647 (comment)
2.) Fix negatives in the plant_in_service_ferc1 table, electric_plant_sold column: #2647 (comment)

@aesharpe
Copy link
Member Author

aesharpe commented Jun 8, 2023

@jrea-rmi wondering what you think about this...

1.) Fix certain negative values in utility_plant_summary_ferc1

The Problem

There are some records in the utility_plant_summary_ferc1 table that appear to contain obvious errors.

Specifically, there are records where the value reported for accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility, a calculated value, is a positive value and the records that comprise the subcomponents are all reported as negative values. This causes a discrepancy between our calculated values (calculated_amount) and the reported calculated values (ending_balance). In this case, the reported calculated values are positive, but our calculated values are negative.

This is one example of 29 (see #2599 (comment)):

report_year utility_id_ferc1 utility_type utility_plant_asset_type ending_balance calculated_amount
93866 2009 211 electric accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility 2.46878e+09 -2.46878e+09
93868 2009 211 electric amortization_of_other_utility_plant_utility_plant_in_service -4.14332e+07 nan
93871 2009 211 electric depreciation_utility_plant_in_service -2.42735e+09 nan

In total, 53 records are spot fixed (this is higher than 29 b/c we are spot fixing the components that add up to the calculated value, not the calculated value itself). This represents just 0.03% of all records!

The Solution

This PR spot-fixes the subcomponents so that they are positive instead of negative. This causes our calculated_amount values to match the values in ending_balance for these calculated values.

I employ a spot fix rather than a programmatic fix here. This is because the way to programmatically identify errors is to use the outputs from the reconcile_table_calculations function. This function is also what we use to check that the reported calculations are correct. To fix the errors programmatically, I would have to run reconcile_table_calculations, identify the errors, fix the errors, and then run reconcile_table_calculations again to get rid of any adjustments made by running it the first time with the erroneous records (the function assumes that the programmatically calculated values are correct and will add an adjustment record to make the values add up).

By flipping the sign value of the subcomponents used in our calculation of the calculated_amount, we can run reconcile_table_calculations and get a calculation that matches the value in ending_balance.

report_year utility_id_ferc1 utility_type utility_plant_asset_type ending_balance calculated_amount
93866 2009 211 electric accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility 2.46878e+09 2.46878e+09
93868 2009 211 electric amortization_of_other_utility_plant_utility_plant_in_service 4.14332e+07 nan
93871 2009 211 electric depreciation_utility_plant_in_service 2.42735e+09 nan

The Problem with the Problem

The 29 problem utility-years all have another field called accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail (the same as the calculated field above but with the suffix _detail. This is a relic of the DBF form. In all 29 cases, the detail field is negative! (i.e., the opposite of the non-detail field). According to the form (below), these values should be the same.

(this is just 3 of the 29)

report_year utility_id_ferc1 utility_type utility_plant_asset_type_y ending_balance_y calculated_amount_y
0 2002 170 other1 accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility -8732 8732
1 2002 170 other1 accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail 8732 nan
6 2006 211 total accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility 2.0784e+09 -2.0784e+09
7 2006 211 total accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail -2.0784e+09 nan
20 2006 211 electric accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility 2.0784e+09 -2.0784e+09
21 2006 211 electric accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail -2.0784e+09 nan

We decided to spot fix these values because we felt that if the calculated value was reported as a positive value then the components were probably miss-reported as negative values. However, now there are two reported calculated values (the latter isn't listed as a calculated value because it comes from DBF, pre-XBRL metadata including calculated values etc.), one of which is negative and one of which is positive...grr. This erodes my confidence that these values should be spot fixed at all, especially because the reported subcomponents are also negative.

The reason why I've kept the spot fixes (for now) is because negative values for depreciation, amortization, and depletion (DAD) are almost always positive. It seems especially rare/bad for a utility report repeated negative values for DAD which is the case for utility_id_ferc1 211 that is part of the spot fix.

In this form:

  • line no. 14 = accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility
  • line no. 33 = accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail
Screen Shot 2023-06-08 at 5 52 02 PM

Ideally I'd like to get rid of the accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail record because it's supposed to be a duplicate and it stops getting reported in XBRL.

Before doing so, it's worth considering that 4.3% of the year-utility-utility_type combos in this table have different values reported for accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail. I have no idea why. The vast majority of those records are below 3% different from one another (or have a flipped sign).

At this point, I'm ok getting rid of the _detail record. However, I'm still not 100% sure which record carries more weight for those with a flipped sign!

@jrea-rmi
Copy link
Collaborator

jrea-rmi commented Jun 8, 2023

Spot fixes to change component values to positive sounds like the right move to me. Negative depreciation means the asset gains value over time, but I don't think that should happen for power plants - as they get used, they deteriorate. But there could be exceptions...

to determine which record carries more weight, we want this to match the value of UtilityPlantNet in balance_sheet_assets_ferc1. Checking the FERC taxonomy, I got turned around a couple times but it seemed like both the _detail and non-_detail sections were referenced...can you look into that also @aesharpe?

Another thought: does accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail start getting reported to a separate table, as opposed to "stops getting reported in XBRL"? I wonder if that's true, then accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail being different than accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility would help identify sign errors in that new detail table? And it would also make sense to me that the non-detail table would include calculations that reference the detail table, because the bottom section of the form should be detail that adds up to values in the top section of the form.

@aesharpe
Copy link
Member Author

aesharpe commented Jun 8, 2023

@jrea-rmi

does accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail start getting reported to a separate table, as opposed to "stops getting reported in XBRL"?

It appears that accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail gets subsumed into accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility. The XBRL metadata for the accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility column points to rows 14 and 33 in the Form which are the same rows highlighted above.

See package_data/settings/ferc1_xbrl_taxonomy_metadata.json and search for accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility

@aesharpe
Copy link
Member Author

aesharpe commented Jun 8, 2023

@jrea-rmi

to determine which record carries more weight, we want this to match the value of UtilityPlantNet in balance_sheet_assets_ferc1. Checking the FERC taxonomy, I got turned around a couple times but it seemed like both the _detail and non-_detail sections were referenced...can you look into that also @aesharpe?

Great idea. A preliminary check shows that the positive value (accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility) is correct! This serves as validation for our spot fixing.

According to the FERC taxonomy, utility_plant_net = utility_plant_and_construction_work_in_progress - accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility (_detail??). As you can see below, the positive value 2.46878e+09 is the correct value for this equation!

5.09278e+09 - 2.46878e+09 = 2.624e+09

report_year utility_id_ferc1 utility_type utility_plant_asset_type ending_balance calculated_amount
93866 2009 211 electric accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility 2.46878e+09 -2.46878e+09
93867 2009 211 electric accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail -2.46878e+09 nan
93872 2009 211 electric utility_plant_and_construction_work_in_progress 5.09278e+09 nan
93878 2009 211 electric utility_plant_net 2.624e+09 2.624e+09

@aesharpe
Copy link
Member Author

aesharpe commented Jun 8, 2023

After all this investigation, I would feel pretty comfortable removing the accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail records. What do @jrea-rmi, @cmgosnell think? I could do a little more investigation into the records that don't match up perfectly. I could also just ignore them and keep it all in there b/c it's such a small amount of problem records that differ.

@jrea-rmi
Copy link
Collaborator

jrea-rmi commented Jun 9, 2023

What's the purpose of removing, and for what output?

My understanding for exploded tables is that detail gets concatenated and then totals are removed; so both total lines would get removed at that step.

@aesharpe
Copy link
Member Author

aesharpe commented Jun 9, 2023

What's the purpose of removing, and for what output?

Because it's technically a duplicate value with no new information that also isn't flagged as a total or calculated value which isn't great because we don't want folks to accidentally concatenate it. I would propose removing it from the utility_plant_summary_ferc1 table pre-explosion so it's gone from the table if anyone wants to use just that table and also doesn't junk up the explosion.

@jrea-rmi
Copy link
Collaborator

jrea-rmi commented Jun 9, 2023

I figured it would have been flagged as a calculated value!

Would it still be useful for doing checks that ensure signs and values of components are correct? If not, then I'm in agreement with removing it pre-explosion.

@codecov
Copy link

codecov bot commented Jun 13, 2023

Codecov Report

Patch coverage: 100.0% and no project coverage change.

Comparison is base (581c6da) 87.2% compared to head (f3c1e22) 87.2%.

❗ Current head f3c1e22 differs from pull request most recent head 8daa787. Consider uploading reports for the commit 8daa787 to get more accurate results

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #2647   +/-   ##
=====================================
  Coverage   87.2%   87.2%           
=====================================
  Files         87      87           
  Lines      10130   10155   +25     
=====================================
+ Hits        8839    8864   +25     
  Misses      1291    1291           
Impacted Files Coverage Δ
src/pudl/transform/params/ferc1.py 100.0% <ø> (ø)
src/pudl/transform/ferc1.py 96.7% <100.0%> (+<0.1%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Previously we mapped FERC row 14, Accum Prov for Depr, Amort, & Depl, to the XBRL row accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and FERC row 33, Total Accum Prov (equals 14) (22,26,30,31,32), to a unique row accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_detail (note the '_detail' suffix). The former (sans '_detail') maps to an XBRL calculated value while the latter does not. Technically, row 33 should be the calculated value as it notes all the rows that should sum up to it).

This commit maps row 33 to accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility and row 14 to accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility_reported. I added the '_reported' suffix instead of the '_detail' suffix because it's more informative.This commit also replaces all the spot-fix rows that reference these values with the correct value name.

This commit also adds one more value to the spot fixer for utility_plant_summary_ferc1
@e-belfer e-belfer linked an issue Jun 14, 2023 that may be closed by this pull request
…lues for electric_plant_sold values where the overall electric_plant_in_service calculation is wrong
…transform_main function for the plant_in_service table (rather than to the dataframe itself) so we can use the calculations to find the index values for the rows to change in the original dataframe without running the calculations on it (and adding new rows for calculations fixes etc.) This change makes the reconcile_table_calculations function work properly in the transform_end function! Previously it was not calculating the right values after some of the spot fixes.
@aesharpe
Copy link
Member Author

Updates:

We did not get rid of the _detail row (for now). Instead, we changed the PUDL column mapping reference to point row 33 (see screen grab from #2647 (comment)) to accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility (was previously the same but with the _detail suffix). Because this is the value that should be marked as calculated_value within the taxonomy.

We then turned the previous value for accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_utility (row 14) into accumulated_provision_for_depreciation_amortization_and_depletion_of_plant_reported.

@aesharpe
Copy link
Member Author

aesharpe commented Jun 14, 2023

2.) Fix certain negatives in the plant_in_service_ferc1 table, electric_plant_sold column

The Problem

There are some calculation values within the plant_in_service_ferc1 table for ferc_account_label == plant_in_service that are wrong. One reason that they are wrong is a negative sign that's incorrectly added to the electric_plant_sold values. This crops up because the calculation for plant_in_service subtracts the value for electric_plant_sold so it's often reported with a negative sign in front of it when it shouldn't be. This causes the calculations to add electric_plant_sold instead of subtract (think 10 - -5 = 15 instead of 10 - 5 = 5). Thus, the calculated_field will not match the value in ending_balance.

The Solution

Create a spot fix by adding to the transform_main function for this table class:

  • Identify plant_in_service calculated_amount records that don't match their ending_balance values.
  • Use the report_year and utility_id_ferc1 from those records to identify electric_plant_sold records that are the subcomponents of the faulty calculations.
  • Replace the starting_balance and ending_balance values for these records with their absolute value.

I did not flip all negatives to positives because sometimes the negative value in electric_plant_sold leads to the correct calculation!

This flips about 78 records so that the calculation matches the ending balance!

@e-belfer
Copy link
Member

I did not flip all negatives to positives because sometimes the negative value in electric_plant_sold leads to the correct calculation!

@aesharpe How often is this the case? Doing an across the board flip seems more justifiable to me and neater.

Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change and a bigger question - is there a reason why you aren't using the spot fixer to make the changes for the utility_plant_summary_ferc1 table? It's already unit tested and is set up to take PKs and values to change, so it seems like a good fit for this issue unless there is something I'm not seeing?

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
@aesharpe
Copy link
Member Author

I did not flip all negatives to positives because sometimes the negative value in electric_plant_sold leads to the correct calculation!

@aesharpe How often is this the case? Doing an across the board flip seems more justifiable to me and neater.

I just checked, and idk what I saw before, but you're right. Which is a bummer because that code was kind of labor intensive. But yay to things being simpler than I thought.

dquote>
dquote> Simplify the plant_in_service function so that it just flips all electric_plant_sold values to positive values.
dquote>
dquote> Update the accomodations for the fast test so that they only happen in one place (years are trunkated from the final list of spot_fix_pks instead of each of the individual lists before the are concatinated).
@e-belfer e-belfer self-requested a review June 15, 2023 17:19
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no! I have one tiny suggestion on the wording of the logger statement, and a bigger question about the utility_plant_summary spot fix.

With this spot fix:

  • plant_in_service_ferc1 goes from 156 to 96 records with incorrect calculations. Nice!
  • utility_plant_summary_ferc1 goes from 179 to 258 records with incorrect calculations. Uh oh! This spot fix should probably not get applied if it is in aggregate making things worse.

Update: the change in numbers found above has to do with the change in the _detail column renaming. This more accurately reflects what field should be summed to, but results in a higher number of inaccurate calculations.

# Par down spot fixes to account for fast tests where not all years are used
df_years = df.report_year.unique().tolist()
spot_fix_pks = [x for x in spot_fix_pks if x[0] in df_years]
logger.info(f"{self.table_id.value}: Spotfixing {len(spot_fix_pks)} records.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just make this "Spotfixing {} records with incorrectly signed values" or something like this so it's a bit more descriptive. Otherwise works like a charm.

src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
@e-belfer
Copy link
Member

@aesharpe One last thought here other than the alembic rebase - can you add a note to the release notes explaining the detail column rename with reference to the PR?

@e-belfer e-belfer self-requested a review June 21, 2023 15:14
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

@aesharpe aesharpe merged commit 8f1e23e into dev Jun 21, 2023
5 of 6 checks passed
@aesharpe aesharpe deleted the spot-fix-ferc-exploder branch June 21, 2023 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate minor cleaning fixes from spot-checks of XBRL explosion
3 participants