Integrate 2023 FERC1 Update #3701

cmgosnell · 2024-06-27T14:43:58Z

Overview

Working on #3700.
Most of the transform changes are documented over in this scoping issue #3698

2024/07/12 Update: Right now there is one integration test failure coming from ferc_xbrl_updates, which @zschira is tackling. Locally I've run the full and fast ETL. With a full db I've run the full validation tests. (I haven't been able to run the build-deploy-pudl because the integration test fails.. I'm xfail-ing the one test locally to test everything in one go).

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Give feedback

If updating analyses or data processing functions: make sure to update or write data validation tests (e.g., test_minmax_rows())
Update the release notes: reference the PR and related issues.
Ensure docs build, unit & integration tests, and test coverage pass locally with make pytest-coverage (otherwise the merge queue may reject your PR)
Review the PR yourself and call out any questions or issues you have
For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
For significant ETL, data coverage or analysis changes, once make pytest-coverage passes, ensure the full ETL runs locally and run data validation tests using make pytest-validate (a ~10 hour run). If you can't run this locally, run the build-deploy-pudl GitHub Action (or ask someone with permissions to). Then, check the logs on the #pudl-deployments Slack channel or gs://builds.catalyst.coop.
Options

this was done with a manual hand-off of the xbrl sqlite db

cmgosnell · 2024-06-27T14:54:51Z

src/pudl/io_managers.py

-            threshold_pct = 0.3
+            threshold_pct = 0.32


this is a hack! we need to investigate this to see if its really a problem

After some investigation, I do not think this is a problem. For the table that caused this value error in the first place (electric_plant_in_service_204_duration), there are 33 excess non-null records out of a possible 10472. Which means that if there were 3 fewer non-null records this test would pass fine np.

After reviewing some of the 2023 excess non-nulls with the apply_diffs methodology, its pretty ambiguous what was supposed to be a record that was updating a previously non-null record into a null vs just that they didn't report anything when they were updating other values. Honestly the fluxxuation in the non-null values from submission to submission was slightly alarming (not all of the values changed by any means! but some were just super different from submission to submission...).

I'm bumping this up again for the fast tests. This is the same # of excess non-nulls, but with one less year involved.

ValueError: Found more than expected excess non-null values using the currently implemented apply_diffs methodology (#7285) as compared to the best_snapshot methodology (#7254). We expected the apply_diffs methodology to result in no more than 100.32% non-null records but found 100.43%.

cmgosnell · 2024-06-27T14:55:37Z

src/pudl/package_data/ferc1/xbrl_calculation_component_fixes.csv

@@ -80,7 +80,6 @@ core_ferc1__yearly_operating_revenues_sched300,other_operating_revenues,core_fer
 core_ferc1__yearly_operating_revenues_sched300,other_operating_revenues,core_ferc1__yearly_operating_revenues_sched300,sales_of_water_and_water_power,1.0,,,
 core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_operating_revenues_sched300,large_or_industrial,1.0,,,
 core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_operating_revenues_sched300,small_or_commercial,1.0,,,
-core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_sales_by_rate_schedules_sched304,commercial_and_industrial,,,,


We were removing this calculation component from this parent factoid because it didn't belong here. The 2023 metadata was updated and they removed this calculation component. wahoo they fixed a thing.

RN we are entirely using the 2023 xbrl metadata. This may need to updated in a world in which we are using different metadata for different years of xbrl re: #3713. Presumably we'd need to add report_year in this file to have year-specific calculation component fixes. We could do this like we are currently doing the dimension columns (keep the dimensions null in the manually compiled csv's if the calc fix is not dimension specific & only add dimensions when the fix is dimension specific -> fill in any null dimensions found in the data). This allows us to minimally specify calc fixes but gives us the flexibility to add dimension specific fixes. BUT this is OOS for this PR and is all for #3713

cmgosnell · 2024-07-12T13:57:41Z

src/pudl/io_managers.py

see this thread for what's going on in here

zaneselvans

It sounds like the missing new-table metadata issue appears in the branch this is merging into so this looks good to go. I asked a couple of non-blocking questions.

zaneselvans · 2024-07-12T15:32:33Z

src/pudl/package_data/settings/etl_fast.yml

@@ -35,7 +35,7 @@ description: >
 version: 0.1.0
 datasets:
  ferc1:
-    years: [2020, 2021, 2022]
+    years: [2020, 2021, 2023]


Do we expect having a gap in the test years here to be totally fine?

src/pudl/transform/ferc1.py

* Update xbrl extraction to use new version * Add 2023 to xbrl extraction years and update how settings are loaded * Add 2023 to default Ferc714XbrlToSqlite settings class * Add 2023 to FERC 6 & 60 data source working partitions. * Integrate 2023 FERC1 Update (#3701) * very draft first pass of transforming the 2023 ferc1 data this was done with a manual hand-off of the xbrl sqlite db * update the filter_for_freshest_data thredhold * Update XBRL settings & metadata to extract 2023 XBRL data * Disable experiment tracking by default * map new ferc1 plants * update min/max rows with new year of data * add release notes * bump the apply_diffs threshhold for the fast tests * light docs updates for project_num fix --------- Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop> Co-authored-by: zschira <zach.schira@catalyst.coop> * Update water limited capacity ratio validation * Require ferc_xbrl_extractor>=1.5.1 since 1.5.0 has a bug. * Update release notes for new FERC stuff. --------- Co-authored-by: Christina Gosnell <cgosnell@catalyst.coop> Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop>

cmgosnell added 2 commits June 27, 2024 10:40

very draft first pass of transforming the 2023 ferc1 data

c0870b5

this was done with a manual hand-off of the xbrl sqlite db

Merge branch 'main' into 2023-ferc1

4ee41ab

cmgosnell commented Jun 27, 2024

View reviewed changes

cmgosnell changed the base branch from main to ferc_xbrl_updates July 8, 2024 15:56

cmgosnell changed the base branch from ferc_xbrl_updates to main July 8, 2024 15:56

Merge branch 'main' into 2023-ferc1

6a6f4f1

cmgosnell changed the base branch from main to ferc_xbrl_updates July 8, 2024 15:57

Merge branch 'ferc_xbrl_updates' into 2023-ferc1

5166293

cmgosnell linked an issue Jul 9, 2024 that may be closed by this pull request

Integrate 2023 FERC1 Data #3700

Closed

cmgosnell self-assigned this Jul 9, 2024

cmgosnell added data-update When fresh data is integrated into PUDL from quarterly or annual updates ferc1 Anything having to do with FERC Form 1 labels Jul 9, 2024

cmgosnell and others added 8 commits July 9, 2024 15:50

update the filter_for_freshest_data thredhold

0741330

Update XBRL settings & metadata to extract 2023 XBRL data

538aeba

Disable experiment tracking by default

ebc8d1b

map new ferc1 plants

61ea18f

Merge branch 'ferc_xbrl_updates' into 2023-ferc1

3959f63

update min/max rows with new year of data

2adad6e

add release notes

d7d76d3

bump the apply_diffs threshhold for the fast tests

5def47b

cmgosnell commented Jul 12, 2024

View reviewed changes

src/pudl/io_managers.py

Copy link

Member Author

cmgosnell Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this thread for what's going on in here

cmgosnell changed the title ~~very draft first pass of transforming the 2023 ferc1 data~~ Integrate 2023 FERC1 Update Jul 12, 2024

light docs updates for project_num fix

4435c2e

cmgosnell marked this pull request as ready for review July 12, 2024 14:30

zaneselvans added 3 commits July 12, 2024 08:52

Merge branch 'ferc_xbrl_updates' into 2023-ferc1

b66c7b9

Merge branch 'ferc_xbrl_updates' into 2023-ferc1

907025d

Merge branch 'ferc_xbrl_updates' into 2023-ferc1

f28f549

zaneselvans self-requested a review July 12, 2024 15:44

zaneselvans approved these changes Jul 12, 2024

View reviewed changes

cmgosnell requested a review from zaneselvans July 12, 2024 17:43

zaneselvans approved these changes Jul 12, 2024

View reviewed changes

zaneselvans merged commit 1c83595 into ferc_xbrl_updates Jul 13, 2024
12 checks passed

zaneselvans deleted the 2023-ferc1 branch July 13, 2024 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate 2023 FERC1 Update #3701

Integrate 2023 FERC1 Update #3701

cmgosnell commented Jun 27, 2024 •

edited

Loading

To-do list

cmgosnell Jun 27, 2024

cmgosnell Jul 9, 2024

cmgosnell Jul 11, 2024 •

edited

Loading

cmgosnell Jun 27, 2024 •

edited

Loading

cmgosnell Jul 12, 2024

zaneselvans left a comment

zaneselvans Jul 12, 2024

Integrate 2023 FERC1 Update #3701

Integrate 2023 FERC1 Update #3701

Conversation

cmgosnell commented Jun 27, 2024 • edited Loading

Overview

Testing

To-do list

cmgosnell Jun 27, 2024

Choose a reason for hiding this comment

cmgosnell Jul 9, 2024

Choose a reason for hiding this comment

cmgosnell Jul 11, 2024 • edited Loading

Choose a reason for hiding this comment

cmgosnell Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

cmgosnell Jul 12, 2024

Choose a reason for hiding this comment

zaneselvans left a comment

Choose a reason for hiding this comment

zaneselvans Jul 12, 2024

Choose a reason for hiding this comment

cmgosnell commented Jun 27, 2024 •

edited

Loading

cmgosnell Jul 11, 2024 •

edited

Loading

cmgosnell Jun 27, 2024 •

edited

Loading