Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate 2023 FERC1 Update #3701

Merged
merged 16 commits into from
Jul 13, 2024
Merged

Integrate 2023 FERC1 Update #3701

merged 16 commits into from
Jul 13, 2024

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jun 27, 2024

Overview

Working on #3700.
Most of the transform changes are documented over in this scoping issue #3698

2024/07/12 Update: Right now there is one integration test failure coming from ferc_xbrl_updates, which @zschira is tackling. Locally I've run the full and fast ETL. With a full db I've run the full validation tests. (I haven't been able to run the build-deploy-pudl because the integration test fails.. I'm xfail-ing the one test locally to test everything in one go).

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

threshold_pct = 0.3
threshold_pct = 0.32
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a hack! we need to investigate this to see if its really a problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some investigation, I do not think this is a problem. For the table that caused this value error in the first place (electric_plant_in_service_204_duration), there are 33 excess non-null records out of a possible 10472. Which means that if there were 3 fewer non-null records this test would pass fine np.

After reviewing some of the 2023 excess non-nulls with the apply_diffs methodology, its pretty ambiguous what was supposed to be a record that was updating a previously non-null record into a null vs just that they didn't report anything when they were updating other values. Honestly the fluxxuation in the non-null values from submission to submission was slightly alarming (not all of the values changed by any means! but some were just super different from submission to submission...).

Copy link
Member Author

@cmgosnell cmgosnell Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm bumping this up again for the fast tests. This is the same # of excess non-nulls, but with one less year involved.

ValueError: Found more than expected excess non-null values using the currently  implemented 
apply_diffs methodology (#7285) as compared to the best_snapshot methodology (#7254). We expected 
the apply_diffs methodology to result in no more than 100.32% non-null records but found 100.43%.

@@ -80,7 +80,6 @@ core_ferc1__yearly_operating_revenues_sched300,other_operating_revenues,core_fer
core_ferc1__yearly_operating_revenues_sched300,other_operating_revenues,core_ferc1__yearly_operating_revenues_sched300,sales_of_water_and_water_power,1.0,,,
core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_operating_revenues_sched300,large_or_industrial,1.0,,,
core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_operating_revenues_sched300,small_or_commercial,1.0,,,
core_ferc1__yearly_operating_revenues_sched300,sales_to_ultimate_consumers,core_ferc1__yearly_sales_by_rate_schedules_sched304,commercial_and_industrial,,,,
Copy link
Member Author

@cmgosnell cmgosnell Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were removing this calculation component from this parent factoid because it didn't belong here. The 2023 metadata was updated and they removed this calculation component. wahoo they fixed a thing.

RN we are entirely using the 2023 xbrl metadata. This may need to updated in a world in which we are using different metadata for different years of xbrl re: #3713. Presumably we'd need to add report_year in this file to have year-specific calculation component fixes. We could do this like we are currently doing the dimension columns (keep the dimensions null in the manually compiled csv's if the calc fix is not dimension specific & only add dimensions when the fix is dimension specific -> fill in any null dimensions found in the data). This allows us to minimally specify calc fixes but gives us the flexibility to add dimension specific fixes. BUT this is OOS for this PR and is all for #3713

@cmgosnell cmgosnell changed the base branch from main to ferc_xbrl_updates July 8, 2024 15:56
@cmgosnell cmgosnell changed the base branch from ferc_xbrl_updates to main July 8, 2024 15:56
@cmgosnell cmgosnell changed the base branch from main to ferc_xbrl_updates July 8, 2024 15:57
@cmgosnell cmgosnell linked an issue Jul 9, 2024 that may be closed by this pull request
@cmgosnell cmgosnell self-assigned this Jul 9, 2024
@cmgosnell cmgosnell added data-update When fresh data is integrated into PUDL from quarterly or annual updates ferc1 Anything having to do with FERC Form 1 labels Jul 9, 2024
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this thread for what's going on in here

@cmgosnell cmgosnell changed the title very draft first pass of transforming the 2023 ferc1 data Integrate 2023 FERC1 Update Jul 12, 2024
@cmgosnell cmgosnell marked this pull request as ready for review July 12, 2024 14:30
@zaneselvans zaneselvans self-requested a review July 12, 2024 15:44
Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like the missing new-table metadata issue appears in the branch this is merging into so this looks good to go. I asked a couple of non-blocking questions.

@@ -35,7 +35,7 @@ description: >
version: 0.1.0
datasets:
ferc1:
years: [2020, 2021, 2022]
years: [2020, 2021, 2023]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect having a gap in the test years here to be totally fine?

src/pudl/transform/ferc1.py Show resolved Hide resolved
@cmgosnell cmgosnell requested a review from zaneselvans July 12, 2024 17:43
@zaneselvans zaneselvans merged commit 1c83595 into ferc_xbrl_updates Jul 13, 2024
12 checks passed
@zaneselvans zaneselvans deleted the 2023-ferc1 branch July 13, 2024 01:54
github-merge-queue bot pushed a commit that referenced this pull request Jul 18, 2024
* Update xbrl extraction to use new version

* Add 2023 to xbrl extraction years and update how settings are loaded

* Add 2023 to default Ferc714XbrlToSqlite settings class

* Add 2023 to FERC 6 & 60 data source working partitions.

* Integrate 2023 FERC1 Update (#3701)

* very draft first pass of transforming the 2023 ferc1 data

this was done with a manual hand-off of the xbrl sqlite db

* update the filter_for_freshest_data thredhold

* Update XBRL settings & metadata to extract 2023 XBRL data

* Disable experiment tracking by default

* map new ferc1 plants

* update min/max rows with new year of data

* add release notes

* bump the apply_diffs threshhold for the fast tests

* light docs updates for project_num fix

---------

Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop>
Co-authored-by: zschira <zach.schira@catalyst.coop>

* Update water limited capacity ratio validation

* Require ferc_xbrl_extractor>=1.5.1 since 1.5.0 has a bug.

* Update release notes for new FERC stuff.

---------

Co-authored-by: Christina Gosnell <cgosnell@catalyst.coop>
Co-authored-by: Zane Selvans <zane.selvans@catalyst.coop>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-update When fresh data is integrated into PUDL from quarterly or annual updates ferc1 Anything having to do with FERC Form 1
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate 2023 FERC1 Data
3 participants