-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transform f1_elctrc_oper_rev #2192
Conversation
Codecov ReportBase: 85.6% // Head: 85.6% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## dev #2192 +/- ##
=====================================
Coverage 85.6% 85.6%
=====================================
Files 73 73
Lines 8958 8969 +11
=====================================
+ Hits 7674 7686 +12
+ Misses 1284 1283 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the most weird part of this was the duration rename. Its very encouraging that the main work here looks like it was adding params and the dbf/xbrl row maps!
So this table has a top chunk of electricity sold w/ revenue plus MWh sold and # of customers and then a second chunk with just revenues? I think it would be nice to put that content into the resource description. So users (and us!) knows why a chunk of the MWhs & # of customer fields are null.
Or are the ones that are being dropped in the wide_to_tidy
all of these columns in the "Other Operating Revenue" section? In these tables that have structured and unstructured portions coming from DBF we've typically pulled all of the columns that are in the XBRL structured table even if they seem like they should be the total value of the unstructured bit of the table.
src/pudl/transform/params/ferc1.py
Outdated
"mwh_sold", | ||
"avg_customers_per_month", | ||
], | ||
"expected_drop_cols": 14, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these the unstructured guys being dropped??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were actually columns that didn't have the mwh sold and average customers, that I was leaving out while deciding how to handle them, and forgot to add them back in. I've added these columns back, and had to add a small bit of code to drop some records with duplicate primary keys. The records in question were all from one utility in 2011 and every one of them had a revenue of 3.33e8 or 3.333e9 and all other fields were Null. These seemed pretty meaningless, so I dropped them all.
Yeah that's correct. I've added a note in the description about the NULL fields on many records, and added notes in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks great! glad you slurped the revenue-only bit back in and I think your description in the resources is sufficient for sure!! approved (pending you merge dev in and the tests pass of course!)
PR Overview
This PR covers the transformation of the
f1_elctrc_oper_rev
table. Most of the work for the structured portion of this table was just reshaping and mapping toelectric_operating_revenues_300_duration
. There's no instant table in the XBRL data, so no merging is necessary. Row #'s 22-25 in the DBF table contain unstructured data that is siloed in the XBRL data to theelectric_operating_revenues_other_300_duration
. This unstructured data is not handled by this PR.PR Checklist
dev
).