Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform f1_elctrc_oper_rev #2192

Merged
merged 15 commits into from
Jan 12, 2023
Merged

Transform f1_elctrc_oper_rev #2192

merged 15 commits into from
Jan 12, 2023

Conversation

zschira
Copy link
Member

@zschira zschira commented Jan 10, 2023

PR Overview

This PR covers the transformation of the f1_elctrc_oper_rev table. Most of the work for the structured portion of this table was just reshaping and mapping to electric_operating_revenues_300_duration. There's no instant table in the XBRL data, so no merging is necessary. Row #'s 22-25 in the DBF table contain unstructured data that is siloed in the XBRL data to the electric_operating_revenues_other_300_duration. This unstructured data is not handled by this PR.

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • Include unit tests for new functions and classes.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Base: 85.6% // Head: 85.6% // Increases project coverage by +0.0% 🎉

Coverage data is based on head (7b06d52) compared to base (dc39bdd).
Patch coverage: 100.0% of modified lines in pull request are covered.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #2192   +/-   ##
=====================================
  Coverage   85.6%   85.6%           
=====================================
  Files         73      73           
  Lines       8958    8969   +11     
=====================================
+ Hits        7674    7686   +12     
+ Misses      1284    1283    -1     
Impacted Files Coverage Δ
src/pudl/extract/ferc1.py 86.0% <ø> (ø)
src/pudl/metadata/fields.py 100.0% <ø> (ø)
src/pudl/metadata/resources/ferc1.py 100.0% <ø> (ø)
src/pudl/transform/params/ferc1.py 100.0% <ø> (ø)
src/pudl/transform/ferc1.py 95.9% <100.0%> (+0.1%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the most weird part of this was the duration rename. Its very encouraging that the main work here looks like it was adding params and the dbf/xbrl row maps!

So this table has a top chunk of electricity sold w/ revenue plus MWh sold and # of customers and then a second chunk with just revenues? I think it would be nice to put that content into the resource description. So users (and us!) knows why a chunk of the MWhs & # of customer fields are null.

Or are the ones that are being dropped in the wide_to_tidy all of these columns in the "Other Operating Revenue" section? In these tables that have structured and unstructured portions coming from DBF we've typically pulled all of the columns that are in the XBRL structured table even if they seem like they should be the total value of the unstructured bit of the table.

src/pudl/package_data/settings/etl_fast.yml Outdated Show resolved Hide resolved
src/pudl/transform/ferc1.py Outdated Show resolved Hide resolved
"mwh_sold",
"avg_customers_per_month",
],
"expected_drop_cols": 14,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these the unstructured guys being dropped??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were actually columns that didn't have the mwh sold and average customers, that I was leaving out while deciding how to handle them, and forgot to add them back in. I've added these columns back, and had to add a small bit of code to drop some records with duplicate primary keys. The records in question were all from one utility in 2011 and every one of them had a revenue of 3.33e8 or 3.333e9 and all other fields were Null. These seemed pretty meaningless, so I dropped them all.

src/pudl/metadata/fields.py Outdated Show resolved Hide resolved
src/pudl/metadata/resources/ferc1.py Outdated Show resolved Hide resolved
@zschira
Copy link
Member Author

zschira commented Jan 12, 2023

So this table has a top chunk of electricity sold w/ revenue plus MWh sold and # of customers and then a second chunk with just revenues? I think it would be nice to put that content into the resource description. So users (and us!) knows why a chunk of the MWhs & # of customer fields are null

Yeah that's correct. I've added a note in the description about the NULL fields on many records, and added notes in dbf_to_xbrl.csv indicating which revenue_type's this applies to. Does this seem sufficient?

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great! glad you slurped the revenue-only bit back in and I think your description in the resources is sufficient for sure!! approved (pending you merge dev in and the tests pass of course!)

@zschira zschira merged commit bf389ac into dev Jan 12, 2023
@zschira zschira deleted the transform_f1_elctrc_oper_rev branch January 12, 2023 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants