Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breakdown total utility_type and partial in_rate_base in rate base table #3532

Merged
merged 24 commits into from
May 14, 2024

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Apr 3, 2024

Overview

Closes #3530.

Also closes #3534:

  • added more in_rate_base and rate_base_category tags
  • began propagating the rate_base_category tag

Extra:

  • there were lots of warnings and error messages in the logs that are now no longer popping up bc of simple reordering of things or silencing things when its not really a problem any more.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Review the PR yourself and call out any questions or issues you have
    Options
  2. Ensure docs build, unit & integration tests, and test coverage pass locally with make pytest-coverage (otherwise the merge queue may reject your PR)
    Options
  3. Update the release notes: reference the PR and related issues.
    Options

@cmgosnell cmgosnell added ferc1 Anything having to do with FERC Form 1 output Exporting data from PUDL into other platforms or interchange formats. rmi labels Apr 3, 2024
@cmgosnell cmgosnell self-assigned this Apr 3, 2024
@cmgosnell cmgosnell marked this pull request as ready for review April 26, 2024 19:17
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
@cmgosnell cmgosnell requested a review from aesharpe April 29, 2024 19:03
Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For right now it's mostly comments on the concept / explanation vs. the code itself. I can go through and walk through each of the code steps if that's helpful.

High level thoughts:

  • use _df suffix
  • use break down instead of breakdown unless I'm understanding it wrong

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
@cmgosnell cmgosnell requested a review from aesharpe May 7, 2024 21:12
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
Copy link
Member

@aesharpe aesharpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the relevant parts ofpudl/output/ferc1.py to wrap my head around the rate base table, and I have some additional comments that don't pertain directly to code edited in this PR. Here are those thoughts:

1. Tags

Is there a place where the term “tag” is more explicitly defined? It doesn't fit into the “tree” metaphor so it’s harder to discern what it is. In theory, tag = a label for something, but that seems pretty vague. It would help to clarify the meaning of "tag" somewhere in the docs. Maybe in _out_ferc1__detailed_tags or _get_tags

It seems like sometimes you use the term “label” and sometimes you use the term “tag” and I can’t tell if they are supposed to be interchangeable or not.

2. The Tree Metaphor

(Starting line 2051)

Several individual values from the higher level
tables will appear as root nodes at the top of each hierarchy, and the leaves in
the underlying tree structure are the individually reported non-calculated values
that make them up.

The way this passage uses the tree metaphor is confusing. It refers to roots “at the top” and leaves as “underlying structure” — is that an accidental flip-flop? Roots are at the bottom and leaves are at the top. Are “high-level” tables leafy tables? Or are “high-level” tables root tables? The language here is playing tricks on my brain. It would be ideal to use terms that relate to or evoke the part of the tree that each table is referring to. The word “granular” feels like it should be low to the ground like “roots” but I think it's used in reference to the leaves (because it’s the least aggregated data?). But maybe I’m getting that wrong. The whole pipeline seems like a backwards tree…lots of roots that lead to one big chunky tree trunk? Or maybe it’s a few core roots leading to a zillion leaves? The roots-to-leaves metaphor is hard because there are often many of both of them. Is that intended? More clarification would be helpful.

I’m also having trouble grasping the “seed” and “forest” references. What is the difference between a seed and a root? It feels a little too abstracted.

3. Unit Tests

Do you want/plan to write any unit tests for this table?

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Show resolved Hide resolved
@cmgosnell cmgosnell enabled auto-merge May 14, 2024 21:20
Comment on lines 2059 to 2060
a hierarchical tree structure. Several individual values from the less granular
tables will appear as root nodes at the top of each hierarchy, and the leaves in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe say "at the base of each hierarchy" instead of "top" so it's more "rooty"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word "several" and "appear" here are confusing. Which roots we use is intentional, right?

Several individual values from the less granular
tables will appear as root nodes at the top of each hierarchy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i went back and forth on this one. i think "top of hierarchy" is more analogous to "root of tree" than "base of hierarchy". bc base of a hierarchy implies that its a wide (i.e. granular) base. and the top of a hierarchy is usually one of more single points (see) from which the rest of the hierarchy flows down (towards leaves)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you keep reading below here you'll see that a list of seed nodes can be given to this forest builder, which is the way we intentionally say what root(s) to start building from

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's why I was confused about the word "appear" in the section I highlighted. The language portrays something that happens coincidentally when in reality it is planned. But this is too nuanced spend time on rly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do appear as roots until we "prune" all of the nodes that are not connected to the seed node (this stuff is a little weedsy and is documented in the many class methods

Comment on lines 2058 to 2064
calculations encoded in the XBRL Metadata, and that these relationships should have
a hierarchical tree structure. Several individual values from the higher level
a hierarchical tree structure. Several individual values from the less granular
tables will appear as root nodes at the top of each hierarchy, and the leaves in
the underlying tree structure are the individually reported non-calculated values
that make them up. Because the top-level tables have several distinct values in
them, composed of disjunct sets of reported values, we have a forest (a group of
several trees) rather than a single tree.
that make them up (i.e. the most granular values). Because the less granular tables
have several distinct values in them, composed of disjunct sets of reported values,
we have a forest (a group of several trees) rather than a single tree.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This phrasing is still a little confusing. Are you saying that values from composite "less granular" tables can be both roots and leaves?

The word "underlying" still sounds too "rooty" for a "leaf" could take it out entirely and say something like: "Composite values (root nodes) from less granular tables form the base of the hierarchy from which the most granular calculation components (leaves) arise. Because the less granular tables have several distinct values in them and are composed of a disjunct sets of reported values, our result is a forest (a group of several trees) rather than a single tree."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed that underlying is not a great word here. I don't think your suggestion here is totally correct. A ton of the nodes on the tree are composite values - everything except the leaves are composite notes. and as i mentioned in the other thread i don't think "base of hierarchy" is accurate.

also, a deep re-write of this doc is something we just said on the phone was oos for this pr. my edits rn on this are simply to remove "underlying"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could say "form the base of the tree" instead of the hierarchy. I see why it's weird to say "base of the hierarchy"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this necessarily negates the fact that non-leaf values can be calculated values. But you're right, there's no need to get stuck here. You can keep it as is if you're more comfortable with that!!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we did discuss the importance of having a sentence that conveys the relationship between seed/root/leaves/tree/forest relationship (in addition to linking to the networkx docs). Def don't want to re-write all the tree language, just clarify it once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay im going to remove the hierarchy bit altogether. i don't think it adds anything and any other way besides "root" and "less granular" is complicating.

src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
src/pudl/output/ferc1.py Outdated Show resolved Hide resolved
@cmgosnell cmgosnell requested a review from aesharpe May 14, 2024 22:12
@cmgosnell cmgosnell added this pull request to the merge queue May 14, 2024
Merged via the queue into main with commit b3d3ec3 May 14, 2024
12 checks passed
@cmgosnell cmgosnell deleted the ferc1-rate-base-total-and-partial-breakdown branch May 14, 2024 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ferc1 Anything having to do with FERC Form 1 output Exporting data from PUDL into other platforms or interchange formats. rmi
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Second Review of Rate Base break down of utility_type=="total" and null/partial in_rate_base records
2 participants