-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breakdown total utility_type
and partial in_rate_base
in rate base table
#3532
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For right now it's mostly comments on the concept / explanation vs. the code itself. I can go through and walk through each of the code steps if that's helpful.
High level thoughts:
- use
_df
suffix - use
break down
instead ofbreakdown
unless I'm understanding it wrong
For more information, see https://pre-commit.ci
src/pudl/package_data/ferc1/xbrl_factoid_rate_base_category_tags.csv
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through the relevant parts ofpudl/output/ferc1.py
to wrap my head around the rate base table, and I have some additional comments that don't pertain directly to code edited in this PR. Here are those thoughts:
1. Tags
Is there a place where the term “tag” is more explicitly defined? It doesn't fit into the “tree” metaphor so it’s harder to discern what it is. In theory, tag = a label for something, but that seems pretty vague. It would help to clarify the meaning of "tag" somewhere in the docs. Maybe in _out_ferc1__detailed_tags
or _get_tags
It seems like sometimes you use the term “label” and sometimes you use the term “tag” and I can’t tell if they are supposed to be interchangeable or not.
2. The Tree Metaphor
(Starting line 2051)
Several individual values from the higher level
tables will appear as root nodes at the top of each hierarchy, and the leaves in
the underlying tree structure are the individually reported non-calculated values
that make them up.
The way this passage uses the tree metaphor is confusing. It refers to roots “at the top” and leaves as “underlying structure” — is that an accidental flip-flop? Roots are at the bottom and leaves are at the top. Are “high-level” tables leafy tables? Or are “high-level” tables root tables? The language here is playing tricks on my brain. It would be ideal to use terms that relate to or evoke the part of the tree that each table is referring to. The word “granular” feels like it should be low to the ground like “roots” but I think it's used in reference to the leaves (because it’s the least aggregated data?). But maybe I’m getting that wrong. The whole pipeline seems like a backwards tree…lots of roots that lead to one big chunky tree trunk? Or maybe it’s a few core roots leading to a zillion leaves? The roots-to-leaves metaphor is hard because there are often many of both of them. Is that intended? More clarification would be helpful.
I’m also having trouble grasping the “seed” and “forest” references. What is the difference between a seed and a root? It feels a little too abstracted.
3. Unit Tests
Do you want/plan to write any unit tests for this table?
src/pudl/output/ferc1.py
Outdated
a hierarchical tree structure. Several individual values from the less granular | ||
tables will appear as root nodes at the top of each hierarchy, and the leaves in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe say "at the base of each hierarchy" instead of "top" so it's more "rooty"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "several" and "appear" here are confusing. Which roots we use is intentional, right?
Several individual values from the less granular
tables will appear as root nodes at the top of each hierarchy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea i went back and forth on this one. i think "top of hierarchy" is more analogous to "root of tree" than "base of hierarchy". bc base of a hierarchy implies that its a wide (i.e. granular) base. and the top of a hierarchy is usually one of more single points (see) from which the rest of the hierarchy flows down (towards leaves)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you keep reading below here you'll see that a list of seed nodes can be given to this forest builder, which is the way we intentionally say what root(s) to start building from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's why I was confused about the word "appear" in the section I highlighted. The language portrays something that happens coincidentally when in reality it is planned. But this is too nuanced spend time on rly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they do appear as roots until we "prune" all of the nodes that are not connected to the seed node (this stuff is a little weedsy and is documented in the many class methods
src/pudl/output/ferc1.py
Outdated
calculations encoded in the XBRL Metadata, and that these relationships should have | ||
a hierarchical tree structure. Several individual values from the higher level | ||
a hierarchical tree structure. Several individual values from the less granular | ||
tables will appear as root nodes at the top of each hierarchy, and the leaves in | ||
the underlying tree structure are the individually reported non-calculated values | ||
that make them up. Because the top-level tables have several distinct values in | ||
them, composed of disjunct sets of reported values, we have a forest (a group of | ||
several trees) rather than a single tree. | ||
that make them up (i.e. the most granular values). Because the less granular tables | ||
have several distinct values in them, composed of disjunct sets of reported values, | ||
we have a forest (a group of several trees) rather than a single tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This phrasing is still a little confusing. Are you saying that values from composite "less granular" tables can be both roots and leaves?
The word "underlying" still sounds too "rooty" for a "leaf" could take it out entirely and say something like: "Composite values (root nodes) from less granular tables form the base of the hierarchy from which the most granular calculation components (leaves) arise. Because the less granular tables have several distinct values in them and are composed of a disjunct sets of reported values, our result is a forest (a group of several trees) rather than a single tree."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed that underlying is not a great word here. I don't think your suggestion here is totally correct. A ton of the nodes on the tree are composite values - everything except the leaves are composite notes. and as i mentioned in the other thread i don't think "base of hierarchy" is accurate.
also, a deep re-write of this doc is something we just said on the phone was oos for this pr. my edits rn on this are simply to remove "underlying"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could say "form the base of the tree" instead of the hierarchy. I see why it's weird to say "base of the hierarchy"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this necessarily negates the fact that non-leaf values can be calculated values. But you're right, there's no need to get stuck here. You can keep it as is if you're more comfortable with that!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we did discuss the importance of having a sentence that conveys the relationship between seed/root/leaves/tree/forest relationship (in addition to linking to the networkx docs). Def don't want to re-write all the tree language, just clarify it once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay im going to remove the hierarchy bit altogether. i don't think it adds anything and any other way besides "root" and "less granular" is complicating.
Overview
Closes #3530.
Also closes #3534:
in_rate_base
andrate_base_category
tagsrate_base_category
tagExtra:
Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
make pytest-coverage
(otherwise the merge queue may reject your PR)