-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PHMSA gas extract step for transmission part k #3258
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part K: materialized the table successfully and it has all the rows I'd expect.
One blocking issue re what tables we're dropping columns for - should be transmission only.
One additional non-blocking suggestion: you've helpfully set up the column names to get parsed and split into categoricals, and maybe you want to use nonsteel
instead of non_steel
to make this even simpler?
src/pudl/extract/phmsagas.py
Outdated
) | ||
df = df.drop(columns=to_drop, errors="ignore") | ||
return df | ||
if int(partition["year"]) < 2010: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also only true for transmission and not distribution data, so we want to filter by form
as well. We should update the docstring to clarify. Otherwise this is super helpful and catches some columns I've missed in earlier mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a great suggestion! will do for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e-belfer is it only page != "yearly_distribution"
that we should avoid for this?
i was going to add and "_transmission_" in page
here but yearly_inspections_and_assessments
is a part of the transmission zip but it just doesn't look like it is included in old years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could also look it up in the page->form map. that feels more accurate if not a little more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that table should probably get renamed to include transmission in it, honestly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the way you've done it is more robust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for cleaning things up!
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## phmsa-transmission-f-g #3258 +/- ##
========================================================
- Coverage 92.7% 92.6% -0.0%
========================================================
Files 144 144
Lines 13087 13091 +4
========================================================
Hits 12128 12128
- Misses 959 963 +4 ☔ View full report in Codecov by Sentry. |
Overview
Closes #3248.
What problem does this address?
we didn't have the mapping files for Part K!
What did you change?
Tasks
data_label
bc that was not being used anywhere and was probably resulting in a million warnings about a missing column 😬Testing
How did you make sure this worked? How can a reviewer verify this?
To-do list
make pytest-integration-full
passes locally