Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PHMSA gas extract step for transmission part k #3258

Merged
merged 8 commits into from
Jan 19, 2024
Merged

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jan 19, 2024

Overview

Closes #3248.

What problem does this address?
we didn't have the mapping files for Part K!

What did you change?

Tasks

Edit tasklist title
Beta Give feedback Tasklist Tasks, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. add all the mapping files for part k 2010-present.
    Options
  2. i also removed the data_label bc that was not being used anywhere and was probably resulting in a million warnings about a missing column 😬
    Options
  3. add old times - 2009 (no old times part k that was easy)
    Options
  4. add in a notebook extractor into the devtools notebook
    Options
  5. make the extractor warnings quiet/appropriate for this old year split up into new format thing.
    Options

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. Update the release notes: reference the PR and related issues.
    Options
  3. Review the PR yourself and call out any questions or issues you have
    Options

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cmgosnell cmgosnell marked this pull request as ready for review January 19, 2024 17:27
@e-belfer e-belfer linked an issue Jan 19, 2024 that may be closed by this pull request
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part K: materialized the table successfully and it has all the rows I'd expect.

One blocking issue re what tables we're dropping columns for - should be transmission only.

One additional non-blocking suggestion: you've helpfully set up the column names to get parsed and split into categoricals, and maybe you want to use nonsteel instead of non_steel to make this even simpler?

)
df = df.drop(columns=to_drop, errors="ignore")
return df
if int(partition["year"]) < 2010:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also only true for transmission and not distribution data, so we want to filter by form as well. We should update the docstring to clarify. Otherwise this is super helpful and catches some columns I've missed in earlier mappings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great suggestion! will do for sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e-belfer is it only page != "yearly_distribution" that we should avoid for this?

i was going to add and "_transmission_" in page here but yearly_inspections_and_assessments is a part of the transmission zip but it just doesn't look like it is included in old years.

Copy link
Member Author

@cmgosnell cmgosnell Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also look it up in the page->form map. that feels more accurate if not a little more complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that table should probably get renamed to include transmission in it, honestly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the way you've done it is more robust

Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for cleaning things up!

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (b3b1a26) 92.7% compared to head (7ba0d0d) 92.6%.
Report is 5 commits behind head on phmsa-transmission-f-g.

Files Patch % Lines
src/pudl/extract/phmsagas.py 44.4% 5 Missing ⚠️
src/pudl/extract/excel.py 50.0% 1 Missing ⚠️
Additional details and impacted files
@@                   Coverage Diff                    @@
##           phmsa-transmission-f-g   #3258     +/-   ##
========================================================
- Coverage                    92.7%   92.6%   -0.0%     
========================================================
  Files                         144     144             
  Lines                       13087   13091      +4     
========================================================
  Hits                        12128   12128             
- Misses                        959     963      +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Base automatically changed from phmsa-transmission-f-g to main January 19, 2024 21:05
@cmgosnell cmgosnell merged commit a996d30 into main Jan 19, 2024
13 checks passed
@cmgosnell cmgosnell deleted the phmsa-transmission-k branch January 19, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Map columns for transmission K
2 participants