Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map PHMSA Natural Gas Transmission Part L columns #3254

Merged
merged 11 commits into from Jan 23, 2024
Merged

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jan 18, 2024

Overview

Closes #3253.

What problem does this address?
Extracts Part L in a raw format for PHMSA transmission data into one table, raw_phmsagas__yearly_miles_of_pipe_by_class_location. This table includes data from 2001-2022.

What did you change?
Mapped columns for the relevant table, and updated the remaining CSVs in src/pudl/package_data/phmsagas.

Testing

How did you make sure this worked? How can a reviewer verify this?
Materialize the raw asset in dagster. Also run the table through the notebook and verify columns are correctly mapped, for additional peace of mind.

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. Update the release notes: reference the PR and related issues.
    Options
  3. Review the PR yourself and call out any questions or issues you have
    Options

@e-belfer e-belfer added new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration labels Jan 18, 2024
@e-belfer e-belfer self-assigned this Jan 18, 2024
Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (b3b1a26) 92.7% compared to head (848cfd4) 92.7%.
Report is 41 commits behind head on main.

Files Patch % Lines
src/pudl/extract/phmsagas.py 44.4% 5 Missing ⚠️
src/pudl/extract/excel.py 50.0% 1 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3254   +/-   ##
=====================================
  Coverage   92.7%   92.7%           
=====================================
  Files        144     144           
  Lines      13087   13091    +4     
=====================================
+ Hits       12128   12134    +6     
+ Misses       959     957    -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@e-belfer e-belfer marked this pull request as draft January 19, 2024 15:55
@e-belfer e-belfer linked an issue Jan 19, 2024 that may be closed by this pull request
@e-belfer e-belfer marked this pull request as ready for review January 19, 2024 15:57
Base automatically changed from phmsa-transmission-f-g to main January 19, 2024 21:05
Copy link
Member

@jdangerx jdangerx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One non-blocking question about standardizing our column name orders.

commodity_group,,,,,,,,,,,,,,,,,,,,,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity,parta5commodity
interstate_or_intrastate,,,,,,,,,,,,,,,,,,,,,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra,inter_intra
report_state,,,,,,,,,,,,stop,stop,stop,stop,stop,stop,stop,stop,stop,state_name,state_name,state_name,state_name,state_name,state_name,state_name,state_name,state_name,state_name,state_name,state_name,state_name
onshore_transmission_pipe_class_1_miles,,,,,,,,,,,,b4ton_1,b4ton_1,b4ton_1,b4ton_1,b4ton_1,b4ton_1,b4ton_1,b4ton_1,b4ton_1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1,partltonc1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are a little inconsistent about the ordering of onshore/offshore and transmission/gathering:

> ag ^onshore_ -c 
yearly_miles_of_pipe_by_class_location.csv:25
yearly_miles_of_transmission_pipe_by_nps.csv:35
yearly_miles_of_gathering_pipe_by_nps.csv:95
> ag ^transmission_onshore -c
yearly_transmission_gathering_summary_by_commodity.csv:1
> ag ^gathering_onshore -c 
yearly_transmission_gathering_summary_by_commodity.csv:4

It appears that "shoreyness" comes before "transmissioniness" in most cases, should we standardize that everywhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the transmission_pipe_by_nps and gathering_pipe_by_nps transmission/gathering are implied, so they're not explicitly spelled out. I can flip the summary_by_commodity usage to more closely reflect the one in class_location though.

@e-belfer e-belfer marked this pull request as draft January 23, 2024 14:05
@e-belfer e-belfer changed the base branch from main to 3243-phmsa-tx-j January 23, 2024 14:52
@e-belfer e-belfer marked this pull request as ready for review January 23, 2024 14:57
Base automatically changed from 3243-phmsa-tx-j to main January 23, 2024 16:27
@e-belfer e-belfer merged commit dfb8800 into main Jan 23, 2024
13 checks passed
@e-belfer e-belfer deleted the phmsa-transmission-l branch January 23, 2024 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Map columns for PHMSA transmission part L
2 participants