Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract raw tables for PHMSA transmission data Part F & G #3242

Merged
merged 13 commits into from Jan 19, 2024

Conversation

e-belfer
Copy link
Member

@e-belfer e-belfer commented Jan 16, 2024

Overview

Closes #3241.

What problem does this address?

Extracts Parts F and G in a raw format for PHMSA transmission data into one table, raw_phmsagas__yearly_inspections_and_assessments. This table includes data from 2010-2022.

What did you change?

Mapped columns for the relevant table, and updated the remaining CSVs in src/pudl/package_data/phmsagas.

Testing

How did you make sure this worked? How can a reviewer verify this?

Materialize the raw asset in dagster. Also run the table through the notebook and verify columns are correctly mapped, for additional peace of mind.

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. Update the release notes: reference the PR and related issues.
    Options
  3. Review the PR yourself and call out any questions or issues you have
    Options

@e-belfer e-belfer added new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration labels Jan 16, 2024
@e-belfer e-belfer self-assigned this Jan 16, 2024
@e-belfer e-belfer linked an issue Jan 16, 2024 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@e-belfer e-belfer changed the base branch from main to phmsa-extractor January 16, 2024 16:00
@@ -3,3 +3,4 @@ yearly_distribution,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
yearly_transmission_gathering_summary_by_commodity,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
yearly_miles_of_transmission_pipe_by_nps,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,2,2,2,2,2,2,2,2,2,2,2
yearly_miles_of_gathering_pipe_by_nps,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,3
yearly_inspections_and_assessments,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be 0 or -1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC -1 is a sentinel value that means that page doesn't exist in that year. Like NA. I'm not sure what 0 would mean.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yes in looking back at other datasets I'm remembering that's why I did this! Hopefully that answers your question @jdangerx

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (27dab3e) 0.0% compared to head (b3b1a26) 92.7%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##           main   #3242      +/-   ##
=======================================
+ Coverage      0   92.7%   +92.7%     
=======================================
  Files         0     144     +144     
  Lines         0   13087   +13087     
=======================================
+ Hits          0   12128   +12128     
- Misses        0     959     +959     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@e-belfer e-belfer marked this pull request as draft January 17, 2024 18:40
@e-belfer e-belfer removed the request for review from jdangerx January 18, 2024 14:23
@e-belfer e-belfer marked this pull request as ready for review January 18, 2024 16:15
@e-belfer e-belfer marked this pull request as draft January 19, 2024 15:41
@e-belfer e-belfer marked this pull request as ready for review January 19, 2024 15:50
@e-belfer e-belfer marked this pull request as draft January 19, 2024 15:51
@e-belfer e-belfer marked this pull request as ready for review January 19, 2024 15:55
Base automatically changed from phmsa-extractor to main January 19, 2024 17:18
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are almost certainly going to want to rename some of the columns in the process of actually transforming the data but this looks like good enough column names to me!

the only real request for change in here is to remove the . in the column name.

conditions_repaired_one_year_conditions_hca_segment_in_line,,,,,,,,,,,,,,,,,,,,,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2
conditions_repaired_monitored_conditions_hca_segment_in_line,,,,,,,,,,,,,,,,,,,,,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3
conditions_repaired_other_scheduled_conditions_hca_segment_in_line,,,,,,,,,,,,,,,,,,,,,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4
conditions_repaired_192.710_segment_in_line,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,partf2d,partf2d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommned conditions_repaired_192_710_segment_in_line instead of the 192.710 bc the .could junk up the df.column.name

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I meant to do this and clearly missed it, will do.

@e-belfer e-belfer marked this pull request as draft January 19, 2024 17:57
@e-belfer e-belfer marked this pull request as ready for review January 19, 2024 18:34
@e-belfer e-belfer merged commit 72ce726 into main Jan 19, 2024
13 checks passed
@e-belfer e-belfer deleted the phmsa-transmission-f-g branch January 19, 2024 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Extract raw PHMSA transmission data for Tables F-G
3 participants