New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract raw tables for PHMSA transmission data Part F & G #3242
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -3,3 +3,4 @@ yearly_distribution,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, | |||
yearly_transmission_gathering_summary_by_commodity,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 | |||
yearly_miles_of_transmission_pipe_by_nps,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,2,2,2,2,2,2,2,2,2,2,2 | |||
yearly_miles_of_gathering_pipe_by_nps,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,3,3,3,3,3,3,3,3,3,3,3 | |||
yearly_inspections_and_assessments,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 0 or -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC -1 is a sentinel value that means that page doesn't exist in that year. Like NA. I'm not sure what 0 would mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yes in looking back at other datasets I'm remembering that's why I did this! Hopefully that answers your question @jdangerx
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3242 +/- ##
=======================================
+ Coverage 0 92.7% +92.7%
=======================================
Files 0 144 +144
Lines 0 13087 +13087
=======================================
+ Hits 0 12128 +12128
- Misses 0 959 +959 ☔ View full report in Codecov by Sentry. |
…, clarify docstrings" This reverts commit bb5b3e6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are almost certainly going to want to rename some of the columns in the process of actually transforming the data but this looks like good enough column names to me!
the only real request for change in here is to remove the .
in the column name.
src/pudl/package_data/phmsagas/column_maps/yearly_inspections_and_assessments.csv
Outdated
Show resolved
Hide resolved
conditions_repaired_one_year_conditions_hca_segment_in_line,,,,,,,,,,,,,,,,,,,,,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2,partf2c2 | ||
conditions_repaired_monitored_conditions_hca_segment_in_line,,,,,,,,,,,,,,,,,,,,,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3,partf2c3 | ||
conditions_repaired_other_scheduled_conditions_hca_segment_in_line,,,,,,,,,,,,,,,,,,,,,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4,partf2c4 | ||
conditions_repaired_192.710_segment_in_line,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,partf2d,partf2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommned conditions_repaired_192_710_segment_in_line
instead of the 192.710
bc the .
could junk up the df.column.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I meant to do this and clearly missed it, will do.
Overview
Closes #3241.
What problem does this address?
Extracts Parts F and G in a raw format for PHMSA transmission data into one table,
raw_phmsagas__yearly_inspections_and_assessments
. This table includes data from 2010-2022.What did you change?
Mapped columns for the relevant table, and updated the remaining CSVs in
src/pudl/package_data/phmsagas
.Testing
How did you make sure this worked? How can a reviewer verify this?
Materialize the raw asset in dagster. Also run the table through the notebook and verify columns are correctly mapped, for additional peace of mind.
To-do list
make pytest-integration-full
passes locally