Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map PHMSA Natural Gas Transmission Part M columns #3270

Merged
merged 8 commits into from Jan 24, 2024
Merged

Conversation

cmgosnell
Copy link
Member

@cmgosnell cmgosnell commented Jan 19, 2024

Overview

Closes #3263.

What problem does this address?

What did you change?

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Make sure full ETL runs & make pytest-integration-full passes locally
    Options
  2. Review the PR yourself and call out any questions or issues you have
    Options
  3. For major data coverage & analysis changes, run data validation tests
    Options
  4. If updating analyses or data processing functions: make sure to update or write data validation tests
    Options
  5. Update the release notes: reference the PR and related issues.
    Options

@e-belfer e-belfer changed the title Map PHMSA Natural Gas Transmission Part T columns Map PHMSA Natural Gas Transmission Part M columns Jan 19, 2024
@cmgosnell cmgosnell linked an issue Jan 19, 2024 that may be closed by this pull request
@cmgosnell cmgosnell marked this pull request as ready for review January 19, 2024 22:02
Base automatically changed from phmsa-transmission-k to main January 19, 2024 23:36
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4528907) 92.7% compared to head (d3a6f45) 92.7%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@          Coverage Diff          @@
##            main   #3270   +/-   ##
=====================================
  Coverage   92.7%   92.7%           
=====================================
  Files        144     144           
  Lines      13091   13091           
=====================================
  Hits       12134   12134           
  Misses       957     957           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@e-belfer e-belfer added new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration labels Jan 22, 2024
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we want to call the main table yearly_transmission_failures_leaks_repairs?
  • Columns in this table definitely go back all the way to 1990. See Part C/D/E in the 2005 form and the 1988 form as well.
  • Just a note that I've been using either other or class_3_4_only to refer to nonhca_nonmca sections. See the advantages of both but we should pick one to use consistently. Otherwise column names look good.

@cmgosnell
Copy link
Member Author

cmgosnell commented Jan 23, 2024

@e-belfer do you know that class 3 & 4 are always non-mca/non-hca? Or is it a square and rectangle thing? I tried to read about it and couldn't find a clear a == b although they are clearly related - in that they are about how important/regulated the land is where the infrastructure is located. But this part M has class 1 & 2/non-mca/non-hca.

image

@e-belfer
Copy link
Member

e-belfer commented Jan 23, 2024

@e-belfer do you know that class 3 & 4 are always non-mca/non-hca? Or is it a square and rectangle thing? I tried to read about it and couldn't find a clear a == b although they are clearly related - in that they are about how important/regulated the land is where the infrastructure is located. But this part M has class 1 & 2/non-mca/non-hca.

This has almost certainly changed overtime, but more recently it seems like operators can either 1) blanket define all Class 3&4 locations as HCAs, or use a formula provided by PHMSA (which would explain the existence of non-MCA/HCA Class 3&4 locations. Class 1 & 2 locations could also be HCAs or MCAs if they meet the criteria (which I think essentially boils down to "many human lives but slightly further away from the pipeline than would the criteria for a Class 3/4". (Source is p.8 here: https://www.phmsa.dot.gov/sites/phmsa.dot.gov/files/docs/technical-resources/pipeline/gas-transmission-integrity-management/62271/faqsgas-transmission-integrity-management201904.pdf)

@cmgosnell
Copy link
Member Author

it seems like there is mostly overlap but I am starting on part Q now and it has all classes 1-4 associated with in HCA, in MCA or not in HCA or MCA. So I don't think it is a good idea to assume there is a perfect overlap.

@cmgosnell
Copy link
Member Author

or @e-belfer is your suggestion that we use other in place of my vv long nonhca_nonmca? that feels good and fine to me. conflating the classes and high/mid consequence areas feels less clear to me.

@e-belfer
Copy link
Member

or @e-belfer is your suggestion that we use other in place of my vv long nonhca_nonmca? that feels good and fine to me. conflating the classes and high/mid consequence areas feels less clear to me.

Yes using "other" instead of "nonhca_nonmca" is my suggestion, I would definitely not conflate the classes and MCA/HCA.

Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One column mapping I would like you to fix, a question re doe, and a response to your question. Otherwise thank you for handling this giant table, the asset generates in dagster as expected.

@e-belfer e-belfer self-requested a review January 24, 2024 15:36
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! One question I'll punt to #3277

report_year,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,yr,report_year,report_year,report_year,report_year,report_year,report_year,report_year,report_year,report_year,report_year,report_year,report_year,report_year
filing_date,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,doe,,,,,,,,,,,,,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't keep filing_date in the other tables other than A-D, I believe. But I also see an argument for keeping it. Let's merge this for now and take this up in #3277, as it'll affect all the tables we've integrated up until now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm yea good point! but agreed let's punt it. I'll add a quick comment over there about all of the date fields.

@cmgosnell cmgosnell merged commit 08c757e into main Jan 24, 2024
12 checks passed
@cmgosnell cmgosnell deleted the phmsa-transmission-m branch January 24, 2024 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Map columns for PHMSA transmission part M
2 participants