Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract PHMSA distribution and transmission data from 1970-1989 #3290

Open
Tracked by #2848
e-belfer opened this issue Jan 24, 2024 · 0 comments
Open
Tracked by #2848

Extract PHMSA distribution and transmission data from 1970-1989 #3290

e-belfer opened this issue Jan 24, 2024 · 0 comments
Labels
new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration

Comments

@e-belfer
Copy link
Member

e-belfer commented Jan 24, 2024

From 1970 - 1989, PHMSA distribution and transmission data is reported in one .xls spreadsheet per dataset, with one tab containing multiple years of data. This data was not included in the original round of raw data extraction because of data particularities. These will need to be addressed in order to extract this data:

  • Multi-row column headers with repeated names for the final row of columns. This will make column mapping substantially more work, even though the columns available are roughly similar to 1990 data, with slightly fewer fields.
  • Tabs correspond to years rather than pages, with no breakdown between table parts. Each tab has a different version of the form. This will probably require some level of adaptation of our existing extraction infrastructure, which expects one sheet per year with tabs sorted by form section.

Potentially, we will want to extract this data into separate raw_phmsagas__transmission_1970_1979, raw_phmsagas__transmission_1980_1981 etc. tables and then split and concatenate them to the other tables during processing.

@e-belfer e-belfer added new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration labels Jan 24, 2024
@e-belfer e-belfer changed the title Extract PHMSA data from 1970-1989 Extract PHMSA distribution and transmission data from 1970-1989 Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-data Requests for integration of new data. phmsa Data from the Pipeline and Hazardous Material Safety Administration
Projects
Status: Icebox
Development

No branches or pull requests

1 participant