-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate EIA-923 Annual Environmental Information (Schedule 8C) spreadsheet maps #2447
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #2447 +/- ##
=====================================
Coverage 86.7% 86.7%
=====================================
Files 81 81
Lines 9438 9438
=====================================
Hits 8183 8183
Misses 1255 1255
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
@zaneselvans as you work on integrating emission control data, one thing to be aware of is that the raw EIA-923 data contains incomplete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running into the same 2018 file error that you were seeing earlier when materializing the raw assets in dagster:
The above exception occurred during handling of the following exception:
KeyError: "No resources found for eia923: {'name': 'EIA923_Schedule_8_Annual_Environmental_Information_2018_Final.xlsx'}"
For me, oddly, the 2018 file dagster just grabbed from 10.5281-zenodo.7236677 is 3.2mb, similar in size to the other years. But I get an error when I try to open it locally, suggesting some kind of file error?
The rest is just suggestions about naming conventions.
src/pudl/metadata/fields.py
Outdated
@@ -1762,6 +1766,10 @@ | |||
"type": "boolean", | |||
"description": "Is the reporting entity an owner of power plants reported on Schedule 2 of the form?", | |||
}, | |||
"pm_control_id_eia": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about using 'particulate' instead of 'pm' in the 860 PRs to avoid prime mover confusion. I think we should be consistent and name this particulate_control_id_eia
, which is what is currently used in the 860 extract tables (but not yet present in fields.py
as we haven't transformed them yet).
report_year,year,year,year,year,year,year,year,year,year,year | ||
plant_id_eia,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id | ||
environmental_equipment_name,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type | ||
pm_control_id_eia,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above re: naming.
hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service | ||
annual_nox_emission_rate_lb_per_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu | ||
ozone_season_nox_emission_rate_lb_per_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu | ||
pm_emission_rate_lb_per_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes here and for all the records below
so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date | ||
fgd_sorbent_consumption_1000_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons | ||
fgd_electricity_consumption_mwh,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours | ||
hg_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have consistency and use either mercury or hg in our codes. So far, we've been using mercury (e.g. compliance_year_mercury
).
Ah yeah you're right I should update all the names. I just took the ones that were here already. Weirdly, my issue with the 2018 data magically disappeared! I thought it must have been something screwy with my local setup, but if you're getting it now too then I don't know what's going on! |
One more weirdness while I'm here! I think the 923 archives aren't a part of the Catalyst organization repo on Zenodo. If I trace the doi for the record I find them here: But they don't show up in this list: https://zenodo.org/communities/catalyst-cooperative/search?page=1&size=20# Is there a reason for this, or is this just an oversight? If so, I can make a separate issue. |
No, there's no reason it's not in the community except that adding it to the community has to be done manually, rather than happening automatically when we create a new archive. See catalyst-cooperative/pudl-archiver#76 I went ahead and added it! Also updated the naming to match what we did in EIA-860 EnviroEquip. |
Re the 2018 data: In the issue you mentioned that using the most recent archive would require more retooling. Is this because we'd need to either update the data source for PUDL across the board (and probably break some other stuff) or write in a one-off exception for the source for this archive? I'd assumed that PUDL grabs the most recent archive by default, but maybe that's not the case? I'm a bit hesitant to merge in an extraction function with the extraction function commented out. Deleting the raw data and redownloading from Zenodo didn't fix the issue. Did you change anything else in your local environment to fix the issue? Otherwise running this for 2012-2017 on my local worked great. |
Which archive is used is hard-coded in One thing that I have been messing around with locally is using Python 3.11. So maybe this problem is magically fixed in 3.11?! Seems unlikely. I think it would be okay (though definitely not ideal) to merge this in now with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the rollback of the commenting out for now, this looks good to merge. I'd suggest leaving a comment in-line both in the assets and in the extraction so nobody's confused about why we have an empty asset, though!
I've commented out the Where are the 2 places (assets + extraction) that you're thinking of here? Maybe I need to add |
Adding |
Ah okay great! Thank you! I'll merge once the CI passes. Such a weird issue. |
PR Overview
A draft of spreadhseet mapping metadata for the EIA-923 Schedule 8C, which is about emissions control equipment. Based on @grgmiller's PR #1950.
See notes on issue #2448
PR Checklist
dev
).