Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate EIA-923 Annual Environmental Information (Schedule 8C) spreadsheet maps #2447

Merged
merged 13 commits into from
Mar 28, 2023

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Mar 23, 2023

PR Overview

A draft of spreadhseet mapping metadata for the EIA-923 Schedule 8C, which is about emissions control equipment. Based on @grgmiller's PR #1950.

See notes on issue #2448

PR Checklist

@zaneselvans zaneselvans marked this pull request as draft March 23, 2023 19:28
@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Patch coverage: 100.0% and no project coverage change.

Comparison is base (76a986f) 86.7% compared to head (abab544) 86.7%.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #2447   +/-   ##
=====================================
  Coverage   86.7%   86.7%           
=====================================
  Files         81      81           
  Lines       9438    9438           
=====================================
  Hits        8183    8183           
  Misses      1255    1255           
Impacted Files Coverage Δ
src/pudl/metadata/fields.py 100.0% <ø> (ø)
src/pudl/extract/eia923.py 100.0% <100.0%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@zaneselvans zaneselvans changed the title Eia923 schedule8 Integrate EIA-923 Schedule 8 spreadsheet maps Mar 24, 2023
@zaneselvans zaneselvans added the eia923 Anything having to do with EIA Form 923 label Mar 24, 2023
@zaneselvans zaneselvans linked an issue Mar 24, 2023 that may be closed by this pull request
@zaneselvans zaneselvans changed the title Integrate EIA-923 Schedule 8 spreadsheet maps Integrate EIA-923 Annual Environmental Information (Schedule 8C) spreadsheet maps Mar 24, 2023
@zaneselvans zaneselvans marked this pull request as ready for review March 24, 2023 00:39
@zaneselvans zaneselvans self-assigned this Mar 24, 2023
@zaneselvans zaneselvans added this to the 2023Q1 milestone Mar 24, 2023
@grgmiller
Copy link
Collaborator

@zaneselvans as you work on integrating emission control data, one thing to be aware of is that the raw EIA-923 data contains incomplete _control_id mapping. For example, some data rows that contain non-missing so2 control efficiencies have a missing so2_control_id. In OGE, we address this by filling in the missing control id with the control id for another pollutant (see: https://github.com/singularity-energy/open-grid-emissions/blob/bb0c0329c54b25cfc0e0c550ebf242bb7930f733/src/emissions.py#L1058) but there may be a better way to address this on the pudl side.

@zaneselvans zaneselvans requested review from e-belfer and removed request for cmgosnell March 24, 2023 16:18
@zaneselvans zaneselvans added the new-data Requests for integration of new data. label Mar 24, 2023
Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running into the same 2018 file error that you were seeing earlier when materializing the raw assets in dagster:

The above exception occurred during handling of the following exception:
KeyError: "No resources found for eia923: {'name': 'EIA923_Schedule_8_Annual_Environmental_Information_2018_Final.xlsx'}"

For me, oddly, the 2018 file dagster just grabbed from 10.5281-zenodo.7236677 is 3.2mb, similar in size to the other years. But I get an error when I try to open it locally, suggesting some kind of file error?

The rest is just suggestions about naming conventions.

@@ -1762,6 +1766,10 @@
"type": "boolean",
"description": "Is the reporting entity an owner of power plants reported on Schedule 2 of the form?",
},
"pm_control_id_eia": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about using 'particulate' instead of 'pm' in the 860 PRs to avoid prime mover confusion. I think we should be consistent and name this particulate_control_id_eia, which is what is currently used in the 860 extract tables (but not yet present in fields.py as we haven't transformed them yet).

report_year,year,year,year,year,year,year,year,year,year,year
plant_id_eia,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id,plant_id
environmental_equipment_name,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type,name_of_environmental_equipment_or_technology_type
pm_control_id_eia,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id,pm_control_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above re: naming.

hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service,hours_in_service
annual_nox_emission_rate_lb_per_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu,nox_emission_rate_entire_year_lbs_mmbtu
ozone_season_nox_emission_rate_lb_per_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu,nox_emission_rate_may_through_september_lbs_mmbtu
pm_emission_rate_lb_per_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu,pm_emissions_rate_lbs_mmbtu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes here and for all the records below

so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date,so2_test_date
fgd_sorbent_consumption_1000_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons,fgd_sorbent_quantity_thousand_tons
fgd_electricity_consumption_mwh,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours,fgd_electricity_consumption_megawatthours
hg_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency,mercury_removal_efficiency
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have consistency and use either mercury or hg in our codes. So far, we've been using mercury (e.g. compliance_year_mercury).

@zaneselvans
Copy link
Member Author

Ah yeah you're right I should update all the names. I just took the ones that were here already.

Weirdly, my issue with the 2018 data magically disappeared! I thought it must have been something screwy with my local setup, but if you're getting it now too then I don't know what's going on!

@e-belfer
Copy link
Member

One more weirdness while I'm here! I think the 923 archives aren't a part of the Catalyst organization repo on Zenodo. If I trace the doi for the record I find them here:
https://zenodo.org/record/7682417

But they don't show up in this list: https://zenodo.org/communities/catalyst-cooperative/search?page=1&size=20#

Is there a reason for this, or is this just an oversight? If so, I can make a separate issue.

@zaneselvans
Copy link
Member Author

No, there's no reason it's not in the community except that adding it to the community has to be done manually, rather than happening automatically when we create a new archive. See catalyst-cooperative/pudl-archiver#76

I went ahead and added it!

Also updated the naming to match what we did in EIA-860 EnviroEquip.

@e-belfer
Copy link
Member

e-belfer commented Mar 27, 2023

Re the 2018 data: In the issue you mentioned that using the most recent archive would require more retooling. Is this because we'd need to either update the data source for PUDL across the board (and probably break some other stuff) or write in a one-off exception for the source for this archive? I'd assumed that PUDL grabs the most recent archive by default, but maybe that's not the case? I'm a bit hesitant to merge in an extraction function with the extraction function commented out.

Deleting the raw data and redownloading from Zenodo didn't fix the issue. Did you change anything else in your local environment to fix the issue?

Otherwise running this for 2012-2017 on my local worked great.

@zaneselvans
Copy link
Member Author

Which archive is used is hard-coded in pudl.workspace.datastore. Basically every update for every data source breaks the ETL because FERC & EIA are not even attempting to be consistent, so we can only update to new archives by hand. For instance, when I tried to use the more recent EIA-923 achive, I discovered that they've dropped the FIPS ID column for coalmines in the Fuel Receipts and Costs table (just between the "final" release of 2021 data and the revisions to that "final" data that have been made since October). There is no way to include data from more than one archive.

One thing that I have been messing around with locally is using Python 3.11. So maybe this problem is magically fixed in 3.11?! Seems unlikely.

I think it would be okay (though definitely not ideal) to merge this in now with the emissions_control page commented out in the extraction just to get the metadata into the system and not have it be dangling in a limbo-PR. I suspect that the next time we update the eia923 DOI it'll work just fine with no weirdness.

Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the rollback of the commenting out for now, this looks good to merge. I'd suggest leaving a comment in-line both in the assets and in the extraction so nobody's confused about why we have an empty asset, though!

@zaneselvans
Copy link
Member Author

I've commented out the emissions_control asset, but I'm not sure why this would prevent it from attempting to extract the emissions_control page. But I also can't make the page fail locally any more so it's hard for me to test what's going on.

Where are the 2 places (assets + extraction) that you're thinking of here?

Maybe I need to add emissions_control to the blacklisted_pages in the EIA923 excel extractor?

Base automatically changed from dagster-eia861 to dev March 27, 2023 19:14
@e-belfer
Copy link
Member

Adding emissions_control to the blacklist did the trick. Should be good to go!

@zaneselvans
Copy link
Member Author

Ah okay great! Thank you! I'll merge once the CI passes. Such a weird issue.

@zaneselvans zaneselvans merged commit b14a8c1 into dev Mar 28, 2023
@zaneselvans zaneselvans deleted the eia923-schedule8 branch March 28, 2023 00:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia923 Anything having to do with EIA Form 923 new-data Requests for integration of new data.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate EIA-923 Annual Environmental Information (Schedule 8)
3 participants