Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2020 Harvest and load #1277

Merged
merged 3 commits into from
Oct 10, 2021
Merged

2020 Harvest and load #1277

merged 3 commits into from
Oct 10, 2021

Conversation

bendnorman
Copy link
Member

I made a couple of tweaks to the metadata to get 2020 data loaded into sqlite:

  • Added 2020 to /ferc1_row_maps maps. I'm not entirely sure how the row maps work so I'd love a second pair of eyes on them.
  • Added respondent_frequency to metadata. This is a new column from eia923. I added it to the schemas of boiler_fuel_eia923, generators_eia860 and generation_fuel_eia923.
  • Added empty string enum to fuel_unit. There are a couple of records with empty strings in fuel_ferc1.fuel_unit. Should empty strings be replaced with pd.NA?

The branch does not pass validation or integration tests. Most of the validation issues are minmax_row tests.

Copy link
Member

@zaneselvans zaneselvans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you go about generating the new row numbers for all the ferc1 row maps? Did you look at the f1_row_lit_tbl and see if any of them had changed this year and added new rows to those tables?

src/pudl/metadata/fields.py Show resolved Hide resolved
src/pudl/metadata/resources.py Outdated Show resolved Hide resolved
src/pudl/transform/eia860.py Show resolved Hide resolved
So far as I can tell, whether a plant has to respond annually ore
monthly depends on capacity of the plant, and we know that can change
over time, so I've put the `respondent_frequency` field into the annual
plants entity table (`plants_eia860`).

Note that some of these changes will need to be reconciled with the
`dedupe-metadata` branch when it gets merged in, since where the
metadata is being stored has changed over there.

It looks like we're now running into the fact that there are plant names
/ utilities that haven't been mapped for 2020, as the FERC 1 tables are
failing with foreign key constraint errors.
@zaneselvans
Copy link
Member

With these changes the 2020 branch now gets through harvesting and fails when attempting to load as-of-yet unmapped FERC plants. I've given the IDs to @swinter2011 to map on his train ride home from CO. (See #1069), so I'm gonna merge this in.

@zaneselvans zaneselvans merged commit 9e711eb into 2020 Oct 10, 2021
@zaneselvans zaneselvans deleted the 2020-eia-transform branch October 10, 2021 14:58
@bendnorman
Copy link
Member Author

I used 2019's row numbers for 2020.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants