Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge EIA861 short-form transform #3660

Merged
merged 45 commits into from
Jun 15, 2024
Merged

Merge EIA861 short-form transform #3660

merged 45 commits into from
Jun 15, 2024

Conversation

zaneselvans
Copy link
Member

Overview

Closes #3540 and replaces @Nancy9ice PR #3565 to get all the CI to run because permissions are hard.

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Edit tasklist title
Beta Give feedback Tasklist To-do list, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. If updating analyses or data processing functions: make sure to update or write data validation tests (e.g., test_minmax_rows())
    Options
  2. Update the release notes: reference the PR and related issues.
    Options
  3. Ensure docs build, unit & integration tests, and test coverage pass locally with make pytest-coverage (otherwise the merge queue may reject your PR)
    Options
  4. Review the PR yourself and call out any questions or issues you have
    Options
  5. For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
    Options
  6. For significant ETL, data coverage or analysis changes, once make pytest-coverage passes, ensure the full ETL runs locally and run data validation tests using make pytest-validate (a ~10 hour run). If you can't run this locally, run the build-deploy-pudl GitHub Action (or ask someone with permissions to). Then, check the logs on the #pudl-deployments Slack channel or gs://builds.catalyst.coop.
    Options
Loading

Nancy9ice and others added 30 commits April 13, 2024 19:18
This reverts commit 574dbbc.

remove index files
@zaneselvans zaneselvans added this pull request to the merge queue Jun 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jun 3, 2024
@zaneselvans zaneselvans added this pull request to the merge queue Jun 3, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 3, 2024
@zaneselvans
Copy link
Member Author

zaneselvans commented Jun 3, 2024

Okay @aesharpe @Nancy9ice it looks like we've got some real integration test failures here:

FAILED test/integration/etl_test.py::test_pudl_engine - pudl.etl.check_foreign_keys.ForeignKeyErrors: Foreign key error for table: core_eia861__yearly_short_form -- core_eia__codes_balancing_authorities (code) -- on rows [    1     2     3 ... 11343 11982 12515]

Foreign key error for table: core_eia861__yearly_short_form -- core_eia__entity_utilities (utility_id_eia) -- on rows [    2     3     4 ... 13013 13014 13015]
FAILED test/integration/output_test.py::test_outputs_by_table_suffix[eia861] - ValueError: Found null column: short_form_eia861.num_water_heaters

@Nancy9ice
Copy link
Contributor

Okay @aesharpe @Nancy9ice it looks like we've got some real integration test failures here:

FAILED test/integration/etl_test.py::test_pudl_engine - pudl.etl.check_foreign_keys.ForeignKeyErrors: Foreign key error for table: core_eia861__yearly_short_form -- core_eia__codes_balancing_authorities (code) -- on rows [    1     2     3 ... 11343 11982 12515]

Foreign key error for table: core_eia861__yearly_short_form -- core_eia__entity_utilities (utility_id_eia) -- on rows [    2     3     4 ... 13013 13014 13015]
FAILED test/integration/output_test.py::test_outputs_by_table_suffix[eia861] - ValueError: Found null column: short_form_eia861.num_water_heaters

Okay, I'll look at it today. Is it the usual make pytest coverage command that I'll run to see if my changes solves this error?

@jdangerx
Copy link
Member

jdangerx commented Jun 7, 2024

Hi Nancy! We've been meeting in person for the last couple of days, thanks for your patience. Yes, make pytest-coverage should reflect what gets run in CI.

@Nancy9ice
Copy link
Contributor

Hi Nancy! We've been meeting in person for the last couple of days, thanks for your patience. Yes, make pytest-coverage should reflect what gets run in CI.

I understand. I've looked at the errors and it's because the values in the utility_id_eia and the balancing_authority_code_eia don't match values in the tables that also have these columns which is a bit odd.

Also, there are null values in the num_water_heater columns. For this one, should I change the NULL values to zero instead?

@zaneselvans
Copy link
Member Author

With the all NULL values for num_water_heaters... is it because that column doesn't exist or contain data in the years of data that are processed in the integration tests? Or is it accidentally getting nulled in the transform somehow? If you run the full ETL locally, is it also entirely NULL? It could also be a column naming error somewhere such that the real data is put into a column that's dropped when the database schema is enforced.

For the FK constraints, we might need to add some additional exceptions to the rules that are used to generate the constraints, since none of the EIA-861 tables are harvested -- i.e. they aren't used as the raw materials to construct the EIA Utility Entity table. I'm not sure where the balancing authority code FK is pointing though, since we don't have an entity table for the BAs. It might be that that column needs to be encoded / cleaned up in the short form table if it's got some bad / non-standard values in it.

I think @cmgosnell is going to pop in and take a look.

@Nancy9ice
Copy link
Contributor

With the all NULL values for num_water_heaters... is it because that column doesn't exist or contain data in the years of data that are processed in the integration tests? Or is it accidentally getting nulled in the transform somehow? If you run the full ETL locally, is it also entirely NULL? It could also be a column naming error somewhere such that the real data is put into a column that's dropped when the database schema is enforced.

For the FK constraints, we might need to add some additional exceptions to the rules that are used to generate the constraints, since none of the EIA-861 tables are harvested -- i.e. they aren't used as the raw materials to construct the EIA Utility Entity table. I'm not sure where the balancing authority code FK is pointing though, since we don't have an entity table for the BAs. It might be that that column needs to be encoded / cleaned up in the short form table if it's got some bad / non-standard values in it.

I think @cmgosnell is going to pop in and take a look.

For the num_water_heaters, if you check the raw data files, they are actually NULL. For example, all the water_heater rows in the 2022 short form table are NULL

@cmgosnell cmgosnell self-assigned this Jun 13, 2024
@cmgosnell
Copy link
Member

hey @Nancy9ice ! thanks for this insight! I went ahead and ran the CI tests locally and got the same results. I also generated this table with all of the years and it looks like this column is always null! So I'm going to add a defensive check in the transform to check if its always null and then remove it from the table.

And the FK problem is very expected with the 861 tables. This is an example of a quirk that don't have great documentation for ;-)

I'll push some changes momentarily and hopefully we'll get this merged in.

@cmgosnell cmgosnell added this pull request to the merge queue Jun 14, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 14, 2024
@zaneselvans zaneselvans added this pull request to the merge queue Jun 15, 2024
Merged via the queue into main with commit baa47ea Jun 15, 2024
12 checks passed
@zaneselvans zaneselvans deleted the dude-wtf branch June 15, 2024 06:09
@zaneselvans zaneselvans mentioned this pull request Jun 16, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia861 Anything having to do with EIA Form 861 new-data Requests for integration of new data.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Transform the EIA-861 short form table
5 participants