Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add business_model, service_type to sales_eia861 PK #2637

Merged
merged 4 commits into from
Jun 8, 2023

Conversation

zaneselvans
Copy link
Member

@zaneselvans zaneselvans commented Jun 6, 2023

PR Overview

We were using an incomplete set of columns as the primary key in the sales_eia861 table, but didn't end up with any duplicates because those additional values in those columns were being lost in a reshaping operation. This problem was reported in #2636 by @christiantfong.

I've added business_model and service_type to the PK for this table, resulting in nearly 20,000 additional rows in the table.

We should add some kind of check into the pudl.transform.eia861._tidy_class_dfs() function to catch this type of error, since it could be present in other EIA-861 tables, and this function is used 20 times... See #2638

I clobbered all the alembic migrations because for some reason just changing the PK fields was not resulting in a change to the primary key definition for the table -- instead all it did was make the new PK fields non-nullable -- so I kept getting duplicate primary key errors (since the new PK fields weren't being used).

Closes #2636

PR Checklist

  • Merge the most recent version of the branch you are merging into (probably dev).
  • All CI checks are passing. Run tests locally to debug failures
  • Make sure you've included good docstrings.
  • For major data coverage & analysis changes, run data validation tests
  • Include unit tests for new functions and classes.
  • Defensive data quality/sanity checks in analyses & data processing functions.
  • Update the release notes and reference reference the PR and related issues.
  • Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

@zaneselvans zaneselvans added the eia861 Anything having to do with EIA Form 861 label Jun 6, 2023
@zaneselvans zaneselvans linked an issue Jun 6, 2023 that may be closed by this pull request
@zaneselvans zaneselvans added this to the 2023 Spring milestone Jun 6, 2023
@codecov
Copy link

codecov bot commented Jun 7, 2023

Codecov Report

Patch coverage: 100.0% and no project coverage change.

Comparison is base (e2f29aa) 86.9% compared to head (fe395c9) 86.9%.

Additional details and impacted files
@@          Coverage Diff          @@
##             dev   #2637   +/-   ##
=====================================
  Coverage   86.9%   86.9%           
=====================================
  Files         84      84           
  Lines       9720    9720           
=====================================
  Hits        8447    8447           
  Misses      1273    1273           
Impacted Files Coverage Δ
src/pudl/metadata/resources/eia861.py 100.0% <ø> (ø)
src/pudl/transform/eia861.py 97.3% <100.0%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@e-belfer e-belfer self-requested a review June 7, 2023 15:43
@e-belfer
Copy link
Member

e-belfer commented Jun 7, 2023

@jdangerx Are there any bad consequences to clobbering the migrations that we should consider, other than other active PRs needing to reset their alembic history?

@jdangerx
Copy link
Member

jdangerx commented Jun 8, 2023

I think this will be OK! We're using the migrations primarily to speed up development, not because we have a production database that needs in-place schema changes - so the worst case scenario is that people have to re-generate their databases from scratch again.

Copy link
Member

@e-belfer e-belfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-ran the fast ETL from scratch with the new migration. All looks as expected and the PK is unique. Along with the PK assertion discussed in #2638, would probably be good to add the EIA 861 expected rows into our validations at some point to make it easier to catch changes in the outputs here. This is definitely out of scope of this issue, though!

@zaneselvans zaneselvans merged commit d8311ac into dev Jun 8, 2023
10 checks passed
zaneselvans added a commit that referenced this pull request Jun 8, 2023
The `sales_eia861` and `demand_response_eia861` tables each have a
handful of duplicate primary keys due to NA values in the
`balancing_authority_code_eia` column. Quantify and log the extent of
this problem, and consolidate the data in the duplicated records if they
constitute less than 0.5% of all records in the table.

This check would also have caught the incorrect primary key columns
reported in #2636 and fixed in #2637.

Because there were so few duplicate records, I decided to just
consolidate them all (with a hard limit on the fraction of records that
could be consolidated) rather that requiring that the only duplication
be due to the BA Code column.

Closes #2638
@zaneselvans zaneselvans deleted the fix-sales-eia861-pk branch September 12, 2023 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
eia861 Anything having to do with EIA Form 861
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Missing delivery utilities in sales_eia861
3 participants