-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add business_model, service_type to sales_eia861 PK #2637
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## dev #2637 +/- ##
=====================================
Coverage 86.9% 86.9%
=====================================
Files 84 84
Lines 9720 9720
=====================================
Hits 8447 8447
Misses 1273 1273
☔ View full report in Codecov by Sentry. |
@jdangerx Are there any bad consequences to clobbering the migrations that we should consider, other than other active PRs needing to reset their alembic history? |
I think this will be OK! We're using the migrations primarily to speed up development, not because we have a production database that needs in-place schema changes - so the worst case scenario is that people have to re-generate their databases from scratch again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ran the fast ETL from scratch with the new migration. All looks as expected and the PK is unique. Along with the PK assertion discussed in #2638, would probably be good to add the EIA 861 expected rows into our validations at some point to make it easier to catch changes in the outputs here. This is definitely out of scope of this issue, though!
The `sales_eia861` and `demand_response_eia861` tables each have a handful of duplicate primary keys due to NA values in the `balancing_authority_code_eia` column. Quantify and log the extent of this problem, and consolidate the data in the duplicated records if they constitute less than 0.5% of all records in the table. This check would also have caught the incorrect primary key columns reported in #2636 and fixed in #2637. Because there were so few duplicate records, I decided to just consolidate them all (with a hard limit on the fraction of records that could be consolidated) rather that requiring that the only duplication be due to the BA Code column. Closes #2638
PR Overview
We were using an incomplete set of columns as the primary key in the
sales_eia861
table, but didn't end up with any duplicates because those additional values in those columns were being lost in a reshaping operation. This problem was reported in #2636 by @christiantfong.I've added
business_model
andservice_type
to the PK for this table, resulting in nearly 20,000 additional rows in the table.We should add some kind of check into the
pudl.transform.eia861._tidy_class_dfs()
function to catch this type of error, since it could be present in other EIA-861 tables, and this function is used 20 times... See #2638I clobbered all the alembic migrations because for some reason just changing the PK fields was not resulting in a change to the primary key definition for the table -- instead all it did was make the new PK fields non-nullable -- so I kept getting duplicate primary key errors (since the new PK fields weren't being used).
Closes #2636
PR Checklist
dev
).