-
-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate v0.5.0 #1345
Validate v0.5.0 #1345
Conversation
* Updated the expected row counts for EIA, FERC, and MCOE output tables. * Added gfn_eia923 (nuclear generation fuel) to outputs with expected row counts * Added all_plants_ferc1 to outputs with expected row counts * Added bf_eia923, gf_eia923, and gfn_eia923 to outputs expected to have unique primary keys (based on work we've done to clean those tables up) * Changed aggregation of gfn_eia923 (monthly/annual) such that it includes the `nuclear_unit_id` field in the groupby. * Alphabetized all of the lists of tables so that when we change them it's clearer which thing has been changed. * Updated Datasette DB descriptions to include the old and new years that have been added. * Un-hid some hidden tables from Datasette DB description since they are more informative now, with the coding setup we've got.
Codecov Report
@@ Coverage Diff @@
## dev #1345 +/- ##
==========================================
+ Coverage 83.23% 83.29% +0.06%
==========================================
Files 62 62
Lines 6731 6737 +6
==========================================
+ Hits 5602 5611 +9
+ Misses 1129 1126 -3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o/s of the # of rows, it looks like you alphabetized all of the output tables in the tests, added the nuclear gf tables and added the if nuclear
in the gf output functions... It is hard to tell which tables you changed the PKs for but I looked through all of the ones you have here now and they all look reasonable. I made a comment/potentially suggestion for the future but it all looks reasonable enough for now.
("plants_eia860", "all"), | ||
("utils_eia860", "all"), | ||
("pu_eia860", "all"), | ||
("bf_eia923", "all"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these just reorganized to get whatever failure you were getting to show up early?? or are they all now just alphabetized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I alphabetized these lists, and also added new output tables to them, like gfn_eia923
and all_plants_ferc1
"report_year", | ||
"utility_id_ferc1", | ||
"plant_name_ferc1", | ||
"capacity_mw" # Why does having capacity here make sense??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume there are duplicates based on the "report_year", "utility_id_ferc1", "plant_name_ferc1"
and the cpacity distinguishes them... this makes me wonder why we are compiling all of these lists of columns that are in effect the PKs of the tables. Why wouldn't we just pull these directly from the metadata. And adjust when necessary bc obviously these are not all analogous to db tables.
The intention of this comment is to maybe lead to an issue for a future enhancement - not to lead to changes in this PR/release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we didn't have many of the primary keys actually encoded. Yes, now that we've got them available in the metadata, I think this can be done automatically (for the EIA stuff, not for FERC 1 which is still keyless).
In these comments I'm questioning whether checking for uniqueness even makes sense -- these tables don't have natural primary keys, and I don't know what exactly the argument is for using capacity to fake one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good to me. Thank you for catching the missing code for the gfn table.
There are several changes here beyond the number of rows that we expect to see in each table which I would like to get some other eyeballs on quickly, including addition of new output tables and the way that we are looking for primary keys that uniquely identify records in the aggregated output tables.