Validate v0.5.0 #1345

zaneselvans · 2021-11-11T18:04:43Z

There are several changes here beyond the number of rows that we expect to see in each table which I would like to get some other eyeballs on quickly, including addition of new output tables and the way that we are looking for primary keys that uniquely identify records in the aggregated output tables.

* Updated the expected row counts for EIA, FERC, and MCOE output tables. * Added gfn_eia923 (nuclear generation fuel) to outputs with expected row counts * Added all_plants_ferc1 to outputs with expected row counts * Added bf_eia923, gf_eia923, and gfn_eia923 to outputs expected to have unique primary keys (based on work we've done to clean those tables up) * Changed aggregation of gfn_eia923 (monthly/annual) such that it includes the `nuclear_unit_id` field in the groupby. * Alphabetized all of the lists of tables so that when we change them it's clearer which thing has been changed. * Updated Datasette DB descriptions to include the old and new years that have been added. * Un-hid some hidden tables from Datasette DB description since they are more informative now, with the coding setup we've got.

codecov · 2021-11-11T18:22:49Z

Codecov Report

Merging #1345 (0e305b8) into dev (5d91dd4) will increase coverage by 0.06%.
The diff coverage is 62.50%.

❗ Current head 0e305b8 differs from pull request most recent head 2d0bc91. Consider uploading reports for the commit 2d0bc91 to get more accurate results

@@            Coverage Diff             @@
##              dev    #1345      +/-   ##
==========================================
+ Coverage   83.23%   83.29%   +0.06%     
==========================================
  Files          62       62              
  Lines        6731     6737       +6     
==========================================
+ Hits         5602     5611       +9     
+ Misses       1129     1126       -3

Impacted Files	Coverage Δ
src/pudl/output/eia923.py	`95.60% <62.50%> (-1.55%)`	⬇️
src/pudl/analysis/timeseries_cleaning.py	`88.84% <0.00%> (+0.44%)`	⬆️
src/pudl/workspace/datastore.py	`70.45% <0.00%> (+1.62%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5d91dd4...2d0bc91. Read the comment docs.

cmgosnell

o/s of the # of rows, it looks like you alphabetized all of the output tables in the tests, added the nuclear gf tables and added the if nuclear in the gf output functions... It is hard to tell which tables you changed the PKs for but I looked through all of the ones you have here now and they all look reasonable. I made a comment/potentially suggestion for the future but it all looks reasonable enough for now.

cmgosnell · 2021-11-11T18:08:17Z

test/validate/eia_test.py

-        ("plants_eia860", "all"),
-        ("utils_eia860", "all"),
-        ("pu_eia860", "all"),
+        ("bf_eia923", "all"),


are these just reorganized to get whatever failure you were getting to show up early?? or are they all now just alphabetized?

I alphabetized these lists, and also added new output tables to them, like gfn_eia923 and all_plants_ferc1

cmgosnell · 2021-11-11T18:16:29Z

test/validate/ferc1_test.py

+            "report_year",
+            "utility_id_ferc1",
+            "plant_name_ferc1",
+            "capacity_mw"  # Why does having capacity here make sense???


I assume there are duplicates based on the "report_year", "utility_id_ferc1", "plant_name_ferc1" and the cpacity distinguishes them... this makes me wonder why we are compiling all of these lists of columns that are in effect the PKs of the tables. Why wouldn't we just pull these directly from the metadata. And adjust when necessary bc obviously these are not all analogous to db tables.

The intention of this comment is to maybe lead to an issue for a future enhancement - not to lead to changes in this PR/release.

Previously we didn't have many of the primary keys actually encoded. Yes, now that we've got them available in the metadata, I think this can be done automatically (for the EIA stuff, not for FERC 1 which is still keyless).

In these comments I'm questioning whether checking for uniqueness even makes sense -- these tables don't have natural primary keys, and I don't know what exactly the argument is for using capacity to fake one.

src/pudl/output/eia923.py

bendnorman

Look good to me. Thank you for catching the missing code for the gfn table.

src/pudl/output/eia923.py

zaneselvans added 2 commits November 11, 2021 11:13

Set data type for nuclear_unit_id in outputs

0e305b8

zaneselvans requested a review from cmgosnell November 11, 2021 18:04

cmgosnell approved these changes Nov 11, 2021

View reviewed changes

bendnorman approved these changes Nov 11, 2021

View reviewed changes

cmgosnell reviewed Nov 11, 2021

View reviewed changes

src/pudl/output/eia923.py Show resolved Hide resolved

zaneselvans added 2 commits November 11, 2021 12:55

Add generation_fuel_nuclear_eia923 to fast output / frequency tests.

2d0bc91

Added a missing comma, omfg.

fc81078

zaneselvans merged commit 7bb74e9 into dev Nov 11, 2021

zaneselvans deleted the validate-v0.5.0 branch November 11, 2021 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate v0.5.0 #1345

Validate v0.5.0 #1345

zaneselvans commented Nov 11, 2021

codecov bot commented Nov 11, 2021 •

edited

cmgosnell left a comment

cmgosnell Nov 11, 2021

zaneselvans Nov 11, 2021

cmgosnell Nov 11, 2021

zaneselvans Nov 11, 2021

bendnorman left a comment

Validate v0.5.0 #1345

Validate v0.5.0 #1345

Conversation

zaneselvans commented Nov 11, 2021

codecov bot commented Nov 11, 2021 • edited

Codecov Report

cmgosnell left a comment

Choose a reason for hiding this comment

cmgosnell Nov 11, 2021

Choose a reason for hiding this comment

zaneselvans Nov 11, 2021

Choose a reason for hiding this comment

cmgosnell Nov 11, 2021

Choose a reason for hiding this comment

zaneselvans Nov 11, 2021

Choose a reason for hiding this comment

bendnorman left a comment

Choose a reason for hiding this comment

codecov bot commented Nov 11, 2021 •

edited