Release v0.4.0 #1087

zaneselvans · 2021-07-16T19:53:09Z

Documentation and data validation tweaks in preparation for releasing 0.4.0.

Please suggest content for the Release Notes!

* Flesh out release notes with changes from last release and known issues. * Update data validation criteria to accommodate new numbers of records in many tables which now have more data in them. * Update and comment the databeta.sh script for making quick-and-dirty data releases. * Added the censusdp1tract SQLite DB to the list of DBs which are accessed directly instead of being regenerated when you run pytest --live-dbs * Changed the integration tests to skip epacems_to_parquet when running with --live-dbs * Reduced margin for checking expected line numbers to zero -- any change in the number of output rows will now cause validation to fail. It's informational to see that the results have changed even if it's by less than an additional year's worth of data. E.g. fixing the leading zeroes on generator IDs changed the number of rows by a few here and there. When we're running the validations automatically it'll be good to see these changes appear as a result of code changes to know what we're affecting.

- removed the replaces that were dealing with null values for string types. added a check to ensure all string types in pc.column_dtypes are the nullable string types - inserted locs in the zip code zero padding

review-notebook-app · 2021-07-16T19:53:13Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2021-07-16T20:15:14Z

Codecov Report

Merging #1087 (cc6ab8b) into dev (178f5a3) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head cc6ab8b differs from pull request most recent head 91e4894. Consider uploading reports for the commit 91e4894 to get more accurate results

@@            Coverage Diff             @@
##              dev    #1087      +/-   ##
==========================================
- Coverage   81.43%   81.41%   -0.02%     
==========================================
  Files          97       49      -48     
  Lines       12109     6089    -6020     
==========================================
- Hits         9860     4957    -4903     
+ Misses       2249     1132    -1117

Impacted Files	Coverage Δ
src/pudl/validate.py	`35.19% <ø> (ø)`
src/pudl/helpers.py	`92.26% <100.00%> (ø)`
..._pudl/lib/python3.9/site-packages/pudl/validate.py
...b/python3.8/site-packages/pudl/glue/eia_epacems.py
.../python3.9/site-packages/pudl/transform/ferc714.py
.env_pudl/lib/python3.8/site-packages/pudl/cli.py
.../lib/python3.8/site-packages/pudl/output/eia923.py
...lib/python3.8/site-packages/pudl/extract/eia923.py
...b/python3.9/site-packages/pudl/glue/eia_epacems.py
...lib/python3.9/site-packages/pudl/extract/eia861.py
... and 137 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 178f5a3...91e4894. Read the comment docs.

Added a parametrized test that checks the date frequency of output dataframes, relative to the gens_eia860 dataframe, to ensure that we have annual / monthly dataframes as expected. This test is currently expected to fail because of the bug referenced in issue #1088. Also consolidated some of the "fast output tests" using parametrization, and improved coverage to include all of the potential output ferc1, eia860, eia923, and mcoe output tables.

cmgosnell

i've got a few questions but nothing major

devtools/databeta.sh

docs/release_notes.rst

cmgosnell · 2021-07-23T16:13:49Z

test/validate/eia_test.py

        .pipe(pv.check_max_rows, expected_rows=expected_rows,
-              margin=0.05, df_name=df_name)
+              margin=0.0, df_name=df_name)


why turn the margin to 0? that seems extreme given that we know there will be some fluctuation

I think we may often want to know about these consequences though. Like fixing the leading-zeroes in generator IDs does change the number of records in some cases since the aggregations lead to fewer groupings. In this case I also wanted to do it so that I could update the expected number of rows to reflect the actual expected number of rows. It's more a warning that "Hey, something changed! Did you expect it to change? If not, maybe you should look into why it changed."

yeah that makes sense to me. especially in the world in which we have the validation tests running every night.

in the before times of the nightly validation tests (now!), I think this behavior is going to be hard to interact with. We will tweak things over time, not realize when and have to play forensic investigator for issues that are probably mostly non-issues.

Yeah this is definitely with an 👁️ toward the aftertimes. I guess I was also thinking that since we run the validation tests so rarely now it's not going to be a common annoyance. And we can look at the numbers and be like "Meh, close enough." if we want to.

cmgosnell · 2021-07-23T16:17:31Z

docs/release_notes.rst

+  This addresses a bunch of unsatisfied foreign key constraints in the original
+  databases published by FERC.
+* We're doing much more software testing and data validation, and so hopefully
+  we're catching more issues early on.


should we add the (experimental!) net generation allocation?

Please feel free to add a blurb explaining it!

okay I added a section. I mostly stole content from the main module docs. Feel free to edit or condense

… into release-v0.4.0

zaneselvans and others added 12 commits July 14, 2021 20:14

Bump to pyarrow 4.0 and tidy EIA ETL debugger

9262594

Merge branch 'dev' into release-prep-v0.4.0

071ada5

First draft of release notes page

e3e81dd

Update README for 0.4.0 and flesh out release notes

a5b9126

Merge branch 'dev' into release-v0.4.0

6299e90

Forgot to save edits to release_notes before committing

d25958b

fix dtypes warning.

95d2c4a

- removed the replaces that were dealing with null values for string types. added a check to ensure all string types in pc.column_dtypes are the nullable string types - inserted locs in the zip code zero padding

add back in string replacement post astype

679f005

remove extra loc

36dd6e3

add back in silly null string replacement

cb3f342

remove non-nullable string test

29d7b00

zaneselvans added 3 commits July 17, 2021 23:04

Merge branch 'dtype_warn' into release-v0.4.0

cc6ab8b

Clean up formatting and add draft of new merge_on_date_year()

b82c4b8

zaneselvans linked an issue Jul 22, 2021 that may be closed by this pull request

Some monthly MCOE outputs become annual #1088

Closed

9 tasks

cmgosnell reviewed Jul 23, 2021

View reviewed changes

cmgosnell and others added 6 commits July 23, 2021 13:05

Add net gen/fuel allocation into release notes

2f00e30

Remove forward looking statements about frequency of future releases.

5271767

Merge branch 'release-v0.4.0' of github.com:catalyst-cooperative/pudl…

3d4f408

… into release-v0.4.0

Merge branch 'dev' into release-v0.4.0

e431efb

Redirect all dependabot PRs to the dev branch

2091426

Add module and data dictionary links to net-gen/fuel allocation section

91e4894

cmgosnell merged commit 8ccb3dc into dev Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.4.0 #1087

Release v0.4.0 #1087

zaneselvans commented Jul 16, 2021

review-notebook-app bot commented Jul 16, 2021

codecov bot commented Jul 16, 2021 •

edited

cmgosnell left a comment

cmgosnell Jul 23, 2021

zaneselvans Jul 23, 2021

cmgosnell Jul 23, 2021

zaneselvans Jul 23, 2021

cmgosnell Jul 23, 2021

zaneselvans Jul 23, 2021

cmgosnell Jul 23, 2021

Release v0.4.0 #1087

Release v0.4.0 #1087

Conversation

zaneselvans commented Jul 16, 2021

review-notebook-app bot commented Jul 16, 2021

codecov bot commented Jul 16, 2021 • edited

Codecov Report

cmgosnell left a comment

Choose a reason for hiding this comment

cmgosnell Jul 23, 2021

Choose a reason for hiding this comment

zaneselvans Jul 23, 2021

Choose a reason for hiding this comment

cmgosnell Jul 23, 2021

Choose a reason for hiding this comment

zaneselvans Jul 23, 2021

Choose a reason for hiding this comment

cmgosnell Jul 23, 2021

Choose a reason for hiding this comment

zaneselvans Jul 23, 2021

Choose a reason for hiding this comment

cmgosnell Jul 23, 2021

Choose a reason for hiding this comment

codecov bot commented Jul 16, 2021 •

edited