Simplify the organization of our test suites #942

zaneselvans · 2021-03-05T15:44:30Z

While updating the development documentation (#940) related to running our tests, it was obvious that our current testing setup is more complex than it needs to be, and it seems to have discouraged people from running the tests, since almost nobody can remember how to do it. There have also been several cases in which someone has gone to the trouble of writing new tests, but they never get run automatically because they aren't put somewhere that our CI setup will run them.

Rather than documenting a complicated system, I'd like to simplify it a bit, and then document a simple system. Changes I'd like to make include:

Add brief description to each Tox test environment. Can be displayed with tox -av
Organize the tests broadly into 3 categories: software unit tests, software integration tests, and data validation.
Make running the "fast" (one year) tests the default, since that's what we almost always do. Other behavior can be elicited if need be by editing the test settings files.
Remove the option of using a preexisting ferc1 database, since generating a new one for a single year only takes a minute.
Use existing datastore by default, unless --tmp-data flag is set, and then download fresh data to a temporary directory.
Read ferc1_to_sqlite settings out of the test settings file instead of generating them dynamically, which is confusing.
Make sure that --live_pudl_db is still behaving as expected by running the output tests or something.
Review all of the tests to see if there are multi-year tests that we need to preserve and run somehow in another context, e.g. the test that verifies all the years of automatically generated ferc1 database schemas are compatible with each other. (note that they already aren't really being run that way now if they do exist).
Create an all-years, all-tables (for ferc1 and eia, but not epacems) test settings file, and a Tox testenv that uses it, in order to make it easy to run the whole ETL against everything, and then also run those tests against that setup too.

See also issue #941 about separating the data validation process from the Tox/pytest setup, since it requires a complete database, and is really a different kind of thing altogether.

The text was updated successfully, but these errors were encountered:

zaneselvans · 2021-03-08T20:38:40Z

This all seems to be working well now! Minor updates to the new docs need to happen to reflect the current setup, which is way simpler.

There are some tests which, rarely, do need to be run against all years of data -- and we didn't really have a systematic way to ensure that was happening in the past. I will create a settings file for all of the years which is part of the test settings, and a tox test environment that uses those settings to check out everything together.

In order to be able to reliably run some multi-year tests (e.g. checking that the database schemas we are generating from all of the different years of FERC Form 1 database are mutually compatible) we do sometimes need to run the tests against the *full* set of all years and tables of data. So I've added a settings file specifying the full complement of data, which can be used via pytest with the --etl-settings flag, or by doing `tox -e full_integration` To just check the FERC 1 db schemas, `tox -e ferc1_schema` -- this does require you to have all the FERC 1 raw input data (or it'll get downloaded for you) Because we can tell the tests to run whatever ETL we want with --etl-settings I removed the standalone test_ferc1_solo_etl test, and instead specified the data that it should attempt to load in a ferc1-solo-test.yml file under test/settings. This is run prior to the main integration tests by Tox. Added a couple of basic tests to the ferc1_etl and pudl_engine tests, which had just been `pass` statements. Now they at least check to see that the fixture is of type `sa.engine.Engine` and check that a couple of tables which should always be present appear in the engine.table_names() list. Made some adjustments to which files are being included in coverage to more accurately reflect how well we're doing. Made the `ci` testenv the default -- if you just run `tox` that's what will get run. You can run `tox -av` to see the list of all the available test environments with short descriptions of what they do. Still need to finish documenting these changes via #940 Closes #942

zaneselvans added the testing Writing tests, creating test data, automating testing, etc. label Mar 5, 2021

zaneselvans self-assigned this Mar 5, 2021

zaneselvans mentioned this issue Mar 9, 2021

Data validation errors after integrating eia860m 2020-11 #943

Closed

zaneselvans closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify the organization of our test suites #942

Simplify the organization of our test suites #942

zaneselvans commented Mar 5, 2021 •

edited

zaneselvans commented Mar 8, 2021

Simplify the organization of our test suites #942

Simplify the organization of our test suites #942

Comments

zaneselvans commented Mar 5, 2021 • edited

zaneselvans commented Mar 8, 2021

zaneselvans commented Mar 5, 2021 •

edited