Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the organization of our test suites #942

Closed
9 tasks done
zaneselvans opened this issue Mar 5, 2021 · 1 comment
Closed
9 tasks done

Simplify the organization of our test suites #942

zaneselvans opened this issue Mar 5, 2021 · 1 comment
Assignees
Labels
testing Writing tests, creating test data, automating testing, etc.

Comments

@zaneselvans
Copy link
Member

zaneselvans commented Mar 5, 2021

While updating the development documentation (#940) related to running our tests, it was obvious that our current testing setup is more complex than it needs to be, and it seems to have discouraged people from running the tests, since almost nobody can remember how to do it. There have also been several cases in which someone has gone to the trouble of writing new tests, but they never get run automatically because they aren't put somewhere that our CI setup will run them.

Rather than documenting a complicated system, I'd like to simplify it a bit, and then document a simple system. Changes I'd like to make include:

  • Add brief description to each Tox test environment. Can be displayed with tox -av
  • Organize the tests broadly into 3 categories: software unit tests, software integration tests, and data validation.
  • Make running the "fast" (one year) tests the default, since that's what we almost always do. Other behavior can be elicited if need be by editing the test settings files.
  • Remove the option of using a preexisting ferc1 database, since generating a new one for a single year only takes a minute.
  • Use existing datastore by default, unless --tmp-data flag is set, and then download fresh data to a temporary directory.
  • Read ferc1_to_sqlite settings out of the test settings file instead of generating them dynamically, which is confusing.
  • Make sure that --live_pudl_db is still behaving as expected by running the output tests or something.
  • Review all of the tests to see if there are multi-year tests that we need to preserve and run somehow in another context, e.g. the test that verifies all the years of automatically generated ferc1 database schemas are compatible with each other. (note that they already aren't really being run that way now if they do exist).
  • Create an all-years, all-tables (for ferc1 and eia, but not epacems) test settings file, and a Tox testenv that uses it, in order to make it easy to run the whole ETL against everything, and then also run those tests against that setup too.

See also issue #941 about separating the data validation process from the Tox/pytest setup, since it requires a complete database, and is really a different kind of thing altogether.

@zaneselvans zaneselvans added the testing Writing tests, creating test data, automating testing, etc. label Mar 5, 2021
@zaneselvans zaneselvans self-assigned this Mar 5, 2021
@zaneselvans
Copy link
Member Author

This all seems to be working well now! Minor updates to the new docs need to happen to reflect the current setup, which is way simpler.

There are some tests which, rarely, do need to be run against all years of data -- and we didn't really have a systematic way to ensure that was happening in the past. I will create a settings file for all of the years which is part of the test settings, and a tox test environment that uses those settings to check out everything together.

zaneselvans added a commit that referenced this issue Mar 9, 2021
In order to be able to reliably run some multi-year tests (e.g. checking that
the database schemas we are generating from all of the different years of
FERC Form 1 database are mutually compatible) we do sometimes need to run the
tests against the *full* set of all years and tables of data. So I've added
a settings file specifying the full complement of data, which can be used
via pytest with the --etl-settings flag, or by doing `tox -e full_integration`
To just check the FERC 1 db schemas, `tox -e ferc1_schema` -- this does require
you to have all the FERC 1 raw input data (or it'll get downloaded for you)

Because we can tell the tests to run whatever ETL we want with --etl-settings
I removed the standalone test_ferc1_solo_etl test, and instead specified the
data that it should attempt to load in a ferc1-solo-test.yml file under
test/settings. This is run prior to the main integration tests by Tox.

Added a couple of basic tests to the ferc1_etl and pudl_engine tests, which
had just been `pass` statements. Now they at least check to see that the
fixture is of type `sa.engine.Engine` and check that a couple of tables which
should always be present appear in the engine.table_names() list.

Made some adjustments to which files are being included in coverage to more
accurately reflect how well we're doing.

Made the `ci` testenv the default -- if you just run `tox` that's what will
get run.  You can run `tox -av` to see the list of all the available test
environments with short descriptions of what they do.

Still need to finish documenting these changes via #940

Closes #942
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Writing tests, creating test data, automating testing, etc.
Projects
None yet
Development

No branches or pull requests

1 participant