Test restructuring #2237

mhauru · 2024-05-28T11:19:03Z

This PR makes two orthogonal changes to how tests are run and structured.

All test files included in runtests.jl are made into modules, and imports are moved into those modules. runtests.jl now has minimal imports. This cleans up the namespaces in which the individual tests are run, and makes it clearer where various references in tests point to.
Test files can now be included selectively using keyword arguments. If one runs e.g. Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"]) then only test files that have the substring "optim" or "hmc" in their path will be run, unless they also have one of the substrings "ext" or "nelder" in their path, in which case they are skipped. By default all tests are run. Substring comparisons are case-insensitive.

Point 2. makes the @turing_testset and @numerical_testset macros unnecessary, and they will be removed.

The selective running of tests is useful for both local development and splitting our CI to run tests for various parts of the Turing.jl in parallel (#2179). I'll try implementing the latter next to see that it goes smoothly.

This is an early draft, I'm opening it to get comments (@yebai, @torfjelde). I've only converted a couple of files to the submodule format, and the selective running lacks documentation. If this is deemed a good design I'll go and finish the implementation.

mhauru · 2024-05-28T11:24:45Z

For the record, optimally I would have liked the selective test running to be based on testset names rather than file paths. I tried a couple of designs for that, but ran into various limitations and complexities in how AbstractTestSet and nesting macros work. This could be revisited later.

yebai · 2024-05-28T11:50:20Z

cc @willtebbutt @devmotion who might have thoughts.

willtebbutt · 2024-05-28T12:03:47Z

I'm in favour of these kinds of changes -- it's good to clean up the namespaces, and to remove these macros in favour of using standard functionality.

Re parallelisation of tests: I've generally gone about this in a very slightly different way (see e.g. Tapir's tests and the associated action ), but it's not obvious to me that either approach is necessarily better than the other.

mhauru · 2024-05-29T16:17:02Z

There's a file called test/essential/container.jlthat's not being included in runtests.jl, or anywhere else. Does someone know if it should be deleted or included in the test suite?

yebai · 2024-05-29T16:21:06Z

It should be included.

mhauru · 2024-05-30T09:41:51Z

I finished the restructuring described in the opening message. I also removed the code coverage part of CI, because it seems to have been broken for the last > 1 year. If we want to reintroduce it I would propose running it as a separate job outside the current test matrix, since it requires running all the tests in one go and only on a single platform.

More work could be done harmonising the way we do imports and qualified references, but I think that could be a separate PR. In this one I avoided touching anything in the actual test code, and only modifying the "header" of each file.

The longest test CI job now takes around 25 minutes, so more could perhaps be done to make tests faster. (EDIT: I did some further splitting, so now it's more like 20 minutes.) Either making them more lightweight or splitting some of the heavy test files into several files.

willtebbutt

I have no particular comments on the functionality, just a couple of comments on style! I've commented on representative examples, rather than going through and commenting on each thing manually.

test/essential/ad.jl

devmotion · 2024-05-30T13:27:50Z

I appreciate the effort, it's good to clean up and improve the test structure a bit 🙂

My feeling is that we should neither implement nor use parsing of test args (and hence do not support something like Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"]). In my experience it's much easier to use ENV variables for test GROUPs (similar to what @willtebbutt uses in Tapir it seems) since this does (almost) not require any additional code and is easier to use (you can just update the ENV variable in the julia REPL and (re)run ] test). It's also the approach taken throughout the SciML ecosystem and in e.g. Bijectors (https://github.com/TuringLang/Bijectors.jl/blob/1070b7349f14626808765e32621db82e64969f29/test/runtests.jl#L32-L61), DynamicPPL (https://github.com/TuringLang/DynamicPPL.jl/blob/master/test/runtests.jl) and DistributionsAD.

mhauru · 2024-05-30T15:57:43Z

Thanks for the comment @devmotion, happy to discuss this. Here are reasons why I went with ARGS rather than env vars, not in any particular order:

I wanted to have the option to include and exclude multiple test files. This requires some parsing of the input, and doing that for command line arguments is more natural and more standard practice than for env vars, which are usually single values.
Pkg.test has features explicitly for this, which I think is a strong hint that this can be considered the standard way of doing test selection in Julia. Pkg.test docs mention exactly this as an intended use case of test_args.
I find command line arguments handier when running things manually: I'm more used to passing things args than setting env vars when I run things from a terminal. For CI I think both work equally well.

It's also the approach taken throughout the SciML ecosystem and in e.g. Bijectors, DynamicPPL, and DistributionsAD.

That's a fair point that I hadn't considered. At the same time, the ARGS approach is what Julia itself does (I took the name of --skip from the tests of stdlib).

it's much easier to use ENV variables for test GROUPs (similar to what @willtebbutt uses in Tapir it seems) since this does (almost) not require any additional code

I would argue that using ARGS does not require any more extra code in runtests.jl. The only reason why there's any substantial extra code here (still <30 lines + docstrings) is because of parsing of include/exclude tests, which I would like to have regardless of whether we use env vars or ARGS.

you can just update the ENV variable in the julia REPL and (re)run ] test

You can have a similar user experience with ARGS, by doing ARGS .= ["my", "new", "--skip", "args"]; ] test.

devmotion · 2024-05-31T12:17:30Z

I can see these points and I'm sure tests could be structured with ARGS as well - but I think it would be good to follow common practices in the ecosystem and use environment variables.

IMO one of the main problems with the current Turing test setup is that it does not follow common practices in the ecosystem and defines its own macros for structuring the tests. My impression is that this makes it unnecessarily complicated for new contributors to read, modify, and add tests and adds additional maintenance burden on our side. My worry is that with an uncommon practice such as ARGS (I haven't encountered a single Julia package yet that uses it to structure tests) we will still make it more complicated than necessary for new contributors (and the custom parsing requires additional maintenance on our side).

yebai · 2024-05-31T14:10:20Z

I like the ARGS approach and was not aware of it previously. One issue with standard practices is that they tend to be heavily influenced by early design decisions in the community and then evolve slowly. If we put a one-line testing instruction in README, maybe people can start learning about this ARGS option.

devmotion · 2024-05-31T14:22:17Z

I'm still not convinced (but also don't care too much if everyone else has a different opinion) - I think it's good to adopt improvements even if they are not common practice yet, but I don't see why/how ARGS is superior than ENV and therefore I would go with the more established and more well-known approach.

mhauru · 2024-06-03T12:11:26Z

One suggestion for CI setup is that failures of particular CI tasks should not cancel other CI tasks so we can independently identify multiple CI failures in parallel.

Good point, should now be fixed.

Assuming CI passes (the previous error was random, see #2234), I have no other changes in mind that I would like to make. I also don't have merge rights, so someone else needs to make the call on when this is final.

torfjelde · 2024-06-03T12:25:17Z

I'd also add my vote to sticking with ENV vs. ARGS, as I've voiced before this PR. My argument is the same as @devmotion ; this is what's done elsewhere within the ecosystem we generally work with. We (the developers) now have to be aware of which package uses ARGS and which uses ENV variables when running tests, which just seems like unnecessary headache without any added benefit.

Also, IMO, when "trying out" stuff like this, it's better to do it in some of the "simpler" packages and then eventually propagate to Turing.jl, rather than first adding it to Turing.jl.

The rest of the PR I think is a very, very good idea @mhauru :) Very keen to have the CI become much less of a pain 🎉

mhauru · 2024-06-03T13:41:13Z

For those who would prefer env vars, how would you like to implement being able to include and exclude multiple test files in a given run? In other words, what would be the env var equivalent of doing Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"])?

without any added benefit

I do see some benefits, the most important ones being

Respecting Pkg.test's intended way for selecting tests to run.
Being more inline with the wider software ecosystem, e.g. how testing frameworks in other languages work.

I commented on 1. more extensively above, but to expand on 2., giving command line arguments to include/exclude particular tests is the standard way to do things in for instance pytest and the standard unit testing frameworks of both Go and Rust. I haven't used all that many testing frameworks, but I haven't seen SciML's env var approach elsewhere. I presume this consistency with other languages and frameworks is also why Pkg.test has the test_args mechanism, and why Julia's Stdlib uses it.

Using environment variables for test selection also goes against my idea of what env vars are meant for. I view env vars as global flags that should set some state about the, well, the environment in which the process operates. Kinda like defining a global constant at the top your source code file that affects everything in that package/module. Selectively running tests is more like passing arguments to a function as you call it, where you might for instance call it back to back with different arguments, and that's exactly what command line arguments are for.

torfjelde · 2024-06-03T14:02:03Z

In other words, what would be the env var equivalent of doing Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"])?

This is a really good argument 👍 You could of course implement the same parsing of an ENV variable, but I very much agree that it's much, much more natural to do so using ARGS.

Respecting Pkg.test's intended way for selecting tests to run.

I that the the docs mention that this is a use-case for args, but does it specifically recommend that over ENV variables?

I'm am genuinely quite keen on giving this pattern a go, but I'm also somewhat worried about doing this in Turing.jl 😕

Has there been a similar discussion elsewhere? I.e. what's the reasoning of the rest of the ecosystem going with ENV variables?

mhauru · 2024-06-03T15:03:52Z

This is a really good argument 👍 You could of course implement the same parsing of an ENV variable, but I very much agree that it's much, much more natural to do so using ARGS.

Yeah, I think parsing complicated env vars gets awkward. You could define two env vars though, INCLUDE_TESTS and EXCLUDE_TESTS, both of which could be space separated lists. I don't know if that would satisfy the desire of consistency with SciML practices any more though.

I that the the docs mention that this is a use-case for args, but does it specifically recommend that over ENV variables?

No, I don't think it says anything about environment variables.

mhauru · 2024-06-03T16:44:52Z

The test failures seem unrelated to this PR. They seem to rather be about the recent ADTypes compat change interacting in funky ways with Optimization.jl version bounds when using Julia 1.7. I haven't gotten to the bottom of it yet.

devmotion · 2024-06-03T19:35:05Z

In other words, what would be the env var equivalent of doing

Probably something similar to JULIA_LOADPATH etc. where a single environment variable is split into a vector of paths/arguments based on a delimiter such as : (using e.g. split(ENV["..."], ':')).

Generally, I haven't encountered the need for such complex inclusion/exclusion of tests yet, so I'm a bit skeptic whether the additional complexity (both for users and developers) is worth it. Having the possibility to run all tests (typically the default but not used in CI) and pre-defined groups of tests has been sufficient in all packages I have worked with. For running even smaller chunks of tests or subsets of test files in local development in my experience typically the most convenient approach is to use TestEnv anyway.

yebai · 2024-06-04T10:27:24Z

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations? This way, we support both, adhering to the TestEnv workflow and allowing users to use additional options only available in ARGS. This should be a relatively simple block of code, so I don't expect it will increase developer mental overhead too much.

mhauru · 2024-06-04T16:13:49Z

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations?

Yep, we could definitely have both available. Have e.g. the ARGS parsing implemented as in the current version of this PR and append to the list of included tests the value of an env var called TURING_SELECT_TEST. Would everyone be happy with that?

Regardless of ARGS/env vars choice, I would still like to stick to tests being selected by file path rather than by manually grouping them. Benefits are

No code needed for introducing the groups. File folders should form natural groups anyway.
Can't accidentally have a test that's not within a group, or a group that isn't included in CI.
More granularity and flexibility in test selection.

Consequently calling the env var GROUP doesn't feel right, even though that seems to be what other SciML packages do.

torfjelde · 2024-06-04T19:11:42Z

For running even smaller chunks of tests or subsets of test files in local development in my experience typically the most convenient approach is to use TestEnv anyway.

So I've also done the same, but having had experience with testing in other languages, I also do sometimes miss the ability to simply filter tests based on their name or whatever. It's particularly nice to do when you're working on bugfixes, since then there's a particular sub-test-suite you want to run, which then usually requires copy-pasting to another file, using TestEnv.jl, etc.

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations?

Both also seems non-ideal, no? 😕 I'd rather have to do that (push onto ARGS) manually than us having to support both.

* Restore compat with ADTypes v0.2. Make AutoTapir export conditional. * Fix AutoTapir export in Essential.jl

yebai · 2024-06-05T11:21:25Z

Both also seems non-ideal, no? 😕 I'd rather have to do that (push onto ARGS) manually than us having to support both.

I'm happy only to support ARGS approach from my side: I think the ARGS approach has fairly sensible mental overheads, given its similarity to standard terminal commands.

torfjelde · 2024-06-05T12:48:42Z

I'm also slowly coming to terms with the ARGS approach. In hindsight, I'd much prefer to have tried this out in another package, e.g. Bijectors or something, which is "less mission critical". But given the fantastic work @mhauru has done here, it might be worth just giving it a go in Turing.jl 🤔

yebai · 2024-06-05T14:23:35Z

@mhauru, feel free to merge this PR when you are ready.

mhauru · 2024-06-05T15:29:42Z

Given @yebai and @torfjelde have come around to args and @devmotion said earlier that he's not too bothered if others are happy with this, I'll go and merge. Everyone seems to agree that this is regardless an improvement over the current situation, and if people miss having a SciML-style GROUP env var we can easily add it by building on the mechanisms introduced here.

torfjelde · 2024-06-06T08:30:29Z

For other people who are lazy: you might find it useful to add a file called, say, test/testrunner.jl which contains the following:

using Pkg
Pkg.test("Turing", test_args=ARGS)

then you can do

julia --project test/testrunner.jl a b c -skip d e

torfjelde · 2024-06-06T08:38:12Z

Even better, in general you can do:

using Pkg
Pkg.test(Pkg.project().name, test_args=ARGS)

though Pkg.project is apparently an experimental feature.

yebai · 2024-06-06T10:36:48Z

@torfjelde can you add this tip to the following docs page: https://turinglang.org/docs/tutorials/docs-01-contributing-guide/#tests

torfjelde · 2024-06-06T11:06:02Z

Question is whether we should just have a small script test/testrunner.jl in the actual repo that does this? Almost seems worth it, no?

mhauru · 2024-06-06T12:48:39Z

This seems like a common problem for many Julia packages, is there no standard solution to this? It feels frustrating that julia --project test/runtests.jl -- a b c --skip d e so nearly works. The only reason it doesn't is that it doesn't create a TestEnv.

devmotion · 2024-06-06T21:35:46Z

I'd suggest running

julia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- a b c --skip d e

if you don't work in a Julia REPL and want to execute the tests.

yebai · 2024-06-07T19:36:37Z

@devmotion, it looks very neat -- I added it to the docs: https://turinglang.org/docs/tutorials/docs-01-contributing-guide/#tests

mhauru added 3 commits May 23, 2024 11:15

Remove unused test util function

15ffdc1

Refactoring ad and optiminterface tests to modules

91956d5

Add SelectiveTests.jl

58c6561

mhauru added 9 commits May 29, 2024 14:23

Rework tests CI GA

3097f49

Switch test CI on Mac back to x64

00bd2e0

Remove coverage from CI. Improve matrix.

e40456d

Wrap all tests in modules.

b9c94be

Remove unused test utils

797566e

Remove GA workflows for DynamicHMC and Numerical

b2ef761

Rename TuringCI GA to Tests

235256d

Fix test_args passing in CI

185e14f

Fix for CI test matrix

6ba68a5

Fixes to various test files

3c4bed4

Add container.jl to test suite

7d6fde5

mhauru marked this pull request as ready for review May 30, 2024 09:34

willtebbutt reviewed May 30, 2024

View reviewed changes

test/essential/ad.jl Outdated Show resolved Hide resolved

test/essential/ad.jl Outdated Show resolved Hide resolved

mhauru added 2 commits May 31, 2024 10:36

Fix spacing around * in test includes

6964622

Split ad.jl and abstractmcmc.jl tests to separate CI jobs

4cdd6b2

Alphabetise imports in tests

a16204b

mhauru added 2 commits June 3, 2024 16:20

Add a step to print matrix variables to tests Action

5900ac9

Fix typo in tests Action

d2bfa5d

This was referenced Jun 4, 2024

Optimization compat fix #2250

Closed

Compat issue #2251

Closed

ADTypes v0.2 compatibility for test restructuring #2253

Merged

ADTypes v0.2 compatibility for test restructuring (#2253)

b1ee983

* Restore compat with ADTypes v0.2. Make AutoTapir export conditional. * Fix AutoTapir export in Essential.jl

mhauru merged commit 03d0e78 into master Jun 5, 2024
56 checks passed

mhauru deleted the mhauru/test-restructuring branch June 5, 2024 15:29

yebai mentioned this pull request Jun 5, 2024

refactor tests into multiple Github CI steps #2179

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test restructuring #2237

Test restructuring #2237

mhauru commented May 28, 2024 •

edited

Loading

mhauru commented May 28, 2024

yebai commented May 28, 2024

willtebbutt commented May 28, 2024

mhauru commented May 29, 2024

yebai commented May 29, 2024

mhauru commented May 30, 2024 •

edited

Loading

willtebbutt left a comment

devmotion commented May 30, 2024

mhauru commented May 30, 2024

devmotion commented May 31, 2024

yebai commented May 31, 2024

devmotion commented May 31, 2024

mhauru commented Jun 3, 2024

torfjelde commented Jun 3, 2024

mhauru commented Jun 3, 2024

torfjelde commented Jun 3, 2024

mhauru commented Jun 3, 2024

mhauru commented Jun 3, 2024

devmotion commented Jun 3, 2024

yebai commented Jun 4, 2024 •

edited

Loading

mhauru commented Jun 4, 2024

torfjelde commented Jun 4, 2024

yebai commented Jun 5, 2024

torfjelde commented Jun 5, 2024

yebai commented Jun 5, 2024

mhauru commented Jun 5, 2024

torfjelde commented Jun 6, 2024 •

edited

Loading

torfjelde commented Jun 6, 2024

yebai commented Jun 6, 2024

torfjelde commented Jun 6, 2024

mhauru commented Jun 6, 2024

devmotion commented Jun 6, 2024

yebai commented Jun 7, 2024

Test restructuring #2237

Test restructuring #2237

Conversation

mhauru commented May 28, 2024 • edited Loading

mhauru commented May 28, 2024

yebai commented May 28, 2024

willtebbutt commented May 28, 2024

mhauru commented May 29, 2024

yebai commented May 29, 2024

mhauru commented May 30, 2024 • edited Loading

willtebbutt left a comment

Choose a reason for hiding this comment

devmotion commented May 30, 2024

mhauru commented May 30, 2024

devmotion commented May 31, 2024

yebai commented May 31, 2024

devmotion commented May 31, 2024

mhauru commented Jun 3, 2024

torfjelde commented Jun 3, 2024

mhauru commented Jun 3, 2024

torfjelde commented Jun 3, 2024

mhauru commented Jun 3, 2024

mhauru commented Jun 3, 2024

devmotion commented Jun 3, 2024

yebai commented Jun 4, 2024 • edited Loading

mhauru commented Jun 4, 2024

torfjelde commented Jun 4, 2024

yebai commented Jun 5, 2024

torfjelde commented Jun 5, 2024

yebai commented Jun 5, 2024

mhauru commented Jun 5, 2024

torfjelde commented Jun 6, 2024 • edited Loading

torfjelde commented Jun 6, 2024

yebai commented Jun 6, 2024

torfjelde commented Jun 6, 2024

mhauru commented Jun 6, 2024

devmotion commented Jun 6, 2024

yebai commented Jun 7, 2024

mhauru commented May 28, 2024 •

edited

Loading

mhauru commented May 30, 2024 •

edited

Loading

yebai commented Jun 4, 2024 •

edited

Loading

torfjelde commented Jun 6, 2024 •

edited

Loading