Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test restructuring #2237

Merged
merged 30 commits into from
Jun 5, 2024
Merged

Test restructuring #2237

merged 30 commits into from
Jun 5, 2024

Conversation

mhauru
Copy link
Collaborator

@mhauru mhauru commented May 28, 2024

This PR makes two orthogonal changes to how tests are run and structured.

  1. All test files included in runtests.jl are made into modules, and imports are moved into those modules. runtests.jl now has minimal imports. This cleans up the namespaces in which the individual tests are run, and makes it clearer where various references in tests point to.
  2. Test files can now be included selectively using keyword arguments. If one runs e.g. Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"]) then only test files that have the substring "optim" or "hmc" in their path will be run, unless they also have one of the substrings "ext" or "nelder" in their path, in which case they are skipped. By default all tests are run. Substring comparisons are case-insensitive.

Point 2. makes the @turing_testset and @numerical_testset macros unnecessary, and they will be removed.

The selective running of tests is useful for both local development and splitting our CI to run tests for various parts of the Turing.jl in parallel (#2179). I'll try implementing the latter next to see that it goes smoothly.

This is an early draft, I'm opening it to get comments (@yebai, @torfjelde). I've only converted a couple of files to the submodule format, and the selective running lacks documentation. If this is deemed a good design I'll go and finish the implementation.

@mhauru
Copy link
Collaborator Author

mhauru commented May 28, 2024

For the record, optimally I would have liked the selective test running to be based on testset names rather than file paths. I tried a couple of designs for that, but ran into various limitations and complexities in how AbstractTestSet and nesting macros work. This could be revisited later.

@yebai
Copy link
Member

yebai commented May 28, 2024

cc @willtebbutt @devmotion who might have thoughts.

@willtebbutt
Copy link
Collaborator

I'm in favour of these kinds of changes -- it's good to clean up the namespaces, and to remove these macros in favour of using standard functionality.

Re parallelisation of tests: I've generally gone about this in a very slightly different way (see e.g. Tapir's tests and the associated action ), but it's not obvious to me that either approach is necessarily better than the other.

@mhauru
Copy link
Collaborator Author

mhauru commented May 29, 2024

There's a file called test/essential/container.jlthat's not being included in runtests.jl, or anywhere else. Does someone know if it should be deleted or included in the test suite?

@yebai
Copy link
Member

yebai commented May 29, 2024

It should be included.

@mhauru mhauru marked this pull request as ready for review May 30, 2024 09:34
@mhauru
Copy link
Collaborator Author

mhauru commented May 30, 2024

I finished the restructuring described in the opening message. I also removed the code coverage part of CI, because it seems to have been broken for the last > 1 year. If we want to reintroduce it I would propose running it as a separate job outside the current test matrix, since it requires running all the tests in one go and only on a single platform.

More work could be done harmonising the way we do imports and qualified references, but I think that could be a separate PR. In this one I avoided touching anything in the actual test code, and only modifying the "header" of each file.

The longest test CI job now takes around 25 minutes, so more could perhaps be done to make tests faster. (EDIT: I did some further splitting, so now it's more like 20 minutes.) Either making them more lightweight or splitting some of the heavy test files into several files.

Copy link
Collaborator

@willtebbutt willtebbutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no particular comments on the functionality, just a couple of comments on style! I've commented on representative examples, rather than going through and commenting on each thing manually.

test/essential/ad.jl Outdated Show resolved Hide resolved
test/essential/ad.jl Outdated Show resolved Hide resolved
@devmotion
Copy link
Member

I appreciate the effort, it's good to clean up and improve the test structure a bit 🙂

My feeling is that we should neither implement nor use parsing of test args (and hence do not support something like Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"]). In my experience it's much easier to use ENV variables for test GROUPs (similar to what @willtebbutt uses in Tapir it seems) since this does (almost) not require any additional code and is easier to use (you can just update the ENV variable in the julia REPL and (re)run ] test). It's also the approach taken throughout the SciML ecosystem and in e.g. Bijectors (https://github.com/TuringLang/Bijectors.jl/blob/1070b7349f14626808765e32621db82e64969f29/test/runtests.jl#L32-L61), DynamicPPL (https://github.com/TuringLang/DynamicPPL.jl/blob/master/test/runtests.jl) and DistributionsAD.

@mhauru
Copy link
Collaborator Author

mhauru commented May 30, 2024

Thanks for the comment @devmotion, happy to discuss this. Here are reasons why I went with ARGS rather than env vars, not in any particular order:

  1. I wanted to have the option to include and exclude multiple test files. This requires some parsing of the input, and doing that for command line arguments is more natural and more standard practice than for env vars, which are usually single values.
  2. Pkg.test has features explicitly for this, which I think is a strong hint that this can be considered the standard way of doing test selection in Julia. Pkg.test docs mention exactly this as an intended use case of test_args.
  3. I find command line arguments handier when running things manually: I'm more used to passing things args than setting env vars when I run things from a terminal. For CI I think both work equally well.

It's also the approach taken throughout the SciML ecosystem and in e.g. Bijectors, DynamicPPL, and DistributionsAD.

That's a fair point that I hadn't considered. At the same time, the ARGS approach is what Julia itself does (I took the name of --skip from the tests of stdlib).

it's much easier to use ENV variables for test GROUPs (similar to what @willtebbutt uses in Tapir it seems) since this does (almost) not require any additional code

I would argue that using ARGS does not require any more extra code in runtests.jl. The only reason why there's any substantial extra code here (still <30 lines + docstrings) is because of parsing of include/exclude tests, which I would like to have regardless of whether we use env vars or ARGS.

you can just update the ENV variable in the julia REPL and (re)run ] test

You can have a similar user experience with ARGS, by doing ARGS .= ["my", "new", "--skip", "args"]; ] test.

@devmotion
Copy link
Member

I can see these points and I'm sure tests could be structured with ARGS as well - but I think it would be good to follow common practices in the ecosystem and use environment variables.

IMO one of the main problems with the current Turing test setup is that it does not follow common practices in the ecosystem and defines its own macros for structuring the tests. My impression is that this makes it unnecessarily complicated for new contributors to read, modify, and add tests and adds additional maintenance burden on our side. My worry is that with an uncommon practice such as ARGS (I haven't encountered a single Julia package yet that uses it to structure tests) we will still make it more complicated than necessary for new contributors (and the custom parsing requires additional maintenance on our side).

@yebai
Copy link
Member

yebai commented May 31, 2024

I like the ARGS approach and was not aware of it previously. One issue with standard practices is that they tend to be heavily influenced by early design decisions in the community and then evolve slowly. If we put a one-line testing instruction in README, maybe people can start learning about this ARGS option.

@devmotion
Copy link
Member

I'm still not convinced (but also don't care too much if everyone else has a different opinion) - I think it's good to adopt improvements even if they are not common practice yet, but I don't see why/how ARGS is superior than ENV and therefore I would go with the more established and more well-known approach.

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 3, 2024

One suggestion for CI setup is that failures of particular CI tasks should not cancel other CI tasks so we can independently identify multiple CI failures in parallel.

Good point, should now be fixed.

Assuming CI passes (the previous error was random, see #2234), I have no other changes in mind that I would like to make. I also don't have merge rights, so someone else needs to make the call on when this is final.

@torfjelde
Copy link
Member

I'd also add my vote to sticking with ENV vs. ARGS, as I've voiced before this PR. My argument is the same as @devmotion ; this is what's done elsewhere within the ecosystem we generally work with. We (the developers) now have to be aware of which package uses ARGS and which uses ENV variables when running tests, which just seems like unnecessary headache without any added benefit.

Also, IMO, when "trying out" stuff like this, it's better to do it in some of the "simpler" packages and then eventually propagate to Turing.jl, rather than first adding it to Turing.jl.

The rest of the PR I think is a very, very good idea @mhauru :) Very keen to have the CI become much less of a pain 🎉

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 3, 2024

For those who would prefer env vars, how would you like to implement being able to include and exclude multiple test files in a given run? In other words, what would be the env var equivalent of doing Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"])?

without any added benefit

I do see some benefits, the most important ones being

  1. Respecting Pkg.test's intended way for selecting tests to run.
  2. Being more inline with the wider software ecosystem, e.g. how testing frameworks in other languages work.

I commented on 1. more extensively above, but to expand on 2., giving command line arguments to include/exclude particular tests is the standard way to do things in for instance pytest and the standard unit testing frameworks of both Go and Rust. I haven't used all that many testing frameworks, but I haven't seen SciML's env var approach elsewhere. I presume this consistency with other languages and frameworks is also why Pkg.test has the test_args mechanism, and why Julia's Stdlib uses it.

Using environment variables for test selection also goes against my idea of what env vars are meant for. I view env vars as global flags that should set some state about the, well, the environment in which the process operates. Kinda like defining a global constant at the top your source code file that affects everything in that package/module. Selectively running tests is more like passing arguments to a function as you call it, where you might for instance call it back to back with different arguments, and that's exactly what command line arguments are for.

@torfjelde
Copy link
Member

In other words, what would be the env var equivalent of doing Pkg.test("Turing", test_args=["optim", "hmc", "--skip", "ext", "nelder"])?

This is a really good argument 👍 You could of course implement the same parsing of an ENV variable, but I very much agree that it's much, much more natural to do so using ARGS.

Respecting Pkg.test's intended way for selecting tests to run.

I that the the docs mention that this is a use-case for args, but does it specifically recommend that over ENV variables?

I'm am genuinely quite keen on giving this pattern a go, but I'm also somewhat worried about doing this in Turing.jl 😕

Has there been a similar discussion elsewhere? I.e. what's the reasoning of the rest of the ecosystem going with ENV variables?

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 3, 2024

This is a really good argument 👍 You could of course implement the same parsing of an ENV variable, but I very much agree that it's much, much more natural to do so using ARGS.

Yeah, I think parsing complicated env vars gets awkward. You could define two env vars though, INCLUDE_TESTS and EXCLUDE_TESTS, both of which could be space separated lists. I don't know if that would satisfy the desire of consistency with SciML practices any more though.

I that the the docs mention that this is a use-case for args, but does it specifically recommend that over ENV variables?

No, I don't think it says anything about environment variables.

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 3, 2024

The test failures seem unrelated to this PR. They seem to rather be about the recent ADTypes compat change interacting in funky ways with Optimization.jl version bounds when using Julia 1.7. I haven't gotten to the bottom of it yet.

@devmotion
Copy link
Member

In other words, what would be the env var equivalent of doing

Probably something similar to JULIA_LOADPATH etc. where a single environment variable is split into a vector of paths/arguments based on a delimiter such as : (using e.g. split(ENV["..."], ':')).

Generally, I haven't encountered the need for such complex inclusion/exclusion of tests yet, so I'm a bit skeptic whether the additional complexity (both for users and developers) is worth it. Having the possibility to run all tests (typically the default but not used in CI) and pre-defined groups of tests has been sufficient in all packages I have worked with. For running even smaller chunks of tests or subsets of test files in local development in my experience typically the most convenient approach is to use TestEnv anyway.

@yebai
Copy link
Member

yebai commented Jun 4, 2024

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations? This way, we support both, adhering to the TestEnv workflow and allowing users to use additional options only available in ARGS. This should be a relatively simple block of code, so I don't expect it will increase developer mental overhead too much.

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 4, 2024

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations?

Yep, we could definitely have both available. Have e.g. the ARGS parsing implemented as in the current version of this PR and append to the list of included tests the value of an env var called TURING_SELECT_TEST. Would everyone be happy with that?

Regardless of ARGS/env vars choice, I would still like to stick to tests being selected by file path rather than by manually grouping them. Benefits are

  • No code needed for introducing the groups. File folders should form natural groups anyway.
  • Can't accidentally have a test that's not within a group, or a group that isn't included in CI.
  • More granularity and flexibility in test selection.

Consequently calling the env var GROUP doesn't feel right, even though that seems to be what other SciML packages do.

@torfjelde
Copy link
Member

For running even smaller chunks of tests or subsets of test files in local development in my experience typically the most convenient approach is to use TestEnv anyway.

So I've also done the same, but having had experience with testing in other languages, I also do sometimes miss the ability to simply filter tests based on their name or whatever. It's particularly nice to do when you're working on bugfixes, since then there's a particular sub-test-suite you want to run, which then usually requires copy-pasting to another file, using TestEnv.jl, etc.

Can we parse/extract test options in both TestEnv and ARGS at the start of runtests.jl and push them into an array of test configurations?

Both also seems non-ideal, no? 😕 I'd rather have to do that (push onto ARGS) manually than us having to support both.

* Restore compat with ADTypes v0.2. Make AutoTapir export conditional.

* Fix AutoTapir export in Essential.jl
@yebai
Copy link
Member

yebai commented Jun 5, 2024

Both also seems non-ideal, no? 😕 I'd rather have to do that (push onto ARGS) manually than us having to support both.

I'm happy only to support ARGS approach from my side: I think the ARGS approach has fairly sensible mental overheads, given its similarity to standard terminal commands.

@torfjelde
Copy link
Member

I'm also slowly coming to terms with the ARGS approach. In hindsight, I'd much prefer to have tried this out in another package, e.g. Bijectors or something, which is "less mission critical". But given the fantastic work @mhauru has done here, it might be worth just giving it a go in Turing.jl 🤔

@yebai
Copy link
Member

yebai commented Jun 5, 2024

@mhauru, feel free to merge this PR when you are ready.

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 5, 2024

Given @yebai and @torfjelde have come around to args and @devmotion said earlier that he's not too bothered if others are happy with this, I'll go and merge. Everyone seems to agree that this is regardless an improvement over the current situation, and if people miss having a SciML-style GROUP env var we can easily add it by building on the mechanisms introduced here.

@mhauru mhauru merged commit 03d0e78 into master Jun 5, 2024
56 checks passed
@mhauru mhauru deleted the mhauru/test-restructuring branch June 5, 2024 15:29
@torfjelde
Copy link
Member

torfjelde commented Jun 6, 2024

For other people who are lazy: you might find it useful to add a file called, say, test/testrunner.jl which contains the following:

using Pkg
Pkg.test("Turing", test_args=ARGS)

then you can do

julia --project test/testrunner.jl a b c -skip d e

@torfjelde
Copy link
Member

Even better, in general you can do:

using Pkg
Pkg.test(Pkg.project().name, test_args=ARGS)

though Pkg.project is apparently an experimental feature.

@yebai
Copy link
Member

yebai commented Jun 6, 2024

@torfjelde can you add this tip to the following docs page: https://turinglang.org/docs/tutorials/docs-01-contributing-guide/#tests

@torfjelde
Copy link
Member

Question is whether we should just have a small script test/testrunner.jl in the actual repo that does this? Almost seems worth it, no?

@mhauru
Copy link
Collaborator Author

mhauru commented Jun 6, 2024

This seems like a common problem for many Julia packages, is there no standard solution to this? It feels frustrating that julia --project test/runtests.jl -- a b c --skip d e so nearly works. The only reason it doesn't is that it doesn't create a TestEnv.

@devmotion
Copy link
Member

I'd suggest running

julia --project=. -e 'import Pkg; Pkg.test(; test_args=ARGS)' -- a b c --skip d e

if you don't work in a Julia REPL and want to execute the tests.

@yebai
Copy link
Member

yebai commented Jun 7, 2024

@devmotion, it looks very neat -- I added it to the docs: https://turinglang.org/docs/tutorials/docs-01-contributing-guide/#tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants