Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing: Split test string #2789

Closed
Lestropie opened this issue Jan 28, 2024 · 6 comments
Closed

Testing: Split test string #2789

Lestropie opened this issue Jan 28, 2024 · 6 comments

Comments

@Lestropie
Copy link
Member

Currently, there are quite a few tests where a single test string actually contains:

  • One or more commands that generate intermediate data prior to running the test
    (partly to reduce storage space of the test data, partly to make the nature of the test data self-explanatory)
  • The command invocation itself
  • Evaluation of the outputs from the command, typically by comparing to a pre-generated output from an earlier version

Questions primarily for @daljit46:

  1. Can ctest "classify" lines based on which of the three categories each belongs to?
  2. Is there an "industry accepted" way of arranging such?
@daljit46
Copy link
Member

  1. I think the answer here is no. CTest is just a test launcher, it's not aware of what those tests are doing.
  2. I don't know any big software projects that implement tests the way we do, although there may be projects like terminal shells that use a similar strategy for part of their testing. Looking for "approval testing" on Google, I came across this project https://github.com/DannyBen/approvals.bash.

@Lestropie
Copy link
Member Author

Given:

I don't know any big software projects that implement tests the way we do

, is the complexity I'm pointing out a problem that arises because of such non-conformity, and adopting a more conventional approach would naturally resolve it? Or does it appear to you that our "unconventional" testing approach is out of necessity?

Alternatively, I'm wondering if your comment is moreso an observation on our extensive use of regression testing rather than unit testing?

Just picking at your brain to see if there's a logical & efficient way to change our tests across the board.

@daljit46
Copy link
Member

daljit46 commented Jan 31, 2024

For our usecase, I think that our tests are fairly suitable especially since they emulate how an end user would use the tools that we develop.

I think the "peculiarity" of our testing framework is mainly due to the use of bash scripts (which is sort of made necessary due to our commands workflow). I was just pointing out that because of this, I don't know about the existence of guidelines and best practices that are commonly used in this scenario.

Alternatively, I'm wondering if your comment is moreso an observation on our extensive use of regression testing rather than unit testing?

On this point, I don't think that unit testing and integration/approval testing are mutually exclusive. In an ideal scenario, we would have both. For example, an obvious limitation of our test suite is that we don't test any GUI code. This is because there is not obvious way of running mrview and verify an "output". While GUI testing is extremely hard, unit testing would at least allow you to test part of the UI logic without running the entire executable.
Furthermore, unit tests may allow you to test certain edge cases more easily than integration tests (e.g. suppose you want to test a square root calculator implemented using add,sub,mul and div functions, then an integration/approval test would not be easily be able to test a division by zero in div since you may not know what input in the general case would trigger that path). Of course, more testing is considerable effort upfront, but it may also lead to less effort in the long-term (fewer bugs). Introducing unit tests to the codebase would be fairly easy and note that we don't have an obligation to write tests for everything. Just writing tests for (parts of) new code would be an improvement over the current situation.

Just picking at your brain to see if there's a logical & efficient way to change our tests across the board.

I've thought about this when I was shifting the build system to CMake. I think for example, we could probably provide the delineation between the "types" of operation in a bash command by delegating the running of commands to designated bash functions (similarly to the Github project I linked above). This would allow for better error messages and a more structured way of writing tests. The downside would be that our bash tests would (slightly) distance themselves from how users use mrtrix in practice, so I'm not sure this would be worth it.

However, I do think there are few low-hanging fruits that we can pick. One of them would be that our tests don't have any description or even a name. It's very difficult to understand what exactly a test is doing and why a certain output is expected. I think we should at least provide comments for each test to improve this (or some other mechanism like a custom bash function), so that a reader doesn't need to know the exact mechanics of each command to be able to understand what's going on.

@Lestropie
Copy link
Member Author

Yes, the absence of labels for the tests is something I have been thinking about in this context that I didn't state in the OP. I've tried adding comments in some test bash script files, but particularly now using ctest that then requires cross-reference. If, for each executable, we were to create a directory for that executable, and put each test of that executable into its own file, then the names of those files could convey the purpose of each test. It would also mean that the three steps bullet-pointed in the OP could be split across multiple lines for each individual test. That seems like a pretty sensible change to me. The catch here is that quite often, there'll be eg. an mrconvert call at the commencement of the first test, for which many subsequent tests use the output. But having such order dependency in a set of tests is bad practise anyway, it was purely about efficiency when making our set of tests greater than zero.

So to be clear, what I'm proposing is to change from:

$ cat testing/binaries/tests/dwiextract

dwiextract dwi.mif - | testing_diff_image - dwiextract/out1.mif
dwiextract dwi.mif -bzero - | testing_diff_image - dwiextract/out2.mif

to something like:

$ cat testing/binaries/tests/dwiextract/default

dwiextract dwi.mif tmp.mif
testing_diff_image tmp.mif dwiextract/out1.mif

$ cat testing/binaries/tests/dwiextract/bzero

dwiextract dwi.mif -bzero tmp.mif
testing_diff_image tmp.mif dwiextract/out2.mif

@Lestropie
Copy link
Member Author

One other factor here is that when CI tests were first introduced, we struggled to get them running within the timeframe afforded by the CI service. As such, there are a number of tests where there is a conversion step done inline as part of one of the initial tests, and the result of such is re-used by later tests. Since then, not only are CI resources typically beefier, but also ccache should reduce the Action job time required for compilation. Therefore I think I would want to have it so that no test is dependent on the data from any other test, and also each test should clean up after itself, rather than relying on *tmp* globs.

@Lestropie
Copy link
Member Author

Rather than each test having to manually specify its own cleanup, could instead have ctest run each test with a trap to delete any temporary files regardless of test success or failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants