Skip to content

TestStrategy

Ben Christel edited this page Sep 3, 2022 · 1 revision

Someone posted this interview question in a forum recently: "please describe a good testing strategy and the different kinds of tests and what each of them are good for". I think it's an interesting question, and possibly even a good one.

There's a lot to unpack in this question. Several of the words used are vague; I'll take this as a cue to provide my own definitions.

testing

Testing is the practice of running a piece of software in a more or less controlled environment (i.e. with known inputs and initial state) to find out what it does. Testing can be manual or automated, formal or informal. The information gained from a test can be about the software's Behavior (in which case we call it FunctionalTesting), or it can be about a non-functional attribute of the software like speed, memory usage, or throughput.

Non-functional testing is an area that I'm not very familiar with. My impression is that reliable tests for performance are quite difficult to write (TODO: link to article about benchmarks that don't measure what they purport to due to compiler optimizations), and trying to formalize performance requirements leads to headaches (inevitably, the performance requirements get relaxed when software starts to push against them and the tests fail). The teams I've worked on mostly didn't write formal tests for performance; instead we preferred to monitor performance in production, and subjectively assess performance by using the software (a type of informal manual testing). If we discovered a performance problem (something taking longer than we thought it should) we profiled to find the slow parts and then sped them up. In general, our goal was not to hit specific throughput or latency targets, but to make the software "fast enough" to have acceptable performance for most users.

Because I'm not that familiar with non-functional testing, most of this post will assume that functional testing is what we care about.

strategy

Since the goal of testing is learning ("finding out what the software does" in the definition above), a testing strategy must be a set of high-level principles for choosing testing tactics— techniques for learning what we want to know about the software.

This forces us to dig into the epistemology of testing. What, actually, can be known about software by testing it? If the software is treated as a black box that could have anything at all going on inside, the answer is "almost nothing"—as commentators from Dijkstra and Coplien to Moseley and Marks have pointed out. Accordingly, all software testing is carried out with either knowledge (in the case of white-box testing) or assumptions (in the case of black-box testing) of the software's internal structure. Our knowledge or assumptions about the software's internals let us assess the risk that test B might fail even if test A passes. If we think B is very unlikely to fail given that A is passing, we probably won't bother to perform test B.

ROI

Whenever we write a test, we should ask "what bug is this test protecting us from?" "how bad would that bug be?" "how easy is it to make the mistake that would introduce the bug?" and "what's the cost to run and maintain this test?" While it's hard to get numeric answers to any of these questions, answering them qualitatively can provide a sense of a test's return on investment (ROI).

I've seen many tests that seemed to have negative ROI. A couple of standout examples:

  • Heavily mocked unit tests for tiny IdeaFragments of code
    • Taught us nothing about the software—the tests just restated the AST of the production code, but more verbosely.
    • Had to be painstakingly updated whenever we added a feature
    • Didn't fail when the system had a bug
  • Elaborate GoldenMasterTesting system tests
    • Took a long time to run
    • No one understood what they were actually testing (i.e. what bugs they protected us from) because the inputs and outputs were thousands of lines long.
    • Often failed when we added new features
    • When they failed, the procedure for updating them was to just overwrite the old "gold" output—usually without understanding what the output actually meant, or assessing whether it was what we expected.

Code coverage

Code coverage metrics were an element of the testing strategy at several of the companies I've worked for. While I find code coverage tools useful for finding untested code, I think treating code coverage as a target actually works against the goals of testing.

This is because the code coverage tools I see in actual use just count the lines that are executed by a test. They tell you nothing about whether a test assertion will fail if a given line is changed. Indeed, the easiest way to get 100% code coverage is by writing tests that have no assertions at all.

The other problem with code coverage targets is that they tend to force programmers to write tests with negative ROI—e.g. the heavily-mocked tests I described earlier.

The way to do code coverage right—MutationTesting—is very expensive in terms of computer time, and there are few tools available. Perhaps in the future more abundant computing resources will make mutation testing feasible for large codebases. Until that happens, my preference is to treat code coverage as a source of information (albeit a not-very-reliable one), but never as a target.

good

What does "good" mean in this context? What might we want from a testing strategy?

  • the tests should reveal problems in the software
  • the tests should reveal problems as early as possible—certainly before the software ships to production, and ideally as soon as the faulty code is written.
  • test failures should help the programmers figure out what's wrong—i.e. the failure messages should be specific and easy to interpret.
  • the tests should not fail when there is no problem.
  • the tests should be easy to adjust when the software's behavior is changed deliberately
  • the tests should be isolated and reproducible. Given a snapshot of the codebase, each test should consistently pass or consistently fail, no matter whose machine it is running on, when it is run, or whether other tests have run before or after it. This ensures that the test results are not muddied by variables other than the one of interest—the behavior of the system under test.

What is a good testing strategy?

What are the kinds of tests?

Clone this wiki locally