# Automated testing techniques

## Arrange, Act, Assert

In the absense of a compelling reason to do otherwise (which is rare), each test case should test exactly one thing. Each test should clearly express what claim about the code under test it is testing, and test that claim and that claim only. Often it is feasible to name the test case with a name that will be read by humans as making that claim (e.g., `test_widgets_are_nontoxic`).

A test case should:

1. **Arrange.** Zero or more actions that set up for what is being tested. Some or all of these actions may be done in test *fixtures* rather than in the test cases themselves, but they are done in preparation for an individual test case.

2. **Act.** An action that exercises the code under test (whose effects will be checked in the subsequent *assert* step).

3. **Assert.** An assertion about the effect of the action.

Arrange-Act-Assert is sometimes regarded to be incomplete. Arguably a fourth kind of test logic deserves to be distinguished explicitly:

4. **Cleanup.** Zero or more actions that tear something down that was used for testing&mdash;freeing resources or restoring an invariant. Some or all of these actions may be done in test fixtures rather than in the test cases themselves, but they are done to clean up after an individual test case.

These steps, where present, should always be clearly identifiable in your test code. When these steps appear within a test case, they should not be written out of order. That is, a lower-numbered step in the above list should not be written above a higher-numbered one. (In *rare* situations, it may be necessary to break this rule.)

In addition to not writing them out of order, you should always strongly consider separating all these steps so that each statement does only one of those things. (You may still often need to have multiple statements that do one of them, of course.) For example, if you are going to call a function and assert something about the value it returns, you should strongly consider assigning the result to a variable, then using that variable in an assertion on a subsequent line.

Controversial claim: The steps should not appear out of order, but in some situations, particularly very simple ones with a large amount of repetition, it may be defensible to combine the steps so that, for example, a statement is both acting and asserting, or both arranging and acting.

- If you do this, you should always strongly consider taking the more orthodox approach of strictly separating them, understand why you are deciding not to do that, and be able to defend your decision to a hypothetical interlocutor.

- Some people say this should never be done.

- Do not do it if it makes tests even slightly: less clear, harder to read and understand, more complicated, or harder to verify for correctness.

It should be readily apparent to anyone reading a test what logic in that test is arrangement, what logic is acting, and what logic is asserting. It is only defensible to choose to do more than one of these kinds of things in the same statement if you are confident that doing so does not make this distinction any less clear to anyone reading the code.

As an opposite approach, some people write `# Arrange`, `# Act`, and `# Assert` comments identifying separate sections of their tests. This is not wrong. But I suggest against doing it, or at least against doing it *habitually*, because that information should always be readily apparent in the *code* of the test. It is possible to write complicated or unclear tests where the distinction between arrangement, acting, and asserting is not clear even if the code is separated out. If you find you're doing that, you should redesign the test, if possible. If you do separate the sections and comment them, you should make sure the distinctions would still be fully clear even without the comments.

**A test case should almost always have exactly one assertion.** In particular, if you have more than one assertion, it is a strong sign you should be writing more than one test. One solution can be to parameterize the test.

Subtests are a form of parameterization and, if separate assertions are done in separate subtests, then the rule to have exactly one assertion is not violated (though you should still make sure you understand why you have decided to use subtests rather than separately written tests or parameterization by some means other than subtests).

Sometimes it is not feasible to avoid having multiple assertions, and not even feasible to place them in separate subtests. For example, it may be that two claims must be asserted separately for the test code to clearly express what those claims are, but that those claims are very closely tied, such that it is impossible or very misleading to test the second one unless the first one has been found to hold.  In such a situation, you may need to write a test with multiple assertions. But make sure:

1. *That the first claim really is part of what you are trying to test and make an assertion about.* If instead the second claim is the one the test is really about, and the first claim is just to verify that the preconditons necessary to test the second claim have been established, then you should check for the first claim and cause the test to *error out* (rather than merely failing). The best way to do this may vary by testing framework, but often you can do it by checking for the precondition with an `if` statement and raising an exception directly. This is one of the rare situations where it it can be reasonable to raise `Exception` (rather than a more derived exception type). Make sure your exception message clearly states the reason for the error.

2. *That it would actually be wrong, or at least clearly undesirable, to test the second claim if the first has failed.* If it would always be acceptable to test both, then even if you cannot reasonably write separate tests or parameterize the whole test case, you can likely use subtests (assuming your testing framework supports subtests or you have extended its functionality to do so, such as with a plugin).

## Testing overlapping functionality

There are three (or four) main ways of testing classes/functions with substantial overlapping functionality. In summary:

0. Only test the functionality of one of them (usually bad, *sometimes* okay in inheritance).

1. Parameterize the test code, typically at the level of a class that collects test-case methods.

2. Use inheritance, either between test classes or from a common ABC, to share test methods (with an attribute/property for the implementation).

3. Manually reproduce the shared testing logic (often excessively cumbersome; this may worsen or improve readability).

Detailed explanations follow, including examples of where each of these four approaches has been used in this project.

### 0. Only test the functionality on one of them.

This is usually bad. It is *sometimes* okay in inheritance.

When you have an abstract class that represents an interface but does not facilitate code reuse&mdash;that is, it has no concrete methods&mdash;then it is often reasonable not to write tests for it. An abstract class without any concrete methods is effectively making a *claim* about what it entails to be kind of thing. It would be hard for the tests to do anything except restate those claims more obscurely. There may occasionally be particularly important information about an interface-representing abstract class that deserves testing, though.

**Other than that common situation, we've done this in `enumerations.py`**, where `BitsetEnum` does not have tests, but its derived class `Guests` does. The `Guests` tests exercise all the functionality that `Guests` inherits from `BitsetEnum`.

Sometimes it is done the other way: a test is written for a base class but not for the derived class. This is much riskier because the derived class is usually in some way functionally different from the base class, and bugs in code that implements the specialized logic would not be found by testing the base class.

### 1. Parameterize the test code, typically at the class level.

Often the best and most straightforward way to test that claims hold true of multiple entities in code under test is to parameterize tests at the level of a class that collects test-case methods.

Depending what testing framework is used and other factors, this may entail adding parameterization to an existing test class, or collecting otherwise separate methods into a class to parameterize it.

**We have done this in `test_simple`** for `make_squarer` and the `MulSquarer` and `PowSquarer` subclasses of `AbstractSquarer`, demonstrating that:

- Multiple test classes can be parameterized separately when some tests apply to more entities than others.

- The decision of whether to inherit can be made separately in code and test. `MulSquarer` and `PowSquarer` share a base class `AbstractSquarer`, but we did not organize our tests using inheritance.

### 2. Inherit shared test code.

Another approach is to use inheritance, either between test classes or from a common ABC, to share test methods.

One of the purposes of inheritance is code reuse, and this can be used for sharing test cases across multiple test classes, where each derived test class tests some distinct entity in the code under test.

In this technique, an attribute or property, specialized in each derived class, specifies the function or class that the test should exercise. Test cases intended to be inherited then use that attribute/property rather than explicitly writing the name of the entity to test.

**We have done this in `test_queues.py`** for all tests, and **in `test_simple.py`** for the *toggle* tests.

When using a base class that exists only to supply test cases to derived classes, make it an ABC if you can, so if a test runner attempts to instantiate it, the failure is early and clear. Abstract or not, you must ensure test runners will not collect and attempt to run tests directly from it. How to do this varies by framework. This is a bit trickier in `unittest` than in some other frameworks like `pytest`. See "Hiding classes from `unittest`" below.

Sometimes no abstract class is needed. If you want to test some claims about `X`, and test all those claims plus some other claims about `Y`, then `TestX` and `TestY` could both be concrete, with `TestY` inheriting from `TestX`.

### 3. Manually reproduce the shared test logic.

This is often excessively cumbersome. But it can be valuable in some situations.

Manually reproducing the test logic may worsen or improve readability. The situations where it is reasonable to consider include:

- There is very little testing logic and it is already written, so combining shared logic is not worthwhile.

- Abstracting out the differences creates confusing tests. (Though sometimes this means the assumption that the code under test has overlapping functionality was not really correct.)

- Abstracting out the differences creates tests that reproduce logic from the code under test. For example, separately implemented tests of `repr` are often able to simply state what the result should be, while sharing them may end up building the `repr` in a manner analogous to the code under test.

   It is bad for tests to reproduce the logic under test. When they do, the same bugs will often appear both places and go undetected, and tests won't clarify or document the code.

- Separating the logic facilitates writing, formatting, or commenting the test code in a way specific to the code being tested, illuminating something about the claims being made.

- Manually duplicating the tests allows them to be given *names* or *docstrings* that differ in a way that clarifies something important.

- It is not known to be the case that, if one of the duplicated tests must be changed, then the other(s) would need to be changed in a corresponding way. Or maybe it is known that they cannot change separately, but that knowledge is not part of what you are testing and is not related to code correctness. $1 + 2 + 3 = 6$ and $1 \times 2 \times 3 = 6$, but this should not be verified by a test case parameterized by operator.

   Unlike the other situations listed here, this one often justifies keeping duplication *outside* tests, too. If, by coindicence, `DEFAULT_BRIGHTNESS = 76` and `TAX_ID_SUFFIX = 76`, you should still not write `DEFAULT_BRIGHTNESS = TAX_ID_SUFFIX = 76` in the code under test, and they shouldn't be tested together either.

There may also be situations where it would be preferable to use some other technique, but practical considerations or organizational limitations forbid using the libraries that would be needed to use those techniques well, or forbid using some of the techniques due to the need for the tests to be understandable to novices or to engineers who primarily work in some other language.

**`test_bobcats.py` duplicates shared test logic** for testing the `Bobcat` and `FierceBobcat` classes.

## Hiding classes from `unittest`

In [1]:
import unittest

Three ways to keep a test runner of `unittest` tests from picking up test classes:

#### 1. Delete the names (variables) that refer to the base classes, afterwards.

In [2]:
class _BaseClassForSomeTests(unittest.TestCase):
    ...  # Blah blah, shared tests.

class TestWidgets(_BaseClassForSomeTests):
    ...  # Anything else we need for widget tests.

class TestGadgets(_BaseClassForSomeTests):
    ...  # Anything else we need for gadget tests.

del _BaseClassForSomeTests
# TestWidgets and TestGadgets are still derived classes of _BaseClassForSomeTests.

#### 2. Nest the base classes inside another class (created for that purpose).

In [3]:
class _Bases:
    class SharedWidgetGadgetTests(unittest.TestCase):
        ... # Blah blah, shared tests.

class TestWidgets(_Bases.SharedWidgetGadgetTests):
    ...  # Anything else we need for widget tests.

class TestGadgets(_Bases.SharedWidgetGadgetTests):
    ... # Anything else we need for gadget tests.

#### 3. Put the base classes in a separate module the test runner won't find.

We might have file `base_test_classes.py`, so the module name doesn't start with `test` (and to be safe, some test runners, in some configurations, look for `test` at the end, too, so avoid that). In that module:

In [4]:
class BaseClassForSomeTests(unittest.TestCase):
    ...  # Blah blah, shared tests.

Then one or more other actual test modules (`test_whatever.py`) can import that module or import classes from it:

```python
import base_test_classes

class TestWidgets(base_test_classes.BaseClassForSomeTests):
    ...  # Anything else we need for widget tests.
    
class TestGadgets(base_test_classes.BaseClassForSomeTests):
    ...  # Anything else we need for gadget tests.
```

This approach is the least commonly done, and is mostly only reasonable when two or more separate test modules would benefit from using the base classes, or when it would otherwise be desirable to have the tests in a separate module (even if only one other module is going to use them).

## Learning Check / Discussion Questions

### Question 1

What is a test fixture? Which kinds of test logic do test fixtures typically contain? How are test fixtures typically written in the `unittest` test framework? What are three places where test fixtures are used in this project, and how are they used in each? (Make sure your examples cover more than one kind of test logic achieved by fixtures.)

### Question 2

Recall that sometimes you anticipate a test may fail due to a particular exception being raised, and you want to make sure the test communicates it is checking for that specific failure, with a descriptive message. Furthermore, if a test fails due to an unhandled exception, that is by default reported as the test not merely *failing*, but as having an *error*. But if the test specifically exists to anticipate and report a particular exception being raised, you may not want that considered an *error*. You've encountered and dealt with this situation. Where did you do that? What did you do?

The writeup above does not cover that situation. It does, however, cover the *opposite* situation: when, to be meaningful, a test depends on something having been successfully arranged, but attempting to arrange it can fail silently, so the test checks to ensure it has succeeded. Such a check should not be done with a test assertion, because if the check fails, that represents the test having an *error*, not the test merely *failing*. (Furthermore, it should not be done with an `assert` statement, which raises `AssertionError`, as do test assertions, so those effects can be confused.) What is the technique this writeup recommends for those situations?

Where is that technique used already in this project? You may not yet have written code that uses it, but as of this writing, the technique appears 13 times in the project. Make sure you find all those occurrences. (You should use tooling to find them, rather than manually reading through every test module and hoping to notice them.) Describe at least two of these checks in detail, picking two that substantially differ from one another.

### Question 3

Sometimes code under test has overlapping functionality; then the tests should usually have overlapping functionality, too, in order to make the same or similar assertions about the behavior of the code under test. But overlapping functionality in tests may or may not be achieved the same way as in the code under test. Even if they both use techniques to reduce or eliminate code duplication (i.e., to achieve code reuse), they may or may not be using the same techniques.

Where do we have code reuse by inheritance in code under test, but test-code reuse by parameterization? Where do we have overlapping functionality in code under test that is *not* achieved by inheritance, but test-code reuse by inheritance? (In the latter case, the code under test may not use inheritance at all, or it may use inheritance but not to achieve code reuse. Recall, and make sure you understand, the other benefit inheritance may offer, besides code reuse.)

### Question 4

Are there ways the bobcats tests benefit from manually reproduced test logic? If so, what are some of them? If not, or if so but you consider the benefits not to justify the reproduction, then what technique might you use to reduce or eliminate the duplicated test logic?

### Question 5

When using the nested base class technique to prevent `unittest` from collecting test cases, should the class that contains the base classes be named starting with an underscore? Why or why not? Should the base classes it contains be named starting with an underscore? Why or why not?