# Mini-unit #5: Testing your code

You may recall [name] from the first mini-unit, whose code error lead to a paper retraction. He is not the only to experience this - there have been numerous examples of code errors impacting published studies [insert a few refs]. Based on his experience, [name] now implements "software-engineer level testing" of research code. This mini-unit will teach you what that means and how to do it.

In this mini-unit, we will focus on testing to detect actual code errors in code you write. You should also be testing to uncover issues in how you design your processing, analysis, etc and how you use other people's code - check out Unit [], which covers how to do this!

A lot of this mini-unit is inspired by Patrick Mineault's excellent chapter on testing in The Good Research Handbook: https://goodresearch.dev/testing.html. Check this out for further reading on the topic!

## Video

This video will include handwritten notes and live coding so slides aren't prepared.

Script:

Errors in code happen - they're difficult to avoid! So, you need to be checking and testing your code to catch errors and ensure it is correct, otherwise your research conclusions could be erroneous! The previous mini-units have already set you up well to test your code - it is much easier to test a suite of small functions than one long script.

We will use a concept from software engineering called a *unit test* to give us a useful framework to systematically test our code. While anyway of checking your code is a good way, using this framework can be less stressful and less work, and allow for easier refactoring of your code.

So what is a unit test? It is a test of a small piece of code, like a function, to make sure it is doing what it should.

Let's back up and talk a bit about informal testing of your code.

Let's jump in with an example. This function, taken from Patrick Mineault's chapter on Testing, computes the Fibonacci sequence, which is defined as:

F(x) = F(x-1) + F(x-2)
F(0) = 0
F(1) = 1

The first few numbers in this sequence are 0, 1, 1, 2, 3...

So, how would you go about checking that you correctly wrote this function? You would probably want to try out some inputs and make sure you get the correct output.

```
>>> fib(0)
0
>>> fib(1)
1
>>> fib(2)
1
>>> fib(6)
8
>>> fib(40)
102334155

```



This is great to check inputs vs outputs, and this is essentially the idea behind unit tests. You are testing that the function is doing what it should for a bunch of different inputs.

Testing your function in this way is good, but has some downsides. If you're just calling the function via a command line interface or notebook, you may not be saving the expected outputs or any of your hard work on designing good tests . Later, if you change anything about the function and want to test it again, you might need to do it all over again!

**Assert testing**
An easy step-up for testing is to use `assert` statements. This is a statement in Python that throws an error if the statement following assert is False. Let's put all of our former informal tests as assert statements.

```
assert fib(0) == 0
assert fib(1) == 1
assert fib(2) == 1
assert fib(6) == 8
assert fib(40) == 102334155
```

When working with .py files, we can include this in the same file where the function lives, wrapped in the __name__ == '__main__' statement so we can directly run the file to run the tests. This way, the tests are right there with the functions and are documented well. You don't have to remember what the expected output of your test should be, an error will be thrown if one of the tests fails. Now, you can refactor the function and still run all the same tests by running the file directly.

**Testing suite** Python has several frameworks for unit tests that allow you to group all your tests together and run them in a more automated way - you don't even have to remember to directly run each file to trigger the assert tests! I'll cover one such framework, `pytest`, in the advanced optional add-on. The idea is very similar to the assert statements - you're just wrapping them in functions and then you can run all the tests in your whole code base at once. It also enables better error messages when tests fail than the assert strategy, and allows you to test more. For example, it allows you to test that a function raises an error for an invalid input. All in all, I highly recommend checking out the video on this!


**What to test**: Whether you are using the assert simple approach or a full testing suite, what matters most is how you design the tests. You need to be very explicit about what you expect as the output and compare to that. In the fibonacci function the inputs and outputs are quite simple, but when you are dealing with more complex functions that do analysis, you may need to spend a bit of time computing what the output should be by hand.

What sorts of things should you test?
- Correctness for typical inputs
- Edge cases (weird inputs: often very small or large numbers)
- Errors for bad inputs (what happens if you input -1?)
- Shape of inputs and outputs

We are focusing on testing pure functions without side effects as that's what we recommend writing. If you have more complicated functions or classes that depend on some state (like a random seed) or are doing something other than simply returning outputs (like saving out data), your tests need to be a little more complex.

If you thoroughly test each function in your _src, you can be much more confident that your scientific code is accurate and that your results are valid! You can also save yourself time as you do not need to recheck everything when refactoring your code.  As a bonus, you'll often find that thinking through the tests gives you more insight into what the function should be doing, especially for odd cases. In fact, some coders find this so useful that they do *test-driven development*, which means they write all the tests for the function before actually writing the function.

# Coding Exercise: Catching Errors

For each example below, try writing multiple unit tests for it using the `assert` statement strategy. Within this context, write each `assert` statement in a separate code cell, otherwise if an earlier one fails you won't even run the following ones.

Try to find any errors the function is making and if they do exist, figure out what the error is. If you have time, try fixing the function to pass your tests - it'll be good practice!

**A)**



In [11]:
def is_prime(number):
    """Checks if a number is prime.

    Args:
      number (int): The number to be checked.

    Returns:
      bool: True if the number is prime, False otherwise.
    """

    for possible_factor in range(2, number):
        if (number % possible_factor) == 0:
            return False

    return True

In [None]:
# Write your assert statements to test is_prime here (remember to have one assert statement per code cell)




Did you find any errors in the function?

<textarea name="comments" id="comments">
...
</textarea><br />
<input type="submit" value="Submit" />

In [21]:
# Our solution

assert is_prime(0)==False
assert is_prime(1)==False
assert is_prime(4)==False
assert is_prime(7)==True
assert is_prime(797)==True
assert is_prime(14161729)==True
assert is_prime(-7)==False
assert is_prime(-4)==False
# note that ideally we'd check that there is an exception for weird inputs like a
#  string, but we need the testing suite to do this! assert can't check this

def is_prime(number):
    """Checks if a number is prime.

    Args:
      number (int): The number to be checked.

    Returns:
      bool: True if the number is prime, False otherwise.
    """

    if number <= 1:
      return False

    for possible_factor in range(2, number):
        if (number % possible_factor) == 0:
            return False

    return True

**B)**


The function `find_nth_from_ends` should get the nth largest and smallest unique values from the input numpy array. So for example, if n = 2 and there are multiple entries of 100 tied for maximum, the second largest value would not be 100, it would be the next highest value present in the array.

Brief reminder: you can create a 1-d array in numpy with:

 `my_array = np.array([1, 5, 0])`

and a 2-d array with:

 `my_array = np.array([[4, 1, 2], [5, 5, 0]])`

In [25]:
import numpy as np

def find_nth_from_ends(array, n):
    """Returns the nth largest and smallest values in the array

    Args:

        array (ndarray): the array in question

        n (int): how many numbers lower than max/higher than min to return


    Returns:

        float/int, float/int: nth largest value, nth smallest value (data type matches array)
    """

    # Get unique values
    unique_values = np.unique(array)

    # Sort array from smallest to biggest
    sorted_array = np.sort(unique_values, axis = None)

    return sorted_array[-n], sorted_array[n - 1]

In [23]:
# Write your assert statements (remember to have one assert statement per code cell)



Did you find any errors in the function?

<textarea name="comments" id="comments">
...
</textarea><br />
<input type="submit" value="Submit" />

In [27]:
# Our solution

# One dimensional array
a = np.array([4, 6, 8])
outs = find_nth_from_ends(a, 2)
assert outs == (6, 6)

# Two dimensional array
a = np.array([[4, 6, 8], [2, 1, 0]])
outs = find_nth_from_ends(a, 2)
assert outs == (6, 1)

# All the same value
a = np.zeros((6, 10 ,3))
outs = find_nth_from_ends(a, 2)
assert outs == (0, 0)

# Negative n
a = np.array([[4, 6, 8], [2, 1, 0]])
outs = find_nth_from_ends(a, -1)
assert outs == (6, 1)


# Larger n than number of elements
a = np.array([[4, 6, 8], [2, 1, 0]])
outs = find_nth_from_ends(a, 10)


def find_nth_from_ends(array, n):
    """Returns the nth largest and smallest values in the array

    Args:

        array (ndarray): the array in question

        n (int): how many numbers lower than max/higher than min to return


    Returns:

        float/int, float/int: nth largest value, nth smallest value (data type matches array)
    """

    # Get unique values
    unique_values = np.unique(array)

    # Sort array from smallest to biggest
    sorted_array = np.sort(unique_values, axis = None)

    # Check if n is larger than array size
    if n > array.size:
        print('n is bigger than the number of elements in the array')
        return sorted_array[0], sorted_array[-1]
    else:
        return sorted_array[-n], sorted_array[n - 1]


IndexError: ignored

Video live-coding solution above and talking through it.

# Advanced (optional) add-on: Using pytest in VSCode