# Introduction to PyTesting

When writing machine learning code in Python, there are several testing frameworks available that can be used to ensure the quality, behaviour and correctness of your code. Using a testing framework for your ML code promotes code quality, reliability, and maintainability. It allows you to catch bugs early, reduce errors, and have confidence in the performance of your machine learning models.

## Benefits of testing code

Validation: Testing frameworks enable you to validate the correctness of your machine learning models and algorithms. By writing tests, you can verify that your code behaves as expected and produces the desired results.

Regression Testing: Machine learning code often evolves over time, and changes made to the codebase can introduce new bugs or break existing functionality. Testing frameworks help in implementing regression testing, allowing you to detect and fix issues when modifying your ML code.

Documentation: Writing tests alongside your code provides executable documentation that demonstrates how your code should be used and the expected outputs. This helps other developers understand and utilize your ML code more effectively.

Continuous Integration: Testing frameworks are commonly used in continuous integration (CI) pipelines to automatically run tests on code changes. CI ensures that your ML code remains functional and reliable as you develop new features or make modifications.

Code Maintainability: Tests act as a safety net, making it easier to refactor or modify your code with confidence. They ensure that the changes you make do not introduce unexpected errors or regressions.

Collaboration: Testing frameworks make it easier for multiple developers to collaborate on ML projects. By running tests, everyone can quickly verify that their changes have not broken existing functionality.





## Popular frameworks available for Python

Some popular testing frameworks in Python include:

__unittest:__ This is a built-in testing framework in Python's standard library. It provides a set of tools for constructing and running tests. unittest is widely used and offers a comprehensive testing solution.

__pytest:__ It is a third-party testing framework that provides a more concise and flexible approach to writing tests compared to unittest. pytest supports advanced features such as fixtures, parameterized testing, and test discovery, making it popular among developers.

__doctest:__ This framework allows you to write tests within the documentation strings (docstrings) of your functions, making it easier to keep tests and code documentation in sync. doctest is lightweight and useful for simple test cases.

__nose:__ It is a test discovery and execution framework that extends unittest. nose automatically discovers test cases and provides additional plugins and features for testing Python code.

In this session we will be using __PyTest__ to write and run our tests.

## Testing Drawbacks

**Sometimes we don't know the answer...**

Unsupervised methods can give us unexpected results, the same input may produce
different results after every run.

**Sometimes the outcomes aren't equal**

np.nan does not equal np.nan, as np.nan is a special floating point number which cannot be equal to any other variable.

**How many tests should you write?**

It can be time consuming to produce tests, but what is the cost of your code being wrong?

You might think you'll only need to test something once, but you'll ne thanking your past self when you do a bug fix and realise that it had a knock on effect on your previously working pipeline.



## Types of tests

Test | Description
-|-
Unit Testing | Tests parts of the code in chunks.
Regression Testing | Looks for a specific output given a certain input.
Functional Testing | Tests for a specific behaviour.
Fuzzing Testing | Testing random data.
Stress Testing | Attempting to overwhelm/flood the system to check for stability.


# Let's begin!

As we are using Google Collab we will be using ```!```
at the start of our statements to run commands similarly to how you would in a terminal.

We can execute our tests using: ```!python -m pytest python_file_name.py```

PyTest runs on any files that start with 'test' and end with '.py'.



In [None]:
# Import and install the necessary packages
!pip install hypothesis


1. Let's mount our google drive to this notebook to access the accompanying Python files.

*Make sure this notebook is contained within 'MyDrive/Colab Notebooks' before running the code below.*

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Switch to the google drive folder
%cd drive/MyDrive/Pytesting

/content/drive/MyDrive/Pytesting


In [3]:
# Check what other files are available in this folder
!ls

calc_mean.py  pytesting_answers.ipynb  test_calc_mean.py
__pycache__   test_calc_mean_1.py


2. We've written a function inside of 'calc_mean.py' to calculate the mean of a list of numbers:

In [4]:
# If we want to display the contents of a Python file we can run
!cat calc_mean.py

def calculate_mean(numbers):
    if len(numbers) == 0:
        return None
    else:
        return sum(numbers) / len(numbers)

3. We want to check it works properly so we've created some simple tests using the Pytest package and the `assert` statement.

The `assert` statement is used to check whether a given expression or condition evaluates to `True` or `False`. If the condition is False, the assert statement raises an AssertionError exception, indicating that the test has failed. Below we use `assert` with the answer calculated by the function and then compare it to our hand calculated answer to check the results match.

In [15]:
# If we want to display the contents of a Python file we can run
!cat test_calc_mean.py


import pytest
import numpy as np
from calc_mean import calculate_mean

def test_calculate_mean():
    numbers = [1, 2, 3, 4, 5]
    assert calculate_mean(numbers) == 3.0

def test_calculate_mean_empty_list():
    numbers = []
    assert calculate_mean(numbers) == None

def test_calculate_mean_single_number():
    numbers = [10]
    assert calculate_mean(numbers) == 10.0

def test_calculate_mean_negative_numbers():
    numbers = [-1, -2, -3, -4, -5]
    # you can also use np.allclose() to assert whether the answers are close
    assert np.allclose(calculate_mean(numbers), -3.0)


In [17]:
# To run the test
# The -v flag shows the tests as they are being processed
!python -m pytest -v test_calc_mean.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /content/drive/MyDrive/Pytesting
plugins: anyio-3.6.2
[1mcollecting ... [0m[1mcollected 4 items                                                              [0m

test_calc_mean.py::test_calculate_mean [32mPASSED[0m[32m                            [ 25%][0m
test_calc_mean.py::test_calculate_mean_empty_list [32mPASSED[0m[32m                 [ 50%][0m
test_calc_mean.py::test_calculate_mean_single_number [32mPASSED[0m[32m              [ 75%][0m
test_calc_mean.py::test_calculate_mean_negative_numbers [32mPASSED[0m[32m           [100%][0m



**BRILLIANT!** All the tests were passed! Our code must be functioning correctly... Right?

Yes, the code passes these tests but what if the person executing the code inputs the wrong datatype. For example, say they provide a set instead of a list, what happens?

# Exercise 1: Add a test to check the argument type

Uncomment the first line of the code below when your testing function is ready to write a new pytesting file. And add a function to check the argument type.

In [7]:
#%%writefile test_calc_mean_1.py

import pytest
from calc_mean import calculate_mean

def test_calculate_mean():
    numbers = [1, 2, 3, 4, 5]
    assert calculate_mean(numbers) == 3.0

def test_calculate_mean_empty_list():
    numbers = []
    assert calculate_mean(numbers) == None

def test_calculate_mean_single_number():
    numbers = [10]
    assert calculate_mean(numbers) == 10.0

def test_calculate_mean_negative_numbers():
    numbers = [-1, -2, -3, -4, -5]
    assert calculate_mean(numbers) == -3.0

def test_calculate_list_of_numbers():
    numbers = {1, 2, 3, 4, 5}
    assert calculate_mean(numbers) == 3.0

Overwriting test_calc_mean_1.py


Run the cell below to execute your new test!

In [8]:
!python -m pytest test_calc_mean_1.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0
rootdir: /content/drive/MyDrive/Pytesting
plugins: anyio-3.6.2
[1mcollecting ... [0m[1mcollected 5 items                                                              [0m

test_calc_mean_1.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m                                                [100%][0m



Your code should still pass, as the function can handle the set.

🌷 Tip: When writing functions such as the ```calculate_mean()``` function it's vital that you write a docstring within the function telling the users what data types they can use and the required structure. To look something like this:



In [9]:
def calculate_mean(numbers):
    """
    Calculate the mean of a list of numbers.

    Args:
        numbers (list): A list of numbers.

    Returns:
        float or None: The mean of the numbers in the list.
                       Returns None if the list is empty.

    Example:
        >>> numbers = [1, 2, 3, 4, 5]
        >>> calculate_mean(numbers)
        3.0
    """
    if len(numbers) == 0:
        return None
    else:
        return sum(numbers) / len(numbers)

Additionally 'error handling' is a good way to manage and respond to errors or exceptions that occur during program execution. It is useful for preventing crashes, improving user experience, and enabling effective debugging and troubleshooting.

There are many different types of exceptions that can occur when using Python, these are built into Python and catch some common errors. Normally these are a blessing and prevent hours of bug searches 🐛 but sometimes these exceptions can throw us off even more.

We can implement our own Try-Except blocks to catch and handle human error for us.

Say for example we try to input a dictionary as an argument in our function, we could implement the Try-Except block as below:

In [12]:
nums = {'a':0, 'b':1, 'c':2}

try:
    assert type(nums) == list
    print("Assertions complete, nums is a list.")
except AssertionError:
    print("Error: data type is incorrect, nums should be a list.")

Error: data type is incorrect, nums should be a list.


## Exercise 2

You don't know what you don't know...

It's hard to come up with tests for every scenario. But we can use the [Hypothesis Package](https://hypothesis.readthedocs.io/en/latest/) to help us come up with edge cases.

Say we have a sigmoid activation function in the activation.py file as shown below:

In [24]:
!cat activations.py

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))


In [31]:
%%writefile test_activations_hypothesis.py
import activations
from hypothesis import given
from hypothesis.strategies import floats


@given(floats())
def test_sigmoid(x):
    result = activations.sigmoid(x)
    assert 0 <= result <= 1

Overwriting test_activations_hypothesis.py


We can create a test using Hypothesis.
importing `floats` will generate random floats and test for us.

In [32]:
!cat test_activations_hypothesis.py

import activations
from hypothesis import given
from hypothesis.strategies import floats


@given(floats())
def test_sigmoid(x):
    result = activations.sigmoid(x)
    assert 0 <= result <= 1


In [33]:
!python -m pytest -v test_activations_hypothesis.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 1 item                                                               [0m

test_activations_hypothesis.py::test_sigmoid [31mFAILED[0m[31m                      [100%][0m

[31m[1m_________________________________ test_sigmoid _________________________________[0m

    [37m@given[39;49;00m(floats())[90m[39;49;00m
>   [94mdef[39;49;00m [92mtest_sigmoid[39;49;00m(x):[90m[39;49;00m

[1m[31mtest_activations_hypothesis.py[0m:7: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = nan

    [37m@given[39;49;00m(floats())[90m[39;49;00m
    [94mdef[39;49;00m [92mtest_sigmoid[39;49;00m(x):[90m[39;49;00m
        result

The test failed as NaN is not between 0 and 1. Perhaps we didn't think about the implications of NaN and so Hypothesis has kindly pointed this out. We can change our test so that we only test if the input is not NaN:

In [36]:
%%writefile test_activations_hypothesis_1.py
import activations
import numpy as np
from hypothesis import given
from hypothesis.strategies import floats


@given(floats())
def test_sigmoid(x):
    result = activations.sigmoid(x)
    if not np.isnan(x):
      assert 0 <= result <= 1

Overwriting test_activations_hypothesis_1.py


In [37]:
!python -m pytest -v test_activations_hypothesis_1.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 1 item                                                               [0m

test_activations_hypothesis_1.py::test_sigmoid [31mFAILED[0m[31m                    [100%][0m

[31m[1m_________________________________ test_sigmoid _________________________________[0m

    [37m@given[39;49;00m(floats())[90m[39;49;00m
>   [94mdef[39;49;00m [92mtest_sigmoid[39;49;00m(x):[90m[39;49;00m

[1m[31mtest_activations_hypothesis_1.py[0m:8: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[1m[31mtest_activations_hypothesis_1.py[0m:9: in test_sigmoid
    result = activations.sigmoid(x)[90m[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Oh no! The test is still failing! It seems we have to restrict our float input to prevent overflowing.

In [38]:
%%writefile test_activations_hypothesis_2.py
import activations
import numpy as np
from hypothesis import given
from hypothesis.strategies import floats


@given(floats(min_value=-100, max_value=100))
def test_sigmoid(x):
    result = activations.sigmoid(x)
    if not np.isnan(x):
      assert 0 <= result <= 1

Writing test_activations_hypothesis_2.py


In [39]:
!python -m pytest -v test_activations_hypothesis_2.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 1 item                                                               [0m

test_activations_hypothesis_2.py::test_sigmoid [32mPASSED[0m[32m                    [100%][0m



Phew the test passed! Hopefully this has demonstrated how useful Hypothesis could be for writing your tests.



## Exercise 3

# Exercise X

Now that we've gone through some small examples let's try to write some tests for a Machine Learning model...

Ideas:


*   Choose a small dataset (https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset) and write short code to train a model to predict outcomes.

* check a batching function works as an example or Test that the data loading and preprocessing functions are working correctly by checking if the expected data shapes, types, and preprocessing transformations are applied.

* Then get groups to try creating their own tests for the model.



## A problem shared is a problem halved!

If you have any ideas for PyTests that could be made to help ML/AI pipelines, have some tests you'd like to share with others or you have some coding issues that you need some assistance with check out the collaborative [hackmd notebook](https://hackmd.io/@3UbYXkLuSRWoUkulK3ihvw/pytesting).

Well done for completing this notebook!

There are many other areas we have not covered which you may wish to delve into further, for example:

* **Continuous integration** test that your code runs after every change. You can also add a `Tests` badge that you can add to your GitHub repository, using [GitHub actions](https://github.com/dwyl/repo-badges), to show that your codebase is passing all the tests.

