# Introduction to PyTesting

When writing machine learning code in Python, there are several testing frameworks available that can be used to ensure the quality, behaviour and correctness of your code. Using a testing framework for your ML code promotes code quality, reliability, and maintainability. It allows you to catch bugs early, reduce errors, and have confidence in the performance of your machine learning models.

## Benefits of testing code

Validation: Testing frameworks enable you to validate the correctness of your machine learning models and algorithms. By writing tests, you can verify that your code behaves as expected and produces the desired results.

Regression Testing: Machine learning code often evolves over time, and changes made to the codebase can introduce new bugs or break existing functionality. Testing frameworks help in implementing regression testing, allowing you to detect and fix issues when modifying your ML code.

Documentation: Writing tests alongside your code provides executable documentation that demonstrates how your code should be used and the expected outputs. This helps other developers understand and utilize your ML code more effectively.

Continuous Integration: Testing frameworks are commonly used in continuous integration (CI) pipelines to automatically run tests on code changes. CI ensures that your ML code remains functional and reliable as you develop new features or make modifications.

Code Maintainability: Tests act as a safety net, making it easier to refactor or modify your code with confidence. They ensure that the changes you make do not introduce unexpected errors or regressions.

Collaboration: Testing frameworks make it easier for multiple developers to collaborate on ML projects. By running tests, everyone can quickly verify that their changes have not broken existing functionality.





## Popular frameworks available for Python

Some popular testing frameworks in Python include:

__unittest:__ This is a built-in testing framework in Python's standard library. It provides a set of tools for constructing and running tests. unittest is widely used and offers a comprehensive testing solution.

__pytest:__ It is a third-party testing framework that provides a more concise and flexible approach to writing tests compared to unittest. pytest supports advanced features such as fixtures, parameterized testing, and test discovery, making it popular among developers.

__doctest:__ This framework allows you to write tests within the documentation strings (docstrings) of your functions, making it easier to keep tests and code documentation in sync. doctest is lightweight and useful for simple test cases.

__nose:__ It is a test discovery and execution framework that extends unittest. nose automatically discovers test cases and provides additional plugins and features for testing Python code.

In this session we will be using __PyTest__ to write and run our tests.

## Testing Drawbacks

**Sometimes we don't know the answer...**

Unsupervised methods can give us unexpected results, the same input may produce
different results after every run.

**Sometimes the outcomes aren't equal**

np.nan does not equal np.nan, as np.nan is a special floating point number which cannot be equal to any other variable.

**How many tests should you write?**

It can be time consuming to produce tests, but what is the cost of your code being wrong?

You might think you'll only need to test something once, but you'll ne thanking your past self when you do a bug fix and realise that it had a knock on effect on your previously working pipeline.



## Types of tests

Test | Description
-|-
Unit Testing | Tests parts of the code in chunks.
Regression Testing | Looks for a specific output given a certain input. Used after changing code e.g. adding a new feature.
Functional Testing | Tests for a specific behaviour.
Fuzzing Testing | Testing random data.
Stress Testing | Attempting to overwhelm/flood the system to check for stability.


# Let's begin!

As we are using Google Collab we will be using ```!```
at the start of our statements to run commands similarly to how you would in a terminal.

We can execute our tests using: ```!python -m pytest python_file_name.py```

PyTest runs on any files that start with 'test' and end with '.py'.



In [1]:
# Import and install the necessary packages
!pip install hypothesis

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


1. Let's mount our google drive to this notebook to access the accompanying Python files.

*Make sure this notebook is contained within 'MyDrive/Pytesting' before running the code below.*

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
# Switch to the google drive folder
%cd drive/MyDrive/Pytesting

/content/drive/MyDrive/Pytesting


In [4]:
# Check what other files are available in this folder
!ls

activations.py		 test_activations_hypothesis_1.py
calc_mean.py		 test_activations_hypothesis_2.py
__pycache__		 test_activations_hypothesis.py
pytesting_answers.ipynb  test_calc_mean_1.py
stroke_data.csv		 test_calc_mean.py
stroke_model.py		 test_model.py


2. We've written a function inside of 'calc_mean.py' to calculate the mean of a list of numbers:

In [5]:
# If we want to display the contents of a Python file we can run
!cat calc_mean.py

def calculate_mean(numbers):
    if len(numbers) == 0:
        return None
    else:
        return sum(numbers) / len(numbers)

3. We want to check it works properly so we've created some simple tests using the Pytest package and the `assert` statement.

The `assert` statement is used to check whether a given expression or condition evaluates to `True` or `False`. If the condition is False, the assert statement raises an AssertionError exception, indicating that the test has failed. Below we use `assert` with the answer calculated by the function and then compare it to our hand calculated answer to check the results match.

In [6]:
# If we want to display the contents of a Python file we can run
!cat test_calc_mean.py


import pytest
import numpy as np
from calc_mean import calculate_mean

def test_calculate_mean():
    numbers = [1, 2, 3, 4, 5]
    assert calculate_mean(numbers) == 3.0

def test_calculate_mean_empty_list():
    numbers = []
    assert calculate_mean(numbers) == None

def test_calculate_mean_single_number():
    numbers = [10]
    assert calculate_mean(numbers) == 10.0

def test_calculate_mean_negative_numbers():
    numbers = [-1, -2, -3, -4, -5]
    # you can also use np.allclose() to assert whether the answers are close
    assert np.allclose(calculate_mean(numbers), -3.0)


In [7]:
# To run the test
# The -v flag shows the tests as they are being processed
!python -m pytest -v test_calc_mean.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 4 items                                                              [0m

test_calc_mean.py::test_calculate_mean [32mPASSED[0m[32m                            [ 25%][0m
test_calc_mean.py::test_calculate_mean_empty_list [32mPASSED[0m[32m                 [ 50%][0m
test_calc_mean.py::test_calculate_mean_single_number [32mPASSED[0m[32m              [ 75%][0m
test_calc_mean.py::test_calculate_mean_negative_numbers [32mPASSED[0m[32m           [100%][0m



**BRILLIANT!** All the tests were passed! Our code must be functioning correctly... Right?

Yes, the code passes these tests but what if the person executing the code inputs the wrong datatype. For example, say they provide a set instead of a list, what happens?

# Exercise 1: Add a test to check the argument type

Uncomment the first line of the code below when your testing function is ready, to write a new pytesting file. And add a function to check the argument type.

In [8]:
#%%writefile test_calc_mean_1.py

import pytest
from calc_mean import calculate_mean

def test_calculate_mean():
    numbers = [1, 2, 3, 4, 5]
    assert calculate_mean(numbers) == 3.0

def test_calculate_mean_empty_list():
    numbers = []
    assert calculate_mean(numbers) == None

def test_calculate_mean_single_number():
    numbers = [10]
    assert calculate_mean(numbers) == 10.0

def test_calculate_mean_negative_numbers():
    numbers = [-1, -2, -3, -4, -5]
    assert calculate_mean(numbers) == -3.0

def test_calculate_list_of_numbers():
    numbers = {1, 2, 3, 4, 5}
    assert calculate_mean(numbers) == 3.0

Run the cell below to execute your new test!

In [9]:
!python -m pytest test_calc_mean_1.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
[1mcollecting ... [0m[1mcollected 5 items                                                              [0m

test_calc_mean_1.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m                                                [100%][0m



Your code should still pass, as the function can handle the set.

🌷 Tip: When writing functions such as the ```calculate_mean()``` function it's vital that you write a docstring within the function telling the users what data types they can use and the required structure. To look something like this:



In [10]:
def calculate_mean(numbers):
    """
    Calculate the mean of a list of numbers.

    Args:
        numbers (list): A list of numbers.

    Returns:
        float or None: The mean of the numbers in the list.
                       Returns None if the list is empty.

    Example:
        >>> numbers = [1, 2, 3, 4, 5]
        >>> calculate_mean(numbers)
        3.0
    """
    if len(numbers) == 0:
        return None
    else:
        return sum(numbers) / len(numbers)

Additionally 'error handling' is a good way to manage and respond to errors or exceptions that occur during program execution. It is useful for preventing crashes, improving user experience, and enabling effective debugging and troubleshooting.

There are many different types of exceptions that can occur when using Python, these are built into Python and catch some common errors. Normally these are a blessing and prevent hours of bug searches 🐛 but sometimes these exceptions can throw us off even more.

We can implement our own Try-Except blocks to catch and handle human error for us.

Say for example we try to input a dictionary as an argument in our function, we could implement the Try-Except block as below:

In [11]:
nums = {'a':0, 'b':1, 'c':2}

try:
    assert type(nums) == list
    print("Assertions complete, nums is a list.")
except AssertionError:
    print("Error: data type is incorrect, nums should be a list.")

Error: data type is incorrect, nums should be a list.


## Exercise 2

You don't know what you don't know...

It's hard to come up with tests for every scenario. But we can use the [Hypothesis Package](https://hypothesis.readthedocs.io/en/latest/) to help us come up with edge cases.

Say we have a sigmoid activation function in the activation.py file as shown below:

In [12]:
!cat activations.py

import math

def sigmoid(x):
    return 1 / (1 + math.exp(-x))


We can create a test using Hypothesis.
importing `floats` will generate random floats and test for us.

We can also choose to display an AssertionError if the assert is returned as False e.g. below we use "Value not between 0 & 1" if 0 <= result <= 1 is not True.

In [13]:
%%writefile test_activations_hypothesis.py
import activations
from hypothesis import given
from hypothesis.strategies import floats


@given(floats())
def test_sigmoid(x):
    result = activations.sigmoid(x)
    assert 0 <= result <= 1, "Value not between 0 & 1"

Overwriting test_activations_hypothesis.py


In [14]:
!cat test_activations_hypothesis.py

import activations
from hypothesis import given
from hypothesis.strategies import floats


@given(floats())
def test_sigmoid(x):
    result = activations.sigmoid(x)
    assert 0 <= result <= 1, "Value not between 0 & 1"


In [15]:
!python -m pytest -v test_activations_hypothesis.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 1 item                                                               [0m

test_activations_hypothesis.py::test_sigmoid [31mFAILED[0m[31m                      [100%][0m

[31m[1m_________________________________ test_sigmoid _________________________________[0m

    [37m@given[39;49;00m(floats())[90m[39;49;00m
>   [94mdef[39;49;00m [92mtest_sigmoid[39;49;00m(x):[90m[39;49;00m

[1m[31mtest_activations_hypothesis.py[0m:7: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[1m[31mtest_activations_hypothesis.py[0m:8: in test_sigmoid
    result = activations.sigmoid(x)[90m[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The test failed as NaN is not between 0 and 1. Perhaps we didn't think about the implications of NaN and so Hypothesis has kindly pointed this out. We can change our test so that we only test if the input is not NaN:

In [16]:
%%writefile test_activations_hypothesis_1.py
import activations
import numpy as np
from hypothesis import given
from hypothesis.strategies import floats


@given(floats())
def test_sigmoid(x):
    result = activations.sigmoid(x)
    if not np.isnan(x):
      assert 0 <= result <= 1, "Value not between 0 & 1"

Overwriting test_activations_hypothesis_1.py


In [17]:
!python -m pytest -v test_activations_hypothesis_1.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 1 item                                                               [0m

test_activations_hypothesis_1.py::test_sigmoid [31mFAILED[0m[31m                    [100%][0m

[31m[1m_________________________________ test_sigmoid _________________________________[0m

    [37m@given[39;49;00m(floats())[90m[39;49;00m
>   [94mdef[39;49;00m [92mtest_sigmoid[39;49;00m(x):[90m[39;49;00m

[1m[31mtest_activations_hypothesis_1.py[0m:8: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[1m[31mtest_activations_hypothesis_1.py[0m:9: in test_sigmoid
    result = activations.sigmoid(x)[90m[39;49;00m
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Oh no! The test is still failing! It seems we have to restrict our float input to prevent overflowing.

In [18]:
%%writefile test_activations_hypothesis_2.py
import activations
import numpy as np
from hypothesis import given
from hypothesis.strategies import floats


@given(floats(min_value=-100, max_value=100))
def test_sigmoid(x):
    result = activations.sigmoid(x)
    if not np.isnan(x):
      assert 0 <= result <= 1, "Value not between 0 & 1"

Overwriting test_activations_hypothesis_2.py


In [19]:
!python -m pytest -v test_activations_hypothesis_2.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 1 item                                                               [0m

test_activations_hypothesis_2.py::test_sigmoid [32mPASSED[0m[32m                    [100%][0m



Phew the test passed! Hopefully this has demonstrated how useful Hypothesis could be for writing your tests.



## Exercise 3

Now that we've gone through some small examples let's try to write some tests for a Machine Learning model...

We're going to use a Dataset from [Kaggle](https://www.kaggle.com/datasets/fedesoriano/stroke-prediction-dataset) to predict stroke.

This dataset contains 12 variables for 5,110 patients:

* id: unique identifier

* gender: "Male", "Female" or "Other"

* age: age of the patient

* hypertension: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension

* heart_disease: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease

* ever_married: "No" or "Yes"

* work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"

* Residence_type: "Rural" or "Urban"

* avg_glucose_level: average glucose level in blood

* bmi: body mass index

* smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"

* stroke: 1 if the patient had a stroke or 0 if not

First, we'll create a Python file and load in the dataset and set up a simple prediction model using Tensorflow. Then we'll demonstrate some ways we can test it and then you can try to come up with some ways to test it further.

In [20]:
# Here's what the data looks like
import pandas as pd
data=pd.read_csv("stroke_data.csv")
data.head()

Unnamed: 0,id,gender,age,hypertension,heart_disease,ever_married,work_type,Residence_type,avg_glucose_level,bmi,smoking_status,stroke
0,9046,Male,67.0,0,1,Yes,Private,Urban,228.69,36.6,formerly smoked,1
1,51676,Female,61.0,0,0,Yes,Self-employed,Rural,202.21,,never smoked,1
2,31112,Male,80.0,0,1,Yes,Private,Rural,105.92,32.5,never smoked,1
3,60182,Female,49.0,0,0,Yes,Private,Urban,171.23,34.4,smokes,1
4,1665,Female,79.0,1,0,Yes,Self-employed,Rural,174.12,24.0,never smoked,1


### Model

In [21]:
# Uncomment the line below if you want to write changes to the .py file
%%writefile stroke_model.py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import tensorflow
from tensorflow import keras
from keras import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix, accuracy_score
import seaborn as sns
import matplotlib.pyplot as plt

data=pd.read_csv("stroke_data.csv")

# There's only NaNs in the BMI column so let's replace these with 0
data['bmi']=data['bmi'].fillna(0)

print(f"Class balance: \nNo stroke = 0, stroke = 1 \n{data['stroke'].value_counts()}")


le = LabelEncoder()
d_list = data.select_dtypes(include = ['object']).columns.tolist()
for i in d_list:
    le.fit(data[i])
    data[i] = le.transform(data[i])

x=data.drop('stroke',axis=1)
y=data['stroke']

# Split the data into train and test sets
def train_test_sets(x, y, test_size=0.2):
  x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_size, random_state=1)
  return x_train, x_test, y_train, y_test

x_train, x_test, y_train, y_test = train_test_sets(x, y, test_size=0.2)

model=Sequential()

model.add(Dense(7,activation="relu",input_dim=11))
model.add(Dense(7,activation="relu"))
model.add(Dense(14,activation="relu"))
model.add(Dense(28,activation="relu"))
model.add(Dense(1,activation="sigmoid"))

model.summary()

model.compile(loss="binary_crossentropy",optimizer="Adam")

model.fit(x_train, y_train, epochs=10, validation_split=0.2)

pred_prob = model.predict(x_test)
pred = np.where(pred_prob > 0.5, 1, 0)

cn=confusion_matrix(y_test,pred)

sns.heatmap(cn,annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.xlabel("Predicted label")
plt.ylabel("True label")
plt.show()

def accuracy(real, pred):
  sklearn_acc = accuracy_score(real, pred)
  return sklearn_acc

print(f"Accuracy of model: {accuracy(y_test, pred)}")

Overwriting stroke_model.py


### Test

Below we've created a test to check if the data is splitting properly using pytest,(a test function and use assertions to verify the sizes and properties of the training and test sets). We've also used hypothesis to check that sklearn's accuracy is calculating what we expect it to within our model.

In [22]:
%%writefile test_model.py

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import hypothesis.strategies as st
from hypothesis import given, settings
import pytest
import stroke_model


def test_data_split():
    # Generate sample data for testing
    np.random.seed(0)
    x = np.random.rand(100, 10)
    y = np.random.randint(0, 2, 100)

    # Split the data into train and test sets
    x_train, x_test, y_train, y_test = stroke_model.train_test_sets(x, y, test_size=0.2)

    # Assertions to check the sizes and properties of the train and test sets
    assert len(x_train) > 0, "Empty training set"
    assert len(x_test) > 0, "Empty test set"
    assert len(x_train) == len(y_train), "Size mismatch between x_train and y_train"
    assert len(x_test) == len(y_test), "Size mismatch between x_test and y_test"
    assert set(np.unique(y_train)) == set(np.unique(y_test)), "Different class labels in train and test sets"


@given(y_true=st.lists(st.integers(min_value=0, max_value=1), min_size=1))
@settings(max_examples=10)
def test_accuracy_score(y_true):
    # Generate random predictions of the same length as y_true
    y_pred = np.random.randint(0, 2, len(y_true))

    # Use accuracy_score from sklearn's accuracy_score
    sklearn_acc = stroke_model.accuracy(y_true, y_pred)

    # Calculate accuracy manually
    manual_acc = np.sum(y_true == y_pred) / len(y_true)

    # Assert that the accuracy scores match
    assert np.isclose(sklearn_acc, manual_acc)


Overwriting test_model.py


In [23]:
!python -m pytest -v test_model.py

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/content/drive/MyDrive/Pytesting/.hypothesis/examples')
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 2 items                                                              [0m

test_model.py::test_data_split [32mPASSED[0m[32m                                    [ 50%][0m
test_model.py::test_accuracy_score [32mPASSED[0m[32m                                [100%][0m



If we want to run all the test files in this 'Pytesting' folder (those starting with 'test' with file type '.py') run the cell below:

In [24]:
!pytest

platform linux -- Python 3.10.12, pytest-7.2.2, pluggy-1.0.0
rootdir: /content/drive/MyDrive/Pytesting
plugins: hypothesis-6.79.1, anyio-3.6.2
collected 14 items                                                             [0m

test_activations_hypothesis.py [31mF[0m[31m                                         [  7%][0m
test_activations_hypothesis_1.py [31mF[0m[31m                                       [ 14%][0m
test_activations_hypothesis_2.py [32m.[0m[31m                                       [ 21%][0m
test_calc_mean.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[31m                                                   [ 50%][0m
test_calc_mean_1.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[31m                                                [ 85%][0m
test_model.py [32m.[0m[32m.[0m[31m                                                         [100%][0m

[31m[1m_________________________________ test_sigmoid _________________________________[0m

    [37m@given

# Over to you! Can you think of any other ways we can test the models above?

If you have any or need ideas for PyTests that could be made to help ML/AI pipelines, have some tests you'd like to share with others or you have some coding issues that you need some assistance with check out the collaborative [hackmd notebook](https://hackmd.io/@3UbYXkLuSRWoUkulK3ihvw/pytesting).


Well done for completing this notebook! 🥇

There are many other areas we have not covered which you may wish to delve into further, for example:

* **Continuous integration** Test that your code runs after every change.

* **Test badges** You can also add a `Tests` badge that you can add to your GitHub repository, using [GitHub actions](https://github.com/dwyl/repo-badges), to show that your codebase is passing all the tests.

* **Performance Regression Testing** Test that your code works within your speed and memory requirements [psutil](https://psutil.readthedocs.io/en/latest/) is a package that can be used for this.

* **Testing within Jupyter Notebooks** you can create temporary paths and files or you can use `jupyter nbconvert --to script your_notebook.ipynb` to change your notebook to a Python file for testing. Alternatively you can explore the `testbook` package which allows you to test code within your notebook.

