Imagine you just built a new robot (the model) that sorts flowers. Before you sell it to a customer, you need to run it through a series of tests to make sure it works correctly, doesn't explode when given weird instructions, and handles mistakes gracefully.

    Pytest/Ipytest: The clipboard and checklist the inspector uses.

    The Model: The flower-sorting robot.

    The Tests: The specific challenges you give the robot

In [3]:
# 1. Install using %pip (This ensures it goes to the current kernel)
%pip install ipytest pytest

# 2. Verify installation
import ipytest
print("✅ ipytest is successfully installed!")


Collecting ipytest
  Downloading ipytest-0.14.2-py3-none-any.whl.metadata (17 kB)
Collecting pytest
  Downloading pytest-9.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting iniconfig>=1.0.1 (from pytest)
  Downloading iniconfig-2.3.0-py3-none-any.whl.metadata (2.5 kB)
Collecting pluggy<2,>=1.5 (from pytest)
  Downloading pluggy-1.6.0-py3-none-any.whl.metadata (4.8 kB)
Downloading ipytest-0.14.2-py3-none-any.whl (18 kB)
Downloading pytest-9.0.1-py3-none-any.whl (373 kB)
Downloading pluggy-1.6.0-py3-none-any.whl (20 kB)
Downloading iniconfig-2.3.0-py3-none-any.whl (7.5 kB)
Installing collected packages: pluggy, iniconfig, pytest, ipytest
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4/4[0m [ipytest]m2/4[0m [pytest]
[1A[2KSuccessfully installed iniconfig-2.3.0 ipytest-0.14.2 pluggy-1.6.0 pytest-9.0.1
Note: you may need to restart the kernel to use updated packages.
✅ ipytest is successfully installed!


In [4]:
import ipytest
import pytest
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import numpy as np

# Load the dataset
iris = load_iris()
X, y = iris.data, iris.target

# Set up the model
model = DecisionTreeClassifier()
model.fit(X, y)


0,1,2
,criterion,'gini'
,splitter,'best'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,
,random_state,
,max_leaf_nodes,
,min_impurity_decrease,0.0


The Analogy: The standard test drive. You give the robot a perfectly normal flower. You know this flower is a Setosa (0).

    model.predict: The robot looks at the flower and guesses.

    assert result == expected_output: This is the inspector's pass/fail criteria.

        If the robot guesses 0, the test passes (silence).

        If the robot guesses 1 or 2, the test fails and screams the error message.

In [5]:
def test_typical_case():
    input_data = np.array([[4.5, 2.3, 1.3, 0.3]])  # Normal flower
    expected_output = 0  # We expect "Setosa" (Class 0)
    result = model.predict(input_data)[0]
    
    # The Check
    assert result == expected_output, f"Expected {expected_output}, but got {result}"

The Analogy: Driving the car into a volcano. You feed the robot measurements for a flower that is 1,000 cm wide (impossible/extreme).

    The Goal: You want to see if the robot has safety rails. The code expects the robot to say, "Error! This input is crazy!" (ValueError).

    try...except: This checks if the error happens.

        If the robot raises a ValueError, the test Passes (assert True).

        Note: In reality, standard Decision Trees are often "too nice." They will probably just guess a class for the giant flower without crashing. If that happens, the code goes to else and Fails (assert False), telling you that your model needs better safety guards.

In [6]:
def test_edge_case_extreme_values():
    input_data = np.array([[1000, 1000, 1000, 1000]])  # Giant mutant flower
    try:
        model.predict(input_data)
    except ValueError:
        assert True  # Pass if it crashes correctly
    else:
        assert False, "Expected ValueError for extreme values, but no error was raised"

This ensures that the model doesn't break or produce incorrect outputs when encountering these rare situations. 

The Analogy: Feeding the robot invisible air. You give the robot None (empty/missing data) instead of numbers.

    Machine learning models need math, and you can't do math on None.

    The test expects the model to scream "I need numbers!" (ValueError).

    Since Scikit-Learn models generally do crash when given None, this test will likely Pass

In [7]:
def test_error_handling_missing_values():
    input_data = np.array([[None, None, None, None]])  # Invisible flower
    try:
        model.predict(input_data)
    except ValueError:
        assert True  # Pass if it complains about missing data
    else:
        assert False, "Expected ValueError for missing values..."

After defining all of our test cases, we'll use PyTest to automate the testing process. This ensures that every time the model is updated, we can quickly verify that all tests pass without introducing new issues

In [8]:

# Run tests using ipytest
ipytest.run('-v')

platform linux -- Python 3.10.19, pytest-9.0.1, pluggy-1.6.0 -- /anaconda/envs/bert-py310-cpu/bin/python
cachedir: .pytest_cache
rootdir: /mnt/batch/tasks/shared/LS_root/mounts/clusters/tesr1/code/Users/new.restart7
plugins: anyio-4.12.0
[1mcollecting ... [0mcollected 3 items

t_972ce9a1296a44b58f83b82565515be9.py::test_typical_case [32mPASSED[0m[32m                              [ 33%][0m
t_972ce9a1296a44b58f83b82565515be9.py::test_edge_case_extreme_values [31mFAILED[0m[31m                  [ 66%][0m
t_972ce9a1296a44b58f83b82565515be9.py::test_error_handling_missing_values [31mFAILED[0m[31m             [100%][0m

[31m[1m__________________________________ test_edge_case_extreme_values ___________________________________[0m

    [0m[94mdef[39;49;00m[90m [39;49;00m[92mtest_edge_case_extreme_values[39;49;00m():[90m[39;49;00m
        input_data = np.array([[[94m1000[39;49;00m, [94m1000[39;49;00m, [94m1000[39;49;00m, [94m1000[39;49;00m]])  [90m# Giant mutan

<ExitCode.TESTS_FAILED: 1>

This is a fantastic result! You have successfully used testing to uncover hidden behaviors in your model.

In software testing, a failure is good news—it means you found a gap between what you expected the robot to do and what it actually did.

Here is the breakdown of why your tests failed and how to fix them.
1. Why test_edge_case_extreme_values Failed

The Expectation: You thought the robot would scream "Error!" if you gave it a 1,000 cm flower. The Reality: The robot simply said, "Wow, that's a big flower! That's definitely a Virginica."

    The Math: Machine Learning models (especially Decision Trees) don't know "common sense." They just know rules like "If petal > 5cm, it is Class 2."

    The Logic: Since 1,000 is greater than 5, the model successfully ran the math and gave an answer. It didn't crash, so the test failed (because the test demanded a crash).

2. Why test_error_handling_missing_values Failed

The Expectation: You thought the robot would crash if you gave it blank data (None). The Reality: Depending on your library versions, the robot likely converted None to NaN (Not a Number) and essentially said, "I don't know this value, so I'll just guess based on the other numbers or averages."

Because the robot managed to produce a guess instead of crashing, your test (which demanded a crash) failed.


How to Fix It: Add "Guardrails"

Since the raw model is "too nice" and accepts garbage data, we need to wrap it in a Guardrail Function that enforces the safety rules you wanted.

Step 1: Create the Safe Prediction Function Run this code to create a smarter interface for your model.

In [9]:
def safe_predict(model, input_data):
    """
    Acts as a security guard for the model.
    Checks inputs BEFORE letting the model see them.
    """
    # Guardrail 1: Check for Missing Values (None or NaN)
    if np.equal(input_data, None).any() or np.isnan(input_data).any():
        raise ValueError("Input contains missing or null values!")

    # Guardrail 2: Check for Extreme Values (e.g., flowers shouldn't be > 100cm)
    if np.max(input_data) > 100:
        raise ValueError("Input values are suspiciously large!")

    # If safe, let the model predict
    return model.predict(input_data)

Step 2: Update the Tests to Use the Guardrails Now, update your tests to call safe_predict instead of model.predict.

In [11]:


def test_edge_case_extreme_values():
    input_data = np.array([[1000, 1000, 1000, 1000]])
    try:
        # Use the wrapper function
        safe_predict(model, input_data)
    except ValueError:
        assert True  # It crashes correctly now!
    else:
        assert False, "The Guardrail failed to catch the extreme value!"

def test_error_handling_missing_values():
    # Force the array to allow 'None' (object type) for this test
    input_data = np.array([[None, None, None, None]], dtype=object)
    try:
        # Use the wrapper function
        safe_predict(model, input_data)
    except ValueError:
        assert True  # It crashes correctly now!
    else:
        assert False, "The Guardrail failed to catch the missing value!"
        
def test_typical_case():
    # This one should still pass easily
    input_data = np.array([[4.5, 2.3, 1.3, 0.3]])
    result = safe_predict(model, input_data)[0]
    assert result == 0

In [12]:
ipytest.run('-v')

platform linux -- Python 3.10.19, pytest-9.0.1, pluggy-1.6.0 -- /anaconda/envs/bert-py310-cpu/bin/python
cachedir: .pytest_cache
rootdir: /mnt/batch/tasks/shared/LS_root/mounts/clusters/tesr1/code/Users/new.restart7
plugins: anyio-4.12.0
[1mcollecting ... [0mcollected 3 items

t_972ce9a1296a44b58f83b82565515be9.py::test_typical_case [32mPASSED[0m[32m                              [ 33%][0m
t_972ce9a1296a44b58f83b82565515be9.py::test_edge_case_extreme_values [32mPASSED[0m[32m                  [ 66%][0m
t_972ce9a1296a44b58f83b82565515be9.py::test_error_handling_missing_values [32mPASSED[0m[32m             [100%][0m



<ExitCode.OK: 0>