# Part 2: Testing ML Models

In this tutorial notebook, we'll examine some examples of why we may want to test models, and how this can be done.

In this session, the focus will mostly be on code design and analysis, and discussing the benefits and limits of testing models before using them for "real" predictions.

Let's start with some imports -- we will use the `inspect` library to print out some imported code in-line, and the imported code represents some custom ML model tests that your lab's research team has written.

**Q: If you didn't want to print everything out in the notebook, where would you look for this code? What are the tradeoffs of inspecting code in different environments or IDEs?**

In [1]:
from inspect import getsource

import classic_tests
import physicality_tests

**Q: What does this following test do? Why might we write it and what intuition are we capturing?**

**Q: What does the "parametrize" function decorator do?**

**Q: What would you think if a colleague wrote some code and caused this test to fail?**

In [6]:
print(getsource(classic_tests.test_prediction_bounds))

@pytest.mark.parametrize(
    "material",
    [
        "Li7La3(SnO6)2",
        "Li3(WO3)8",
        "Li4Mn2P4H3O16",
        "Li2Ni(PO3)5",
        "MgV2O4",
    ]
)
def test_prediction_bounds(material:str, model:SynthesizabilityModel) -> None:
    predicted_value = model.predict_single(material)
    assert 0 <= predicted_value <= 1



**Q: Let's discuss the same points, looking at the next test.**

**Q: Is this test hinting at a different intuition? What would you think if some new code caused this test to fail?**

In [7]:
print(getsource(classic_tests.test_invalid_material))

@pytest.mark.parametrize(
    "material",
    [
        1234,
        dict({"a":"b"}),
        ["Fe2O3"],
        ["Fe2", "O3"],
    ]
)
def test_invalid_material(material:str, model:SynthesizabilityModel) -> None:
    with pytest.raises(TypeError):
        model.predict_single(material)



**Q: Let's switch to a less standard type of testing: can we use software tests to understand the behaviours of our ML models?**

**Q: What intuition is this test capturing? Could anyone write this test, or do you need special domain knowledge? Why?**

In [10]:
print(getsource(physicality_tests.test_atomic_order))

@pytest.mark.parametrize(
    "input_series",
    [
        [
            "MgV2O4",
            "O4MgV2",
            "V2MgO4",
        ],
        [
            "Li3(WO3)8",
            "(WO3)8Li3",
        ]
    ]
)
@pytest.mark.xfail(reason="Too hard for a random model!")
def test_atomic_order(input_series:str) -> None:
    predicted_values = [
        synthesizability_model.predict_single(input_val)
        for input_val in input_series
    ]

    assert max(predicted_values) - min(predicted_values) < 0.001



**Q: What about this next test? Is the intuition that is tested reasonable? Why or why not?**

In [11]:
print(getsource(physicality_tests.test_stoichiometric_monotonicity))

@pytest.mark.parametrize(
    "input_series",
    [
        [
            "MgV2O4",
            "MgV8O4",
            "MgV24O4",
        ],
        [
            "Li3(WO3)8",
            "Li3(W5O3)8",
            "Li3(W20O3)8",
        ]
    ]
)
@pytest.mark.xfail(reason="Too hard for a random model!")
def test_stoichiometric_monotonicity(input_series:str) -> None:
    predicted_values = [
        synthesizability_model.predict_single(input_val)
        for input_val in input_series
    ]

    assert predicted_values == sorted(predicted_values, reverse=True)



**Q: To prove the point about automation and saving time: looking at the test executions below, what do you think about the opportunities for time-saved? Is it worth the tradeoff of writing tests in all cases?**

In [8]:
!pytest classic_tests.py

platform darwin -- Python 3.7.6, pytest-7.1.0, pluggy-0.13.1
rootdir: /Users/eddie/Documents/GitHub/mrs-s22-ds04-tutorial/02_testing_models
plugins: hypothesis-5.5.4, arraydiff-0.3, remotedata-0.3.2, openfiles-0.4.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 9 items                                                              [0m

classic_tests.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m                                               [100%][0m



In [9]:
!pytest physicality_tests.py

platform darwin -- Python 3.7.6, pytest-7.1.0, pluggy-0.13.1
rootdir: /Users/eddie/Documents/GitHub/mrs-s22-ds04-tutorial/02_testing_models
plugins: hypothesis-5.5.4, arraydiff-0.3, remotedata-0.3.2, openfiles-0.4.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 4 items                                                              [0m

physicality_tests.py [33mx[0m[33mx[0m[33mx[0m[33mx[0m[33m                                                [100%][0m

