# Chapter 4: Property-Based Testing

In Chapters 1–3, we moved from a quick, monolithic solver to a modular design with unit tests.
Unit tests gave us confidence, but only for cases that we handpicked.

Now, we’ll take the next step by introducing property-based testing via the Hypothesis
library:

 - Instead of manually writing individual test cases, we’ll state general properties such as conservation, symmetry, maximum principle intuition, or linearity.
 - Hypothesis will generate many random inputs automatically, exploring cases we might never think to test.
 - When a property fails, Hypothesis will shrink the failing case to the simplest possible counterexample, the testing equivalent of a controlled scientific experiment.

## What is a "property"?

A property is a statement about your program that you expect to hold for all valid inputs.
It’s a general rule, not just a single example.

For our 1-D heat equation solver, some key properties might include:

- **Conservation:** total heat stays constant.
- **Telescoping property:** The sum of the divergence over all cells equals the net flux through the boundaries.
- **Symmetry:** symmetric initial states stay symmetric under symmetric BCs.
- **Monotonicity:** non-decreasing initial states remain non-decreasing.

## Unit tests vs. property-based tests:
 - Unit tests: "for this input, expect this output"
 - Property tests: "for all inputs satisfying these preconditions, this relation should hold"

Property tests complement unit tests:
 - Unit tests are concrete and focused.
 - Property tests are broader and can reveal edge cases you never anticipated.

## How this relates to Chapter 2

In Chapter 2, we introduced preconditions, postconditions, and invariants as lightweight specifications for our solver.

Property-based testing is a natural extension of these ideas:

 - The precondition {P} now defines the input space that Hypothesis will explore.
 - The postcondition {Q} or invariant becomes the property being checked.
 - When a test fails, Hypothesis provides a counterexample, helping you determine whether:
   - The precondition was too weak (i.e., it allowed invalid inputs).
   - The postcondition was too strong (i.e., it ruled out valid outputs).
   - There’s a bug in the implementation.

Think of this as turning your Chapter 2 contracts into scientific hypotheses:

 - "If the precondition holds, the property should always be true."

Hypothesis plays the role of the experimenter, probing your code with many randomized “experiments” to try and refute your claim.

## The `Hypothesis` library

Hypothesis is a Python library for property-based testing.
Instead of handpicking test cases, you describe the space of valid inputs, and it generates random examples within that space.

If a failure is found, Hypothesis automatically shrinks the input to the simplest version that still causes the failure. This helps you debug by giving a clear, minimal failing example.

Let's revisit our `div` function from the previous chapters.

In [1]:
def div(x, y):
    assert y != 0           # P    (precondition)
    res = x / y             # code (implementation)
    assert res * y == x     # Q    (postcondition)
    return res

In Chapter 3, we tested this manually:

In [2]:
def test_division():
    div(7, 25)

test_division()

AssertionError: 

Running this fails because 7 / 25 cannot be represented exactly in floating point.
But notice: we only found this case because we happened to test exactly this input.
If we had tested other pairs like the ones listed below, the test
might have passed, which would leave us unaware of the issue.

In [3]:
def test_more_divisions():
    div(7, 26)
    div(7, 24)
    div(7, 23)
    div(6, 1)
    div(2, 4)
    div(0, 1)
    div(46, 7)
    div(7657, 26)
    div(1, 3)
    div (23, 23424)
    div(1000, 3)
    div(123,456)
    print("No assertion violations encountered!")

test_more_divisions()

No assertion violations encountered!


### Hypothesis to the rescue

Let's use Hypothesis to generate a wide range of test cases for our division function.

In [4]:
from hypothesis import assume, given, strategies as st

@given(st.integers(), st.integers())
def test_div_property(x, y):
    assume(y != 0)
    div(x, y)

The above test definition highlights the key components of property-based testing with Hypothesis:

1. The `@given` decorator
    - This is the main entry point for defining a property-based test.
    - It takes zero or more strategies as arguments, which describe how Hypothesis should generate input data for the test.

2. Strategies
    - Strategies define the space of possible inputs.
    - In our example, we used `st.integers()` to generate random integers for both x and y.

We also used the `assume` function to enforce our precondition {P}.
Here, the precondition is that `y != 0` to avoid division by zero. Adding `assume(y != 0)` ensures that we only test valid inputs.

With these pieces in place, we can now run the property-based test and let Hypothesis explore a wide range of inputs automatically:

In [5]:
test_div_property()

AssertionError: 

Running the above property-based test should reveal a pair of integers (x, y) that violate the division property. Recall, the inputs are generated randomly, so you might see different failing cases each time you run the test. 

### Shrinking

Hypothesis automatically *shrinks* the input, step by step, until it finds the simplest failing 
example. This makes debugging much easier, since you don’t need to wade through massive random inputs.

### Combining strategies

In the above example, we only tested integer division. However, we can also test floating-point division by using the `st.floats()` strategy from Hypothesis. Better yet, we can combine both strategies to test the `div` function with a wider range of inputs:

In [6]:
from hypothesis import assume, given, strategies as st

# Combine integers and floats into a single strategy:
numbers = st.one_of(st.integers(), st.floats(allow_nan=False, allow_infinity=False))

@given(numbers, numbers)
def test_div_property(x, y):
    assume(y != 0)
    div(x, y)

In [7]:
test_div_property()

AssertionError: 

### Controlling how many examples are generated

By default, `hypothesis` generates 100 random input values. You can control this with the @settings decorator:

In [8]:
from hypothesis import given, settings, strategies as st

ctr = 0

@settings(max_examples=10)
@given(st.integers())
def test_random(x):
    global ctr; ctr += 1
    print(f"Test no. {ctr} with random input {x}")

test_random()

Test no. 1 with random input 0
Test no. 2 with random input -5980
Test no. 3 with random input 101
Test no. 4 with random input 28979
Test no. 5 with random input 15464
Test no. 6 with random input 23
Test no. 7 with random input 31039
Test no. 8 with random input -304902678
Test no. 9 with random input 4378
Test no. 10 with random input 46


In the above cell, we reduced `max_examples` from the default 100 to 10. In practice, one
should consider increasing the `max_examples` parameter based on the complexity of the property being tested, particularly in the absence of failing cases when no counterexamples are found.

## Back to the Heat Equation Solver

Now let's use Hypothesis to test our heat equation solver. Execute the below cell that contains the entire source code of our solver, so it becomes available for property-based testing.

In [9]:
# %load heat1d.py

from dataclasses import dataclass
from pytest import approx

vec = list[float]

@dataclass
class Mesh:
    """Uniform 1-D mesh."""

    dx: float  # cell size
    N: int     # number of cells

    def cell_field(self) -> vec:
        return [0.0] * self.N

    def face_field(self) -> vec:
        return [0.0] * (self.N + 1)

def apply_bc(f_out: vec, bc: vec) -> None:
    """Apply BCs by overriding first and last face quantities (f_out)."""
    assert len(f_out) > 1, "face field size too small"
    assert len(bc) == 2, "bc must be of size 2"
    f_out[0], f_out[-1] = bc[0], bc[1]

def diffusive_flux(f_out: vec, c: vec, kappa: float, dx: float) -> None:
    """Given a cell field (c), compute the diffusive flux (f_out)."""
    assert len(f_out) == len(c) + 1, "Size mismatch"
    assert dx > 0 and kappa > 0, "Non-positive dx or kappa"
    for i in range(1, len(f_out) - 1):
        f_out[i] = -kappa * (c[i] - c[i-1]) / dx

def divergence(c_out: vec, f: vec, dx: float) -> None:
    """Compute the divergence of face quantities (f) and store in (c_out)."""
    assert len(c_out) == len(f) - 1, "Size mismatch"
    assert dx > 0, "Non-positive dx"
    for i in range(len(c_out)):
        c_out[i] = (f[i] - f[i+1]) / dx

def step_heat_eqn(u_inout: vec, kappa: float, dt: float, mesh: Mesh, bc: vec):
    """Advance cell field u by one time step using explicit Euler method."""
    assert dt > 0, "Non-positive dt"
    assert mesh.N == len(u_inout), "Size mismatch"

    F = mesh.face_field()
    divF = mesh.cell_field()

    apply_bc(F, bc)
    diffusive_flux(F, u_inout, kappa, mesh.dx)
    divergence(divF, F, mesh.dx)

    for i in range(mesh.N):
        u_inout[i] += dt * divF[i]

def solve_heat_eqn(u0: vec, kappa: float, dt: float, nt: int, dx: float, bc: vec) -> vec:
    """Orchestrate nt steps over cell field u."""

    assert nt > 0, "Number of time steps must be positive"
    assert dt <= (dx ** 2) / (2 * kappa), "Stability condition not met"

    mesh = Mesh(dx, N=len(u0))
    u = u0.copy()
    for _ in range(nt):
        step_heat_eqn(u, kappa, dt, mesh, bc)

    return u



### Scaffolding

Before encoding our properties, we need to set up some scaffolding to facilitate testing.

First, let's define some strategies for generating *sane* floating-point numbers for our tests:


In [10]:
floats_st = st.floats(min_value=-1e3, max_value=1e3, allow_nan=False, allow_infinity=False)
kappa_st  = st.floats(min_value=1e-4, max_value=10, allow_nan=False, allow_infinity=False)
dx_st     = st.floats(min_value=1e-3, max_value=10, allow_nan=False, allow_infinity=False)
dt_st     = st.floats(min_value=1e-6, max_value=10, allow_nan=False, allow_infinity=False)

Next, let's define a mesh size strategy to generate reasonable and manageable mesh sizes,
as well as a strategy for generating a tuple of boundary conditions.

In [11]:
# Mesh sizes
n_st = st.integers(min_value=3, max_value=40)

# Boundary faces (qL, qR)
bc_st= st.tuples(
    st.floats(min_value=-100.0, max_value=100.0, allow_nan=False, allow_infinity=False),
    st.floats(min_value=-100.0, max_value=100.0, allow_nan=False, allow_infinity=False),
)

### Telescoping Property

Before testing overall conservation, let’s start one level deeper with the local flux balance
that makes conservation possible.

In finite-volume discretizations, fluxes between neighboring cells telescope: 
the flux leaving one cell enters the next (except in the presence of sources, 
variable cell volumes or densities, or numerical errors.)

Consequently, when we sum the discrete divergence over all cells, the interior
fluxes cancel pairwise, leaving only the boundary contributions.
in other words, the total divergence equals the net flux through the boundaries:

$\qquad
\sum_{i=0}^{N-1} (\nabla \cdot F)_i = F_0 - F_N
\qquad$

Recall the encoding of this property:

In [12]:
def telescoping(c: vec, f: vec, dx: float) -> bool:
    """Check the finite volume telescoping property."""
    total_divergence = sum(c) * dx
    boundary_flux = f[0] - f[-1]
    return total_divergence == approx(boundary_flux)

We are now ready to implement the symmetry property test using Hypothesis:

### Exercise: 

Below is the Hypothesis test for the telescoping property.
Insert the missing assumption(s) in the below test to ensure valid inputs for telescoping property using the `assume` function:

In [13]:
@given(
    divF = st.lists(floats_st, min_size=3, max_size=20),
    F     = st.lists(floats_st, min_size=4, max_size=21),
    dx    = dx_st
)
def test_telescoping_property(divF: vec, F: vec, dx: float):
    # todo: assume(...)
    divergence(divF, F, dx)
    assert telescoping(divF, F, dx), "Telescoping property violated"

In [14]:
#test_telescoping_property()


### Conservation

Recall the conservation property specification:

In [15]:
def heat_is_conserved(u_old: vec, u_new: vec, dt: float, dx: float, bc: vec) -> bool:
    """Check if heat is conserved."""
    lhs = sum(u_new) * dx
    rhs = sum(u_old) * dx + dt * (bc[0] - bc[1])
    return lhs == approx(rhs)

### Exercise:

Similar to how we wrote a test function for the telescoping property, we can now implement
a property-based test for conservation using Hypothesis. Write a property-based test function
named `test_conservation_property` that checks if heat is conserved under arbitrary boundary conditions
at one time step. 
Follow up: If this test passes, can we be certain that our solver conserves heat over multiple time steps? Why or why not?

In [16]:
# TODO: Implement test_conservation_property using Hypothesis

Answer:

In [91]:
@given(
    u_old = st.lists(floats_st, min_size=3, max_size=20),
    kappa = kappa_st,
    dt    = dt_st,
    dx    = dx_st,
    bc    = bc_st
)
def test_conservation_property(u_old: vec, kappa: float, dt: float, dx: float, bc: tuple[float, float]):

    u = u_old.copy()
    mesh = Mesh(dx, N=len(u))

    step_heat_eqn(
        u_inout=u,
        kappa=kappa,
        dt=dt,
        mesh=mesh,
        bc=bc
    )

    assert heat_is_conserved(u_old, u, dt, dx, bc), "Conservation property violated"

Passing the one-step conservation property confirms that the solver preserves total heat under
insulated boundaries for a single update. In exact arithmetic and insulated boundaries at every
step, this one-step law would imply conservation over all steps by induction because conservation
is a ***loop invariant***: 

> loop invariant: a property that remains true before and after every iteration of the timestepping loop.

In other words, total heat being conserved is an invariant of the update rule.
In real code, however, floating-point round-off and truncation errors in the 
spatial and temporal discretizations can accumulate over time, breaking perfect invariance.
So, while the property shows the solver is formally conservative per step, true long-term 
conservation depends on floating-point precision and numerical stability.


### Symmetry

We can test the symmetry property of the heat equation solver by checking if the solution remains
 symmetric when the initial conditions are mirrored and boundary conditions are applied symmetrically.
  To check this property, let's first define a helper function to generate symmetric initial conditions.

**Feature Spotlight:** The `st.builds()` method can be used to build complex data structures by
combining simpler strategies. Below, we use it to create symmetric arrays.

In [104]:
def symmetric_field(min_n=3, max_n=31):
    # Produce palindromic arrays (odd length preferred for strict central symmetry)
    # Build half then mirror
    half = st.lists(floats_st, min_size=(min_n//2), max_size=(max_n//2))
    center = floats_st
    return st.builds(lambda h, c: h + [c] + list(reversed(h)), half, center)

In [105]:
@given(symmetric_field(), dx_st, kappa_st, dt_st, floats_st)
def test_symmetry_preservation(u0, dx, kappa, dt, q):
    n = len(u0)
    assume(n >= 3 and n % 2 == 1)  # keep a clear center

    mesh = Mesh(dx=dx, N=n)

    u = u0[:]  # copy to update
    step_heat_eqn(u, kappa, dt, mesh, bc=[q, -q])

    # palindromic within FP tolerance
    for i in range(n//2):
        assert u[i] == approx(u[-i-1])

In [106]:
test_symmetry_preservation()
print("Test completed")

Test completed


### Monotonicity

Diffusion smooths. One way to express this intuition discretely is through monotonicity preservation:
if a temperature profile is nondecreasing at the start of a step, it should remain so after the update.
In other words, diffusion should not create new inversions.

This property is a discrete analogue of the maximum principle: 
if a scheme never inverts local order, it can’t create new extrema either.

Below we encode the monotonicity property using Hypothesis.
We generate random nondecreasing initial profiles, advance one step, and check that ordering is preserved.

In [145]:
def is_monotone_nondecreasing(u: vec, atol=1e-10) -> bool:
    """Check if the list u is non-decreasing within a tolerance."""
    return all(u[i] <= u[i+1] + atol for i in range(len(u)-1))

def nondecreasing_lists(min_size=3, max_size=40):
    """Generate non-decreasing lists of floats strategy."""
    deltas = st.lists(
        st.floats(min_value=0.0, max_value=10.0),
        min_size=min_size, max_size=max_size
    )
    base = floats_st
    return st.builds(lambda b, ds: [b + sum(ds[:i+1]) for i in range(len(ds))], base, deltas)

@given(
    u0  = nondecreasing_lists(),
    dt  = dt_st,
    dx  = dx_st,
    kappa = kappa_st
)
def test_monotonicity_preserved_one_step(u0, dt, dx, kappa):
    n = len(u0)
    assume(n >= 3)
    assume(kappa * dt / (dx ** 2) < 0.5)  # stability condition
    mesh = Mesh(dx=dx, N=n)
    bc = (0.0, 0.0)  # insulated

    u = u0.copy()
    step_heat_eqn(u, kappa, dt, mesh, bc)

    assert is_monotone_nondecreasing(u), "Monotonicity violated"


In [147]:
test_monotonicity_preserved_one_step()
print("Test completed")

Test completed


## What we just did

Property-based testing lets you codify computational, scientific, and mathematical properties as executable claims:

 - We did prep-work: defined strategies for generating valid inputs.
 - We encoded properties: telescoping, conservation, symmetry, monotonicity
 - Hypothesis searched broad input spaces and shrank counterexamples when things went wrong.

 - This directly operationalizes the scientific method for software: state a hypothesis (property), attempt refutation (generation + shrinking), refine code or specs.

## Looking Ahead

In Chapter 5, we’ll step across the next rung of the rigor ladder: theorem proving—automating reasoning over all inputs within a finite domain and turning some of these properties into machine-checked contracts.


---

R3Sw tutorial by Alper Altuntas (NSF NCAR). Guest lecture by **Deepak Cherian** (Earthmover). Sponsored by the BSSw Fellowship Program. © 2025.

Cite as: Alper Altuntas, Philip Zucker, Deepak Cherian, Adrianna Foster, Manish Venumuddula, and Helen Kershaw. (2025). *"Rigor and Reasoning in Research Software (R3Sw) Tutorial."* https://www.alperaltuntas.com/R3Sw