Add Ability to Reproduce Failed Unit Tests #707

seanlaw · 2022-11-06T11:49:49Z

Currently, when there are precision-related issues, it is not obvious what random seed produced those failed unit/coverage tests. It would be nice to figure out a way to log this information in PyTest so that we can easily reproduce the failure. It'll require some trial-and-error and may not be easy/possible.

seanlaw · 2023-05-23T15:59:31Z

Based on some brief research, I've learned that, we should stop setting the np.random.seed since (according to this)

The problem comes in larger projects or projects with imports that could also set the seed. Using np.random.seed(number) sets what NumPy calls the global random seed, which affects all uses to the np.random.* module. Some imported packages or other scripts could reset the global random seed to another random seed with np.random.seed(another_number), which may lead to undesirable changes to your output and your results becoming unreproducible. For the most part, you will only need to ensure you use the same random numbers for specific parts of your code (like tests or functions).

So, if we have multi-threading when running our tests, each thread could potentially be setting the random global random seed at the same time and therefore it can change the arrays that are produced. Instead, within each test file, we should

Generate a random integer, test_state (essentially a seed)
Create a (local) pesudo random number generator, rng, based on the test_state
Use the random number generator, rng, everywhere within this test file where a random number or array is needed
Pass the test_state into every test function so that it will be written/recorded when a test fails

Untested Example

# In some test file
import numpy as np
test_state = np.random.randint(1_000_000)
rng = np.random.default_rng(test_state)

test_data = [
    (
        np.array([9, 8100, -60, 7], dtype=np.float64),
        np.array([584, -11, 23, 79, 1001, 0, -19], dtype=np.float64),
    ),
    (
        rng.random.uniform(-1000, 1000, [8]).astype(np.float64),
        rng.random.uniform(-1000, 1000, [64]).astype(np.float64),
    ),
]


@pytest.mark.parametrize("state", test_state)
@pytest.mark.parametrize("Q, T", test_data)
def test_compute_mean_std_multidimensional(state, Q, T):
    m = Q.shape[0]

    Q = np.array([Q, rng.random.uniform(-1000, 1000, [Q.shape[0]])])
    T = np.array([T, T, rng.random.uniform(-1000, 1000, [T.shape[0]])])

    ref_μ_Q, ref_σ_Q = naive_compute_mean_std_multidimensional(Q, m)
    ref_M_T, ref_Σ_T = naive_compute_mean_std_multidimensional(T, m)
    comp_μ_Q, comp_σ_Q = core.compute_mean_std(Q, m)
    comp_M_T, comp_Σ_T = core.compute_mean_std(T, m)

    npt.assert_almost_equal(ref_μ_Q, comp_μ_Q)
    npt.assert_almost_equal(ref_σ_Q, comp_σ_Q)
    npt.assert_almost_equal(ref_M_T, comp_M_T)
    npt.assert_almost_equal(ref_Σ_T, comp_Σ_T)

@pytest.mark.parametrize("state", test_state)
@pytest.mark.parametrize("Q, T", test_data)
def test_njit_sliding_dot_product(state, Q, T):
    ref_mp = naive_rolling_window_dot_product(Q, T)
    comp_mp = core._sliding_dot_product(Q, T)
    npt.assert_almost_equal(ref_mp, comp_mp)

Maybe something like this? Note that the random state is set once at the beginning of the file and then it is iterated upon to generate the necessary data but the initial state is never changed. I believe/hypothesize that one would need to explicitly set the random state to the test_state in order to get reproduce the failed test.

seanlaw added help wanted Extra attention is needed testing CI/CD labels Nov 6, 2022

seanlaw mentioned this issue May 23, 2023

add param isconstant to ostinato #864

Merged

seanlaw added this to the Python 1.12.1/1.13.0 Release milestone May 8, 2024

seanlaw modified the milestones: Python 1.13.0 Release, Python 1.14.0 Release Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ability to Reproduce Failed Unit Tests #707

Add Ability to Reproduce Failed Unit Tests #707

seanlaw commented Nov 6, 2022 •

edited

Loading

seanlaw commented May 23, 2023

Add Ability to Reproduce Failed Unit Tests #707

Add Ability to Reproduce Failed Unit Tests #707

Comments

seanlaw commented Nov 6, 2022 • edited Loading

seanlaw commented May 23, 2023

Untested Example

seanlaw commented Nov 6, 2022 •

edited

Loading