Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests caused by mishandling of global PRNG state #1266

Closed
bukzor opened this issue May 5, 2018 · 12 comments · Fixed by #1295
Closed

Flaky tests caused by mishandling of global PRNG state #1266

bukzor opened this issue May 5, 2018 · 12 comments · Fixed by #1295
Assignees
Labels
enhancement it's not broken, but we want it to be better flaky-tests for when our tests only sometimes pass

Comments

@bukzor
Copy link
Contributor

bukzor commented May 5, 2018

I've been running into this problem: https://paste.pound-python.org/show/qchV5aFbMkvs0uoYfArd/

This particular test has a @flaky decorator but no @given decorator. Despite the decorator, the problem was happening 100% reproducibly for me, and upon debugging, I found that the stdlib random singleton was in a deterministic state. Further, when running the test alone (-k "half and endpoint"), the test passed, and no longer had determinism.

This is caused by cross-test pollution. The @given decorator results in random.seed(0) running, so that the prior test causes the next test to be deterministic even though it wouldn't normally.

I'm not sure of the best fix, but this patch fixes the problem for me, and obviates the @flaky decorator too:

--- b/hypothesis-python/tests/common/debug.py
+++ a/tests/common/debug.py
@@ -73,6 +73,10 @@
         definition, condition=None,
         settings=None, random=None
 ):
+    if random is None:
+      from random import seed
+      seed(0)
+
     settings = Settings(
         settings,
         max_examples=10000,
@Zac-HD Zac-HD added the flaky-tests for when our tests only sometimes pass label May 6, 2018
@Zac-HD
Copy link
Member

Zac-HD commented May 6, 2018

I think the proper solution will avoid causing this reseeding pollution in the first place. For example, a new context manager that can be used in core.py:

@contextlib.context_manager
def deterministic_context(seed=0):
    state = random.getstate()
    random.seed(seed)
    try:
        yield
    finally:
        random.setstate(state)

And then ban direct use of random.seed to avoid similar problems in future.

Note: this is equivalent to building the random_module() strategy into every use of @given, which seems like a user-interface win to me.

@bukzor
Copy link
Contributor Author

bukzor commented May 6, 2018

@Zac-HD At that point, I think you'd be happier with an autouse pytest fixture. It's much more foolproof, and I feel sure you won't regret it.

In tests/conftest.py:

import pytest
import random

@pytest.fixture(autouse=True)
def deterministic_random():
    random.seed(0)

If you want to assert that seed isn't used, you can take it a bit farther:

import pytest
import random

@pytest.yield_fixture(autouse=True)
def deterministic_random():
    random.seed(0)
    seed = random.seed
    del random.seed
    yield
    random.seed = seed

@bukzor
Copy link
Contributor Author

bukzor commented May 6, 2018

I think the proper solution will avoid causing this reseeding pollution in the first place.

To be clear, there's no reseeding involved in the problem above. The test does no seeding at all, but other tests in the module do, so it has differing behavior when run alone or along with all tests in the module.

@Zac-HD
Copy link
Member

Zac-HD commented May 7, 2018

I mean that the other tests should not have the side-effect of changing the random state, which can be implemented via the decorator I suggested.

@Zac-HD
Copy link
Member

Zac-HD commented May 9, 2018

Note: fixing this issue will remove the last reason to use the random_module strategy, which should therefore be turned into a no-op and deprecated.

Edit: if we integrate that into the internal runner mechanisms, which looks impractical.

@Zac-HD Zac-HD added the enhancement it's not broken, but we want it to be better label May 10, 2018
@bukzor
Copy link
Contributor Author

bukzor commented May 14, 2018

@Zac-HD Is there a ticket for that idea? How would I generate a pareto-random number in that world? Right now I can use st.randoms() and random.pareto().

@Zac-HD Zac-HD self-assigned this May 14, 2018
@Zac-HD
Copy link
Member

Zac-HD commented May 14, 2018

This issue is the ticket! I've assigned myself to make that clearer 😄

In that world you can use st.randoms() and methods on the instance it gives you (as now), or just use the global random - Hypothesis will pick a seed for each example and take care of resetting the global random state after each example.

@DRMacIver
Copy link
Member

Note: fixing this issue will remove the last reason to use the random_module strategy, which should therefore be turned into a no-op and deprecated.

Wait, what? It might remove our last use of it, but the random_module strategy serves a perfectly valid (if somewhat niche) use case. Why would a change about how we test our code remove that?

@Zac-HD
Copy link
Member

Zac-HD commented May 14, 2018

Sorry, looks like I need to actually write out my chain of logic here 😊

  • The failure is caused by the way that every test with @given includes a random.seed(0) in core.py, but never resets the PRNG to its previous state. This massively degrades the quality of randomness in any test suite using Hypothesis, which is very bad. VERY VERY BAD. See e.g. test_uses_provided_seed passes without the @seed decorator #1174 for an example where this has bitten us 😭. We should fix the general problem rather than patching the tests to avoid it, especially because this impacts users too.

  • The random_module() strategy is a special magic thing that doesn't give you a value (users: "what???"), but instead draws an integer seed and does the whole save/seed/resetting thing for you. Sadly, this only helps if you (a) know it exists, (b) know you need it for a particular test, and (c) remember to use it.

  • So we need to do a thing where we save and reset the PRNG state in @given (and IIRC find, and for state machines - see Stateful testing doesn't explicitly seed global random #702) to avoid polluting the randomness of the rest of the test suite. If we do this, the damage is contained to each test where random_module() should have been used, but wasn't. 🎉

Before doing some deeper investigation I had hoped it would be practical to integrate the random_module() functionality into internals, but adding seeding to the mix makes it a lot harder.

@DRMacIver
Copy link
Member

I agree that we should reset the random module state after given rather than letting random.seed(0) pollute the global state, and I'd be happy for suggestions for a better API for random_module (I agree it's weird), but the random_module strategy is there not to clean up state (which it doesn't - the random.seed(0) is called beforehand), but to allow people who are testing code which depends on the global random number generator to vary the seed between examples.

@Zac-HD
Copy link
Member

Zac-HD commented May 14, 2018

I'll open a pull to reset state in a few days when life calms down a little 😄

We agree on the actual purpose of random_module being to vary the PRNG state between examples. However, it does reset the state in cleanup... right back the the random.seed(0) state that we set up earlier, which causes the original problem!

class RandomModule(SearchStrategy):
def do_draw(self, data):
data.can_reproduce_example_from_repr = False
seed = data.draw(integers())
state = random.getstate()
random.seed(seed)
cleanup(lambda: random.setstate(state))
return RandomSeeder(seed)

@DRMacIver
Copy link
Member

However, it does reset the state in cleanup...

Yes, sorry, that's what I meant by "the random.seed(0) is called beforehand". i.e. random_module does exactly as much state cleanup as without it effectively (less even! At least the state of the system after can vary a bit with normal @given)

@Zac-HD Zac-HD changed the title failure of "float nastiness: half bounded generates endpoint" Flaky tests caused by mishandling of global PRNG state Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement it's not broken, but we want it to be better flaky-tests for when our tests only sometimes pass
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants