Strange `HealthCheck` filtering failure #3497

langfield · 2022-10-28T21:11:02Z

See the following MWE.

"""MWE."""
from dataclasses import dataclass
from beartype import beartype
from beartype.typing import List

import hypothesis.strategies as st
from hypothesis import settings
from hypothesis.stateful import (
    RuleBasedStateMachine,
    Bundle,
    rule,
    initialize,
)

# pylint: disable=missing-class-docstring, missing-function-docstring


@beartype
@dataclass
class State:
    xs: List[str]


@st.composite
def elements(draw, states: Bundle) -> st.SearchStrategy[str]:
    state = draw(states)
    pith = "$|".join(state.xs)
    pattern: str = f"^(?!{pith}$)"
    return draw(st.from_regex(pattern))


class Machine(RuleBasedStateMachine):
    def __init__(self):
        super().__init__()
        self.state = State(xs=[])

    states = Bundle("states")

    @initialize(target=states)
    def init_state(self) -> State:
        return self.state

    @rule(x=elements(states=states))
    @beartype
    def add(self, x: str) -> None:
        self.state.xs.append(x)


Machine.TestCase.settings = settings(max_examples=100)
TestMachine = Machine.TestCase

This test fails with the following output.

(anki) user@computer:~$ pytest test.py
====================================================== test session starts =======================================================
platform linux -- Python 3.9.11, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/mal
plugins: timeout-2.1.0, hypothesis-6.56.3, mock-3.7.0
collected 1 item

test.py F                                                                                                                  [100%]

============================================================ FAILURES ============================================================
______________________________________________________ TestMachine.runTest _______________________________________________________

self = <hypothesis.stateful.Machine.TestCase testMethod=runTest>

    def runTest(self):
>       run_state_machine_as_test(cls)

conda/envs/anki/lib/python3.9/site-packages/hypothesis/stateful.py:400:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
conda/envs/anki/lib/python3.9/site-packages/hypothesis/stateful.py:222: in run_state_machine_as_test
    run_state_machine(state_machine_factory)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

factory = <class 'test.Machine'>

    @settings
>   @given(st.data())
E   hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 0 good ones. This will make your tests much slower, and also will probably distort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by a low max_leaves parameter in recursive() calls
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.filter_too_much to the suppress_health_check settings for this test.

conda/envs/anki/lib/python3.9/site-packages/hypothesis/stateful.py:107: FailedHealthCheck
----------------------------------------------------------- Hypothesis -----------------------------------------------------------
You can add @seed(225781618033313213706118631490124347638) to this test or run pytest with --hypothesis-seed=225781618033313213706118631490124347638 to reproduce this failure.
==================================================== short test summary info =====================================================
FAILED test.py::TestMachine::runTest - hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot ...
======================================================= 1 failed in 0.21s ========================================================

However, when I comment out the Machine.TestCase.settings line, it passes. Since I believe 100 examples is the default anyway, what is going on here?

The text was updated successfully, but these errors were encountered:

langfield · 2022-10-29T22:42:38Z

Also, is there a way to display the reason why a test was marked with Status.INVALID? It seems like it's either that the strategy is empty, or the depth has exceeded some maximum. Can this be printed somehow so you could see which strategy is causing the problem?

Zac-HD · 2022-10-29T23:24:47Z

Status.INVALID usually just means that you hit an assume(False) or too many retries on a .filter(), and unfortunately this is not introspectable without getting deep into private internal implementation details.

In this case, I think the problem is probably that your regex with negative lookahead isn't turning into a workable strategy?

There might also be a performance issue where our stateful "swarm testing" randomly disables a subset of rules, and this doesn't correct for having fewer rules - but that doesn't explain why every test case is filtered out.

langfield · 2022-10-30T00:23:04Z

@Zac-HD Thanks for the quick reply! I really appreciate it! Any ideas about why this issue disappears when the settings line is commented-out?

The st.from_regex() call was indeed causing the issue. I replaced it with a st.text().filter(lambda s: s not in <set>) and it gave me a much better valid-to-invalid ratio. Is there a cleaner way to do this?

Status.INVALID usually just means that you hit an assume(False) or too many retries on a .filter(), and unfortunately this is not introspectable without getting deep into private internal implementation details.

What if I just print some stuff in here?

hypothesis/hypothesis-python/src/hypothesis/internal/conjecture/data.py

Lines 929 to 933 in 45b35c5

    
           if strategy.is_empty: 
        
               self.mark_invalid() 
        
           if self.depth >= MAX_DEPTH: 
        
               self.mark_invalid()

Zac-HD · 2023-06-04T05:33:15Z

I think the depth issue has been substantially improved in #3654, while the other part is basically the same problem as #434 and unfortunately we don't have any great ideas on how to solve it beyond maybe "lots of observability engineering for PBT" which is an ongoing research topic.

Zac-HD added the performance go faster! use less memory! label Oct 29, 2022

Zac-HD closed this as completed Jun 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange `HealthCheck` filtering failure #3497

Strange `HealthCheck` filtering failure #3497

langfield commented Oct 28, 2022

langfield commented Oct 29, 2022

Zac-HD commented Oct 29, 2022

langfield commented Oct 30, 2022 •

edited

Loading

Zac-HD commented Jun 4, 2023

Strange HealthCheck filtering failure #3497

Strange HealthCheck filtering failure #3497

Comments

langfield commented Oct 28, 2022

langfield commented Oct 29, 2022

Zac-HD commented Oct 29, 2022

langfield commented Oct 30, 2022 • edited Loading

Zac-HD commented Jun 4, 2023

Strange `HealthCheck` filtering failure #3497

Strange `HealthCheck` filtering failure #3497

langfield commented Oct 30, 2022 •

edited

Loading