-
Notifications
You must be signed in to change notification settings - Fork 587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests fail with StopTest (OVERRUN) when generating a random integer (strategies.randoms) #3999
Comments
This comment was marked as outdated.
This comment was marked as outdated.
hypothesis/hypothesis-python/src/hypothesis/strategies/_internal/random.py Lines 272 to 273 in 218d8a1
I think we get the |
## Description of changes *Summarize the changes made by this PR.* The primary intent of this PR is to remove the is_metadata_valid invariant which was a workaround for our metadata strategy generating faulty metadata and then us special casing all uses of the record set strategy to handle invalid generations. This PR patches the metadata generation to not generate invalid metadata. - Adds modes in test_add to add a medium sized record set. This was initially timing out in hypothesis's generation. Hypothesis bounds the buffer size of the bytes it uses to do random generation, so generating larger metadata was resulting in examples being marked at OVERRUN by conjecture (gleaned from issues like HypothesisWorks/hypothesis#3999 + reading hypothesis code + stepping through it). This PR adds the ability to generate N fixed metadata entries and uniformly distribute them over the record set, reducing the overall entropy. - Fixes a bug that test_embeddings was not handling None as a possible metadata state, since this state was never generated. Added an explicit test for this. - Fixes a bug in the reference filtering implementation in test_filtering that did not handle None metadata since that state was never generated. This PR is forced to touch types related to metadata, which are incorrect and cause typing errors. I ignored the errors to minimize the surface area of this change and defer those changes to the pass mentioned in #2292. ## Test plan *How are these changes tested?* These changes are covered by existing tests, and - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes No external changes required.
## Description of changes *Summarize the changes made by this PR.* The primary intent of this PR is to remove the is_metadata_valid invariant which was a workaround for our metadata strategy generating faulty metadata and then us special casing all uses of the record set strategy to handle invalid generations. This PR patches the metadata generation to not generate invalid metadata. - Adds modes in test_add to add a medium sized record set. This was initially timing out in hypothesis's generation. Hypothesis bounds the buffer size of the bytes it uses to do random generation, so generating larger metadata was resulting in examples being marked at OVERRUN by conjecture (gleaned from issues like HypothesisWorks/hypothesis#3999 + reading hypothesis code + stepping through it). This PR adds the ability to generate N fixed metadata entries and uniformly distribute them over the record set, reducing the overall entropy. - Fixes a bug that test_embeddings was not handling None as a possible metadata state, since this state was never generated. Added an explicit test for this. - Fixes a bug in the reference filtering implementation in test_filtering that did not handle None metadata since that state was never generated. This PR is forced to touch types related to metadata, which are incorrect and cause typing errors. I ignored the errors to minimize the surface area of this change and defer those changes to the pass mentioned in chroma-core#2292. ## Test plan *How are these changes tested?* These changes are covered by existing tests, and - [x] Tests pass locally with `pytest` for python, `yarn test` for js, `cargo test` for rust ## Documentation Changes No external changes required.
I think I was likely on an outdated master when I claimed I had a reproducer, because I can't reproduce on latest hypothesis anymore and my reproducer just traces back to the fix in #3991. And yet, the traceback in OP indicates this is a distinct issue (ie not caused by @khardix are you able to share the offending strategy here? I can't immediately reproduce with synthetic overruns. |
@tybug Let me share the project so you have the complete picture, and then I'll point out the things I think are relevant. https://git.sr.ht/~khardix/shoji, the offending tests live in Versions I'm able to trigger the bug on: hypothesis-6.108.10, shoji-0.3.1 (or current master branch: The random strategy is just @stg.composite
def textlike_data(
draw: stg.DrawFn,
line_terminator: str = "\n",
min_line_len: int = 0,
max_line_len: int | None = _io.MAX_LINE_LEN,
) -> bytes:
"""Generate binary data that look like UTF-8 text.
Generates the data as a list of lines,
which makes it easier to reason about whether there is *any* line terminator preset.
"""
alphabet = stg.characters(codec="utf-8", exclude_characters=set(line_terminator))
line = stg.text(alphabet, min_size=min_line_len, max_size=max_line_len)
text = stg.lists(line)
final_terminator = stg.one_of(stg.just(""), stg.just(line_terminator))
return bytes(
line_terminator.join(draw(text)) + draw(final_terminator), encoding="utf-8"
) The def random_chunks(
data: bytes, min_size: int = 0, *, rng: Random = RNG
) -> abc.Iterator[bytes]:
"""Split data into random-sized chunks.
Arguments:
data: The data to split.
min_size: Minimal size of a chunk.
..note:: The last chunk may be smaller than `min_size`, even empty!
Keyword arguments:
rng: Random Number Generator for determining the chunk sizes.
Yields:
Chunks of `data`.
Raises:
ValueError: When `min_size` is negative.
"""
if min_size < 0:
raise ValueError(f"Negative minimal size: {min_size}")
if len(data) < min_size:
yield data
return
start_idx = 0
while start_idx < len(data):
remaining_len = len(data) - start_idx
end_idx = start_idx + rng.randint(min(min_size, remaining_len), remaining_len)
yield data[start_idx:end_idx]
start_idx = end_idx |
I suspect this is a bad interaction between hypothesis and pytest-trio: @given(stg.data())
async def test_bufreciever_receive_line_reads_all_data(data):
# force an overrun
for _ in range(5):
data.draw(stg.integers(0, 1 << 25_000)) traceback============================================================================ test session starts =============================================================================
platform darwin -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0
rootdir: /Users/tybug/Desktop/shoji
configfile: pyproject.toml
plugins: hypothesis-6.108.10, trio-0.8.0, anyio-4.4.0
collected 21 items / 20 deselected / 1 selected
tests/test__io.py F [100%]
================================================================================== FAILURES ==================================================================================
________________________________________________________________ test_bufreciever_receive_line_reads_all_data ________________________________________________________________
+ Exception Group Traceback (most recent call last):
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/runner.py", line 341, in from_call
| result: TResult | None = func()
| ^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/runner.py", line 242, in <lambda>
| lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
| return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
| return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 182, in _multicall
| return outcome.get_result()
| ^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_result.py", line 100, in get_result
| raise exc.with_traceback(exc.__traceback__)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
| teardown.throw(outcome._exception)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/threadexception.py", line 92, in pytest_runtest_call
| yield from thread_exception_runtest_hook()
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/threadexception.py", line 68, in thread_exception_runtest_hook
| yield
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
| teardown.throw(outcome._exception)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/unraisableexception.py", line 95, in pytest_runtest_call
| yield from unraisable_exception_runtest_hook()
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/unraisableexception.py", line 70, in unraisable_exception_runtest_hook
| yield
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
| teardown.throw(outcome._exception)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/logging.py", line 848, in pytest_runtest_call
| yield from self._runtest_for(item, "call")
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/logging.py", line 831, in _runtest_for
| yield
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
| teardown.throw(outcome._exception)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/capture.py", line 879, in pytest_runtest_call
| return (yield)
| ^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 167, in _multicall
| teardown.throw(outcome._exception)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/skipping.py", line 257, in pytest_runtest_call
| return (yield)
| ^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
| res = hook_impl.function(*args)
| ^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/runner.py", line 174, in pytest_runtest_call
| item.runtest()
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/python.py", line 1627, in runtest
| self.ihook.pytest_pyfunc_call(pyfuncitem=self)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_hooks.py", line 513, in __call__
| return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_manager.py", line 120, in _hookexec
| return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 139, in _multicall
| raise exception.with_traceback(exception.__traceback__)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pluggy/_callers.py", line 103, in _multicall
| res = hook_impl.function(*args)
| ^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/_pytest/python.py", line 159, in pytest_pyfunc_call
| result = testfunction(**testargs)
| ^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/tests/test__io.py", line 67, in test_bufreciever_receive_line_reads_all_data
| async def test_bufreciever_receive_line_reads_all_data(data):
| ^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/core.py", line 1699, in wrapped_test
| raise the_error_hypothesis_found
| BaseExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pytest_trio/plugin.py", line 195, in _fixture_manager
| yield nursery_fixture
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/pytest_trio/plugin.py", line 250, in run
| await self._func(**resolved_kwargs)
| File "/Users/tybug/Desktop/shoji/tests/test__io.py", line 69, in test_bufreciever_receive_line_reads_all_data
| data.draw(stg.integers(0, 1 << 25_000))
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/strategies/_internal/core.py", line 2115, in draw
| result = self.conjecture_data.draw(strategy, observe_as=f"generate:{desc}")
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2458, in draw
| return strategy.do_draw(self)
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/strategies/_internal/lazy.py", line 167, in do_draw
| return data.draw(self.wrapped_strategy)
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2452, in draw
| return strategy.do_draw(self)
| ^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/strategies/_internal/numbers.py", line 85, in do_draw
| return data.draw_integer(
| ^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2121, in draw_integer
| value = self.provider.draw_integer(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 1481, in draw_integer
| return self._draw_bounded_integer(
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 1780, in _draw_bounded_integer
| probe = self._cd.draw_bits(
| ^^^^^^^^^^^^^^^^^^^
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2612, in draw_bits
| self.__check_capacity(n_bytes)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2654, in __check_capacity
| self.mark_overrun()
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2679, in mark_overrun
| self.conclude_test(Status.OVERRUN)
| File "/Users/tybug/Desktop/shoji/myenv/lib/python3.12/site-packages/hypothesis/internal/conjecture/data.py", line 2666, in conclude_test
| raise StopTest(self.testcounter)
| hypothesis.errors.StopTest: 1
+------------------------------------
--------------------------------------------------------------------------------- Hypothesis ---------------------------------------------------------------------------------
You can add @seed(102826716451383249480663158101555025900) to this test or run pytest with --hypothesis-seed=102826716451383249480663158101555025900 to reproduce this failure.
========================================================================== short test summary info ===========================================================================
FAILED tests/test__io.py::test_bufreciever_receive_line_reads_all_data - BaseExceptionGroup: Exceptions from Trio nursery (1 sub-exception)
====================================================================== 1 failed, 20 deselected in 0.04s ====================================================================== pytest-trio 0.8.0, hypothesis 6-108.10. pytest-trio may be swallowing @Zac-HD maybe you have ideas here? 😅 I will likely be quite slow to identify the correct resolution with trio involved. (thanks a lot for the reproducer @khardix!) |
ooooohhh, a In the long term I'd really like Hypothesis to "just work" with (Base)ExceptionGroups; in the short term we might end up just catching and crashing on this as a more-interpretable improvement on the status quo. We do already depend on the |
I originally thought this is the same issue as #3874, but even after upgrading to
hypothesis-6.102.4
, my tests sometimes fail as described in $title.Run log from the last failure:
The text was updated successfully, but these errors were encountered: