New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop shrink phase after timeout when progress is very slow #2340
Comments
Here's another py-spy top output after monitoring for a longer time:
|
Sounds like this may be the same problem as #2308? If you could check whether it happens on those earlier versions that would be really helpful! |
I've run it 150 times on 4.48.1 and not run into this problem at all (it's occasionally 3-4x slower than the median, but some examples are much slower to test than others so it's not unexpected), and it's also 2-3x faster than 5.4.1 (which may of course just be because it tests less e.g. I saw some later version made stateful testing more likely to use the maximum number of steps). So it may well be related to #2308. Let me know if it will help to bisect with any other versions or commits. |
@bmerry - can you test again with Hypothesis >= 5.8.5? We've just merged fixes for a (rarely) infinite and a near-infinite loop might you might have been triggering. |
Thanks! Will give it a go. Since it's an intermittent problem it'll take me a while to confirm. |
@Zac-HD Unfortunately that doesn't seem to fix it. Just tried with Hypothesis 5.9.1 and was still able to reproduce the issue after a number of tries. py-spy output is similar to before:
Is there any way I can help narrow this down? e.g. is there some way I can get it to print out the seed before each execution and then set that seed again so that it can be reproduced reliably? |
https://hypothesis.readthedocs.io/en/latest/reproducing.html#reproducing-a-test-run-with-seed and just increment through 0, 1, 2, 3...? Unfortunately our tools tend to assume that things crash rather than hang on failure so it's not as elegant we I'd like 😅 |
Thanks, that seems to have done the trick. So here are steps to reproduce the issue.
Just in case the random generation is somehow affected by dependencies, here's my output from
|
I'm just looking into this now and I think there are two problems happening here.
The way to see that (2) is happening is to put prints in your initialisation around the line |
@DRMacIver thanks for investigating. (1) might explain why I haven't been able to reproduce the issue with the latest version of fakeredis: I've presumably fixed whatever failure case hypothesis is finding, so it doesn't need to shrink. I've added the print statements you suggested for (2) and am leaving it to run to see if I hit it - but I doubt I hit this previously, because py-spy was still showing hypothesis functions consuming CPU time. One possibility is that some example got redis into a state where it either hung or was doing something slow/CPU-heavy; redis is single-threaded so CPU-intensive operations block all clients. |
Were they still actively consuming CPU time or was the main thread blocked waiting for redis so nothing was consuming CPU time, leaving the percentages unchanged? (I don't actually know that much about how py-spy works so this is just a guess) |
Tentatively, adding socket timeout to your redis connection seems to make this problem go away. i.e. |
The columns showing seconds spent (OwnTime and TotalTime) were rising, so I don't think it was just blocked. Interestingly, seed 18 has finished for me after 25 minutes, and without hanging in redis. 18 is just the first seed that ran for more than a few minutes, so I'm guessing there are probably worse seeds. Apart from the redis hang, it sounds like the problem is now understood, and the question is how to solve it. Obviously a magically much faster shrinker would be nice, but smaller workarounds that would make hypothesis more valuable for me would be: |
Yeah, I was thinking of adding a time limit to shrinking. This won't help if you're hitting an actual hang though. We used to have this as a more general feature which we dropped, but it's only problematic during generation so I think it would be reasonable to bring it back for shrinking. |
Would re-running your tests then resume shrinking, and potentially take another $timeout seconds to complete again? I think this is probably the best available option, but ouch... probably helpful to print a short explanation of what happened when we time out of shrinking, and maybe a link to the As a shrink-phase-timeout duration, I nominate |
Sorry, I'm not able to parse that. |
If my test function is super fast, I'd still like to spend a reasonable amount of time on shrinking. But I'm guessing users with fast test functions won't have bothered setting the deadline (I wasn't even aware of the setting until now), so that should work out quite nicely.
I would certainly care if it ended up being too high and led to Travis killing the job. Maybe provide a setting but default it to the formula you proposed? |
That's the idea! 100 seconds should be enough for normal use-cases, and we can take as long as 500 for users who have disabled deadlines (who know their test is slow and don't seem to mind waiting). (gah, markdown issues with |
Yeah, I was thinking this would cause Hypothesis to emit a warning and request a bug report. |
I'm using hypothesis very successfully (thanks!) in fakeredis. However I have a sporadic issue that maybe you can give me some insight into debugging. Mostly the test suite finishes in about a minute, but sometimes it seems to run forever (at least 30 minutes, but I've never actually seen it finish although I can see forward progress). py-spy top shows that the CPU time is all being spent in hypothesis internals:
I could see this happening if I'd ended up trying to sample a strategy with a difficult filter, but the odd part is that things remain slow across many, many instantiations of my RuleBasedStateMachine. I would have assumed that even if one evolution of the state machine backed things into a corner where new rules were hard to generate, it would no longer be a problem once things were reset for the next iteration.
If you want to try reproducing this, check out fakeredis (URL above, I'm seeing this with revision dd10a73), install the dependencies from its requirements.txt, and run
pytest test_fakeredis_hypothesis.py
repeatedly. I can usually get the slowdown to occur within half and hour.Is there some way I can log the seed at the start, so that once I've triggered the bad case I can reliably reproduce it?
The text was updated successfully, but these errors were encountered: