Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop shrink phase after timeout when progress is very slow #2340

Closed
bmerry opened this issue Feb 4, 2020 · 19 comments · Fixed by #2421
Closed

Stop shrink phase after timeout when progress is very slow #2340

bmerry opened this issue Feb 4, 2020 · 19 comments · Fixed by #2421
Labels
performance go faster! use less memory!

Comments

@bmerry
Copy link

bmerry commented Feb 4, 2020

I'm using hypothesis very successfully (thanks!) in fakeredis. However I have a sporadic issue that maybe you can give me some insight into debugging. Mostly the test suite finishes in about a minute, but sometimes it seems to run forever (at least 30 minutes, but I've never actually seen it finish although I can see forward progress). py-spy top shows that the CPU time is all being spent in hypothesis internals:

Total Samples 10500
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                          
 12.00%  24.00%    8.51s    21.46s   start (hypothesis/internal/conjecture/data.py:144)
  6.00%  26.00%    6.14s    23.95s   try_shrinking_blocks (hypothesis/internal/conjecture/shrinker.py:869)
  9.00%  11.00%    5.25s     7.75s   __getitem__ (hypothesis/internal/conjecture/data.py:447)
  4.00%   4.00%    5.11s     5.11s   __getitem__ (hypothesis/internal/conjecture/junkdrawer.py:90)
  5.00%  28.00%    4.02s    24.64s   try_shrinking_blocks (hypothesis/internal/conjecture/shrinker.py:870)
  5.00%   5.00%    4.00s     4.00s   lazy_calculate (hypothesis/internal/conjecture/data.py:271)
  3.00%   4.00%    3.86s     5.38s   __getitem__ (hypothesis/internal/conjecture/data.py:442)
  5.00%   8.00%    3.66s     7.38s   starts (hypothesis/internal/conjecture/data.py:366)
  7.00%  13.00%    3.54s     8.60s   end (hypothesis/internal/conjecture/data.py:151)
  1.00%   1.00%    2.95s     2.95s   __getitem__ (hypothesis/internal/conjecture/data.py:441)
  1.00%   1.00%    2.69s     2.69s   __getitem__ (hypothesis/internal/conjecture/junkdrawer.py:92)
  3.00%   3.00%    2.03s     2.03s   __getitem__ (hypothesis/internal/conjecture/data.py:443)
  3.00%  16.00%    2.00s    10.59s   try_shrinking_blocks (hypothesis/internal/conjecture/shrinker.py:872)
  1.00%   2.00%    1.69s     2.78s   append (hypothesis/internal/conjecture/junkdrawer.py:117)
  0.00%   0.00%    1.62s     1.62s   append (hypothesis/internal/conjecture/junkdrawer.py:116)
  2.00%   2.00%    1.62s     1.62s   __init__ (hypothesis/internal/conjecture/data.py:107)

I could see this happening if I'd ended up trying to sample a strategy with a difficult filter, but the odd part is that things remain slow across many, many instantiations of my RuleBasedStateMachine. I would have assumed that even if one evolution of the state machine backed things into a corner where new rules were hard to generate, it would no longer be a problem once things were reset for the next iteration.

If you want to try reproducing this, check out fakeredis (URL above, I'm seeing this with revision dd10a73), install the dependencies from its requirements.txt, and run pytest test_fakeredis_hypothesis.py repeatedly. I can usually get the slowdown to occur within half and hour.

Is there some way I can log the seed at the start, so that once I've triggered the bad case I can reliably reproduce it?

@bmerry
Copy link
Author

bmerry commented Feb 4, 2020

Here's another py-spy top output after monitoring for a longer time:

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                          
  1.00%   1.00%   202.6s    202.6s   __eq__ (hypothesis/internal/lazyformat.py:30)
  2.00%   2.00%   179.2s    179.2s   __eq__ (hypothesis/internal/lazyformat.py:31)
  0.00%   1.00%   153.1s    409.0s   __setitem__ (weakref.py:407)
  2.00%   7.00%   152.8s    406.1s   __getitem__ (weakref.py:394)
  8.00%   8.00%   128.5s    128.5s   recv_into (redis/_compat.py:74)
  3.00%   3.00%   122.1s    122.1s   __eq__ (hypothesis/internal/lazyformat.py:29)
  0.00%   0.00%   52.33s    55.98s   forced_value (hypothesis/strategies/_internal/strategies.py:92)
  1.00%   4.00%   29.87s    56.87s   append (hypothesis/internal/conjecture/junkdrawer.py:117)
  0.00%   0.00%   28.97s    31.83s   forced_value (hypothesis/strategies/_internal/strategies.py:94)
  3.00%   5.00%   28.76s    104.9s   sendall (redis/_compat.py:8)
  0.00%   0.00%   26.24s    26.24s   append (hypothesis/internal/conjecture/junkdrawer.py:116)
  2.00%   2.00%   24.61s    24.68s   parse (ast.py:35)
  0.00%   0.00%   22.22s    36.83s   append (hypothesis/internal/conjecture/junkdrawer.py:115)
  1.00%   4.00%   19.92s    82.21s   start (hypothesis/internal/conjecture/data.py:144)
  0.00%   0.00%   18.60s    18.60s   __len__ (hypothesis/internal/conjecture/junkdrawer.py:87)
  0.00%   0.00%   17.37s    17.37s   stop_example (hypothesis/internal/conjecture/data.py:915)

@Zac-HD
Copy link
Member

Zac-HD commented Feb 4, 2020

Sounds like this may be the same problem as #2308? If you could check whether it happens on those earlier versions that would be really helpful!

@Zac-HD Zac-HD added the performance go faster! use less memory! label Feb 4, 2020
@bmerry
Copy link
Author

bmerry commented Feb 4, 2020

I've run it 150 times on 4.48.1 and not run into this problem at all (it's occasionally 3-4x slower than the median, but some examples are much slower to test than others so it's not unexpected), and it's also 2-3x faster than 5.4.1 (which may of course just be because it tests less e.g. I saw some later version made stateful testing more likely to use the maximum number of steps).

So it may well be related to #2308. Let me know if it will help to bisect with any other versions or commits.

@Zac-HD
Copy link
Member

Zac-HD commented Apr 15, 2020

@bmerry - can you test again with Hypothesis >= 5.8.5? We've just merged fixes for a (rarely) infinite and a near-infinite loop might you might have been triggering.

@bmerry
Copy link
Author

bmerry commented Apr 15, 2020

Thanks! Will give it a go. Since it's an intermittent problem it'll take me a while to confirm.

@bmerry
Copy link
Author

bmerry commented Apr 16, 2020

@Zac-HD Unfortunately that doesn't seem to fix it. Just tried with Hypothesis 5.9.1 and was still able to reproduce the issue after a number of tries. py-spy output is similar to before:

Total Samples 8000
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                              
 10.00%  21.00%    7.27s    18.75s   start (hypothesis/internal/conjecture/data.py:144)
  3.00%   3.00%    4.74s     4.74s   __getitem__ (hypothesis/internal/conjecture/junkdrawer.py:90)
  4.00%  25.00%    4.45s    19.73s   try_shrinking_blocks (hypothesis/internal/conjecture/shrinker.py:869)
  3.00%   3.00%    4.39s     4.39s   lazy_calculate (hypothesis/internal/conjecture/data.py:271)
  8.00%  13.00%    4.24s    10.14s   end (hypothesis/internal/conjecture/data.py:151)
  2.00%   4.00%    4.08s     6.57s   __getitem__ (hypothesis/internal/conjecture/data.py:461)
  7.00%  27.00%    3.76s    22.08s   try_shrinking_blocks (hypothesis/internal/conjecture/shrinker.py:870)
  2.00%   6.00%    3.09s     6.47s   starts (hypothesis/internal/conjecture/data.py:366)
  5.00%   9.00%    2.82s     4.24s   __getitem__ (hypothesis/internal/conjecture/data.py:456)
  3.00%   3.00%    2.54s     2.54s   __getitem__ (hypothesis/internal/conjecture/junkdrawer.py:92)
  5.00%   5.00%    2.39s     2.39s   __getitem__ (hypothesis/internal/conjecture/data.py:455)
  1.00%  14.00%    2.24s    12.39s   try_shrinking_blocks (hypothesis/internal/conjecture/shrinker.py:872)
  3.00%   3.00%    1.90s     1.90s   __getitem__ (hypothesis/internal/conjecture/data.py:457)
  3.00%   3.00%    1.58s     3.35s   ends (hypothesis/internal/conjecture/data.py:370)
  1.00%   1.00%    1.48s     1.48s   __init__ (hypothesis/internal/conjecture/data.py:107)
  3.00%   3.00%    1.31s     1.31s   __len__ (hypothesis/internal/conjecture/data.py:452)

Is there any way I can help narrow this down? e.g. is there some way I can get it to print out the seed before each execution and then set that seed again so that it can be reproduced reliably?

@Zac-HD
Copy link
Member

Zac-HD commented Apr 17, 2020

https://hypothesis.readthedocs.io/en/latest/reproducing.html#reproducing-a-test-run-with-seed and just increment through 0, 1, 2, 3...? Unfortunately our tools tend to assume that things crash rather than hang on failure so it's not as elegant we I'd like 😅

@bmerry
Copy link
Author

bmerry commented Apr 17, 2020

Thanks, that seems to have done the trick. So here are steps to reproduce the issue.

  1. Run a redis server on localhost. For Ubuntu that's just apt install redis-server. NB: the test will wipe out the data in the redis server, so if you already use a redis server which holds data, beware.
  2. Create and activate a fresh Python 3 virtualenv (I'm on Python 3.6).
  3. Run the following to install the necessary bits:
pip install pytest hypothesis==5.9.1
git clone https://github.com/jamesls/fakeredis
cd fakeredis
git checkout dd10a73
pip install .
  1. To reproduce the hang, run pytest --hypothesis-seed=18 test_fakeredis_hypothesis.py.

Just in case the random generation is somehow affected by dependencies, here's my output from pip freeze (but note that it isn't actually a release version of fakeredis, it's the git commit above).

attrs==19.3.0
fakeredis==1.1.1
hypothesis==5.9.1
importlib-metadata==1.6.0
more-itertools==8.2.0
packaging==20.3
pkg-resources==0.0.0
pluggy==0.13.1
py==1.8.1
pyparsing==2.4.7
pytest==5.4.1
redis==3.4.1
six==1.14.0
sortedcontainers==2.1.0
wcwidth==0.1.9
zipp==3.1.0

@DRMacIver
Copy link
Member

DRMacIver commented Apr 17, 2020

I'm just looking into this now and I think there are two problems happening here.

  1. The test is actually failing, and shrinking the failure for it is taking a very long time (something of a known issue with stateful testing unfortunately, we could definitely do better here in terms of either showing what's going on or ideally shrinking better).
  2. There does seem to be a weird hang that's sometimes being triggered during shrinking, but that hang is actually in your test rather than Hypothesis.

The way to see that (2) is happening is to put prints in your initialisation around the line self.real.ping() - it gets up to that point and then hangs in trying to ping redis for some reason. I'm not quite sure why this would be. My guess it's something about how your tests interact with redis when they've been run many times in the same process (hence this being triggered during shrinking rather than normal generation) but I don't know what that would be.

@bmerry
Copy link
Author

bmerry commented Apr 17, 2020

@DRMacIver thanks for investigating.

(1) might explain why I haven't been able to reproduce the issue with the latest version of fakeredis: I've presumably fixed whatever failure case hypothesis is finding, so it doesn't need to shrink.

I've added the print statements you suggested for (2) and am leaving it to run to see if I hit it - but I doubt I hit this previously, because py-spy was still showing hypothesis functions consuming CPU time. One possibility is that some example got redis into a state where it either hung or was doing something slow/CPU-heavy; redis is single-threaded so CPU-intensive operations block all clients.

@DRMacIver
Copy link
Member

because py-spy was still showing hypothesis functions consuming CPU time

Were they still actively consuming CPU time or was the main thread blocked waiting for redis so nothing was consuming CPU time, leaving the percentages unchanged? (I don't actually know that much about how py-spy works so this is just a guess)

@DRMacIver
Copy link
Member

Tentatively, adding socket timeout to your redis connection seems to make this problem go away. i.e. self.real = redis.StrictRedis('localhost', port=6379, socket_timeout=1). Still takes ages to shrink, but no longer appears to hang while doing so.

@bmerry
Copy link
Author

bmerry commented Apr 17, 2020

The columns showing seconds spent (OwnTime and TotalTime) were rising, so I don't think it was just blocked.

Interestingly, seed 18 has finished for me after 25 minutes, and without hanging in redis. 18 is just the first seed that ran for more than a few minutes, so I'm guessing there are probably worse seeds.

Apart from the redis hang, it sounds like the problem is now understood, and the question is how to solve it. Obviously a magically much faster shrinker would be nice, but smaller workarounds that would make hypothesis more valuable for me would be:
(a) printing the seed at the start of the test when using pytest (if the pytest plugin interface allows for that - presumably it needs to happen when the plugin is loaded rather than when tests are running, to avoid messing up the output). That way if a test times out I can at least reproduce it.
(b) a setting to limit the amount of effort spent on shrinking. A poorly-shrunk example is still a lot more useful to me than a test that gets killed off by a CI framework.

@DRMacIver
Copy link
Member

(b) a setting to limit the amount of effort spent on shrinking. A poorly-shrunk example is still a lot more useful to me than a test that gets killed off by a CI framework.

Yeah, I was thinking of adding a time limit to shrinking. This won't help if you're hitting an actual hang though. We used to have this as a more general feature which we dropped, but it's only problematic during generation so I think it would be reasonable to bring it back for shrinking.

@Zac-HD
Copy link
Member

Zac-HD commented Apr 17, 2020

bring [wall-clock timeouts] back for shrinking

Would re-running your tests then resume shrinking, and potentially take another $timeout seconds to complete again? I think this is probably the best available option, but ouch... probably helpful to print a short explanation of what happened when we time out of shrinking, and maybe a link to the phases setting (and request for a report?).

As a shrink-phase-timeout duration, I nominate MAX_SHRINKS:=500 * (settings.deadline or 1 second). 100 secs on default settings, timeout=None is around eight minutes (a little less than e.g. the ten minutes Travis gives you before killing the job), and it adapts up and down with expected test duration. Doesn't require yet another setting, since I think it's unlikely that people really care about the exact value beyond 'there's a timeout of a few minutes so things don't hang'.

@bmerry
Copy link
Author

bmerry commented Apr 17, 2020

Would re-running your tests then resume shrinking, and potentially take to complete again?

Sorry, I'm not able to parse that.

@bmerry
Copy link
Author

bmerry commented Apr 17, 2020

it adapts up and down with expected test duration

If my test function is super fast, I'd still like to spend a reasonable amount of time on shrinking. But I'm guessing users with fast test functions won't have bothered setting the deadline (I wasn't even aware of the setting until now), so that should work out quite nicely.

unlikely that people really care about the exact value beyond 'there's a timeout of a few minutes so things don't hang'.

I would certainly care if it ended up being too high and led to Travis killing the job. Maybe provide a setting but default it to the formula you proposed?

@Zac-HD
Copy link
Member

Zac-HD commented Apr 17, 2020

If my test function is super fast, I'd still like to spend a reasonable amount of time on shrinking. But I'm guessing users with fast test functions won't have bothered setting the deadline (I wasn't even aware of the setting until now), so that should work out quite nicely.

That's the idea! 100 seconds should be enough for normal use-cases, and we can take as long as 500 for users who have disabled deadlines (who know their test is slow and don't seem to mind waiting).

(gah, markdown issues with <>) The issue with replay is that if we find a failing example in the replay stage, we attempt to shrink it again. This is usually fast, but could be slow if we stopped attempting to shrink early because we were making very slow progress!

@DRMacIver
Copy link
Member

Would re-running your tests then resume shrinking, and potentially take another $timeout seconds to complete again? I think this is probably the best available option, but ouch... probably helpful to print a short explanation of what happened when we time out of shrinking, and maybe a link to the phases setting (and request for a report?).

Yeah, I was thinking this would cause Hypothesis to emit a warning and request a bug report.

@Zac-HD Zac-HD changed the title Occasionally very slow Stop shrink phase after timeout when progress is very slow Apr 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance go faster! use less memory!
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants