precondition with lambda function slows down RuleBasedStateMachine considerably #3963

pspacek · 2024-05-02T11:18:37Z

Hello once again,

and big thank you for Hypothesis! It's a great tool and helped me to uncover nasty and hard-to-find bugs.

Summary

@precondition decorator with a trivial lambda function (one integer comparison + object dereference) causes significant slowdown in RuleBasedStateMachine execution.

Versions

hypothesis 6.100.2
pytest 8.1.2
Python 3.12.3 x86_64 on Arch Linux (package version 3.12.3-1)

Measurement on a realistic test case

I'm going to show measurement numbers for my complex (but realistic) state machine, which uses complex custom strategy to generate inputs. With all the complexity I would not expect precondition to take negligible toll on execution time, but that's not the case.

pytest reported runtime:

A] including five instances of preconditions len(self.model) > 0 and self.iter_generation == self.testcase.generation:

72.39s (0:01:12)

B] Without preconditions (just commented out, they were meant as optimization to steer the state machine into more interesting states):

6.18 s

syscall profiling

perf trace --summary syscall profiling:

$ sudo perf trace --summary --pid `pgrep pytest`

 Summary of events:

 pytest (64494), 3217578 events, 100.0%

   syscall            calls  errors  total       min       avg       max       stddev
                                     (msec)    (msec)    (msec)    (msec)        (%)
   --------------- --------  ------ -------- --------- --------- ---------     ------
   openat           2842952 2828160 7342084.708     0.000     2.583    73.245      0.03%
   newfstatat         14791      0  2173.974     0.033     0.147    59.981      3.38%
   write             105496      0   170.597     0.001     0.002     0.034      0.27%
   read               14767      0    36.604     0.001     0.002     0.713      1.96%
   close              14791      0    28.622     0.001     0.002     0.021      0.23%
   fstat              14798      0    25.282     0.001     0.002     0.022      0.33%
   ioctl              14794  14791    17.895     0.001     0.001     0.018      0.34%
   lseek              14711      0    15.242     0.001     0.001     0.023      0.49%

In other words, roughly 10 % of runtime was spent in doing openat syscall. Huh?

Probable cause

Here's what strace shows during test execution:

openat(AT_FDCWD, "<unknown>", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "<unknown>", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "<unknown>", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "<unknown>", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "<unknown>", O_RDONLY) = -1 ENOENT (No such file or directory)
... and it goes on and on ...

A bit of light debugging in GDB+python-gdb plugin indicates that these calls originate here:

Traceback (most recent call first):
  File "/usr/lib/python3.12/ast.py", line 52, in parse
    return compile(source, filename, mode, flags,
  File "/usr/lib/python3.12/site-packages/hypothesis/internal/reflection.py", line ?, in extract_lambda_source
    (failed to get frame line number)
  File "/usr/lib/python3.12/site-packages/hypothesis/internal/reflection.py", line 449, in get_pretty_function_description
    return extract_lambda_source(f)
  File "/usr/lib/python3.12/site-packages/hypothesis/stateful.py", line 1009, in is_valid
    where = f"{desc} precondition {get_pretty_function_description(pred)}"
  File "/usr/lib/python3.12/site-packages/hypothesis/stateful.py", line 977, in <genexpr>
    if not any(self.is_valid(rule) for rule in self.rules):

Confirmation

Just to confirm we are on the right track this cheapskate patch gets runtime for variants with and without precondition to the same values:

--- a/stateful.py	2024-05-02 13:15:06.751722997 +0200
+++ b/stateful.py	2024-05-02 13:15:51.041613884 +0200
@@ -1006,8 +1006,8 @@
         desc = f"{self.machine.__class__.__qualname__}, rule {rule.function.__name__},"
         for pred in rule.preconditions:
             meets_precond = pred(self.machine)
-            where = f"{desc} precondition {get_pretty_function_description(pred)}"
-            predicates[where]["satisfied" if meets_precond else "unsatisfied"] += 1
+            #where = f"{desc} precondition {get_pretty_function_description(pred)}"
+            #predicates[where]["satisfied" if meets_precond else "unsatisfied"] += 1
             if not meets_precond:
                 return False

Reproducer

Here's smaller test case which does not actually test anything:
test_reproducer.py.gz

With precondition commented out its runtime is about ~ 1.6 seconds on my machine.
With precondition active runtime jumps to ridiculous ~ 35 seconds.

The text was updated successfully, but these errors were encountered:

pspacek · 2024-05-02T14:23:37Z

Another related issues seems to be that ast.parse() for some reason is consuming a lot of memory and keeps the memory around while the test is running. Here's Scalene profile produced on the minimal test case above:
profile.json

40 MB does not look like much, but on my realistic test case the memory usage climbs to 1 GB in the first minute and it keeps going up until terminated by max_examples limit.

tybug · 2024-05-03T03:27:38Z

Wow, thanks for the detailed report! #3966 will address the performance and likely subsequently the memory...though we may still have an underlying leak somewhere.

pspacek · 2024-05-03T14:23:31Z

Tanks for a quick turnaround!

tybug · 2024-05-05T04:14:01Z

This should be fixed in hypothesis >= 6.100.3, though let us know if performance or memory is still bad!

pspacek · 2024-05-06T08:58:15Z

I confirm that CPU-time performance is okay on hypothesis-6.100.4. Memory consumption is still suspicious but I will open a separate issue about it, assuming I'm able to write down sensible bug report.

tybug mentioned this issue May 3, 2024

cache extract_lambda_source #3966

Merged

tybug added bug something is clearly wrong here performance go faster! use less memory! and removed bug something is clearly wrong here labels May 3, 2024

tybug closed this as completed May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

precondition with lambda function slows down RuleBasedStateMachine considerably #3963

precondition with lambda function slows down RuleBasedStateMachine considerably #3963

pspacek commented May 2, 2024

pspacek commented May 2, 2024

tybug commented May 3, 2024

pspacek commented May 3, 2024

tybug commented May 5, 2024

pspacek commented May 6, 2024

precondition with lambda function slows down RuleBasedStateMachine considerably #3963

precondition with lambda function slows down RuleBasedStateMachine considerably #3963

Comments

pspacek commented May 2, 2024

Summary

Versions

Measurement on a realistic test case

pytest reported runtime:

syscall profiling

Probable cause

Confirmation

Reproducer

pspacek commented May 2, 2024

tybug commented May 3, 2024

pspacek commented May 3, 2024

tybug commented May 5, 2024

pspacek commented May 6, 2024