You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: simple mean for calculating drawtime_percent is not the natural interpretation of the statistic, and is actively misleading in the worst case.
I was investigating some of our slow tests and saw this implied data generation was taking on the order of 5 minutes for the whole test:
- during generate phase (579.03 seconds):
- Typical runtimes: 6-18592 ms, ~ 50% in data generation
- 100 passing examples, 0 failing examples, 83 invalid examples
- Events:
* 7.65%, Retried draw from <hypothesis.strategies._internal.core.CompositeStrategy object at 0x7f3603c2b2b0>.filter(re.compile('s[0-9]{2}').fullmatch).filter(not_yet_in_unique_list) to satisfy filter
- Stopped because settings.max_examples=100
Replacing the test function with return yields something much more sensible:
- during generate phase (1.32 seconds):
- Typical runtimes: 1-8 ms, ~ 97% in data generation
- 90 passing examples, 0 failing examples, 84 invalid examples
- Events:
* 7.47%, Retried draw from <hypothesis.strategies._internal.core.CompositeStrategy object at 0x7f165d90d3a0>.filter(re.compile('s[0-9]{2}').fullmatch).filter(not_yet_in_unique_list) to satisfy filter
- Stopped because settings.max_examples=100
The calculation in statistics.py::L76-78 is an unweighted mean:
drawtime_percent = 100 * statistics.mean(
t["drawtime"] / t["runtime"] if t["runtime"] > 0 else 0 for t in cases
)
This isn't actually very informative when the time spent generating the input is mostly constant and the time spent in the rest of the test isn't. At worst it actually implies that I need to be optimizing the way I'm generating data when in fact I don't. Some alternatives would be:
percentage of overall time spent in generation, e.g. 100 * sum(t["drawtime"] for t in cases) / sum(t["runtime"] for t in cases)[1]
absolute time spent in generation, e.g. "1-8 ms in data generation"
The text was updated successfully, but these errors were encountered:
andreareina
changed the title
timings in statistics not misleading wrt time spent generating data
timings in statistics misleading wrt time spent generating data
Mar 17, 2023
Thanks for raising this! I think a message like Typical runtimes: 6-18592 ms, of which x-y ms in data generation would be a big improvement over the status quo. Would you like to open a PR?
Summary: simple mean for calculating
drawtime_percentis not the natural interpretation of the statistic, and is actively misleading in the worst case.I was investigating some of our slow tests and saw this implied data generation was taking on the order of 5 minutes for the whole test:
Replacing the test function with
returnyields something much more sensible:The calculation in
statistics.py::L76-78is an unweighted mean:This isn't actually very informative when the time spent generating the input is mostly constant and the time spent in the rest of the test isn't. At worst it actually implies that I need to be optimizing the way I'm generating data when in fact I don't. Some alternatives would be:
100 * sum(t["drawtime"] for t in cases) / sum(t["runtime"] for t in cases)[1][1] handling of the divide by zero elided :)
The text was updated successfully, but these errors were encountered: