Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timings in statistics misleading wrt time spent generating data #3598

Closed
andreareina opened this issue Mar 17, 2023 · 2 comments · Fixed by #3605
Closed

timings in statistics misleading wrt time spent generating data #3598

andreareina opened this issue Mar 17, 2023 · 2 comments · Fixed by #3605
Labels
legibility make errors helpful and Hypothesis grokable

Comments

@andreareina
Copy link

Summary: simple mean for calculating drawtime_percent is not the natural interpretation of the statistic, and is actively misleading in the worst case.

I was investigating some of our slow tests and saw this implied data generation was taking on the order of 5 minutes for the whole test:

  - during generate phase (579.03 seconds):
    - Typical runtimes: 6-18592 ms, ~ 50% in data generation
    - 100 passing examples, 0 failing examples, 83 invalid examples
    - Events:
      * 7.65%, Retried draw from <hypothesis.strategies._internal.core.CompositeStrategy object at 0x7f3603c2b2b0>.filter(re.compile('s[0-9]{2}').fullmatch).filter(not_yet_in_unique_list) to satisfy filter

  - Stopped because settings.max_examples=100

Replacing the test function with return yields something much more sensible:

  - during generate phase (1.32 seconds):
    - Typical runtimes: 1-8 ms, ~ 97% in data generation
    - 90 passing examples, 0 failing examples, 84 invalid examples
    - Events:
      * 7.47%, Retried draw from <hypothesis.strategies._internal.core.CompositeStrategy object at 0x7f165d90d3a0>.filter(re.compile('s[0-9]{2}').fullmatch).filter(not_yet_in_unique_list) to satisfy filter

  - Stopped because settings.max_examples=100

The calculation in statistics.py::L76-78 is an unweighted mean:

        drawtime_percent = 100 * statistics.mean(
            t["drawtime"] / t["runtime"] if t["runtime"] > 0 else 0 for t in cases
        )

This isn't actually very informative when the time spent generating the input is mostly constant and the time spent in the rest of the test isn't. At worst it actually implies that I need to be optimizing the way I'm generating data when in fact I don't. Some alternatives would be:

  • percentage of overall time spent in generation, e.g. 100 * sum(t["drawtime"] for t in cases) / sum(t["runtime"] for t in cases)[1]
  • absolute time spent in generation, e.g. "1-8 ms in data generation"

[1] handling of the divide by zero elided :)

platform linux -- Python 3.8.11, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/andrea/omnistream/cpsat-pogger, configfile: pyproject.toml
plugins: anyio-3.6.1, hypothesis-6.70.0, cov-2.12.1, flaky-3.7.0, split-0.8.0
@andreareina andreareina changed the title timings in statistics not misleading wrt time spent generating data timings in statistics misleading wrt time spent generating data Mar 17, 2023
@Zac-HD Zac-HD added the legibility make errors helpful and Hypothesis grokable label Mar 27, 2023
@Zac-HD
Copy link
Member

Zac-HD commented Mar 27, 2023

Thanks for raising this! I think a message like Typical runtimes: 6-18592 ms, of which x-y ms in data generation would be a big improvement over the status quo. Would you like to open a PR?

The code in question is here, and CONTRIBUTING.rst has instructions - in this case I'd open a PR, let CI point out which tests need changing, and then fix them 🙂

@andreareina
Copy link
Author

Will do, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
legibility make errors helpful and Hypothesis grokable
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants