Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark statistics #493

Merged
merged 13 commits into from
Feb 18, 2017
Merged

Benchmark statistics #493

merged 13 commits into from
Feb 18, 2017

Conversation

pv
Copy link
Collaborator

@pv pv commented Dec 9, 2016

Record more benchmark samples, and compute, display, and use statistics based on them.

  • Change default settings to record more (but shorter) samples. Changes meaning of goal_time Makes determination of goal_time more accurate; however, no big changes to methodology --- there's room to improve here.
  • Do warmup before benchmark --- this seems to matter also on CPython (although possibly not due to CPython itself but some OS/CPU effects)
  • Display spread of measurements (median +/- half of interquartile range)
  • Estimate confidence interval and use that to decide what is not significant in asv compare.
  • Optionally, save samples to the json files.
  • Switch to gzipped files.

The statistical confidence estimates are a somewhat tricky point, because timing samples usually have strong autocorrelation (multimodality, stepwise changes in location, etc.), which makes simple stuff often misleading. There's some alleviation for this currently there in that it tries to regress the timing sample time series looking for steps, and adds those to the CI. Not rigorous, but probably better than nothing.

The problem is that there's low-frequency noise in the measurement, so measuring for a couple of second does not give a good idea of the full distribution.

todo:

  • more tests, e.g. printing and formatting is mostly untested
  • do we really need to gzip?
  • update documentation
  • tuning for pypy

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 72.988% when pulling f953513 on pv:timing-estimator6 into cfe3d3e on spacetelescope:master.

@pv pv force-pushed the timing-estimator6 branch 2 times, most recently from 6500cb3 to bd4ddca Compare December 9, 2016 23:37
@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 72.971% when pulling bd4ddca on pv:timing-estimator6 into cfe3d3e on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 72.971% when pulling bd4ddca on pv:timing-estimator6 into cfe3d3e on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.6%) to 72.941% when pulling 364e0ed on pv:timing-estimator6 into cfe3d3e on spacetelescope:master.

@wrwrwr
Copy link
Contributor

wrwrwr commented Dec 12, 2016

Changing goal_time to mark a single repeat time is nice, but do I read it right that setting repeat = 5 will take 5 samples if number is fixed and 50 samples if it is left at zero? Why not make the number of samples equal to repeat in the adaptive case too (with a single setup per sample and a higher default)?

@pv
Copy link
Collaborator Author

pv commented Dec 12, 2016 via email

@wrwrwr
Copy link
Contributor

wrwrwr commented Dec 12, 2016

Interleaving benchmarks would surely be good, comparability across separate import benchmarks was something I was aiming for. The whole samples/repeats logic would probably need to be moved up to realize that though (with a "sample" rather than a "benchmark" script?).

@pv pv force-pushed the timing-estimator6 branch 2 times, most recently from 76012d6 to 8c41b73 Compare December 16, 2016 22:38
@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 73.009% when pulling 8c41b73 on pv:timing-estimator6 into 390145a on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.7%) to 72.991% when pulling 8c8fbe0 on pv:timing-estimator6 into 390145a on spacetelescope:master.

@pv pv changed the title WIP: Benchmark statistics Benchmark statistics Dec 17, 2016
@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 73.191% when pulling 2c5b78e on pv:timing-estimator6 into 390145a on spacetelescope:master.

@pv
Copy link
Collaborator Author

pv commented Dec 17, 2016

I think this is more or less finished now. Some fine-tuning may be useful to do later on.

The changes in benchmarking parameter default values may be annoying, but these should give better accuracy. The default runtime also becomes shorter. Big improvements on the accuracy probably will need stuff such as splitting the benchmark runs into several parts, to be run in an interleaved order to ensure long sampling time spans to capture the low-frequency noise properly. https://github.com/pv/asv/tree/many-proc

@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 73.203% when pulling 26a42ba on pv:timing-estimator6 into 390145a on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 73.187% when pulling fc2045d on pv:timing-estimator6 into 390145a on spacetelescope:master.

… methodology

Makes `goal_time` to be more accurate, and changes the default values
for `repeat` and `goal_time`.

Adds `warmup_time`.
Use a more uniform API format for parameterized and non-parameterized
benchmark data (non-parameterized ~ parameterized with 1 combination).

Move the API access to methods, to decouple the API from the file
format. Also, the accessors now do the work of compatible_results,
so that it doesn't have to be done explicitly.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 73.233% when pulling 347bb53 on pv:timing-estimator6 into e581104 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 73.253% when pulling 50ddeb8 on pv:timing-estimator6 into e581104 on spacetelescope:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.9%) to 73.268% when pulling 10fdd49 on pv:timing-estimator6 into e581104 on spacetelescope:master.

@pv pv merged commit f3d7935 into airspeed-velocity:master Feb 18, 2017
@pv
Copy link
Collaborator Author

pv commented Feb 18, 2017

After some dogfooding, I think this is ready to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants