-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark statistics #493
Benchmark statistics #493
Conversation
6500cb3
to
bd4ddca
Compare
bd4ddca
to
364e0ed
Compare
Changing |
Yes, I've got something like that in the works. This is partly
orthogonal to this PR, however.
.
The benchmark methodology may still need tuning. One issue faced here
not faced by perf is that the benchmarks that run later, run under
different CPU/systemload/heating conditions than the benchmarks that run
earlier. So the sampling probably should use multiple processes, and be
done in A B C ... A' B' C' ... A'' B'' C'' ... order.
|
Interleaving benchmarks would surely be good, comparability across separate import benchmarks was something I was aiming for. The whole samples/repeats logic would probably need to be moved up to realize that though (with a "sample" rather than a "benchmark" script?). |
76012d6
to
8c41b73
Compare
8c41b73
to
8c8fbe0
Compare
8c8fbe0
to
2c5b78e
Compare
2c5b78e
to
26a42ba
Compare
I think this is more or less finished now. Some fine-tuning may be useful to do later on. The changes in benchmarking parameter default values may be annoying, but these should give better accuracy. The default runtime also becomes shorter. Big improvements on the accuracy probably will need stuff such as splitting the benchmark runs into several parts, to be run in an interleaved order to ensure long sampling time spans to capture the low-frequency noise properly. https://github.com/pv/asv/tree/many-proc |
… methodology Makes `goal_time` to be more accurate, and changes the default values for `repeat` and `goal_time`. Adds `warmup_time`.
Use a more uniform API format for parameterized and non-parameterized benchmark data (non-parameterized ~ parameterized with 1 combination). Move the API access to methods, to decouple the API from the file format. Also, the accessors now do the work of compatible_results, so that it doesn't have to be done explicitly.
The number of repeats may be smaller if too_slow() condition was encountered.
fc2045d
to
347bb53
Compare
After some dogfooding, I think this is ready to go. |
Record more benchmark samples, and compute, display, and use statistics based on them.
Changes meaning ofMakes determination ofgoal_time
goal_time
more accurate; however, no big changes to methodology --- there's room to improve here.asv compare
.Switch to gzipped files.The statistical confidence estimates are a somewhat tricky point, because timing samples usually have strong autocorrelation (multimodality, stepwise changes in location, etc.), which makes simple stuff often misleading. There's some alleviation for this currently there in that it tries to regress the timing sample time series looking for steps, and adds those to the CI. Not rigorous, but probably better than nothing.
The problem is that there's low-frequency noise in the measurement, so measuring for a couple of second does not give a good idea of the full distribution.
todo: