When attempting to collect benchmarking data, it occasionally happens that system noise will make a few of the iterations take much longer than other iterations. If one iteration out of a thousand does this and is ten times slower than the rest, this inflates the average by 10% and can cause havoc with the standard deviation. Even if only one iteration out of a hundred does this, then the average and standard deviation are useless even though the data is meaningful if you exclude these outliers.
I would like the ability to instruct Criterion to omit such outliers from it's calculations. Even something as simple as removing the best and worst 10% of samples would often be sufficient.
Of course this could be abused (e.g., the standard deviation doesn't mean quite so much if you remove the best and worst 49% of samples), but for benchmarking things like CPU times of code that does no IO or external calculation it can be very useful as benchmark numbers often need to reflect the performance of the code being benchmarked instead of whatever system noise happened to randomly kick in.
Possible extensions of this idea include reporting the median and/or mode. Another possibility is to try to fit the sampling data to some sort of distribution is flexible enough to account for system noise (e.g. a Poisson, bimodal(*) or mixture based distribution) and then reporting the parameters of that distribution (e.g., the location of the peak (or peaks) of the Poisson or or bimodal distribution rather than just the mean).
(*) The second peak represents when the system noise kicks up.
+1 I would have thought that criterion already does that, seems a fairly standard technique.
Rejecting outliers from a sample is not statistically sound when the distribution being sampled is not understood. There should be enough of the machinery exposed that if you want to remove outliers yourself, it should be easy to do so - but this is not something I plan to add to criterion itself.