use legitimate non-iid hypothesis testing #74

jrevels · 2017-09-05T15:21:07Z

It's unlikely I'll get around to doing this in the foreseeable future, but I'm tired of digging through issues to find this comment when I want to link it in other discussions. Recreated from my comment here:

Robust hypothesis testing is quite tricky to do correctly in the realm of non-i.i.d. statistics, which is the world benchmark timings generally live in. If you do the "usual calculations", you'll end up getting junk results a lot of the time.

A while ago, I developed a working prototype of a subsampling method for calculating p-values (which could be modified to compute confidence intervals), but it relies on getting the correct normalization coefficient for the test statistic + timing distribution (unique to each benchmark). IIRC, it worked decently on my test benchmark data, but only if I manually tuned the normalization coefficient for any given benchmark. There are methods out there for automatically estimating this coefficient, but I never got around to implementing them. For a reference, see Politis and Romano's book "Subsampling" (specifically section 8: "Subsampling with Unknown Convergence Rate").

ararslan self-assigned this Dec 23, 2017

jrevels mentioned this issue Oct 11, 2020

Why do execution times get sorted? #179

Closed

gdalle added the enhancement label Jun 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use legitimate non-iid hypothesis testing #74

use legitimate non-iid hypothesis testing #74

jrevels commented Sep 5, 2017

use legitimate non-iid hypothesis testing #74

use legitimate non-iid hypothesis testing #74

Comments

jrevels commented Sep 5, 2017