-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve linear regression method #324
Comments
Maybe we could use quantile regression. |
You are absolutely correct. I would not expect the time of repeated runs of a benchmark to follow a true normal distribution. For one thing no matter how many times I time it I suspect I will never get a negative runtime! For another, as you point out, I would expect a fairly heavy right tail from every time the benchmark got interrupted by the OS and other sutch outlier inducing bias. There is an additional assumption for linear regression that I think does not hold for our use case. It is called "Homoscedasticity" it means that the uncertainty in the measurement is not related to N. In our case I suspect that the timing we get from running the benchmark once is much noisier than the timing we get running the benchmark 100 times.
For now let's ignore the point from the first paragraph, and assume that the benchmark time is normally distributed.
So when criterion runs it's N=1 case we expect it to take: For the N=2 case we expect it to take: In general we have: Your concern is moderated by the code that throws out the extreme outliers. My concern is moderated by using the bootstrap instead of the estimated variance. Therefore I suspect that correcting for these limitations will not have a significant impact on criterion's results, but the only way to be sure is to model it correctly and see if it is different! I would start by seeing if there is a way to express what we need using a "Generalized linear model". If we can't find the right way to express our needs in that framework, then this should all be pretty straight forward using bayesian methods. |
@Eh2406 As I know criterion don't throw any outliers. It will only warn you. I have a forked branch of criterion which deletes outliers. (It can only work with I have also developed another version of criterion to test quantile regression. It performs surprisingly well. |
Thank you, I did not realize it included the outliers! That graph looks promising! How hard would it be to show both regression lines? I would find that a useful way to see if the regressions are practically different. |
FWIW This paper has some ideas about how to deal with non-normally distributed data: https://htor.inf.ethz.ch/publications/img/hoefler-scientific-benchmarking.pdf Quantile regression would be great to have included with criterion. |
Hey, thanks for the suggestion. I've been super busy IRL lately so it might be a while before I can get to this. I'll definitely have to dig into quantile regression, I haven't heard of it before. |
@gz Thanks for sharing the paper! It inspired me a lot! @bheisler I have a modified version of criterion.rs which used quantile_regression (I have also developed a crate langbardo to solve quantile regression by solving linear constraints). If you can design a customizable Analysis/Report process, I can integrate it with criterion right now. The biggest problem is that quantile regression calculates much slower than LSM (if with the simplest algorithm). Too big resample numbers is not acceptable. |
I think that's a bit overstating it: the OLS regression is an unbiased, consistent estimator of the mean with asymptotically good variance, regardless of the distributions involved. I do agree, however, that it could be improved.
The variance of OLS regression of triangularly sampled benchmarks (time running once, then time running twice, then time running thrice, etc. to total a triangular number of runs) has 3 times the variance of doing the same number of runs in one batch and dividing that time by the number of runs. (This is in the limit; many practical sizes end up closer to a factor of 2 overhead.) That's certainly not good, and we can do better, but at least the noise overhead doesn't grow unboundedly with the number of samples. Criterion's model is that there are two unknown probability distributions, ProofA linear predictor is specified as a set of coefficients (
We assume that the predictor acts correctly on collinear data, that is, for all
However,
so
Since all samples are taken independently, the variance of a linear predictor for a fixed number of data points can be broken down with no covariance terms to a linear combination of the variance of
DerivationThe variance of an arbitrary linear predictor goes as
We also have the constraint that for all
or split into coefficients of
To minimize variance, we find a stationary point of
From the first (parameterized) equation, we can write each
Plugging that into the last constraint,
We will label the number of data points as
With the optimal solution in hand, we can fairly compare methods. In all of the following,
Variance Calculations
Note that none of this analysis made any assumptions about normality or any other property of the distributions. It also only considers linear predictors that simply take a linear combination of the data points; I wouldn't be surprised if a nonlinear predictor could exploit the nonnegative support assumption to get better results, though I wouldn't expect massive improvement. To be clear, none of this invalidates the interest in quantile regression, which is qualitatively different. I only considered estimation of the mean. TL;DR: OLS regression isn't bad at finding the mean, and it could be improved up to a factor of 1.73 (and no more) in standard deviation while keeping minimal assumptions. The "additional statistics" that come from treating data points as samples (dividing by iteration count) are biased and converge particularly slowly. Measuring medians and quantiles with quantile regression could be more practically useful in most caes, but means are useful for e.g. throughput. |
Has somone verified the proof?
Is there another resource who describes the Model of Criterion? |
I did some searching of robust regression methods, and one in particular seems to fit Criterion's regression needs: M-estimation. It is much more vulnerable to outliers in the explanatory variable and thus has fallen out of favor as a general robust regression method, but such outliers are extremely unlikely under criterion's model of timing a predetermined number of iterations at a time. |
@gereeter Thanks for your proof. I realized that even with OLS, the unbiasness of slope can be also derived from the i.d.d., which means if all bias has the same distribution (even if their expected value is not zero), the estimate of slope is also unbias. In your derivation of |
In criterion, least-square methods were used to fit lines and get linear regression. However, the "white noise" assumption cannot be made, as the expected value of bias (distribution) is not zero.
In most benchmark cases, our function will use the CPU at full speed. In most cases, bias in the situation will lead to a slower speed but not a higher speed. So the bias follows a skew distribution which means the method of linear regression could be improved.
But I haven't found any information or paper about it (which I think there should have). I will do more research on this field these days.
The text was updated successfully, but these errors were encountered: