Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of Outliers #1310

Closed
luckerby opened this issue Nov 17, 2019 · 1 comment
Closed

Removal of Outliers #1310

luckerby opened this issue Nov 17, 2019 · 1 comment
Assignees
Labels

Comments

@luckerby
Copy link

@luckerby luckerby commented Nov 17, 2019

According to the "How it works" page here, the 'Result' entries in BDN's output will be the 'ActualWorkload' ones minus the computed overhead.

Consider the following relevant output following a BDN run (full text output is here):
image
image

33 iterations are performed, and of results obtained from those, 3 are discarded (highlighted in red) and the rest of 30 results are subsequently kept.

I don't really understand the criteria based on which BDN decides whether to further process the result of an iteration or not (eg how does it decide to remove outliers). The 3 values are clearly "big", but how far off must they be to be removed ?

In the example given, values that are outside the confidence interval are kept as well, so it's not this (besides, the confidence interval looks to be computed based on the data that survived outlier removal, so it can't be a factor in removing the outliers in the first place).

I've quickly went through Andrey's "Pro .NET Benchmarking" book again thinking I might have missed this, but couldn't really find the answer there either.

@AndreyAkinshin AndreyAkinshin self-assigned this Dec 5, 2019
@AndreyAkinshin

This comment has been minimized.

Copy link
Member

@AndreyAkinshin AndreyAkinshin commented Dec 5, 2019

@luckerby currently (BenchmarkDotNet v0.12.0), we use Tukey's fences to detect outliers, you can find the relevant source code here. Speaking of "Pro .NET Benchmarking", it's described in Chapter 4 "Statistics for Performance Engineers" ("Descriptive Statistics" -> "Outliers"). It's one of the most classic approaches, with a very simple implementation, so I chose it in one of the first versions of BenchmarkDotNet.

Currently, I'm working on a better kind of summary table and improved statistics engine. Among other things, I make experiments with different algorithms for outlier detection. Probably, I will make some changes there. So, there is no guarantee that Tukey's fences will be the default algorithm for outlier detection in the future versions of BenchmarkDotNet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.