-
-
Notifications
You must be signed in to change notification settings - Fork 968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BenchmarkDotNet as a performance tests runner. #155
Comments
This is great, thanks so much for doing this, I think it'll be a great addition to BenchmarkDotNet I need more time to look at it deeply, but on first glance it looks great! With regards to:
and
We already have this, if you enable the Or does this not meet your needs? |
@mattwarren, I'm not sure that the GC diagnoser will work if the benchmark will be runned in process. If it will or if it's easy to adopt the diagnoser for this (very specific, as for me) scenario then it's definitely a way to go! Anyway, I want to wait until all other things will be more or less stable, just to prevent myself from doing useless work:) |
@ig-sinicyn I was just wondering how this was going? Do you need any help from any of us, or is it just case of finding the time to do it (like we all seem to be struggling with!!) |
@mattwarren As far I can see there's no production-grade perftesting suite for .Net. So, sometimes I had to reinvent the wheel basing on our experience in perftesting. Try-fix-repeat all the way down:) Current results are quite promising. Most tests are running in a 5-7 seconds, they are accurate enough, and there's a lot of diagnostics to prevent some typical errors. E.g. environment validation and ability to rerun the tests to detect 'on-the-edge' limits (cases when test eventually does not fit into limits). For example, the output for case "perftest was failed because was run under x86, not x64" looks like this:
The output should be changed to something more readable but at least it works:) |
Yeah that's probably true, the only one I've seen is NBench, but it's pretty new. Being able to do this with BenchmarkDotNet would be great!
BTW That output looks fantastic, just the sort of thing you'd want to see. |
@mattwarren [PerfBenchmark(Description = "Test to ensure that a minimal throughput test can be rapidly executed.",
NumberOfIterations = 3, RunMode = RunMode.Throughput,
RunTimeMilliseconds = 1000, TestMode = TestMode.Test)]
[CounterThroughputAssertion("TestCounter", MustBe.GreaterThan, 10000000.0d)]
[MemoryAssertion(MemoryMetric.TotalBytesAllocated, MustBe.LessThanOrEqualTo, ByteConstants.ThirtyTwoKb)]
[GcTotalAssertion(GcMetric.TotalCollections, GcGeneration.Gen2, MustBe.ExactlyEqualTo, 0.0d)]
public void Benchmark()
{
_counter.Increment();
} I'm pretty sure that attribute annotations can be made less verbose. At least I will try hard to do it:) |
@ig-sinicyn is there anything left to do for us to help you with this task? |
@adamsitnik Actually, no. Thanks for asking! :) The code for the first version is complete (under complete I'm meaning ready-to-ship code quality), it runs in dogfooding for last two weeks without a problem. The last feature - app.connfig support - is ready but not pushed into repo, will be done tomorrow after code review. After that I'll create beta nuget packages (will post here), will complete the docs and samples and will create request for comment thread here and on RSDN forum. Two main issues for now:
|
I'm glad to hear that! Can't wait to start using as well! |
What is the status for this issue, Any chance of being able to use it soon and if so do you have an example setup to look at? This sounds almost exactly like what we are looking for. Would like to integrate it with our teamcity build server which should work fine as long as the results can be accessed |
Pretty much the same:( Good news: we have one in dogfooding since July, no major issues discovered yet. Bad news: until now I had no free time (literally) to finish it. Today my team finally shipped major version of product we were working on and I hope I'll have more time for my pet projects. If you do not want to wait, feel free to grab sources from here. Note that the code may not include some fixes & updates and there'll be breaking changes after upgrading to Bench.Net 0.10. As example - tests running on appveyor (search for |
@ig-sinicyn what is blocking you? do you have some features unfinished or is it something else? maybe we could somehow help you? |
@adamsitnik no blockers actually. The github version produces slightly unstable results when run on appveyor / low-end notebooks and I want to wait for 0.10 release before backporting. |
Is there any update on this? |
@Vannevelj yep. It works and (almost) stable. The sad part is, I'm very busy this year and there's a lot of work before making public announce. Most of TODOs is related to documentation and samples so I may release a public beta if you're interested in it. |
@ig-sinicyn Definitely interested so if you find the time to do it, that would be great. |
@Vannevelj Intro: https://github.com/rsdn/CodeJam/blob/master/PerfTests/docs/Intro.md Small teaser:The code: // A perf test class.
[Category("PerfTests: NUnit examples")]
[CompetitionAnnotateSources] // Opt-in feature: source annotations.
[CompetitionBurstMode] // Use this for large-loops benchmark
public class SimplePerfTest
{
private const int Count = CompetitionRunHelpers.BurstModeLoopCount;
// Perf test runner method.
[Test]
public void RunSimplePerfTest() => Competition.Run(this);
// Baseline competition member.
// All relative metrics will be compared with metrics of the baseline method.
[CompetitionBaseline]
public void Baseline() => Thread.SpinWait(Count);
// Competition member #1. Should take ~3x more time to run.
[CompetitionBenchmark]
public void SlowerX3() => Thread.SpinWait(3 * Count);
// Competition member #2. Should take ~5x more time to run.
[CompetitionBenchmark]
public void SlowerX5() => Thread.SpinWait(5 * Count);
// Competition member #3. Should take ~7x more time to run.
[CompetitionBenchmark]
public void SlowerX7() => Thread.SpinWait(7 * Count);
} first run (should take ~20 seconds: // Baseline competition member.
// All relative metrics will be compared with metrics of the baseline method.
[CompetitionBaseline]
[GcAllocations(0)]
public void Baseline() => Thread.SpinWait(Count);
// Competition member #1. Should take ~3x more time to run.
[CompetitionBenchmark(2.76, 3.28)]
[GcAllocations(0)]
public void SlowerX3() => Thread.SpinWait(3 * Count);
// Competition member #2. Should take ~5x more time to run.
[CompetitionBenchmark(4.74, 5.73)]
[GcAllocations(0)]
public void SlowerX5() => Thread.SpinWait(5 * Count);
// Competition member #3. Should take ~7x more time to run.
[CompetitionBenchmark(6.72, 7.34)]
[GcAllocations(0)]
public void SlowerX7() => Thread.SpinWait(7 * Count); Then, comment out the
P.S. Please note this is the first public beta and there may be glitches here and there. Feel free to file an issue or ask for help in out gitter chat if you catch one:) |
and updates on this? |
@bonesoul, still working on this. I guess that the first version of performance testing API will be available in March. |
Has there been progress on this since February? @bonesoul ? |
@AndreyAkinshin @ig-sinicyn I've not seen anything announcing a performance testing API for BenchmarkDotNet though you said one would be released (possibly in March 2018). Is there any progress on this or some place to help make this a reality? |
I would also like to use BenchmarkDotNet with Nunit similair to what NBench provides. As NBench is not keeping up with NUnit progress I was hoping to find that BenchmarkDotNet would be ;-) |
Hey everyone! I'm sorry that this feature takes so much time. Unfortunately, it's not so easy to implement a reliable performance runner that works with different kinds of benchmarks. Such a system should have an extremely low false-positive error rate (if we get too many false alarms, the performance tests will become untrustable); low false-negative error rate (if we skip most of the performance degradations, the system will become useless); execute as small number of iterations as possible (in case of macrobenchmarks which takes minutes, it doesn't make sense to execute 15-30 iterations each time); and work with different kind of distributions (including multimodal distributions with huge variance and extremely high outliers). |
@AndreyAkinshin Which JB tool will this surface in? |
@AndreyAkinshin It'd make an awesome Xmas present ;). If there's anything we can do to help, let us know. I've a TeamCity configuration where I've (also) been unsuccessful with creating reliable perf unit tests (I'm now just charting the results and if they go up, it's bad, if they go down, it's good, but that's certainly not suitable as a performance unit test). I understand this is complex and a lot of work, but if you need to help weed out some bugs, maybe we can set up a feature branch and go from there for a while until it is considered stable enough to include in BDN? |
Hello folks, just doing a casual ping. {Insert some relaxing joke about Christmas presents} :) |
@abelbraaksma @ndrwrbgs sorry for the delay, the performance runner is still in progress. I take important steps forward every month, but it still does not work as well as I would like. |
@AndreyAkinshin, that's good to hear! Is there something we can do to help? Maybe run it against our own test sets? |
@abelbraaksma it will be much appreciated as soon as I finish the approach that works fine on my set of test cases. |
watching this also 👍 |
@AndreyAkinshin @ig-sinicyn just wondering how this has come along? I played with https://github.com/rsdn/CodeJam/tree/master/PerfTests%5BWIP%5D but unfortunately I need a .NET Core version. Well actually something that at also meets .Net Standard 2.0 as we are still on .Net Core 2.x |
@natiki currently, I'm working on some mathematical approaches that should help to make the future performance runner reliable. You can find some of my recent results in my blog: |
Last I touched this, I was willing to handle the statistical analysis myself the blocker was that BenchmarkDotNet wouldn't let me run in a UnitTest context. It seems from your results that you've already removed that blocker and are trying to put shipping it behind a full-on out-of-box polish. Is it possible to expose the "run inside unit test context" functionality as is to get more hands into the kitchen, to speak, of making a reliable statistical analysis of the results? |
@ndrwrbgs, sorry but the reliable statistical analysis is still in progress ("reliable" is the hardest part). Meanwhile, I continue publishing blog post related to the subject: Most of the suggested approaches are already implemented in perfolizer, but it's not enough to provide reliable out-of-the-box performance checks. (This is not about polishing, I still have some research tasks that should be finished first). |
I think you read my message too swiftly and missed the purpose, you seemed
to reply to what I stated was the NON purpose, so I'll repeat myself.
Last I checked, the library physically could not run from a unit test
context. It seems you're trying to do polish on analyzing the OUTPUT of the
library, which suggests you have changes locally that would let it run and
could unblock many, and we could handle the outputs ourselves for our
unique use cases as mathematicians and statisticians. Could you comment on
this?
…On Mon, Nov 9, 2020, 3:59 AM Andrey Akinshin ***@***.***> wrote:
@ndrwrbgs <https://github.com/ndrwrbgs>, sorry but the reliable
statistical analysis is still in progress ("reliable" is the hardest part).
Meanwhile, I continue publishing blog post related to the subject:
https://aakinshin.net/posts/weighted-quantiles/
https://aakinshin.net/posts/gumbel-mad/
https://aakinshin.net/posts/kde-bw/
https://aakinshin.net/posts/misleading-histograms/
https://aakinshin.net/posts/qrde-hd/
https://aakinshin.net/posts/lowland-multimodality-detection/
Most of the suggested approaches are already implemented in perfolizer
<https://github.com/AndreyAkinshin/perfolizer>, but it's not enough to
provide reliable out-of-the-box performance checks. (This is not about
polishing, I still have some research tasks that should be finished first).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#155 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACSHCOXSYAVJ33EHPTJ2BPDSO7KTTANCNFSM4CCYORPA>
.
|
Is there any working sample for performance tests based on benchmark.nwt? |
@ndrwrbgs sorry for misreading your question, my bad. While we are not recommending executing benchmarks from the unit test context (because a unit test runner may introduce some performance side effects), you can definitely do that. If you have any problems, please file a separate issue with all the details (the unit test framework title, it's version, how do you run tests, etc.). |
|
Has anyone tried this ? |
I recommend reading Measuring performance using BenchmarkDotNet - Part 3 Breaking Builds which introduces this tool: https://github.com/NewDayTechnology/benchmarkdotnet.analyser It documents how to perform benchmarks during CI and how to fail the build if perf has degraded below a certain threshold. |
Hi!
As promised, an report on adopting
BenchmarkDotNet
to be used as a performance tests runner.Bad part:
Good part: it finally works and covers almost all of our use cases:)
Lets start with a short intro describing what perftest are and what they are not
At first, the benchmarks and the perftests ARE NOT the same
The difference is like between olympic running shoes and the hiking boots.
There are some similar parts but the use cases are different, obviously:)
Performance tests are not the thing to find the sandbox-winner method. On the contrary, they're aimed to proof that in real-world conditions the code will not break limits set in the test.
As with all other tests, perftests will be run on different machines, under different workload and they still have to produce repeatable results.
This means you cannot use absolute timings to set the limits for perftests.
There's no sense to compare
0.1 sec run on a tablet
with
0.05 sec when run on dedicated testserver
(under same conditions the latter is 10x slower).
So, you have to include some reference (or baseline) method into the benchmark and to compare all other benchmark methods using a relative-to-the-baseline execution time metric.
This approach is known as a competition perf-testing and it is used in all performance tests we do.
Second, you cannot use averages to compare the results.
These result in too optimistic estimates, percentiles to the resque. To be short, some links:
http://www.goland.org/average_percentile_services/
https://msdn.microsoft.com/en-us/library/bb924370.aspx
http://apmblog.dynatrace.com/2012/11/14/why-averages-suck-and-percentiles-are-great/
Also, I'd recommend the Average vs. Percentiles section from the avesome "Writing High-Performance .NET Code" book. To be honest, the entire Chapter 1 does worth the reading.
Third, you have to set BOTH upper and lower limits.
You DO want to detect situations like "Code B runs 1000x faster unexpectedly", believe me.
100 out of 100, the "Code B" was broken somehow.
Fourth, you will have a LOT of the perftests.
Our usual ratio is one perftest per 20 unit tests or so.
That's not a goal, of course, just statistics from our real-world projects.
Let's say, you have a few hundreds of the perftests. This means that they should be FAST. Usual time limit is 5-10 secs for large tests and 1-2 sec for smaller ones. No one will wait for a hour:)
Fifth, all perftests should be auto-annotated.
Yes, there should be option (configurable via appconfig) to collect the statistics and to update the source with it.
Also, benchmark should rerun automatically with a new limits and loose them if they are too tight.
It allows not to bother with run-set limits-repeat loop and boosts the productivity in magnitude. As I've said above, there will be a lot of perftests.
And there should be way to store the annotation as attributes in the code or as a separate xml file. The last one is mandatory in case the tests are auto-generated (yes, we had these).
And the last but not least:
You should not monitor execution time only.
Memory allocations and GC count should be checked too, as they has influence on the entire app performance.
Ook, looks like that's all:)
Ops, one more: the perftests SHOULD be compatible with any unit-testing framework.
Different projects use different testing libraries and it'll be silly to require another one just to run the perftests.
And now, the great news:
Our current perftest implementation covers almost all of the above requirements and it's almost stable enough to be merged into BenchmarkDotNet.
If you're interested in it, of course:)
The code is in https://github.com/rsdn/CodeJam/tree/master/Main/tests-performance/BenchmarkDotNet
The example tests are in https://github.com/rsdn/CodeJam/tree/master/Main/tests-performance/CalibrationBenchmarks
aaand its kinda working 🆒
The main show stopper
We need an ability to use a custom toolchain.
It looks like it will allow us to enable in-process test running much faster than waiting for #140 to be closed:)
Also, I've delayed the implementation of memory-related limits until we will be sure all other parts are working fine. We definitely need an ability to collect the GC statistics directly from the benchmark process.
It'll allow us to use same System.GC API we're using for the monitoring in production.
When all of it'll will be done I'm going to create a discussion about merging the competition tests infrastructure into BenchmarkDotNet.
At the end
A list of the things that are not critical but definitely should be included into Bench.Net codebase:
Percentile and scaled percentile columns.
The API to group Summary' benchmarks by same conditions (same job and same parameters).
Use case: we've benchmark with different
[Params()]
and there's no sense to compareresults from
Count = 100
and result fromCount = 1000
.You already have similar check in the BaselineDiffColumn,
I propose to extract it in into public API, something like
API to get
BenchmarkReport
fromSummary
andBenchmark
. There wasSummary.Reports
in 0.9.3, but in 0.9.5 its type was changed from Dictionary<> to the array.Ability to report benchmark errors from the analysers. Use case: unit test analyser should report error if the perf test does not fit into timing limits.
Currently i just throw an exception but it does not fit well into the design of the BenchmarkDotNet.
Whoa! That's all for now
Any questions / suggestions are welcome:)
The text was updated successfully, but these errors were encountered: