-
-
Notifications
You must be signed in to change notification settings - Fork 973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed diagnosers flow, reduced heap allocations in Engine to 0 #277
Conversation
@adamsitnik, I like your changes, we are definitely on the right way. Will do a code review on the weekend. |
@AndreyAkinshin cool! Now I am profiling the Engine to reduce the number of allocations. Currently when I run a benchmark with empty method (does nothing, 0 allocations) with TargetCount = 5000 the results are : 7x GC 0, 2.302,08 Bytes Allocated/Op. Let's see how far I can get ;) |
Ok, little update. I was able to get 0 GCs for empty method, the overhead went from 2000 bytes to 20 bytes per target invocation. My current problem is that Visual Studio Profiler can not tell me where the 20 bytes are ;) So I will have to try another tool. @AndreyAkinshin Most of the changes so far were not ugly. The only ugly thing was to pre-allocate The breaking change is to not display results during run, but display them all at the end of benchmark run. But I hope it's not a problem. |
Could we keep the old behavior? It's really nice to look at results one by one during benchmarking. Moreover, it allows to understand what's going on right now and check how long single iteration takes (so, it's possible to detect mistakes in the benchmark design on early stages). P.S. Probably, we have some allocations because of the |
I have one idea: there is always a separate run when any diagnoser is attached, so I could print results immediately by default and in the diagnosers run delay the prining. |
LGTM. |
Ok, I updated the code. My To do list:
|
@AndreyAkinshin Ok, I updated the code once again. I am very close to getting it done ;) I also tried to set InvocationCount to some value, but then realized it was not passed to engine. Is this commit correct fix? The results are getting more stable and accurate, but 100KB limitation of ETW allocation event is still a problem. Default Config + MemoryDiagnoser: Keep in mind that 10 bytes array = 10 bytes for bytes, 8 + 8 bytes for extra object fields and + 8 byte for the reference to array. |
Yeah that's always going to be a fundamental problem with ETW events, maybe it's time to look at using AppDomain.MonitoringTotalAllocatedMemorySize instead? |
I like this idea, I will do some PoC. |
Ok, I am done. All possible heap allocations were eliminated from Memory profiling results of auto-genearated benchmark runner process for benchmark: [Benchmark(Description = "new byte[10]")]
public byte[] TenBytes() => new byte[10]; Next steps:
|
@adamsitnik, great! Could you also fix failed test? I will do the full review in a few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome pull request. It needs some minor changes and it could be merged into master.
@@ -5,6 +5,9 @@ namespace BenchmarkDotNet.Samples.JIT | |||
{ | |||
// See http://en.wikipedia.org/wiki/Inline_expansion | |||
// See http://aakinshin.net/en/blog/dotnet/inlining-and-starg/ | |||
#if !CORE | |||
[Diagnostics.Windows.Configs.InliningDiagnoserConfig] |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
{ | ||
// TODO: use Accuracy.RemoveOutliers | ||
// TODO: check if resulted measurements are too small (like < 0.1ns) | ||
double overhead = Idle == null ? 0.0 : new Statistics(Idle.Select(m => m.Nanoseconds)).Median; |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
||
double variance = 0; | ||
for (int i = 0; i < measurements.Count; i++) | ||
variance += Math.Pow(measurements[i].Nanoseconds - mean, 2) / (N - 1); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
@@ -17,50 +19,87 @@ $AdditionalLogic$ | |||
|
|||
namespace BenchmarkDotNet.Autogenerated | |||
{ | |||
public class Program : global::$TargetTypeName$ | |||
public class Program |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
||
namespace BenchmarkDotNet.Diagnostics.Windows.Configs | ||
{ | ||
public class InliningDiagnoserConfigAttribute : Attribute, IConfigSource |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
|
||
namespace BenchmarkDotNet.Diagnostics.Windows.Configs | ||
{ | ||
public class MemoryDiagnoserConfigAttribute : Attribute, IConfigSource |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
// shouldn't be more than a few seconds. This increases the likelihood that | ||
// all relevant events are processed by the collection thread by the time we | ||
// are done with the benchmark. | ||
Thread.Sleep(TimeSpan.FromSeconds(3)); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
logger.WriteLineInfo($"{benchmark.DisplayInfo}"); | ||
logger.WriteLineHeader(new string('-', 20)); | ||
Logger.WriteLine(); | ||
Logger.WriteLineHeader(new string('-', 20)); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
private void Blackhole<T>(T input) { } | ||
|
||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
private unsafe void Blackhole(byte* input) { } |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
private IConfig CreateConfig(IDiagnoser diagnoser, int targetCount) | ||
{ | ||
return ManualConfig.CreateEmpty() | ||
.With(Job.Dry.WithLaunchCount(1).WithWarmupCount(1).WithTargetCount(targetCount).With(GcMode.Default.WithForce(false))) |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
@AndreyAkinshin Thanks! I updated the code. |
|
||
var InterquartileRange = Q3 - Q1; | ||
var LowerFence = Q1 - 1.5 * InterquartileRange; | ||
var UpperFence = Q3 + 1.5 * InterquartileRange; |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
@adamsitnik Great job, you've given the Diagnosers a long overdue and sorely needed fix/tidy-up thanks for doing that and the new Plus the work on reducing the BenchmarkDotNet memory allocations is really cool, I like this approach of forcing allocations to happen up-front Just one small thing I noticed, we now have quite a few warnings messages when you build the code (which don't seem to be there on master), can some/all of these be removed or suppressed? |
Slight tongue-in-cheek but also slightly serious, should we ban
That always makes me laugh as the people who designed/wrote LINQ are banned from using it (admittedly only in certain parts of their code) |
@mattwarren Thanks! It was a lot of fun to me to track even the smallest allocations. Btw. first call to list.Sort was allocating 1740 bytes!! That's insane As for the LINQ I was wondering if we could somehow ban it on unit-test level. Pseudocode: Anyway I will most probably check for heap allocations in engine before each BDN release. |
Maybe we should dog-food our own product and use BenchmarkDotNet to automate this for us? I.e. a unit test that runs on each build and fails if we allocate too much or certain code paths? |
Good idea! I could cover that with integration tests |
@AndreyAkinshin I have pushed the last commit with rename + I added and verified possibility to use custom Engine. Next week I should be able to implement universal Memory Diagnoser, so if you could wait with 0.10.0 until then it would be great! |
|
@AndreyAkinshin I will do separate PR with universal Memory Diagnoser within next week. |
I wanted to implement a tail call diagnoser with my friends for #273 but we got stucked. We signed up for a jit event but we were never getting it, most probably because the method was already jitted before we even started the ETW session. But I am not sure, it's a guess.
The diagnosers flow was quite complicated and not very flexible. So I have changed it a little.
How it works now (Auto-generated program.cs):
[Setup]
allocations GC Diagnoser should not include allocations done by Setup method #186 .@AndreyAkinshin @mattwarren could you guys take a look at the code? I created a PR to start the discussion.
If you like this solution then I would like to continue the work:
@AndreyAkinshin what do you think about setup/cleanup changes I made?