Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed diagnosers flow, reduced heap allocations in Engine to 0 #277

Merged
merged 11 commits into from Oct 15, 2016

Conversation

@adamsitnik
Copy link
Member

adamsitnik commented Oct 5, 2016

I wanted to implement a tail call diagnoser with my friends for #273 but we got stucked. We signed up for a jit event but we were never getting it, most probably because the method was already jitted before we even started the ETW session. But I am not sure, it's a guess.

The diagnosers flow was quite complicated and not very flexible. So I have changed it a little.

How it works now (Auto-generated program.cs):

  1. The first thing to do is to notify master process that diagnoser can attach.
  2. The JIT based diagnosers can attach at this moment, and we are sure that no jitting was done.
  3. When Engine is created and prepared, we run Setup once and notify the diagnosers that this can happen. The GC diagnoser can attach at this moment and avoid the problem of including [Setup] allocations #186 .
  4. Engine runs, but it does not call Setup and/or Cleanup anymore
  5. We notify diagnosers that we are going to call Cleanup, so for example GC Diagnoser can detach at this moment, and don't include Cleanup allocations.
  6. Call Cleanup

@AndreyAkinshin @mattwarren could you guys take a look at the code? I created a PR to start the discussion.

If you like this solution then I would like to continue the work:

  • reduce the number of heap allocations in Engine to 0 to give better accuracy for Memory Diagnoser.
  • And improve allocated bytes calculation per operation to get better accuracy, for example 100 bytes for following benchmark:
[Benchmark]
public void New() => new byte[100];

@AndreyAkinshin what do you think about setup/cleanup changes I made?

@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Oct 6, 2016

@adamsitnik, I like your changes, we are definitely on the right way. Will do a code review on the weekend.

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 7, 2016

I like your changes, we are definitely on the right way. Will do a code review on the weekend.

@AndreyAkinshin cool!

Now I am profiling the Engine to reduce the number of allocations. Currently when I run a benchmark with empty method (does nothing, 0 allocations) with TargetCount = 5000 the results are : 7x GC 0, 2.302,08 Bytes Allocated/Op. Let's see how far I can get ;)

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 7, 2016

Ok, little update. I was able to get 0 GCs for empty method, the overhead went from 2000 bytes to 20 bytes per target invocation. My current problem is that Visual Studio Profiler can not tell me where the 20 bytes are ;) So I will have to try another tool.

@AndreyAkinshin Most of the changes so far were not ugly. The only ugly thing was to pre-allocate List<Measurement> that contains the results.

The breaking change is to not display results during run, but display them all at the end of benchmark run. But I hope it's not a problem.

@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Oct 7, 2016

The breaking change is to not display results during run, but display them all at the end of benchmark run. But I hope it's not a problem.

Could we keep the old behavior? It's really nice to look at results one by one during benchmarking. Moreover, it allows to understand what's going on right now and check how long single iteration takes (so, it's possible to detect mistakes in the benchmark design on early stages).

P.S. Probably, we have some allocations because of the Statistics instances. In this case, we could implement incremental logic.

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 7, 2016

I have one idea: there is always a separate run when any diagnoser is attached, so I could print results immediately by default and in the diagnosers run delay the prining.

@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Oct 7, 2016

LGTM.

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 7, 2016

Ok, I updated the code.

My To do list:

  • find the missing 18-22 bytes for most simple scenario: no warmup, no pilot, fixed target count
  • analyse and improve for scenarion with warmup and pilot
@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 9, 2016

@AndreyAkinshin Ok, I updated the code once again. I am very close to getting it done ;)

I also tried to set InvocationCount to some value, but then realized it was not passed to engine. Is this commit correct fix?

The results are getting more stable and accurate, but 100KB limitation of ETW allocation event is still a problem.

Default Config + MemoryDiagnoser:
image

Keep in mind that 10 bytes array = 10 bytes for bytes, 8 + 8 bytes for extra object fields and + 8 byte for the reference to array.

@mattwarren
Copy link
Contributor

mattwarren commented Oct 10, 2016

The results are getting more stable and accurate, but 100KB limitation of ETW allocation event is still a problem.

Yeah that's always going to be a fundamental problem with ETW events, maybe it's time to look at using AppDomain.MonitoringTotalAllocatedMemorySize instead?

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 10, 2016

Yeah that's always going to be a fundamental problem with ETW events, maybe it's time to look at using AppDomain.MonitoringTotalAllocatedMemorySize instead?

I like this idea, I will do some PoC.

@AndreyAkinshin

This comment has been minimized.

Could we put Jitting in the method caption? The current caption looks a little bit confusing.

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 11, 2016

Ok, I am done. All possible heap allocations were eliminated from Engine.Run()

Memory profiling results of auto-genearated benchmark runner process for benchmark:

[Benchmark(Description = "new byte[10]")]
public byte[] TenBytes() => new byte[10];

image

Next steps:

  1. @AndreyAkinshin could you do the code review?
  2. When the code passes code review Itry to make the Memory Diagnostics a build in part of BDN. For classic toolchain I want to use AppDomain.MonitoringTotalAllocatedMemorySize as @mattwarren suggested, for Core I want to use GC.GetAllocatedBytesForCurrentThread which got exposed 2 days ago
@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Oct 11, 2016

@adamsitnik, great! Could you also fix failed test? I will do the full review in a few days.

Copy link
Member

AndreyAkinshin left a comment

Awesome pull request. It needs some minor changes and it could be merged into master.

@@ -5,6 +5,9 @@ namespace BenchmarkDotNet.Samples.JIT
{
// See http://en.wikipedia.org/wiki/Inline_expansion
// See http://aakinshin.net/en/blog/dotnet/inlining-and-starg/
#if !CORE
[Diagnostics.Windows.Configs.InliningDiagnoserConfig]

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

Could we rename it to InliningDiagnoser (it's more convenient to other attributes name).
Also, it would be cool to add an ability to use this attribute without if-endif directives (it could show a warning, if we are trying to use it for the core target). Is it possible?

This comment has been minimized.

Copy link
@adamsitnik

adamsitnik Oct 11, 2016

Author Member

Yes and no ;)

In general these attributes have to have dependency to our Diagnosers, and Diagnosers are in the BenchmarkDotNet.Diagnostics.Windows project, which supports only NET 4.5+, no .NET Core support.

But we could workaround that by adding the class with same name, but without implementation to our main project. Then all classic projects that have dependency to our Diagnostics package would use the real one, and the .NET Core ones would get empty class. By using the Obsolete attribute we can give them warning.

#if CORE
using System;
using BenchmarkDotNet.Configs;

namespace BenchmarkDotNet.Diagnostics.Windows.Configs
{
    [Obsolete("InliningDiagnoser is not supported for .NET Core yet")]
    public class InliningDiagnoserAttribute : Attribute, IConfigSource
    {
        public IConfig Config { get; }

        public InliningDiagnoserAttribute() { }
    }
}
#endif

What do you think? Personally I am afraid that this could cause some problems in the future and we would gain only simplier sample.

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 12, 2016

Member

Both options looks bad. =( Ok, let keep the current behavior without obsolete attributes.
By the way, is it possible to implement diagnosers for .NET Core?

This comment has been minimized.

Copy link
@adamsitnik

adamsitnik Oct 12, 2016

Author Member

By the way, is it possible to implement diagnosers for .NET Core?

We are limited to what API Microsoft/other software vendors are exposing. If MS releases .NET Core ETW lib then there is no problem. I am planning to implement .NET Core Memory Diagnoser next week based on api exposed by System.GC for netcoreapp1.1.

{
// TODO: use Accuracy.RemoveOutliers
// TODO: check if resulted measurements are too small (like < 0.1ns)
double overhead = Idle == null ? 0.0 : new Statistics(Idle.Select(m => m.Nanoseconds)).Median;

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

I think, we should use Mean here.


double variance = 0;
for (int i = 0; i < measurements.Count; i++)
variance += Math.Pow(measurements[i].Nanoseconds - mean, 2) / (N - 1);

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

I think, here we should use multiplication operator instead of Math.Pow.

@@ -17,50 +19,87 @@ $AdditionalLogic$

namespace BenchmarkDotNet.Autogenerated
{
public class Program : global::$TargetTypeName$
public class Program

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

I like new approach!


namespace BenchmarkDotNet.Diagnostics.Windows.Configs
{
public class InliningDiagnoserConfigAttribute : Attribute, IConfigSource

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

Rename to InliningDiagnoser


namespace BenchmarkDotNet.Diagnostics.Windows.Configs
{
public class MemoryDiagnoserConfigAttribute : Attribute, IConfigSource

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

Rename to MemoryDiagnoser

// shouldn't be more than a few seconds. This increases the likelihood that
// all relevant events are processed by the collection thread by the time we
// are done with the benchmark.
Thread.Sleep(TimeSpan.FromSeconds(3));

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

Could we wrap this line in a separated method with nice name (like WaitFor...)?

logger.WriteLineInfo($"{benchmark.DisplayInfo}");
logger.WriteLineHeader(new string('-', 20));
Logger.WriteLine();
Logger.WriteLineHeader(new string('-', 20));

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

Could we move new string('-', 20) to a static readonly field (e.g. LogSeparator)?

private void Blackhole<T>(T input) { }

[MethodImpl(MethodImplOptions.NoInlining)]
private unsafe void Blackhole(byte* input) { }

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

Could we rename it to Consume?

private IConfig CreateConfig(IDiagnoser diagnoser, int targetCount)
{
return ManualConfig.CreateEmpty()
.With(Job.Dry.WithLaunchCount(1).WithWarmupCount(1).WithTargetCount(targetCount).With(GcMode.Default.WithForce(false)))

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 11, 2016

Member

This line is too long. Could we introduce a variable for job here?
Also, you could use WithGcForce.

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 11, 2016

@AndreyAkinshin Thanks! I updated the code.


var InterquartileRange = Q3 - Q1;
var LowerFence = Q1 - 1.5 * InterquartileRange;
var UpperFence = Q3 + 1.5 * InterquartileRange;

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 12, 2016

Member

Could you use lower first letter (because here we are using local variables instead of properties)?

This comment has been minimized.

Copy link
@AndreyAkinshin

AndreyAkinshin Oct 14, 2016

Member

@adamsitnik, could you fix it?

This comment has been minimized.

Copy link
@adamsitnik

adamsitnik Oct 14, 2016

Author Member

yes, sure! I will do this today

@mattwarren
Copy link
Contributor

mattwarren commented Oct 12, 2016

@adamsitnik Great job, you've given the Diagnosers a long overdue and sorely needed fix/tidy-up thanks for doing that and the new IDiagnoser API looks really nice.

Plus the work on reducing the BenchmarkDotNet memory allocations is really cool, I like this approach of forcing allocations to happen up-front

Just one small thing I noticed, we now have quite a few warnings messages when you build the code (which don't seem to be there on master), can some/all of these be removed or suppressed?

@mattwarren
Copy link
Contributor

mattwarren commented Oct 12, 2016

Slight tongue-in-cheek but also slightly serious, should we ban using System.Linq from certain parts of BenchmarkDotNet? Like the Roslyn code-guidelines do

  • Avoid allocations in compiler hot paths:
    • Avoid LINQ.
    • Avoid using foreach over collections that do not have a struct enumerator.
    • Consider using an object pool. There are many usages of object pools in the compiler to see an example.

That always makes me laugh as the people who designed/wrote LINQ are banned from using it (admittedly only in certain parts of their code)

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 12, 2016

@mattwarren Thanks! It was a lot of fun to me to track even the smallest allocations. Btw. first call to list.Sort was allocating 1740 bytes!! That's insane

As for the LINQ I was wondering if we could somehow ban it on unit-test level.

Pseudocode: Assert.False(typeof(Engine).References.Any(reference => reference.Name == "System.Linq"))

Anyway I will most probably check for heap allocations in engine before each BDN release.

@mattwarren
Copy link
Contributor

mattwarren commented Oct 12, 2016

Anyway I will most probably check for heap allocations in engine before each BDN release.

Maybe we should dog-food our own product and use BenchmarkDotNet to automate this for us? I.e. a unit test that runs on each build and fails if we allocate too much or certain code paths?

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 12, 2016

Maybe we should dog-food our own product and use BenchmarkDotNet to automate this for us? I.e. a unit test that runs on each build and fails if we allocate too much or certain code paths?

Good idea! I could cover that with integration tests

@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 14, 2016

@AndreyAkinshin I have pushed the last commit with rename + I added and verified possibility to use custom Engine.

Next week I should be able to implement universal Memory Diagnoser, so if you could wait with 0.10.0 until then it would be great!

@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Oct 15, 2016

@adamsitnik,

  1. Could we merge this PR into master right now? Or should we wait for the universal Memory Diagnoser?
  2. I'm going to release v0.10.0 at the end of October.
@adamsitnik adamsitnik changed the title Changed diagnosers flow Changed diagnosers flow, reduced heap allocations in Engine to 0 Oct 15, 2016
@adamsitnik adamsitnik merged commit afa5865 into master Oct 15, 2016
2 checks passed
2 checks passed
continuous-integration/appveyor/branch AppVeyor build succeeded
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
@adamsitnik adamsitnik deleted the diagnosersFlow branch Oct 15, 2016
@adamsitnik
Copy link
Member Author

adamsitnik commented Oct 15, 2016

@AndreyAkinshin I will do separate PR with universal Memory Diagnoser within next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.