Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NativeMemoryDiagnoser #1131

Merged
merged 23 commits into from Jul 22, 2019

Conversation

@WojciechNagorski
Copy link
Collaborator

WojciechNagorski commented Apr 12, 2019

Fix for #457. I am open to all suggestions and comments.

Information

I added new columns: Total native memory and Native memory leak. Example of output:

|                           Method |       Mean |         Error |      StdDev |   Gen 0 |  Gen 1 |  Gen 2 | Allocated | Total native memory | Native memory leak |
|--------------------------------- |-----------:|--------------:|------------:|--------:|-------:|-------:|----------:|--------------------:|-------------------:|
|                 AllocateIntArray | 145.592 us | 1,873.6901 us | 102.7033 us |  0.1526 | 0.1526 | 0.1526 |         - |              40,000 |             40,000 |
|    AllocateAndDeallocateIntArray |   5.772 us |     0.8384 us |   0.0460 us |       - |      - |      - |         - |              40,000 |                  - |
|              AllocateStructArray |  66.127 us |   265.2075 us |  14.5369 us |  0.1831 | 0.1831 | 0.1831 |         - |              80,000 |             80,000 |
| AllocateAndDeallocateStructArray |  13.895 us |     3.7949 us |   0.2080 us |       - |      - |      - |         - |              80,000 |                  - |
|    AllocateAndDeallocateManyType | 140.390 us |    87.7633 us |   4.8106 us |       - |      - |      - |         - |             240,000 |            120,000 |
|                ManagedAllocation |  14.924 us |     6.9775 us |   0.3825 us | 12.6495 |      - |      - |   40024 B |                   - |                  - |

NativeMemoryDiagnoser can track all native allocations and memory leaks. This diagnoser also prints native type names. This is console output:
image

I compared the result with PerfView and I got the same result.

Sources

Thanks to @kayle and his microsoft/perfview#857 I could get the type of native objects.

Code from NativeMemoryDiagnoser is inspired by https://github.com/Microsoft/perfview/blob/master/src/PerfView/PerfViewData.cs#L5719-L5944

How to run

I added a new project BenchmarkDotNet.IntegrationTests.NativeAllocation which contains all tests.

You can run this project in this way:

cd tests\BenchmarkDotNet.IntegrationTests.NativeAllocation
dotnet run -c Release -- --filter *IntroNativeMemory*

I didn't want to put this test into BenchmarkDotNet.Samples because this project has to be compiled for the x64 platform.

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Apr 12, 2019

Builds are failed because of the new C++ project.

@adamsitnik

This comment has been minimized.

Copy link
Member

adamsitnik commented Apr 16, 2019

Hi @wojtpl2 ! big thanks for the PR! it looks great!

The only thing that worries me is the new C++ project.

Would it be possible to test the new diagnoser by using Marshal type to allocate unmanaged memory?

IntPtr unmanagedHandle = Marshal.AllocHGlobal(256);
Marshal.FreeHGlobal(unmanagedHandle);
@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Apr 16, 2019

I'll check. I want to clear and remove some code from perfview.

I want to also fix build but I need more time with it.

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Apr 17, 2019

@adamsitnik Marshal.AllocHGlobal and Marshal.FreeHGlobal also work.

|                         Method |      Mean |      Error |    StdDev |   Gen 0 | Gen 1 | Gen 2 | Allocated | Total native memory | Native memory leak |
|------------------------------- |----------:|-----------:|----------:|--------:|------:|------:|----------:|--------------------:|-------------------:|
|                  AllocIntArray |  46.65 us | 161.566 us | 8.8560 us |       - |     - |     - |         - |              40,000 |             40,000 |
|           AllocAndFreeIntArray |  32.69 us |  24.865 us | 1.3630 us |       - |     - |     - |         - |              40,000 |                  - |
|               AllocStructArray | 127.63 us | 100.791 us | 5.5247 us |       - |     - |     - |         - |              80,000 |             80,000 |
|        AllocAndFreeStructArray |  88.10 us |  15.757 us | 0.8637 us |       - |     - |     - |         - |              80,000 |                  - |
|           MarshalAllocIntArray |  46.02 us |  31.281 us | 1.7146 us |       - |     - |     - |         - |              40,000 |             40,000 |
|    MarshalAllocAndFreeIntArray |  25.20 us |  22.858 us | 1.2529 us |       - |     - |     - |         - |              40,000 |                  - |
|        MarshalAllocStructArray | 110.16 us |  74.743 us | 4.0969 us |       - |     - |     - |         - |              80,000 |             80,000 |
| MarshalAllocAndFreeStructArray |  80.83 us |   9.791 us | 0.5367 us |       - |     - |     - |         - |              80,000 |                  - |
|           AllocAndFreeManyType | 274.48 us |  45.900 us | 2.5159 us |       - |     - |     - |         - |             240,000 |            120,000 |
|                   ManagedAlloc |  16.76 us |   1.835 us | 0.1006 us | 12.6343 |     - |     - |   40024 B |                   - |                  - |

Marshal.AllocHGlobal allocate only space without any type. Output from NativeMemoryDiagnoser for this memory looks:
image

For a memory that is allocated from C++, NativeMemoryDiagnoser prints the type names:
image

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Apr 17, 2019

@adamsitnik https://github.com/Microsoft/perfview also have C++ projects:
image

I'll take care of it after the 24 of April.

@adamsitnik

This comment has been minimized.

Copy link
Member

adamsitnik commented Apr 17, 2019

@wojtpl2 thanks!

@kayle

This comment has been minimized.

Copy link
Contributor

kayle commented Apr 18, 2019

Thanks for following up on that PerfView commit. I had started thinking about how to track allocations in benchmarks, but haven't had time to submit a PR. Here's a few questions/considerations I had for this area:

  1. The docs indicate that heap profiling will likely cause severe slowdowns. Is it helpful/possible to force a single iteration with no warmup? Maybe hide the time metrics entirely or warn that they shouldn't be used?
  2. Would it be better to have separate managed/native diagnosers, or a single diagnoser that can optionally collect both types of information. In theory, the managed type allocations are easier to track. I had started a prototype here (kayle@7d4a237)
  3. Regarding adding a c++ project to this repository, I suspect there's a pinvoke available that would allocate on the native heap (or picking a class that uses SafeHandle internally since those need to free native resources).
  4. How much disk I/O happens when collecting these ETW events? Ideally, we could process ETW with real-time listeners by default, and optionally save the ETL to disk for manual inspection.
  5. I'm guessing a common case is to expect 0 allocations during a benchmark. It'd be nice to offer a simple pass/fail option for this case.
  6. Leak tracking seems like more of a correctness issue than performance. I know you already did the hard part of the work, but it might be simpler and less work for this repo to only track raw allocations. To diagnose leaks, users could load the ETL file in perfview.

I'm happy to help out here. Let me know if there's a specific task that I should look closer at.

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jun 21, 2019

I've finally found some time. I've had a hard time.

I moved IntroNativeMemory.cs into BenchmarkDotNet.Samples and I used Bitmap and Graphics from System.Drawing to make native allocation instead of using custom C++ dll.

I created two tests:

        [Benchmark]
        public void BitmapWithLeaks()
        {
            var flag = new Bitmap(200, 100);
            var graphics = Graphics.FromImage(flag);
            var blackPen = new Pen(Color.Black, 3);
            graphics.DrawLine(blackPen, 100, 100, 500, 100);
        }

        [Benchmark]
        public void Bitmap()
        {
            using (var flag = new Bitmap(200, 100))
            {
                using (var graphics = Graphics.FromImage(flag))
                {
                    using (var blackPen = new Pen(Color.Black, 3))
                    {
                        graphics.DrawLine(blackPen, 100, 100, 500, 100);
                    }
                }
            }
        }

Below is the result of these tests.
image

image

TODOs

  1. I have to compare this result with PerfView.

Additional information
Thanks @kayle for your suggestions. Bellow are my answers:

The docs indicate that heap profiling will likely cause severe slowdowns. Is it helpful/possible to force a single iteration with no warmup? Maybe hide the time metrics entirely or warn that they shouldn't be used?

  1. NativeMemoryDiagnoser is used only in the extra diagnostic run. I set performExtraBenchmarksRun=true in EtwProfilerConfig.

Would it be better to have separate managed/native diagnosers, or a single diagnoser that can optionally collect both types of information. In theory, the managed type allocations are easier to track. I had started a prototype here (kayle/BenchmarkDotNet@7d4a237)

  1. Currently managed memory diagnoser does not use ETW at all. If we create new ETW Managed memory diagnoser then BenchmarkDotNet will have two different diagnosers. I'm not sure if it will be accepted by @adamsitnik or @AndreyAkinshin.

Regarding adding a c++ project to this repository, I suspect there's a pinvoke available that would allocate on the native heap (or picking a class that uses SafeHandle internally since those need to free native resources).

  1. It is a good suggestion. I used the System.Drawing instead of custom C++ project. I'll try to find some example with native type.

How much disk I/O happens when collecting these ETW events? Ideally, we could process ETW with real-time listeners by default, and optionally save the ETL to disk for manual inspection.

  1. It is a good suggestion. First I was trying to do real-time listeners but I didn't have all the information. Now I think it will be easy to do. I'll try.

I'm guessing a common case is to expect 0 allocations during a benchmark. It'd be nice to offer a simple pass/fail option for this case.

  1. It is a good idea. I can do it but first I would like to finish current PR.

Leak tracking seems like more of a correctness issue than performance. I know you already did the hard part of the work, but it might be simpler and less work for this repo to only track raw allocations. To diagnose leaks, users could load the ETL file in perfview.

  1. I think that @adamsitnik or @AndreyAkinshin should comment on this point. I will not have a problem removing this feature. But for me, the best solution will be tracking memory leaks without information about type. In this case, I could remove a big part of the code.
@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jun 21, 2019

I see that appveyor build on master does not pass. There are some failing tests:
image

Copy link
Member

adamsitnik left a comment

@wojtpl2 another amazing PR! I am really impressed and also suprised that it was possible to implement this diagnoser.

Could you please respond to my questions? The most important thing is to make the usage of heapSession optional. As soon as you fix it I am going to merge it. I think that it's going to be the key feature of our next release.

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jul 11, 2019

I will fix it today. Thanks for review.

fix
@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jul 12, 2019

for netcoreapp2.1:
image
image

but I don't know yet why it doesn't work for net461:
dotnet run -c Release -f net461 -- --filter *Bitmap
image

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jul 12, 2019

If I comment line:
TraceEventSession.MergeInPlace(FilePath, TextWriter.Null);
where e.g.
FilePath = BenchmarkDotNet.Samples.IntroNativeMemory.Bitmap-20190712-115234.etl
and then I merge collected files using PerfView, everything works ok. It means that EtwProfiler collected data from heapSession correctly. I do not know what's happening yet.

Copy link
Member

adamsitnik left a comment

@wojtpl2 thank you for all the fixes! I have left some nit comments, please let me know if you have time for fixing them.

I do not know what's happening yet.

Were you able to solve the problem of merging the trace files?

@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jul 16, 2019

I was thinking that it was a problem with merging. I checked it again and the problem was in kernel session.

To EnableWindowsHeapProvider function I send name with the path but I should have sent only name with extension.

internal override Session EnableProviders()
        {
-            var osHeapExe = Path.ChangeExtension(Details.Process.StartInfo.FileName, ".exe");
+            var osHeapExe = Path.GetFileName(Path.ChangeExtension(Details.Process.StartInfo.FileName, ".exe"));
            TraceEventSession.EnableWindowsHeapProvider(osHeapExe);
            return this;
        }
@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jul 16, 2019

@adamsitnik It is ready to review. I have fixed all your suggestion and I have sent some optimization (please check #1131 (comment))

I will send the documentation in the next PR if this one is merged.

@adamsitnik

This comment has been minimized.

Copy link
Member

adamsitnik commented Jul 22, 2019

I've just tested it and it works great! :shipit:

@adamsitnik adamsitnik merged commit e92abf8 into dotnet:master Jul 22, 2019
4 checks passed
4 checks passed
WIP Ready for review
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla All CLA requirements met.
Details
@WojciechNagorski WojciechNagorski deleted the WojciechNagorski:native_memory branch Jul 22, 2019
@WojciechNagorski

This comment has been minimized.

Copy link
Collaborator Author

WojciechNagorski commented Jul 22, 2019

Thanks! I'm very happy. I've learned a lot!
Now I'm going to update the documentation in the next PR.

@adamsitnik

This comment has been minimized.

Copy link
Member

adamsitnik commented Jul 22, 2019

Thanks! I'm very happy. I've learned a lot!

Awesome! Once again big thanks.

Now I'm going to update the documentation in the next PR.

Could you please mention that if benchmark has a native memory leak it should be using Job.Short? (I've run into an issue with that when I was testing it)

Could you also consider implementing the IProfiler for it and adding the console argument support? Sth like:

--profiler nativememory
@AndreyAkinshin AndreyAkinshin added this to the v0.11.6 milestone Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.