For large benchmarks report exporting is very, very slow and has a massive Gen2 heap #119

nietras · 2016-03-16T18:12:36Z

It has been running for hours at 100% CPU for a very long time with:

And Gen2 can be seen here:

adamsitnik · 2016-03-16T18:22:26Z

For hours? Sounds interesting. I will take a look at it right away.

adamsitnik · 2016-03-16T18:32:21Z

@nietras do you use the default Config?

nietras · 2016-03-16T18:33:41Z

No. And I'm using a param with a lot of values.

You can see the benchmark here:
https://github.com/DotNetCross/Memory.Copies/blob/master/src/DotNetCross.Memory.Copies.Benchmarks/CopiesBenchmark.cs

On Mar 16, 2016 7:32 PM, "Adam Sitnik" notifications@github.com wrote:

@nietras https://github.com/nietras do you use the default Config?

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#119 (comment)

nietras · 2016-03-16T19:42:59Z

Still running by the way ;P Could look a lot like a N^2 problem or something. It has printed out the results in the console now but it still consume 100% CPU.

adamsitnik · 2016-03-16T20:04:48Z

I know that it might sound stupid, but could you take a look at the generated file list, then order it by modification/creation time and try to tell if any particular file type takes longer to create?

I have created some huge Summary to export with AutoFixture, will profile the default exporters now.

nietras · 2016-03-16T20:08:56Z

Hope this is what you need. Let me know if you need more. It's still running so...

mattwarren · 2016-03-17T08:14:41Z

@nietras Could you post the ".log" and the "-report.txt" file somewhere, so we can take a look?

mattwarren · 2016-03-17T08:40:31Z

@nietras Also, if you swap the params to [Params(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)], does the issue go away? I.e. is this issue triggered by a large # of Params?

nietras · 2016-03-17T10:43:17Z

@mattwarren I'm pretty sure this is related to the number of test cases e.g. # methods * # runtimes * # param values etc. You can easily try this yourself by cloning the github repo and running it:

https://github.com/DotNetCross/Memory.Copies.git

adamsitnik · 2016-03-17T15:01:37Z

@mattwarren I created new branch where I put all optimizaitons: link.

I did some profiling yesterday, but I was not able to reproduce this behaviour (100% CPU for exporting). However, I have fixed some perf issues.

Later I will try to profile the exact case of @nietras. But this will take hours ;)

mattwarren · 2016-03-17T15:55:02Z

@mattwarren I created new branch where I put all optimizaitons: link.

Cool, I'll take a look when I get a chance.

Later I will try to profile the exact case of @nietras. But this will take hours ;)

I wonder if this code here is the culprit (just a guess, haven't profiled it). If report.AllMeasurements is large, all the Select, Distinct and Where could be expensive.

nietras · 2016-03-17T20:04:03Z

this code here is the culprit (just I guess, haven't profiled it). If report.AllMeasurements is large, all the Select, Distinct and Where could be expensive.

Yeah, that would be bad since Distinct as far as I know pretty much makes a full HashSet of the values and then yield returns if the value isn't already in the set. Since most measurements are from the actual run (I presume) you end up copying a lot of memory. I think. Linq just sucks from a perf perspective in my view. Its so damn easy to waste cpu cycles with it ;)

Later I will try to profile the exact case of @nietras. But this will take hours ;)

Hey, I would have thought you would have used BenchmarkDotNet to benchmark BenchmarkDotNot. Dogfooding at its best ;)

Anyway, I put the requested log files up in a gist: https://gist.github.com/nietras/41408f2e8f2b94478bf5

Also if you do run the copies benchmark I would love to see the actual results of that test if possible.

adamsitnik · 2016-03-17T20:31:30Z

@nietras sure! I will share the results when I have them.

Thanks for sharing the log file! There is 2394 benchmarks, I suspect that we do some O(n^2) for it multiple times ;)

nietras · 2016-03-17T20:33:24Z

Yep, that would be my guess. Great and thanks.
On Mar 17, 2016 21:31, "Adam Sitnik" notifications@github.com wrote:

@nietras https://github.com/nietras sure! I will share the results when
I have them.

Thanks for sharing the log file! There is 2394 benchmarks, I suspect that
we do some O(n^2) for it multiple times ;)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#119 (comment)

adamsitnik · 2016-03-18T00:43:47Z

Profiler's trap:

My first thought was to blame string.Join for the cost, but then I realized that probably the biggest part of it is the price of the linq query ;)

mattwarren · 2016-03-18T10:14:15Z

My first thought was to blame string.Join for the cost, but then I realized that probably the biggest part of it is the price of the linq query ;)

Yay deferred-execution!

nietras · 2016-03-18T10:39:42Z

Yeah the linq stuff looks pretty bad. Where + Select/Distinct => N^2. And all for Count() > 1.

Enumerable.Range(0, n).
 .Where(i => ownProperties[i].Value != defaultJobProperties[i].Value &&
             allJobs.Select(j => j.GetAllProperties().ToArray()[i].Value).Distinct().Count() > 1)
 .Select(i => ownProperties[i]);

But good that you found the probable cause.

adamsitnik · 2016-03-18T15:05:35Z

I have probably finished. For my test case I went down from 40s to 6s. Now logging is almost only about logging ;)

I just got two nice and same VMs from my friend, will run nietras case on both: 0.9.3 vs my updated version of BDN. After I get this done I will post a PR

mattwarren · 2016-03-18T15:27:51Z

Good job, thanks for doing that!

nietras · 2016-03-18T18:38:51Z

Awesome! One thing, though, what about the memory usage? 1.6GB on Gen2 is a lot. I don't understand how it can even get close to so much RAM, given 2394 benchmarks with 5 launches (I hope I use the right terminology), how does it end up with all that RAM? Also compared to the file outputs where the largest is 20MB it seems very high.

Probably I do not understand exactly how you do the measurements, have been trying to look for an explanation of how exactly but couldn't find any, then been looking at the code, but not sure I understand it exactly. However, my guess then is that each launch (i.e. MainTarget) measures the time it takes to do X iterations. This is is done for Y launches for each test case, which are # methods * # params * # runtimes/archs, so in total # launches * # methods * # params * # runtimes/archs or? (Of course there are pilot, idle, warmup runs too)

adamsitnik · 2016-03-18T21:28:33Z

@nietras Good news! I started running the same benchmarks on 0.9.3 and my tuned version on equaly powerfull two separate VMs (Processor=Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz, ProcessorCount=8). The updated lasted 3 hours and 9 minutes and export took only 2 minutes. The unchanged version is still running.

Update: the 0.9.3 just finished, export took 2 hours 46 minutes, total was 5 hours 51 minutes.

Results

Could you give me some comments on the results? I mean is everything in place, displayed correctly etc?

I am familar only with compilation, execution and now logging. @AndreyAkinshin can give you some explanation on the calculations, but he is on a bussiness trip now, not sure when he will be able to respond.

nietras · 2016-03-19T18:34:32Z

@adamsitnik that sounds really great. A 166 / 2 = 8300% improvement is not bad :) I did a quick WinMerge diff on the file you uploaded and besides different times everything looked ok. I guess you've added some new features like bolding the first row for a given parameter set plus other features and also numeric ordering based on parameters.

I've run the following to enable HPET on my windows machines, apparently default is disabled it seems:

bcdedit /set useplatformclock true

You need to reboot after.

I have tried cloning and building the dev branch, but neither this nor the master branch works out of the box :( Anything special you need to do in order to compile. I just opened the sln with VS2015 Update 1 and tried building, but package restore fails for example.

adamsitnik · 2016-03-19T22:46:15Z

My changes are on the 'perf' branch. The building guide is here. You probably lack of the dnx toolchain installed. If you have any problems with the compilation you can post it on the gitter.

Probably you could workaround this bug now by disabling most of the exporters. By default 8 are enabled, CSV-based were the slowest ones. So if you create a new class that implements IConfig you can use only singe one, the one that you need.

adamsitnik self-assigned this Mar 16, 2016

adamsitnik mentioned this issue Mar 19, 2016

Faster export #124

Merged

AndreyAkinshin added the bug label Mar 23, 2016

AndreyAkinshin added this to the v0.9.4 milestone Mar 23, 2016

AndreyAkinshin closed this as completed Mar 23, 2016

martinwoodward unassigned adamsitnik Nov 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For large benchmarks report exporting is very, very slow and has a massive Gen2 heap #119

For large benchmarks report exporting is very, very slow and has a massive Gen2 heap #119

nietras commented Mar 16, 2016

adamsitnik commented Mar 16, 2016

adamsitnik commented Mar 16, 2016

nietras commented Mar 16, 2016

nietras commented Mar 16, 2016

adamsitnik commented Mar 16, 2016

nietras commented Mar 16, 2016

mattwarren commented Mar 17, 2016

mattwarren commented Mar 17, 2016

nietras commented Mar 17, 2016

adamsitnik commented Mar 17, 2016

mattwarren commented Mar 17, 2016

nietras commented Mar 17, 2016

adamsitnik commented Mar 17, 2016

nietras commented Mar 17, 2016

adamsitnik commented Mar 18, 2016

mattwarren commented Mar 18, 2016

nietras commented Mar 18, 2016

adamsitnik commented Mar 18, 2016

mattwarren commented Mar 18, 2016

nietras commented Mar 18, 2016

adamsitnik commented Mar 18, 2016

nietras commented Mar 19, 2016

adamsitnik commented Mar 19, 2016

For large benchmarks report exporting is very, very slow and has a massive Gen2 heap #119

For large benchmarks report exporting is very, very slow and has a massive Gen2 heap #119

Comments

nietras commented Mar 16, 2016

adamsitnik commented Mar 16, 2016

adamsitnik commented Mar 16, 2016

nietras commented Mar 16, 2016

nietras commented Mar 16, 2016

adamsitnik commented Mar 16, 2016

nietras commented Mar 16, 2016

mattwarren commented Mar 17, 2016

mattwarren commented Mar 17, 2016

nietras commented Mar 17, 2016

adamsitnik commented Mar 17, 2016

mattwarren commented Mar 17, 2016

nietras commented Mar 17, 2016

adamsitnik commented Mar 17, 2016

nietras commented Mar 17, 2016

adamsitnik commented Mar 18, 2016

mattwarren commented Mar 18, 2016

nietras commented Mar 18, 2016

adamsitnik commented Mar 18, 2016

mattwarren commented Mar 18, 2016

nietras commented Mar 18, 2016

adamsitnik commented Mar 18, 2016

nietras commented Mar 19, 2016

adamsitnik commented Mar 19, 2016