Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For large benchmarks report exporting is very, very slow and has a massive Gen2 heap #119

Closed
nietras opened this issue Mar 16, 2016 · 23 comments
Labels
Milestone

Comments

@nietras
Copy link
Contributor

nietras commented Mar 16, 2016

It has been running for hours at 100% CPU for a very long time with:

image

And Gen2 can be seen here:

image

@adamsitnik
Copy link
Member

For hours? Sounds interesting. I will take a look at it right away.

@adamsitnik adamsitnik self-assigned this Mar 16, 2016
@adamsitnik
Copy link
Member

@nietras do you use the default Config?

@nietras
Copy link
Contributor Author

nietras commented Mar 16, 2016

No. And I'm using a param with a lot of values.

You can see the benchmark here:
https://github.com/DotNetCross/Memory.Copies/blob/master/src/DotNetCross.Memory.Copies.Benchmarks/CopiesBenchmark.cs

On Mar 16, 2016 7:32 PM, "Adam Sitnik" notifications@github.com wrote:

@nietras https://github.com/nietras do you use the default Config?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#119 (comment)

@nietras
Copy link
Contributor Author

nietras commented Mar 16, 2016

Still running by the way ;P Could look a lot like a N^2 problem or something. It has printed out the results in the console now but it still consume 100% CPU.

image

@adamsitnik
Copy link
Member

I know that it might sound stupid, but could you take a look at the generated file list, then order it by modification/creation time and try to tell if any particular file type takes longer to create?

I have created some huge Summary to export with AutoFixture, will profile the default exporters now.

@nietras
Copy link
Contributor Author

nietras commented Mar 16, 2016

Hope this is what you need. Let me know if you need more. It's still running so...

image

@mattwarren
Copy link
Contributor

@nietras Could you post the ".log" and the "-report.txt" file somewhere, so we can take a look?

@mattwarren
Copy link
Contributor

@nietras Also, if you swap the params to [Params(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)], does the issue go away? I.e. is this issue triggered by a large # of Params?

@nietras
Copy link
Contributor Author

nietras commented Mar 17, 2016

@mattwarren I'm pretty sure this is related to the number of test cases e.g. # methods * # runtimes * # param values etc. You can easily try this yourself by cloning the github repo and running it:

https://github.com/DotNetCross/Memory.Copies.git

@adamsitnik
Copy link
Member

@mattwarren I created new branch where I put all optimizaitons: link.

I did some profiling yesterday, but I was not able to reproduce this behaviour (100% CPU for exporting). However, I have fixed some perf issues.

Later I will try to profile the exact case of @nietras. But this will take hours ;)

@mattwarren
Copy link
Contributor

@mattwarren I created new branch where I put all optimizaitons: link.

Cool, I'll take a look when I get a chance.

Later I will try to profile the exact case of @nietras. But this will take hours ;)

I wonder if this code here is the culprit (just a guess, haven't profiled it). If report.AllMeasurements is large, all the Select, Distinct and Where could be expensive.

@nietras
Copy link
Contributor Author

nietras commented Mar 17, 2016

this code here is the culprit (just I guess, haven't profiled it). If report.AllMeasurements is large, all the Select, Distinct and Where could be expensive.

Yeah, that would be bad since Distinct as far as I know pretty much makes a full HashSet of the values and then yield returns if the value isn't already in the set. Since most measurements are from the actual run (I presume) you end up copying a lot of memory. I think. Linq just sucks from a perf perspective in my view. Its so damn easy to waste cpu cycles with it ;)

Later I will try to profile the exact case of @nietras. But this will take hours ;)

Hey, I would have thought you would have used BenchmarkDotNet to benchmark BenchmarkDotNot. Dogfooding at its best ;)

Anyway, I put the requested log files up in a gist: https://gist.github.com/nietras/41408f2e8f2b94478bf5

Also if you do run the copies benchmark I would love to see the actual results of that test if possible.

@adamsitnik
Copy link
Member

@nietras sure! I will share the results when I have them.

Thanks for sharing the log file! There is 2394 benchmarks, I suspect that we do some O(n^2) for it multiple times ;)

@nietras
Copy link
Contributor Author

nietras commented Mar 17, 2016

Yep, that would be my guess. Great and thanks.
On Mar 17, 2016 21:31, "Adam Sitnik" notifications@github.com wrote:

@nietras https://github.com/nietras sure! I will share the results when
I have them.

Thanks for sharing the log file! There is 2394 benchmarks, I suspect that
we do some O(n^2) for it multiple times ;)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#119 (comment)

@adamsitnik
Copy link
Member

Profiler's trap:

image

My first thought was to blame string.Join for the cost, but then I realized that probably the biggest part of it is the price of the linq query ;)

@mattwarren
Copy link
Contributor

My first thought was to blame string.Join for the cost, but then I realized that probably the biggest part of it is the price of the linq query ;)

Yay deferred-execution!

@nietras
Copy link
Contributor Author

nietras commented Mar 18, 2016

Yeah the linq stuff looks pretty bad. Where + Select/Distinct => N^2. And all for Count() > 1.

Enumerable.Range(0, n).
 .Where(i => ownProperties[i].Value != defaultJobProperties[i].Value &&
             allJobs.Select(j => j.GetAllProperties().ToArray()[i].Value).Distinct().Count() > 1)
 .Select(i => ownProperties[i]);

But good that you found the probable cause.

@adamsitnik
Copy link
Member

I have probably finished. For my test case I went down from 40s to 6s. Now logging is almost only about logging ;)

image

I just got two nice and same VMs from my friend, will run nietras case on both: 0.9.3 vs my updated version of BDN. After I get this done I will post a PR

@mattwarren
Copy link
Contributor

Good job, thanks for doing that!

@nietras
Copy link
Contributor Author

nietras commented Mar 18, 2016

Awesome! One thing, though, what about the memory usage? 1.6GB on Gen2 is a lot. I don't understand how it can even get close to so much RAM, given 2394 benchmarks with 5 launches (I hope I use the right terminology), how does it end up with all that RAM? Also compared to the file outputs where the largest is 20MB it seems very high.

Probably I do not understand exactly how you do the measurements, have been trying to look for an explanation of how exactly but couldn't find any, then been looking at the code, but not sure I understand it exactly. However, my guess then is that each launch (i.e. MainTarget) measures the time it takes to do X iterations. This is is done for Y launches for each test case, which are # methods * # params * # runtimes/archs, so in total # launches * # methods * # params * # runtimes/archs or? (Of course there are pilot, idle, warmup runs too)

@adamsitnik
Copy link
Member

@nietras Good news! I started running the same benchmarks on 0.9.3 and my tuned version on equaly powerfull two separate VMs (Processor=Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz, ProcessorCount=8). The updated lasted 3 hours and 9 minutes and export took only 2 minutes. The unchanged version is still running.

Update: the 0.9.3 just finished, export took 2 hours 46 minutes, total was 5 hours 51 minutes.

Results

Could you give me some comments on the results? I mean is everything in place, displayed correctly etc?

I am familar only with compilation, execution and now logging. @AndreyAkinshin can give you some explanation on the calculations, but he is on a bussiness trip now, not sure when he will be able to respond.

@nietras
Copy link
Contributor Author

nietras commented Mar 19, 2016

@adamsitnik that sounds really great. A 166 / 2 = 8300% improvement is not bad :) I did a quick WinMerge diff on the file you uploaded and besides different times everything looked ok. I guess you've added some new features like bolding the first row for a given parameter set plus other features and also numeric ordering based on parameters.

I've run the following to enable HPET on my windows machines, apparently default is disabled it seems:

bcdedit /set useplatformclock true

You need to reboot after.

I have tried cloning and building the dev branch, but neither this nor the master branch works out of the box :( Anything special you need to do in order to compile. I just opened the sln with VS2015 Update 1 and tried building, but package restore fails for example.

@adamsitnik
Copy link
Member

My changes are on the 'perf' branch. The building guide is here. You probably lack of the dnx toolchain installed. If you have any problems with the compilation you can post it on the gitter.

Probably you could workaround this bug now by disabling most of the exporters. By default 8 are enabled, CSV-based were the slowest ones. So if you create a new class that implements IConfig you can use only singe one, the one that you need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants