New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ Question ] Reduce memory consumption of CoreCLR #7694
Comments
Don't allocate 😄 |
Hi Ruben, do you have any profiling results? |
Hi SaeHie, Yes, we have profiled several Xamarin GUI applications on Tizen Mobile. Typical profile of CoreCLR's memory on the GUI applications is the following:
|
Compiler itself or generated code? |
Yes, the memory for compilation itself, without size of JIT-compiled code (the code's size is accounted in "Code heap"). |
This memory should be transient. It is not needed once the JIT is done JITing. The JIT keeps some of it around to avoid asking OS for it again and again. Is the 1.7MB number the high watermark, or do you see it kept around permanently? The JIT should need less than 100kB to JIT most methods. You may take a look at which (large?) methods take the large amount of memory to JIT, and do something about them. |
This is not necessarily the right answer to optimize the fixed footprint that this issue is about. The techniques to avoid allocations (generics, etc.) often make the fixed footprint worse than just writing a simple code that allocates a bit of temporary garbage.
Excellent! It is always good to start performance investigation with a measurement.
We do have a prior art here: The server GC vs. workstation GC setting is exactly that. The server GC is higher performance, but it has higher memory consumption as well. We can discuss other similar switches like this.
These two are obviously the buckets to focus on. For optimizing the footprint of mapped assembly images, you may take a look at using the https://github.com/mono/linker - @russellhadley and @erozenfeld are looking into using the mono linker for .NET Core. |
Thanks for sharing the results! |
Thank you very much for your comments. We clarified the measurements. Also, need to add some comments about them:
@seanshpark , @jkotas , please, see the clarified measurements below.
Do we understand correctly that the differences in memory distribution between ReadyToRun and Fragile mode are caused by storing preinitialised data in the Fragile format? Could you, please, point us to some documentation or places in code base that could explain the difference? |
I think so.
The pre-initialized datastructures in the Fragile format have a lot of pointers that need to be updated. It is called "restoring" in the code, e.g. look for Creating the datastructures at runtime on demand gives you a dense packing for free. The private pages contain just the datastructures needed. The preinitialized datastructures in the fragile images do not have this property (e.g. the program may only need 100 byte datastructure from a given page, but the whole 4k page is private memory). |
@jkotas, thank you for the information! |
@ruben-ayrapetyan as i read it this is answered now; please reopen if not. |
We have performed initial comparison of CoreCLR and CoreRT from viewpoint of memory consumption on benchmarks from http://benchmarksgame.alioth.debian.org. The initial measurements show that CoreCLR consumes approximately Particularly, As far as we currently see, the difference in memory consumption is mostly related to differences in GC heuristics. Do we see correctly that the main cause of the difference is related to GC? cc @lemmaa @egavrin @Dmitri-Botcharnikov @sergign60 @BredPet @gbalykov @kvochko |
Unfortunately, it does not explain why we see performance improvements on memory intensive benchmarks like Launch time is better on CoreRT, obviously. ~45% faster with CoreRT. |
GC PAL is incomplete in CoreRT - the performance related parts are missing:
|
@jkotas, Thank you very much for the advice. We checked the CoreCLR with concurrent GC turned off. In this configuration, CoreCLR consumes |
You may be running into dotnet/corert#3784. These kind of differences between CoreCLR and CoreRT are point-in-time problem. The GC perf characteristics should be within noise between CoreCLR and CoreRT by the time we are done. |
Hello.
I am wondering about possible ways to reduce memory consumption of CoreCLR.
Do you have any ideas about how it is possible to reduce the working set size?
Please, share any related ideas, and also general opinions about this direction of development.
By the way, is there any defined set of rules for choosing between higher performance and lower memory consumption?
Is it an accepted practice to add compile-time or runtime switches, which allow to switch between the two options?
The text was updated successfully, but these errors were encountered: