Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ Question ] Reduce memory consumption of CoreCLR #7694

Closed
ruben-ayrapetyan opened this issue Mar 22, 2017 · 17 comments
Closed

[ Question ] Reduce memory consumption of CoreCLR #7694

ruben-ayrapetyan opened this issue Mar 22, 2017 · 17 comments
Labels
design-discussion Ongoing discussion about design without consensus

Comments

@ruben-ayrapetyan
Copy link
Contributor

Hello.

I am wondering about possible ways to reduce memory consumption of CoreCLR.
Do you have any ideas about how it is possible to reduce the working set size?
Please, share any related ideas, and also general opinions about this direction of development.

By the way, is there any defined set of rules for choosing between higher performance and lower memory consumption?
Is it an accepted practice to add compile-time or runtime switches, which allow to switch between the two options?

@davidfowl
Copy link
Member

Don't allocate 😄

@seanshpark
Copy link
Contributor

Hi Ruben, do you have any profiling results?

@ruben-ayrapetyan
Copy link
Contributor Author

ruben-ayrapetyan commented Mar 22, 2017

Hi SaeHie,

Yes, we have profiled several Xamarin GUI applications on Tizen Mobile.

Typical profile of CoreCLR's memory on the GUI applications is the following:

  1. Mapped assembly images - 4.2 megabytes (50%)
  2. JIT-compiler's memory - 1.7 megabytes (20%)
  3. Execution engine - about 1 megabyte (11%)
  4. Code heap - about 1 megabyte (11%)
  5. Type information - about 0.5 megabyte (6%)
  6. Objects heap - about 0.2 megabyte (2%)

@egavrin
Copy link

egavrin commented Mar 22, 2017

JIT-compiler memory - 1.7 megabytes (20%)

Compiler itself or generated code?

@ruben-ayrapetyan
Copy link
Contributor Author

ruben-ayrapetyan commented Mar 22, 2017

JIT-compiler memory - 1.7 megabytes (20%)

Compiler itself of generated code?

Yes, the memory for compilation itself, without size of JIT-compiled code (the code's size is accounted in "Code heap").

@jkotas
Copy link
Member

jkotas commented Mar 22, 2017

Yes, the memory for compilation itself

This memory should be transient. It is not needed once the JIT is done JITing. The JIT keeps some of it around to avoid asking OS for it again and again. Is the 1.7MB number the high watermark, or do you see it kept around permanently?

The JIT should need less than 100kB to JIT most methods. You may take a look at which (large?) methods take the large amount of memory to JIT, and do something about them.

@jkotas
Copy link
Member

jkotas commented Mar 22, 2017

Don't allocate :-)

This is not necessarily the right answer to optimize the fixed footprint that this issue is about. The techniques to avoid allocations (generics, etc.) often make the fixed footprint worse than just writing a simple code that allocates a bit of temporary garbage.

Typical profile of CoreCLR's memory on the GUI applications

Excellent! It is always good to start performance investigation with a measurement.

Higher performance and lower memory consumption? Is it an accepted practice to add compile-time or runtime switches, which allow to switch between the two options?

We do have a prior art here: The server GC vs. workstation GC setting is exactly that. The server GC is higher performance, but it has higher memory consumption as well. We can discuss other similar switches like this.

Mapped assembly images - 4.2 megabytes (50%)
JIT-compiler's memory - 1.7 megabytes (20%)

These two are obviously the buckets to focus on. For optimizing the footprint of mapped assembly images, you may take a look at using the https://github.com/mono/linker - @russellhadley and @erozenfeld are looking into using the mono linker for .NET Core.

@seanshpark
Copy link
Contributor

Yes, we have profiled several Xamarin GUI applications on Tizen Mobile.

Thanks for sharing the results!

@ruben-ayrapetyan
Copy link
Contributor Author

ruben-ayrapetyan commented Apr 3, 2017

@jkotas,

Thank you very much for your comments.

We clarified the measurements.

Also, need to add some comments about them:

  • the measurements were performed with assemblies precompiled in ReadyToRun format, which currently isn't default in Tizen. When Fragile format of assemblies is used, distribution of memory consumptions looks quite differently.
  • the measurements show "Private" memory usage of process, i.e. only the part, which is not shared with other processes. "Shared" part is not accounted at all in this measurements. Most part of "Mapped assembly images" in measurements above is "Private_Clean" (unmodified) memory, which automatically becomes "shared" just when same assembly is mapped to another processes. So, actual per-application consumption of the "mapped assembly images" is much less in ReadyToRun mode. Please, see the new measurements below.

@seanshpark , @jkotas , please, see the clarified measurements below.
The following measurements are for Puzzle sample application (https://developer.tizen.org/sites/default/files/documentation/puzzle2.zip), which is started along with another .NET application (so, mapped files are mostly shared).

ReadyToRun mode means the Tizen-default set of precompiled assemblies is in ReadyToRun format.
Fragile mode - the Tizen-default set of precompiled assemblies is in Fragile format (currently, the format is used in Tizen).
The values in cells represent "Private" (per-application) memory consumption of CoreCLR.

Component ReadyToRun mode Fragile mode
Mapped assembly images 1921 kilobytes (37%) 5130 kilobytes (76%)
Execution engine 1309 kilobytes (25.2%) 795 kilobytes (11.8%)
Objects heap 690 kilobytes (13.3%) 506 kilobytes (7.5%)
Code heap 549 kilobytes (10.5%) 119 kilobytes (1.7%)
Type information 654 kilobytes (12.6%) 106 kilobytes (1.5%)
JIT-compiler's memory 64 kilobytes (1.2%) 64 kilobytes (0.9%)
Total 5187 kilobytes (100%) 6720 kilobytes (100%)

Do we understand correctly that the differences in memory distribution between ReadyToRun and Fragile mode are caused by storing preinitialised data in the Fragile format? Could you, please, point us to some documentation or places in code base that could explain the difference?

@jkotas
Copy link
Member

jkotas commented Apr 3, 2017

differences in memory distribution between ReadyToRun and Fragile mode are caused by storing preinitialised data in the Fragile format?

I think so.

documentation or places in code base that could explain the difference?

The pre-initialized datastructures in the Fragile format have a lot of pointers that need to be updated. It is called "restoring" in the code, e.g. look for MethodTable::Restore. Updating the pointers produces the private memory pages.

Creating the datastructures at runtime on demand gives you a dense packing for free. The private pages contain just the datastructures needed. The preinitialized datastructures in the fragile images do not have this property (e.g. the program may only need 100 byte datastructure from a given page, but the whole 4k page is private memory).

@ruben-ayrapetyan
Copy link
Contributor Author

@jkotas, thank you for the information!

@danmoseley
Copy link
Member

@ruben-ayrapetyan as i read it this is answered now; please reopen if not.

@ruben-ayrapetyan
Copy link
Contributor Author

ruben-ayrapetyan commented Jul 5, 2017

@jkotas,

We have performed initial comparison of CoreCLR and CoreRT from viewpoint of memory consumption on benchmarks from http://benchmarksgame.alioth.debian.org.

The initial measurements show that CoreCLR consumes approximately 41% more memory on average than CoreRT and is approximately 4% slower (x64 release build).

Particularly, binary-trees benchmark (http://benchmarksgame.alioth.debian.org/u64q/program.php?test=binarytrees&lang=csharpcore&id=5) shows the following:
Peak Rss on CoreCLR is about 1.5 gigabytes
Peak Rss on CoreRT is about 1 gigabyte
Running time on CoreCLR is about 46.7 seconds
Running time on CoreRT is about 29.6 seconds

As far as we currently see, the difference in memory consumption is mostly related to differences in GC heuristics.
Particularly, we could reduce memory consumption of CoreCLR on binary-trees by about 2 times through invoking GC more frequently.

Do we see correctly that the main cause of the difference is related to GC?
Could you, please, clarify what are the differences in GC between CoreRT and CoreCLR?

cc @lemmaa @egavrin @Dmitri-Botcharnikov @sergign60 @BredPet @gbalykov @kvochko

@egavrin
Copy link

egavrin commented Jul 5, 2017

As far as we currently see, the difference in memory consumption is mostly related to differences in GC heuristics.

Unfortunately, it does not explain why we see performance improvements on memory intensive benchmarks like binary-trees or spectral-norm.

Launch time is better on CoreRT, obviously. ~45% faster with CoreRT.

@jkotas
Copy link
Member

jkotas commented Jul 5, 2017

GC PAL is incomplete in CoreRT - the performance related parts are missing:

  • The concurrent/background GC is not enabled in CoreRT yet (it is the default in CoreCLR). You can try rerunning the CoreCLR with concurrent GC disabled to see whether it is causing the difference.
  • The L1/L2 cache size detection is missing https://github.com/dotnet/corert/blob/master/src/Native/gc/unix/gcenv.unix.cpp#L389. You can try hardcode the number that CoreCLR uses on your machine to see whether it is causing the difference.

@ruben-ayrapetyan
Copy link
Contributor Author

@jkotas, Thank you very much for the advice.

We checked the CoreCLR with concurrent GC turned off.

In this configuration, CoreCLR consumes 2 times less RSS at peak, and is about 30% faster than CoreRT on the binary-trees benchmark.

@jkotas
Copy link
Member

jkotas commented Jul 7, 2017

You may be running into dotnet/corert#3784.

These kind of differences between CoreCLR and CoreRT are point-in-time problem. The GC perf characteristics should be within noise between CoreCLR and CoreRT by the time we are done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
design-discussion Ongoing discussion about design without consensus
Projects
None yet
Development

No branches or pull requests

6 participants