Proposal for docker limits #49

richlander · 2018-12-13T16:06:18Z

.NET Core has support for control groups (cgroups), which is the basis of Docker limits. We found that the algorithm we use to honor cgroups works well for larger memory size limits (for example, >500MB), but that it is not possible to configure a .NET Core application to run indefinitely at lower memory levels. This document proposes an approach to support low memory size limits, <100MB.

Please give feedback on this proposal.

Update (2018.2.27) -- We tightened up the design a fair bit, although the base ideas remain constant:

Default policy for GC heap size: max (20mb, 75% of the memory limit on the container)
Explicit size can be set as an absolute number or percentage
Minimum reserved segment size per heap is 16mb, which will reduce the number of heaps created on machines with a large number of cores and small memory limits

We are committed to making .NET Core work well within Docker limits. We will continue to make improvements for this scenario as required.

normj · 2018-12-13T19:27:56Z

proposed/support-for-docker-limits.md

+
+## GC Configuration
+
+We will expose the following configuration knobs (final naming TBD) to enable developers to define their own policies, with the following default values (final values TBD).


How will they be exposed? I assume they could be set in the runtimeconfig.json but as a hosting provider I would want to give the knobs to the user without affecting their deployment bundle. A secondary option of environment variables could allow us to change the settings without affect the supplied bundle.

A cloud hoster might want to fix the memory allocated to an application to improve hosting density and profitability.

This is the scenario you are looking for. I didn't call the exact way of achieving it, but yes, ENV sounds like a good plan.

tmds · 2018-12-13T21:04:10Z

Making .NET Core work better in constrained environments 👍

Configuration knobs are for advanced scenarios, they push complexity towards the user.
When the user has set an amount of RAM/CPU on a container (2 parameters already), .NET Core should be smart about that by default.

'usage' in a cgroup (as considered by the OOM killer) is more than 'native' + 'GC heap size'. It includes 'other programs' in the cgroup and an amount of 'kernel cache'.

omajid · 2018-12-13T21:48:23Z

proposed/support-for-docker-limits.md

+* Maximum RPS that can be maintained indefinitely with 64MB of memory allocated
+* Minimum memory size required to indefinitely maintain 100 requests per second (RPS)
+
+These experiences fix a single metric and expect the application to otherwise function indefinitely. They each offer a specific characteristic that we expect will align with a specific kind of workload.


Is ASP.NET Core implicit in here? We aren't talking about a .NET Core (non-ASP.NET) application by itself, right?

Correct. That's what I meant.

richlander · 2018-12-13T22:28:53Z

Configuration knobs are for advanced scenarios, they push complexity towards the user.
When the user has set an amount of RAM/CPU on a container (2 parameters already), .NET Core should be smart about that by default.

@tmds that aligns with the proposal, yes? It has defaults which I think are good. Please give me feedback on that.

'usage' in a cgroup (as considered by the OOM killer) is more than 'native' + 'GC heap size'. It includes 'other programs' in the cgroup and an amount of 'kernel cache'.

I was planning on modeling all of that as native memory. Do you think that's not workable? Any thoughts on an alternate approach?

tmds · 2018-12-14T17:08:12Z

My comment about the usage, was to try to make clear the OOM killer will consider more things being 'used' than .NET Core. This will lead to the .NET Core app getting killed.

tmds · 2018-12-14T17:22:24Z

How it works now (since 2.1.5):
.NET Core reads cgroup limit and usage. If the usage is high compared to the limit, it triggers GCs.

I was planning on modeling all of that as native memory.

The challenge is that parts of cgroup usage aren't in direct 'control' of the .NET Core app. Such as usage from other apps in the cgroup, or the cache part in the kernel.

tmds · 2018-12-14T17:24:28Z

One suggestion that comes up a lot in issues about this: to use workstation gc instead of server gc. Perhaps the runtime can default to workstation GC when the environment is limited.

richlander · 2018-12-14T22:40:38Z

@tmds

The challenge is that parts of cgroup usage aren't in direct 'control' of the .NET Core app. Such as usage from other apps in the cgroup, or the cache part in the kernel.

Agreed. We don't have a good suggestion on how to do that other than modeling it as native memory. We talked at length on this problem and couldn't come up with a model we liked. We thought we'd start with this simple but improved model and get feedback before getting more fancy.

One suggestion that comes up a lot in issues about this: to use workstation gc instead of server gc. Perhaps the runtime can default to workstation GC when the environment is limited.

This suggestion isn't part of the written proposal but is part of the plan. We still have to define a policy on that. My personal thinking was to provide a default value for workstation vs server GC based on the size. Imagine <100MB is workstation, <200MB is up to 2 cores, <400 is up to 4 cores, <1000MB is up to 10 cores, >1000MB is unconstrained, and maybe the initial heap size also has a scaling factor.

tmds · 2018-12-15T05:15:44Z

Agreed. We don't have a good suggestion on how to do that other than modeling it as native memory. We talked at length on this problem and couldn't come up with a model we liked. We thought we'd start with this simple but improved model and get feedback before getting more fancy.

I think this doesn't work.
Either a model will predict less than cgroup usage, and causes the app to get killed.
Or a model predicts more than cgroup usage, and it will cause more GCs and reduce performance.

One parameter that could be added to control the current implementation is control usage vs limit to trigger GCs. This is a way to control the 'safe' limit for the entire cgroup. That would allow apps to get closer to the limit without triggering GCs. Which means, less memory may be allocated to the container. The app will perform better (less GC) but it can get killed if it would allocate too fast.

tmds · 2018-12-15T05:52:42Z

Maybe the cgroup notification api (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) can be used to trigger GCs closer to the cgroup limit.

jkotas · 2018-12-15T08:44:35Z

cc @janvorli

richlander · 2018-12-18T18:38:58Z

That is a good suggestion @tmds. We need to test a prototype with that API and also validate if it is available in all the places we need it. That doc is apparently "hopelessly out of date". Do you know of a more "up to date" version we can use?

omajid · 2018-12-18T19:48:00Z

The cgroup v2 doc is here: https://www.kernel.org/doc/Documentation/cgroup-v2.txt

janvorli · 2018-12-18T21:31:44Z

@richlander I would prefer to try to make .NET application behave better in the constrained memory scenarios before resorting to a manual configuration way. @Maoni0 is working on changes that should help scenarios where the container size limits are small. And I've been recently thinking about and experimenting with better accounting for committed memory that was not touched yet. The GC is tuned to Windows job objects that account for such memory in a different way than cgroups. While job objects take the committed size into account right at the commit time, cgroups take into account only the committed and touched memory. That means that when you try to commit more memory than you have available, with cgroups, you'd succeed and the failure would happen later once you touch that many memory pages that they cross the limit. With Windows job objects, the commit itself would fail right away. I think this plays a significant role in the failures we were seeing.
My hope is that the GC and accounting changes together could get us to a state where hopefully just a small percentage of applications that have some special memory allocation patterns would require some manual settings.

The memory threshold notification is interesting, I've missed that when reading the cgroup docs in the past. I just knew about the OOM notification which would not be useful for our purposes.

richlander · 2018-12-19T03:03:06Z

@janvorli I'm not looking for a manual configuration model. My proposal was about knobs with good defaults that typically don't need to be changed by users. That all said, any mechanism that provides the behavior we want is good.

It is important for the solution to be cgroup-oriented since Linux remains the dominant container OS.

jkotas · 2019-01-25T18:27:29Z

cc @marek-safar

richlander · 2019-02-27T22:22:12Z

I made significant updates to the proposal, updated in the PR. I also provided a summary in the PR description. Please give more feedback.

sebastienros · 2019-02-28T00:15:43Z

proposed/support-for-docker-limits.md

+
+The GC will more aggressive perform GCs as the GC heap grows closer to the `GCHeapHardLimit` with the goal of making more memory available so that the application can continue to safely function. The GC will avoid continuously performing full blocking GCs if they are not considered productive.
+
+The GC will throw an `OutOfMemoryException` when the committed heap size exceeds the `GCHeapHardLimit` memory size after a full compacting GC.


Does it mean that the committed memory can be over GCHeapHardLimit before a full compacting GC? My numbers seem to reflect that, but I'd like confirm it's expected.

Are you using Preview 3 builds and do you have memory limits set?

the committed memory used by the GC heap should not be over GCHeapHardLimit if you specified it with one of the configs. the committed memory for the whole process of course can be more than that.

Yes, it was for the whole process, so that's expected.

yep, expected.

Right ... and that's what the text says, so I think we're good.

The text is confusing. It says that GC will throw OutOfMemoryException after a full compacting GC. The GC never throws exceptions right after finishing a GC. The GC throws OutOfMemoryException exceptions when you try to allocate. I think this should say:

The GC will throw an OutOfMemoryException for allocations that would cause the committed heap size to exceed the GCHeapHardLimit memory size, even after a full compacting GC.

The GC never throws exceptions right after finishing a GC

hmm? the GC DOES throw OOM right after a full compacting GC if there's still not enough space.

The original text made it sound like GC.Collect may throw exceptions under certain circumstances. That is not the case. I hope my suggested edit made it clear.

The text is using "GC" to mean "garbage collector component" and "act of collecting a garbage" interchangeably. I think it is ok, but it may be sometimes confusing.

tmds · 2019-03-12T09:01:54Z

I want to make sure I understand this.

Before: GCs were triggered by the cgroup used vs the cgroup limit.
With this change: GCs are triggered by heap used vs heap limit. And heap limit defaults to 75% of cgroup limit.
Using server GC, for each core there should be at least 16 mb heap size provisioned. This number doesn't change based on the available memory/cpu.

@richlander @Maoni0 is this correct?

Maoni0 · 2019-03-12T20:39:57Z

Before: GCs were triggered by the cgroup used vs the cgroup limit.
With this change: GCs are triggered by heap used vs heap limit. And heap limit defaults to 75% of cgroup limit.

so the intention was it would be triggered by the limit specified on the container. the implementation just didn't work that well in the case especially when there are lots of heaps with a low memory limit 'cause GC was not reacting fast enough to recognize that it's about to exceed the commit limit in certain code paths. it was still taking into considering for example the correct memory pressure inside the container. with the change it's now checking with the commit limit closely during allocation paths so we don't get premature OOMs there.

and yes, there was indeed a change about the 75%. so we'll take 75% of the container memory limit which means if you specified 120mb before, now GC will try to stay under 90mb instead of 120mb.

Using server GC, for each core there should be at least 16 mb heap size provisioned. This number doesn't change based on the available memory/cpu.

at least 16mb heap size reserved, not committed. so if by "provisioned" that's what you meant, then it's correct. and correct, this reserve size does not change.

tmds · 2019-03-14T04:12:04Z

Thank you for clarifying @Maoni0.

Add proposal for docker limits

261698c

normj reviewed Dec 13, 2018

View reviewed changes

omajid reviewed Dec 13, 2018

View reviewed changes

Update per design discussions and feedback

7d5852c

sebastienros reviewed Feb 28, 2019

View reviewed changes

richlander added 3 commits March 4, 2019 16:42

Update support-for-docker-limits.md

b5713ef

Update support-for-docker-limits.md

3e6282e

File rename

226b97d

richlander merged commit 5c40d5e into master Mar 6, 2019

richlander deleted the cgroups branch March 6, 2019 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for docker limits #49

Proposal for docker limits #49

richlander commented Dec 13, 2018 •

edited

normj Dec 13, 2018

richlander Dec 13, 2018

tmds commented Dec 13, 2018

omajid Dec 13, 2018

richlander Dec 13, 2018

richlander commented Dec 13, 2018 •

edited

tmds commented Dec 14, 2018 •

edited

tmds commented Dec 14, 2018

tmds commented Dec 14, 2018 •

edited

richlander commented Dec 14, 2018

tmds commented Dec 15, 2018 •

edited

tmds commented Dec 15, 2018

jkotas commented Dec 15, 2018

richlander commented Dec 18, 2018

omajid commented Dec 18, 2018

janvorli commented Dec 18, 2018

richlander commented Dec 19, 2018

jkotas commented Jan 25, 2019

richlander commented Feb 27, 2019

sebastienros Feb 28, 2019

richlander Feb 28, 2019

Maoni0 Feb 28, 2019

sebastienros Feb 28, 2019

Maoni0 Feb 28, 2019

richlander Mar 3, 2019

jkotas Mar 4, 2019 •

edited

Maoni0 Mar 5, 2019

jkotas Mar 5, 2019

tmds commented Mar 12, 2019

Maoni0 commented Mar 12, 2019

tmds commented Mar 14, 2019


		## GC Configuration

		We will expose the following configuration knobs (final naming TBD) to enable developers to define their own policies, with the following default values (final values TBD).


		The GC will more aggressive perform GCs as the GC heap grows closer to the `GCHeapHardLimit` with the goal of making more memory available so that the application can continue to safely function. The GC will avoid continuously performing full blocking GCs if they are not considered productive.

		The GC will throw an `OutOfMemoryException` when the committed heap size exceeds the `GCHeapHardLimit` memory size after a full compacting GC.

Proposal for docker limits #49

Proposal for docker limits #49

Conversation

richlander commented Dec 13, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmds commented Dec 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richlander commented Dec 13, 2018 • edited

tmds commented Dec 14, 2018 • edited

tmds commented Dec 14, 2018

tmds commented Dec 14, 2018 • edited

richlander commented Dec 14, 2018

tmds commented Dec 15, 2018 • edited

tmds commented Dec 15, 2018

jkotas commented Dec 15, 2018

richlander commented Dec 18, 2018

omajid commented Dec 18, 2018

janvorli commented Dec 18, 2018

richlander commented Dec 19, 2018

jkotas commented Jan 25, 2019

richlander commented Feb 27, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkotas Mar 4, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tmds commented Mar 12, 2019

Maoni0 commented Mar 12, 2019

tmds commented Mar 14, 2019

richlander commented Dec 13, 2018 •

edited

richlander commented Dec 13, 2018 •

edited

tmds commented Dec 14, 2018 •

edited

tmds commented Dec 14, 2018 •

edited

tmds commented Dec 15, 2018 •

edited

jkotas Mar 4, 2019 •

edited