Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for docker limits #49

Merged
merged 5 commits into from Mar 6, 2019
Merged

Proposal for docker limits #49

merged 5 commits into from Mar 6, 2019

Conversation

richlander
Copy link
Member

@richlander richlander commented Dec 13, 2018

.NET Core has support for control groups (cgroups), which is the basis of Docker limits. We found that the algorithm we use to honor cgroups works well for larger memory size limits (for example, >500MB), but that it is not possible to configure a .NET Core application to run indefinitely at lower memory levels. This document proposes an approach to support low memory size limits, <100MB.

Please give feedback on this proposal.

Update (2018.2.27) -- We tightened up the design a fair bit, although the base ideas remain constant:

  • Default policy for GC heap size: max (20mb, 75% of the memory limit on the container)
  • Explicit size can be set as an absolute number or percentage
  • Minimum reserved segment size per heap is 16mb, which will reduce the number of heaps created on machines with a large number of cores and small memory limits

We are committed to making .NET Core work well within Docker limits. We will continue to make improvements for this scenario as required.

Related:


## GC Configuration

We will expose the following configuration knobs (final naming TBD) to enable developers to define their own policies, with the following default values (final values TBD).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will they be exposed? I assume they could be set in the runtimeconfig.json but as a hosting provider I would want to give the knobs to the user without affecting their deployment bundle. A secondary option of environment variables could allow us to change the settings without affect the supplied bundle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cloud hoster might want to fix the memory allocated to an application to improve hosting density and profitability.

This is the scenario you are looking for. I didn't call the exact way of achieving it, but yes, ENV sounds like a good plan.

@tmds
Copy link
Member

tmds commented Dec 13, 2018

Making .NET Core work better in constrained environments 👍

Configuration knobs are for advanced scenarios, they push complexity towards the user.
When the user has set an amount of RAM/CPU on a container (2 parameters already), .NET Core should be smart about that by default.

'usage' in a cgroup (as considered by the OOM killer) is more than 'native' + 'GC heap size'. It includes 'other programs' in the cgroup and an amount of 'kernel cache'.

* Maximum RPS that can be maintained indefinitely with 64MB of memory allocated
* Minimum memory size required to indefinitely maintain 100 requests per second (RPS)

These experiences fix a single metric and expect the application to otherwise function indefinitely. They each offer a specific characteristic that we expect will align with a specific kind of workload.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ASP.NET Core implicit in here? We aren't talking about a .NET Core (non-ASP.NET) application by itself, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. That's what I meant.

@richlander
Copy link
Member Author

richlander commented Dec 13, 2018

Configuration knobs are for advanced scenarios, they push complexity towards the user.
When the user has set an amount of RAM/CPU on a container (2 parameters already), .NET Core should be smart about that by default.

@tmds that aligns with the proposal, yes? It has defaults which I think are good. Please give me feedback on that.

'usage' in a cgroup (as considered by the OOM killer) is more than 'native' + 'GC heap size'. It includes 'other programs' in the cgroup and an amount of 'kernel cache'.

I was planning on modeling all of that as native memory. Do you think that's not workable? Any thoughts on an alternate approach?

@tmds
Copy link
Member

tmds commented Dec 14, 2018

My comment about the usage, was to try to make clear the OOM killer will consider more things being 'used' than .NET Core. This will lead to the .NET Core app getting killed.

@tmds
Copy link
Member

tmds commented Dec 14, 2018

How it works now (since 2.1.5):
.NET Core reads cgroup limit and usage. If the usage is high compared to the limit, it triggers GCs.

I was planning on modeling all of that as native memory.

The challenge is that parts of cgroup usage aren't in direct 'control' of the .NET Core app. Such as usage from other apps in the cgroup, or the cache part in the kernel.

@tmds
Copy link
Member

tmds commented Dec 14, 2018

One suggestion that comes up a lot in issues about this: to use workstation gc instead of server gc. Perhaps the runtime can default to workstation GC when the environment is limited.

@richlander
Copy link
Member Author

@tmds

The challenge is that parts of cgroup usage aren't in direct 'control' of the .NET Core app. Such as usage from other apps in the cgroup, or the cache part in the kernel.

Agreed. We don't have a good suggestion on how to do that other than modeling it as native memory. We talked at length on this problem and couldn't come up with a model we liked. We thought we'd start with this simple but improved model and get feedback before getting more fancy.

One suggestion that comes up a lot in issues about this: to use workstation gc instead of server gc. Perhaps the runtime can default to workstation GC when the environment is limited.

This suggestion isn't part of the written proposal but is part of the plan. We still have to define a policy on that. My personal thinking was to provide a default value for workstation vs server GC based on the size. Imagine <100MB is workstation, <200MB is up to 2 cores, <400 is up to 4 cores, <1000MB is up to 10 cores, >1000MB is unconstrained, and maybe the initial heap size also has a scaling factor.

@tmds
Copy link
Member

tmds commented Dec 15, 2018

Agreed. We don't have a good suggestion on how to do that other than modeling it as native memory. We talked at length on this problem and couldn't come up with a model we liked. We thought we'd start with this simple but improved model and get feedback before getting more fancy.

I think this doesn't work.
Either a model will predict less than cgroup usage, and causes the app to get killed.
Or a model predicts more than cgroup usage, and it will cause more GCs and reduce performance.

One parameter that could be added to control the current implementation is control usage vs limit to trigger GCs. This is a way to control the 'safe' limit for the entire cgroup. That would allow apps to get closer to the limit without triggering GCs. Which means, less memory may be allocated to the container. The app will perform better (less GC) but it can get killed if it would allocate too fast.

@tmds
Copy link
Member

tmds commented Dec 15, 2018

Maybe the cgroup notification api (https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt) can be used to trigger GCs closer to the cgroup limit.

@jkotas
Copy link
Member

jkotas commented Dec 15, 2018

cc @janvorli

@richlander
Copy link
Member Author

That is a good suggestion @tmds. We need to test a prototype with that API and also validate if it is available in all the places we need it. That doc is apparently "hopelessly out of date". Do you know of a more "up to date" version we can use?

@omajid
Copy link
Member

omajid commented Dec 18, 2018

The cgroup v2 doc is here: https://www.kernel.org/doc/Documentation/cgroup-v2.txt

@janvorli
Copy link
Member

@richlander I would prefer to try to make .NET application behave better in the constrained memory scenarios before resorting to a manual configuration way. @Maoni0 is working on changes that should help scenarios where the container size limits are small. And I've been recently thinking about and experimenting with better accounting for committed memory that was not touched yet. The GC is tuned to Windows job objects that account for such memory in a different way than cgroups. While job objects take the committed size into account right at the commit time, cgroups take into account only the committed and touched memory. That means that when you try to commit more memory than you have available, with cgroups, you'd succeed and the failure would happen later once you touch that many memory pages that they cross the limit. With Windows job objects, the commit itself would fail right away. I think this plays a significant role in the failures we were seeing.
My hope is that the GC and accounting changes together could get us to a state where hopefully just a small percentage of applications that have some special memory allocation patterns would require some manual settings.

The memory threshold notification is interesting, I've missed that when reading the cgroup docs in the past. I just knew about the OOM notification which would not be useful for our purposes.

@richlander
Copy link
Member Author

@janvorli I'm not looking for a manual configuration model. My proposal was about knobs with good defaults that typically don't need to be changed by users. That all said, any mechanism that provides the behavior we want is good.

It is important for the solution to be cgroup-oriented since Linux remains the dominant container OS.

@jkotas
Copy link
Member

jkotas commented Jan 25, 2019

cc @marek-safar

@richlander
Copy link
Member Author

I made significant updates to the proposal, updated in the PR. I also provided a summary in the PR description. Please give more feedback.


The GC will more aggressive perform GCs as the GC heap grows closer to the `GCHeapHardLimit` with the goal of making more memory available so that the application can continue to safely function. The GC will avoid continuously performing full blocking GCs if they are not considered productive.

The GC will throw an `OutOfMemoryException` when the committed heap size exceeds the `GCHeapHardLimit` memory size after a full compacting GC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that the committed memory can be over GCHeapHardLimit before a full compacting GC? My numbers seem to reflect that, but I'd like confirm it's expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using Preview 3 builds and do you have memory limits set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the committed memory used by the GC heap should not be over GCHeapHardLimit if you specified it with one of the configs. the committed memory for the whole process of course can be more than that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was for the whole process, so that's expected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, expected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ... and that's what the text says, so I think we're good.

Copy link
Member

@jkotas jkotas Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text is confusing. It says that GC will throw OutOfMemoryException after a full compacting GC. The GC never throws exceptions right after finishing a GC. The GC throws OutOfMemoryException exceptions when you try to allocate. I think this should say:

The GC will throw an OutOfMemoryException for allocations that would cause the committed heap size to exceed the GCHeapHardLimit memory size, even after a full compacting GC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC never throws exceptions right after finishing a GC

hmm? the GC DOES throw OOM right after a full compacting GC if there's still not enough space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original text made it sound like GC.Collect may throw exceptions under certain circumstances. That is not the case. I hope my suggested edit made it clear.

The text is using "GC" to mean "garbage collector component" and "act of collecting a garbage" interchangeably. I think it is ok, but it may be sometimes confusing.

@richlander richlander merged commit 5c40d5e into master Mar 6, 2019
@richlander richlander deleted the cgroups branch March 6, 2019 18:23
@tmds
Copy link
Member

tmds commented Mar 12, 2019

I want to make sure I understand this.

  • Before: GCs were triggered by the cgroup used vs the cgroup limit.
    With this change: GCs are triggered by heap used vs heap limit. And heap limit defaults to 75% of cgroup limit.
  • Using server GC, for each core there should be at least 16 mb heap size provisioned. This number doesn't change based on the available memory/cpu.

@richlander @Maoni0 is this correct?

@Maoni0
Copy link
Member

Maoni0 commented Mar 12, 2019

Before: GCs were triggered by the cgroup used vs the cgroup limit.
With this change: GCs are triggered by heap used vs heap limit. And heap limit defaults to 75% of cgroup limit.

so the intention was it would be triggered by the limit specified on the container. the implementation just didn't work that well in the case especially when there are lots of heaps with a low memory limit 'cause GC was not reacting fast enough to recognize that it's about to exceed the commit limit in certain code paths. it was still taking into considering for example the correct memory pressure inside the container. with the change it's now checking with the commit limit closely during allocation paths so we don't get premature OOMs there.

and yes, there was indeed a change about the 75%. so we'll take 75% of the container memory limit which means if you specified 120mb before, now GC will try to stay under 90mb instead of 120mb.

Using server GC, for each core there should be at least 16 mb heap size provisioned. This number doesn't change based on the available memory/cpu.

at least 16mb heap size reserved, not committed. so if by "provisioned" that's what you meant, then it's correct. and correct, this reserve size does not change.

@tmds
Copy link
Member

tmds commented Mar 14, 2019

Thank you for clarifying @Maoni0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants