Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implementing hard limit for GC heap #22180

Merged
merged 1 commit into from Jan 29, 2019
Merged

implementing hard limit for GC heap #22180

merged 1 commit into from Jan 29, 2019

Conversation

@Maoni0
Copy link
Member

@Maoni0 Maoni0 commented Jan 24, 2019

To support container scenario, 2 HardLimit configs are added -

GCHeapHardLimit - specifies a hard limit for the GC heap
GCHeapHardLimitPercent - specifies a percentage of the physical memory this process is allowed to use

If both are specified, GCHeapHardLimit is checked first and only when it's not specified
would we check GCHeapHardLimitPercent.

If neither is specified but the process is running inside a container with a memory
limit specified, we will take this as the hard limit:

max (20mb, 75% of the memory limit on the container)

If one of the HardLimit configs is specified, and the process is running inside a container
with a memory limit, the GC heap usage will not exceed the HardLimit but the total memory
is still the memory limit on the container so when we calculate the memory load it's based
off the container memory limit.

An example,

process is running inside a container with 200mb limit
user also specified GCHeapHardLimit as 100mb.

if 50mb out of the 100mb is used for GC, and 100mb is used for other things, the memory load
is (50 + 100)/200 = 75%.

Some notes on these configs -

  • The limit is the commit size.

  • This is only supported on 64-bit.

  • For Server GC the minimum reserved segment size is 16mb per heap, this is to avoid the
    scenario where the hard limit is small but the process can use many procs and we end up
    with tiny segments which doesn't make sense. We then keep track of the committed on the segments
    so the total does not exceed the hard limit.

@Maoni0 Maoni0 force-pushed the Maoni0:oom branch 5 times, most recently from 92af082 to 052f5a2 Jan 24, 2019
@Maoni0 Maoni0 changed the title [WIP] implementing hard limit for GC heap implementing hard limit for GC heap Jan 25, 2019
@Maoni0
Copy link
Member Author

@Maoni0 Maoni0 commented Jan 25, 2019

FYI @andy-ms @jkotas @richlander @davidwrighton @janvorli

@andy-ms and I are still working on some further perf tuning on Linux, eg, make the default number of heaps better for Server GC based on the limit.

@@ -1877,7 +1877,7 @@ size_t GetCacheSizePerLogicalCpu(BOOL bTrueSize)
}
}

#if defined(_TARGET_AMD64_) || defined (_TARGET_X86_)

This comment has been minimized.

@jkotas

jkotas Jan 25, 2019
Member

Is there a reason why it is ok to delete this fallback for AMD64, but we still need it for x86?

This comment has been minimized.

@Maoni0

Maoni0 Jan 25, 2019
Author Member

the only reason is because I have not tested on x86 so I am reluctant to remove it.

@@ -1877,7 +1877,7 @@ size_t GetCacheSizePerLogicalCpu(BOOL bTrueSize)
}
}

#if defined(_TARGET_AMD64_) || defined (_TARGET_X86_)
#if defined (_TARGET_X86_)
DefaultCatchFilterParam param;

This comment has been minimized.

@jkotas

jkotas Jan 25, 2019
Member

If you are removing _AMD64_ here, you can also remove #ifdef _WIN64 inside this block.

This comment has been minimized.

@Maoni0

Maoni0 Jan 29, 2019
Author Member

right...I haven't because this is for preview and there is a chance we might still want to do this after our perf testing. if the perf testing shows that there's no need to do this, I'll do some code clean up then.

GCHeapHardLimit - specifies a hard limit for the GC heap
GCHeapHardLimitPercent - specifies a percentage of the physical memory this process is allowed to use

If both are specified, GCHeapHardLimit is checked first and only when it's not specified
would we check GCHeapHardLimitPercent.

If neither is specified but the process is running inside a container with a memory
limit specified, we will take this as the hard limit:

max (20mb, 75% of the memory limit on the container)

If one of the HardLimit configs is specified, and the process is running inside a container
with a memory limit, the GC heap usage will not exceed the HardLimit but the total memory
is still the memory limit on the container so when we calculate the memory load it's based
off the container memory limit.

An example,

process is running inside a container with 200mb limit
user also specified GCHeapHardLimit as 100mb.

if 50mb out of the 100mb is used for GC, and 100mb is used for other things, the memory load
is (50 + 100)/200 = 75%.

Some notes on these configs -

+ The limit is the commit size.

+ This is only supported on 64-bit.

+ For Server GC the minimum *reserved* segment size is 16mb per heap, this is to avoid the
scenario where the hard limit is small but the process can use many procs and we end up
with tiny segments which doesn't make sense. We then keep track of the committed on the segments
so the total does not exceed the hard limit.

fix
@Maoni0 Maoni0 force-pushed the Maoni0:oom branch from 052f5a2 to fdbbfc9 Jan 26, 2019
@Maoni0 Maoni0 merged commit ed52a00 into dotnet:master Jan 29, 2019
25 checks passed
25 checks passed
CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
Linux-musl x64 Debug Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Checked crossgen_comparison Build and Test Build finished.
Details
Ubuntu arm Cross Release crossgen_comparison Build and Test Build finished.
Details
Ubuntu x64 Checked CoreFX Tests Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
@wip
WIP Ready for review
Details
Windows_NT x64 Checked CoreFX Tests Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release CoreFX Tests Build finished.
Details
Windows_NT x64 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x64 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test (Jit - TieredCompilation=0) Build finished.
Details
Windows_NT x86 Release Innerloop Build and Test Build finished.
Details
Windows_NT x86 full_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
Windows_NT x86 min_opt ryujit CoreCLR Perf Tests Correctness Build finished.
Details
@azure-pipelines
coreclr-ci Build #20190125.760 succeeded
Details
license/cla All CLA requirements met.
Details
andy-ms added a commit to andy-ms/coreclr that referenced this pull request Mar 20, 2019
This is based on a perf test with 100% survival in a container, before
and after dotnet#22180. GC pause times were greater after that commit.
Debugging showed that the reason was that after, we were always doing
compacting GC, and objects were staying in generation 1 and not making it
to generation 2. The reason was that in the "after" build,
`should_compact_loh()` was always returning true if heap_hard_limit was
set; currently if we do an LOH compaction, we compact all other
generations too. As the comment indicates, we should decide that
automatically, not just set it to true all the time.
@Maoni0 Maoni0 deleted the Maoni0:oom branch Aug 30, 2019
@PSanetra
Copy link

@PSanetra PSanetra commented Jun 17, 2020

@Maoni0 why was 75% of the container memory limit chosen as the default limit? Is it expected that there is some other process running inside the same container?

@Maoni0
Copy link
Member Author

@Maoni0 Maoni0 commented Jun 17, 2020

yes, we do expect normally you have some native memory usage or some other processes running.

@davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Jun 17, 2020

Be aware that this control is for the managed heap. Even in a managed process there are other consumers of heap memory such as the runtime itself, various native allocations performed by the OS to support file and socket i/o, etc. 75% is a fairly conservative number, and especially with larger containers, a higher percentage is probably achievable safely.

@richlander
Copy link
Member

@richlander richlander commented Jun 18, 2020

Previously, the value (through lack of having an algorithm here) was 100%. People reported OOMs. We picked a conservative value and we no longer see those reports. It's likely that many apps could tolerate a higher value, but we thought that this value would prevent the vast majority of apps from seeing OOMs, wouldn't require much validation on our part, and enables people to configure higher or lower values depending on their needs. I'm still feeling good about this set of choices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants