Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make runtime more friendly to Kubernetes and container systems in general #93030

Open
janvorli opened this issue Oct 4, 2023 · 4 comments
Open
Labels
area-Meta discussion enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@janvorli
Copy link
Member

janvorli commented Oct 4, 2023

  • Hierarchical limits
    • Some container orchestration systems use cgroup hierarchy with memory / CPU limits set possibly in ancestor cgroups. .NET reads only the limits for the cgroup of the current process though. So, the limits imposed by the higher levels are not honored. We want to start honoring these limits. cgroups v1 allows us to simply read the hierarchical limit, but for cgroups v2, we need to walk the hierarchy programatically and compute the final limit.
      PR: Enable cgroups hierarchical memory limits support #93611 [merged]
  • Memory soft limits
    • Cgroups support soft limit for memory. When there is a memory contention, it pushes the process memory usage back to the soft limit by reclaiming memory pages that can be reclaimed. We would like to start reading the soft limit, making GC aware of that limit and try to keep physical memory usage within the limit e.g. by returning free memory to the OS more eagerly.
    • Kubernetes doesn't set the soft limit in cgroups v1 in any way. The request that would be an equivalent of a soft limit can be only read using the Kubernetes API, which is a HTTP API. So, we cannot call that from the runtime. It seems that we can provide an API for external components to provide us details on the soft limit and other limit related details like dynamic limit change notifications.
      Edit: With cgroups v2, it does set the memory.low to the request size, so we can easily extract it.
      With cgroups v2, Kubernetes doc states it may set the memory.low to the request size, but in reality, the current Kubernetes based systems don't do that.
  • Virtual memory limit
    • Linux allows limiting the amount of virtual address space used by a process. There are some hosting offerings that use that to limit actual physical memory usage. .NET doesn't honor that limit, especially when creating the GC heap. We want to implement honoring that limit. There is an existing PR for such a change that needs to be finalized.
      PR: Honor virtual memory limit #80295
  • CPU limits
    • Cgroups v1 and v2 enable limiting CPU usage using a quota and a share. Although we already use the quota to get a "virtual" number of CPU cores and use it for making various runtime decisions, we don't use the share (weight) setting. This controls the minimum number of shares of CPU the cgroup should get when there is a contention. We would like to investigate if we can somehow augment runtime behavior based on this setting as well.
  • Memory and CPU pressure interface
    • Cgroups v2 optionally provide an interface to notify on memory and CPU pressure. A process can register triggers on stalls, use poll on the related descriptor and get notified on the stalls. We would like to investigate if we can use it to make the runtime detect / respond better to memory and CPU pressure.
  • Memory events
    • cgroups v2 has a memory.events pseudo-file for each non-root cgroup that contains counts on how many times a process in the cgroup was throttled / memory reclaimed / oom killed etc. A process can wait on file change event for this file to get notified on changes and then use the counts to detect a memory pressure. Maybe .NET runtime can somehow take advantage of that too.
  • Handling limits when there are multiple .NET processes in a container
    • When there are multiple .NET processes running in a container, we don't currently have any way to detect that or accommodate their GC heap size to this situation. So, the process that is started first assumes that all of the free memory within the limits is available for its usage. We would like to figure out if there is a reasonable way to handle such cases.
@janvorli janvorli added enhancement Product code improvement that does NOT require public API changes/additions area-Meta discussion labels Oct 4, 2023
@janvorli janvorli added this to the 9.0.0 milestone Oct 4, 2023
@ghost
Copy link

ghost commented Oct 4, 2023

Tagging subscribers to this area: @dotnet/area-meta
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Hierarchical limits
    • Some container orchestration systems use cgroup hierarchy with memory / CPU limits set possibly in ancestor cgroups. .NET reads only the limits for the cgroup of the current process though. So the limits imposed by the higher levels are not honored. We want to start honoring these limits. cgroups v1 allows us to simply read the hierarchical limit, but for cgroups v2, we need to walk the hierarchy programatically and compute the final limit.
  • Memory soft limits
    • Cgroups support soft limit for memory. When there is a memory contention, it pushes the process memory usage back to the soft limit by reclaiming memory pages that can be reclaimed. We would like to start reading the soft limit, making GC aware of that limit and try to keep physical memory usage within the limit e.g. by returning free memory to the OS more eagerly
    • Kubernetes doesn't set the soft limit in cgroups in any way. The request that would be an equivalent of a soft limit can be only read using the Kubernetes API, which is a HTTP API. So we cannot call that from the runtime. It seems that we can provide an API for external components to provide us details on the soft limit and other limit related details like dynamic limit change notifications.
  • Virtual memory limit
    • Linux allows limiting the amount of virtual address space used by a process. There are some hosting offerings that use that to limit actual physical memory usage. .NET doesn't honor that limit, especially when creating the GC heap. We want to implement honoring that limit. There is an existing PR for such a change that needs to be finalized.
  • CPU limits
    • Cgroups v1 and v2 enable limiting CPU usage using a quota and a share. Although we already use the quota to get a "virtual" number of CPU cores and use it for making various runtime decisions, we don't use the share (weight) setting. This controls the minimum number of shares of CPU the cgroup should get when there is a contention. We would like to investigate if we can somehow augment runtime behavior based on this setting as well.
  • Memory and CPU pressure interface
    • Cgroups v2 optionally provide an interface to notify on memory and CPU pressure. A process can register triggers on stalls, use poll on the related descriptor and get notified on the stalls. We would like to investigate if we can use it to make the runtime detect / respond better to memory and CPU pressure.
  • Memory events
    • cgroups v2 has a memory.events pseudo-file for each non-root cgroup that contains counts on how many times a process in the cgroup was throttled / memory reclaimed / oom killed etc. A process can wait on file change event for this file to get notified on changes and then use the counts to detect a memory pressure. Maybe .NET runtime can somehow take advantage of that too.
Author: janvorli
Assignees: -
Labels:

enhancement, area-Meta, discussion

Milestone: 9.0.0

@omajid
Copy link
Member

omajid commented Oct 6, 2023

A process can register triggers on stalls, use poll on the related descriptor and get notified on the stalls. We would like to investigate if we can use it to make the runtime detect / respond better to memory and CPU pressure.

Last I looked into this (around 3 years ago), this was restricted to root processes. However, this change made earlier this year torvalds/linux@d82caa2 (which landed in 6.4) allows unprivileged processes to do this as well.

@tmds
Copy link
Member

tmds commented Oct 9, 2023

There was a question here about improving the behavior when multiple .NET processes run in the same container: #84828.

It affects in particular building .NET applications in a container, because such builds have several .NET processes running.

@janvorli
Copy link
Member Author

janvorli commented Oct 9, 2023

@tmds that's a good point, I'll include it in the list.

@mangod9 mangod9 added this to UserStories + Epics in Core-Runtime .net 9 Oct 23, 2023
@mangod9 mangod9 moved this from UserStories + Epics to Complete in Core-Runtime .net 9 May 6, 2024
@mangod9 mangod9 moved this from Complete to UserStories + Epics in Core-Runtime .net 9 May 6, 2024
@mangod9 mangod9 moved this from UserStories + Epics to Experiments in Core-Runtime .net 9 May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Meta discussion enhancement Product code improvement that does NOT require public API changes/additions
Projects
Development

No branches or pull requests

3 participants