You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.
While we set *-reserved-* kubelet flags toda,y its effect is limited (only to allocatable). It does not set limits on cgroups as you would expect. This means users can either run something manually (via ssh session) or schedule a pod that over consumes nodes resources. Both scenarios leading to unstable cluster.
This is an issue to track the following modifications (note: they have wide implications, so i appreciate careful consideration):
1- user: Limit user mem/cpu (this include anything started by user in SSH session) to 1/2 core and 512mi RAM. Note, this does not apply on scheduled pods.
2- System: Limit and reserve system (udev daemon, sshd, azure waagent etc) to 512mi and 1 core.
3- Node: Create a new cgroups titled runtime. dockerd and kubelet will go under this cgroup with limits of 1 core and 1 GB ram.
4- Master: Create a new cgroups titled runtime. kubelet, etcd, docker will go under this cgroup with reservation/limits to 50% of machine capacity (40% will be allocated to control plane pods such as api server).
5- Document the limits somewhere visible across aks-e and aks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
While we set
*-reserved-*
kubelet flags toda,y its effect is limited (only to allocatable). It does not set limits on cgroups as you would expect. This means users can either run something manually (via ssh session) or schedule a pod that over consumes nodes resources. Both scenarios leading to unstable cluster.This is an issue to track the following modifications (note: they have wide implications, so i appreciate careful consideration):
1- user: Limit user mem/cpu (this include anything started by user in SSH session) to 1/2 core and 512mi RAM. Note, this does not apply on scheduled pods.
2- System: Limit and reserve system (udev daemon, sshd, azure waagent etc) to 512mi and 1 core.
3- Node: Create a new cgroups titled
runtime
. dockerd and kubelet will go under this cgroup with limits of 1 core and 1 GB ram.4- Master: Create a new cgroups titled
runtime
. kubelet, etcd, docker will go under this cgroup with reservation/limits to 50% of machine capacity (40% will be allocated to control plane pods such as api server).5- Document the limits somewhere visible across aks-e and aks.
@jackfrancis @seanmck
The text was updated successfully, but these errors were encountered: