Less allocatable memory with v20191119 #387

pappnase99 · 2019-12-19T15:51:41Z

We're in eu-central-1 using EKS 1.14 with t3.xlarge worker nodes. After updating to AMI v20191119 the allocatable memory went down from 16132820Ki to 13461148Ki. Also the allocatable CPU went down to 3920m which is less important. Wondering what's going on here.

mogren · 2019-12-19T17:01:05Z

@pappnase99 This is because of #367, adding some reserved memory to Kubelet. The reason for that is to make sure that the kubelet doesn't get killed first if the memory pressure on the node gets too high. That said, reserving 2.5G sounds a bit high.

vedatappsamurai · 2019-12-24T08:26:04Z

We're having a similar issue. We're using r5.large instances in our EKS cluster in eu-west-1 and after updating our nodes from EKS image 1.14.7 to 1.14.8 our applications keep getting evicted on these instances. After investigation we've seen the same allocatable CPU and memory changes @pappnase99 mentioned. According to #367 calculation for an m5.large instance is:

m5.large (2 vCPU, 8 GiB):
"kubeReserved": {
"cpu": "70m",
"ephemeral-storage": "1Gi",
"memory": "1741Mi"
}

@leakingtapan @natherz97 Isn't it too much to reserve 1741Mi on an instance that has 8Gi RAM? Is there a miscalculation here?

harshal-shah · 2019-12-27T14:45:05Z

We are having a similar problem as described by @vedatappsamurai
Before upgrade to the latest AMI, our allocatable node resources were as below:

Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         4
 ephemeral-storage:           96625420948
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      7762648Ki
 pods:                        58

After upgrade to the latest AMI, the allocatable memory has dropped significantly to the following:

Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         3920m
 ephemeral-storage:           95551679124
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      5980828Ki
 pods:                        58

This caused a side effect on our production clusters, we had some cronjobs which used to request 6GB of memory and that would be fine because of earlier allocation numbers but with the new AMI, the cronjob pods were stuck in Pending state.

We are essentially reserving 1740Mi from an 8Gi node which is a tad more than 20%. I don't think the formula mentioned here is a very good guideline. The numbers should be tuned further looking at historical usages etc. 255Mi per 4Gi RAM should be good enough IMHO.

RTodorov · 2019-12-27T15:33:06Z

@mogren could we get an update to this issue? any plans on revisiting this formula? depending on the size of the cluster, this will incur considerable increases in the bill.

mogren · 2019-12-27T17:53:01Z

@RTodorov I agree it seems unnecessary high. We should update the formula to work well for larger instances.

harshal-shah · 2019-12-30T11:29:43Z

We were able to mitigate the issue for now by adding an extra kubelet flag --kube-reserved memory=0.3Gi to our cluster's terraform spec.

vincentheet · 2020-02-05T12:08:46Z

This is blocking my teams to update to a newer AMI, can we get the calculation adjusted @mogren @natherz97 ?

nogara · 2020-02-05T13:57:05Z

It seems as if the calculation is shared with GKE and AKS...

https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable

https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations

natherz97 · 2020-02-06T18:10:34Z

We're working on updating the formula to reserve a smaller percentage of resources available on each instance type, pending more tests. When the PR is published, I will link it here.

vincentheet · 2020-03-25T12:59:07Z

@natherz97 any updates about the progress? The PR seems inactive for the last 2 weeks after an issue is was found by a user.

natherz97 · 2020-03-25T17:43:08Z

@vincentheet Apologies for the delay, we're working on getting more eyes on the PR from other team members.

ade90036 · 2020-04-06T16:06:31Z

Anyone has managed to apply

--kube-reserved memory=0.3Gi

on a the new managed EKS Node Group, one that is not been created via custom CloudFormation?

avasquez614 · 2020-05-29T01:09:45Z

@natherz97 has this been fixed? I see there was a merged PR to reduce the memory allocated, but I just created a 1.14 EKS cluster, with nodes that use AMI amazon-eks-node-1.14-v20200507, and for a c5.large I only have 2580Mi available, which means that around 1400Mi is being reserved. Is that the correct allocation? It still seems a bit excessive.

natherz97 · 2020-05-30T22:03:13Z

@avasquez614 I just launched a worker node using amazon-eks-node-1.14-v20200507 (ami-0486134a23d903f10) and confirmed this commit was included in the release, f1ae97b.

Using a t3.2xlarge instance:

$ ssh -i ~/.ssh/key.pem ec2-user@ec2-35-163-71-186.us-west-2.compute.amazonaws.com -- cat /etc/kubernetes/kubelet/kubelet-config.json
...
"kubeReserved": {
    "cpu": "90m",
    "ephemeral-storage": "1Gi",
    "memory": "893Mi"
  },
...

avasquez614 · 2020-07-02T21:23:05Z

@natherz97 I created a ticket in the eksctl repo to see there's an issue with the tool and the kubeReserved memory. The issue seems to still be happening with clusters that we create, as recent as yesterday. Any help or ideas would be greatly appreciated.

mogren · 2020-08-16T04:11:36Z

This seems to have been resolved in eksctl-io/eksctl#2443. @avasquez614 if you still have issues, see #507 as well.

ericlake · 2021-01-04T19:29:15Z

Where can one find the kubeReserved memory for all of the different instance sizes?

mogren added bug Something isn't working enhancement New feature or request help wanted labels Dec 27, 2019

natherz97 mentioned this issue Feb 18, 2020

Reducing memory allocated in kubeReserved #419

Merged

avasquez614 mentioned this issue Jul 2, 2020

Eksctl creating worker nodes with too much reserved memory for kubelet eksctl-io/eksctl#2395

Closed

mogren removed the help wanted label Aug 16, 2020

mogren closed this as completed Aug 16, 2020

suket22 mentioned this issue Feb 2, 2022

The max pods bootstrap logic is incorrect especially when prefixes are used #782

Open

suket22 mentioned this issue Mar 10, 2022

Add support for limiting max pods per node to 110 aws/karpenter-provider-aws#1490

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less allocatable memory with v20191119 #387

Less allocatable memory with v20191119 #387

pappnase99 commented Dec 19, 2019

mogren commented Dec 19, 2019

vedatappsamurai commented Dec 24, 2019

harshal-shah commented Dec 27, 2019

RTodorov commented Dec 27, 2019

mogren commented Dec 27, 2019

harshal-shah commented Dec 30, 2019

vincentheet commented Feb 5, 2020

nogara commented Feb 5, 2020

natherz97 commented Feb 6, 2020

vincentheet commented Mar 25, 2020

natherz97 commented Mar 25, 2020

ade90036 commented Apr 6, 2020 •

edited

avasquez614 commented May 29, 2020

natherz97 commented May 30, 2020

avasquez614 commented Jul 2, 2020 •

edited

mogren commented Aug 16, 2020

ericlake commented Jan 4, 2021

Less allocatable memory with v20191119 #387

Less allocatable memory with v20191119 #387

Comments

pappnase99 commented Dec 19, 2019

mogren commented Dec 19, 2019

vedatappsamurai commented Dec 24, 2019

harshal-shah commented Dec 27, 2019

RTodorov commented Dec 27, 2019

mogren commented Dec 27, 2019

harshal-shah commented Dec 30, 2019

vincentheet commented Feb 5, 2020

nogara commented Feb 5, 2020

natherz97 commented Feb 6, 2020

vincentheet commented Mar 25, 2020

natherz97 commented Mar 25, 2020

ade90036 commented Apr 6, 2020 • edited

avasquez614 commented May 29, 2020

natherz97 commented May 30, 2020

avasquez614 commented Jul 2, 2020 • edited

mogren commented Aug 16, 2020

ericlake commented Jan 4, 2021

ade90036 commented Apr 6, 2020 •

edited

avasquez614 commented Jul 2, 2020 •

edited