Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less allocatable memory with v20191119 #387

Closed
pappnase99 opened this issue Dec 19, 2019 · 17 comments
Closed

Less allocatable memory with v20191119 #387

pappnase99 opened this issue Dec 19, 2019 · 17 comments
Labels
bug Something isn't working enhancement New feature or request

Comments

@pappnase99
Copy link

We're in eu-central-1 using EKS 1.14 with t3.xlarge worker nodes. After updating to AMI v20191119 the allocatable memory went down from 16132820Ki to 13461148Ki. Also the allocatable CPU went down to 3920m which is less important. Wondering what's going on here.

@mogren
Copy link

mogren commented Dec 19, 2019

@pappnase99 This is because of #367, adding some reserved memory to Kubelet. The reason for that is to make sure that the kubelet doesn't get killed first if the memory pressure on the node gets too high. That said, reserving 2.5G sounds a bit high.

@vedatappsamurai
Copy link

We're having a similar issue. We're using r5.large instances in our EKS cluster in eu-west-1 and after updating our nodes from EKS image 1.14.7 to 1.14.8 our applications keep getting evicted on these instances. After investigation we've seen the same allocatable CPU and memory changes @pappnase99 mentioned. According to #367 calculation for an m5.large instance is:

m5.large (2 vCPU, 8 GiB):
"kubeReserved": {
"cpu": "70m",
"ephemeral-storage": "1Gi",
"memory": "1741Mi"
}

@leakingtapan @natherz97 Isn't it too much to reserve 1741Mi on an instance that has 8Gi RAM? Is there a miscalculation here?

@harshal-shah
Copy link

We are having a similar problem as described by @vedatappsamurai
Before upgrade to the latest AMI, our allocatable node resources were as below:

Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         4
 ephemeral-storage:           96625420948
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      7762648Ki
 pods:                        58

After upgrade to the latest AMI, the allocatable memory has dropped significantly to the following:

Allocatable:
 attachable-volumes-aws-ebs:  25
 cpu:                         3920m
 ephemeral-storage:           95551679124
 hugepages-1Gi:               0
 hugepages-2Mi:               0
 memory:                      5980828Ki
 pods:                        58

This caused a side effect on our production clusters, we had some cronjobs which used to request 6GB of memory and that would be fine because of earlier allocation numbers but with the new AMI, the cronjob pods were stuck in Pending state.

We are essentially reserving 1740Mi from an 8Gi node which is a tad more than 20%. I don't think the formula mentioned here is a very good guideline. The numbers should be tuned further looking at historical usages etc. 255Mi per 4Gi RAM should be good enough IMHO.

@RTodorov
Copy link

@mogren could we get an update to this issue? any plans on revisiting this formula? depending on the size of the cluster, this will incur considerable increases in the bill.

@mogren mogren added bug Something isn't working enhancement New feature or request help wanted labels Dec 27, 2019
@mogren
Copy link

mogren commented Dec 27, 2019

@RTodorov I agree it seems unnecessary high. We should update the formula to work well for larger instances.

@harshal-shah
Copy link

We were able to mitigate the issue for now by adding an extra kubelet flag --kube-reserved memory=0.3Gi to our cluster's terraform spec.

@vincentheet
Copy link

This is blocking my teams to update to a newer AMI, can we get the calculation adjusted @mogren @natherz97 ?

@nogara
Copy link

nogara commented Feb 5, 2020

It seems as if the calculation is shared with GKE and AKS...

https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable

https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations

@natherz97
Copy link
Contributor

We're working on updating the formula to reserve a smaller percentage of resources available on each instance type, pending more tests. When the PR is published, I will link it here.

@vincentheet
Copy link

@natherz97 any updates about the progress? The PR seems inactive for the last 2 weeks after an issue is was found by a user.

@natherz97
Copy link
Contributor

@vincentheet Apologies for the delay, we're working on getting more eyes on the PR from other team members.

@ade90036
Copy link

ade90036 commented Apr 6, 2020

Anyone has managed to apply

--kube-reserved memory=0.3Gi

on a the new managed EKS Node Group, one that is not been created via custom CloudFormation?

@avasquez614
Copy link

@natherz97 has this been fixed? I see there was a merged PR to reduce the memory allocated, but I just created a 1.14 EKS cluster, with nodes that use AMI amazon-eks-node-1.14-v20200507, and for a c5.large I only have 2580Mi available, which means that around 1400Mi is being reserved. Is that the correct allocation? It still seems a bit excessive.

@natherz97
Copy link
Contributor

@avasquez614 I just launched a worker node using amazon-eks-node-1.14-v20200507 (ami-0486134a23d903f10) and confirmed this commit was included in the release, f1ae97b.

Using a t3.2xlarge instance:

$ ssh -i ~/.ssh/key.pem ec2-user@ec2-35-163-71-186.us-west-2.compute.amazonaws.com -- cat /etc/kubernetes/kubelet/kubelet-config.json
...
"kubeReserved": {
    "cpu": "90m",
    "ephemeral-storage": "1Gi",
    "memory": "893Mi"
  },
...

@avasquez614
Copy link

avasquez614 commented Jul 2, 2020

@natherz97 I created a ticket in the eksctl repo to see there's an issue with the tool and the kubeReserved memory. The issue seems to still be happening with clusters that we create, as recent as yesterday. Any help or ideas would be greatly appreciated.

@mogren
Copy link

mogren commented Aug 16, 2020

This seems to have been resolved in eksctl-io/eksctl#2443. @avasquez614 if you still have issues, see #507 as well.

@mogren mogren closed this as completed Aug 16, 2020
@ericlake
Copy link

ericlake commented Jan 4, 2021

Where can one find the kubeReserved memory for all of the different instance sizes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests