-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[aws-appmesh-envoy] Too many open files error in version v1.26.4.0+
in EKS Fargate
#489
Comments
Hey Zheng, Thanks for the report. Can you confirm the environment details? Is this EKS / ECS, fargate / ec2? |
Hi @BennettJames, the customer has confirmed that the issue was reproduced in EKS 1.27/1.28 with a Fargate profile. |
Our local testing suggests that this is a regression introduced in our |
Just to provide an update, we've found that this only impacts the EKS + Fargate use case and is not a problem on the EC2 platform. Lately, there have been many reports like this from Envoy users on k8s. For example, there was a change made in the Regarding the AppMesh image, it consists of an agent (see here) which forks the Envoy as a child process and monitors its health. While, we see that the agent process has the soft limit equal to the hard limit the child isn't inheriting those limits and has a lower soft limit of default 1024. In my investigation, I'm exploring whether this has to do we the linux capabilities not being preserved on the Envoy process when being forked. |
v1.26.4.0
+ in EKS Fargate
v1.26.4.0
+ in EKS Fargatev1.26.4.0+
in EKS Fargate
I've checked the syscall by agent as described below. Before execve is executed, setrlimit is invoked with the value, in this case was 1024, stored in rlimit.init.
|
"restore original NOFILE rlimit in child process" commit was introduced since go 1.21 go versions
v1.27.3.0
|
@axot thanks for following up on this, that change would explain the behavior we're observing. We were still on |
Hi @karanvasnani, thanks for the confirm. I have checked the commit and found it has been backported to |
Hi, I checked ECS Fargate, it seems the soft limit is 65535 not 1024.
|
I have built a custom image with the change being introduced in this PR (aws/amazon-ecs-service-connect-agent#73) for testing: |
Close the issue as v1.29.5.0 has been released. |
Summary
Customer have reported that while aws-appmesh-envoy 1.25.4 functions properly,
but upgrading to v1.27.2.0-prod results in a "Too many open files" error.
The issue has been confirm in v1.27.2.0-prod and v1.27.3.0-prod.
Steps to Reproduce
Perform a load test on v1.27.2.0 or v1.27.3.0
Are you currently working around this issue?
N/A
Additional context
N/A
Attachments
The text was updated successfully, but these errors were encountered: