Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: nofile soft limit on EKS Fargate causes connection limits and crashes #71

Closed
visit1985 opened this issue Feb 16, 2024 · 3 comments · Fixed by #73
Closed

Bug: nofile soft limit on EKS Fargate causes connection limits and crashes #71

visit1985 opened this issue Feb 16, 2024 · 3 comments · Fixed by #73
Labels
bug Something isn't working

Comments

@visit1985
Copy link

visit1985 commented Feb 16, 2024

Summary

We have workloads running on EKS Fargate with an aws-appmesh-envoy sidecar injected by AWS App Mesh Controller.
The appnet agent process (PID 1) has a nofile soft limit of 65535, while the forked envoy process has a nofile soft limit of 1024 only.

kubectl exec -i -t -n default example-5ff7dbfc5d-strcr -c envoy -- sh
sh-4.2$ cat /proc/1/cmdline; echo
/usr/bin/agent
sh-4.2$ grep open /proc/1/limits
Max open files            65535                65535                files
sh-4.2$ cat /proc/31/cmdline; echo
/usr/bin/envoy-c/tmp/envoy-config-459706937.yaml-linfo--drain-time-s20
sh-4.2$ grep open /proc/31/limits
Max open files            1024                 65535                files

This imposes a limits of max. ~480 possible TCP connections, since a file handle is created for each ingress/egress.
Reaching the limit causes the envoy process to crash and being restarted by the appnet agent (#181), which causes outage.

Steps to Reproduce

Please refer to support case 170713370901828 for this.

Are you currently working around this issue?

We are unable to workaround this issue, because the appnet agent seems to be closed source.

@visit1985 visit1985 added the bug Something isn't working label Feb 16, 2024
@karanvasnani karanvasnani transferred this issue from aws/aws-app-mesh-roadmap Feb 19, 2024
@karanvasnani
Copy link
Contributor

Thanks for your patience, continuing to track this investigation as part of aws/aws-app-mesh-roadmap#489

axot added a commit to axot/amazon-ecs-service-connect-agent that referenced this issue Mar 22, 2024
axot added a commit to axot/amazon-ecs-service-connect-agent that referenced this issue Mar 22, 2024
@liubnu liubnu closed this as completed in #73 Apr 4, 2024
@karanvasnani karanvasnani reopened this May 23, 2024
@karanvasnani
Copy link
Contributor

Re-opening this issue since the fix hasn't been released yet. As an update, we experienced delays in our release and are currently working on a new release which will include this fix. Will share an update as soon as we have one.

@liubnu
Copy link
Contributor

liubnu commented Jun 26, 2024

Close for aws/aws-app-mesh-roadmap#492

@liubnu liubnu closed this as completed Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants