-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASP.NET Core hoarding memory #6803
Comments
Thanks for all these details. I notice you are not using 2.2. Would it be possible for you to upgrade your base image, there was a bug that was fixed that was the cause of many performance issues in it. In the meantime I will setup a service similar to yours and let it run for a few days. |
I will update it tomorrow, but we will have to wait at least for a week to see something relevant 👍 |
I hope that this issue will be the quintessence of the whole struggle with the GC in ASP.NET Core. |
We are running an AspNet Core 2.1.4 microservice deployed on Alpine Linux, hosted on AWS ECS ( docker image It's using server GC and has 1536MB available. Our microservice with websockets is not experiencing any memory growth. We do not explicitly set a Also, I assume the reason your memory usage dropped is due to You could verify that theory by wiring up some mechanism to run Lastly, in our experience running dotnet core on linux, ( all with server GC mind you ), it has a tendency to keep and re-use memory instead of returning it to the OS for performance gains. Note how even though the GC.GetTotalMemory ( with There had been issues in the past ( on 2.1.3 and earlier ) with .net growing too aggressively and not properly respecting the limits set by the container and getting memkilled, but they seem to be fixed in 2.1.4. I hope this helps a little bit. Good luck! |
Testing the |
I had the chance to run the application in a Windows machine, reproduce the conditions that lead to the memory leak, take a snapshot and analyze it with dotMemory. The problem was a feature of RabbitMQ.Client named "AutorecoveryConnection", which is keeping information about existing subscriptions in order to recover them in the event of a reconnection. Setting Our application uses a channel per websocket connection, and it seems channels are not as lightweight as I thought. However, I find no reason to keep 11K+ How come that memory usage dropped when taking the memory snapshot? Well, probably RabbitMQ.Client disconnected by timeout (since the process freezes) and something happened internally in the component that released the held objects. I hope to have some time in the future to investigate this further, but for now it is clear is not an ASP.NET core thing. |
This issue is following up on #1976
The problem
We have a websocket server that hoards memory during days, till the point that Kubernetes eventually kills it. We monitor it using prometheous-net.
I can see in the graphs, that GC is collecting regularly in all generations.
GC Server is disabled using:
Before disabling GC Server, the service used to grow memory way faster. Now it takes two weeks to get into 512Mb.
Other services using ASP.NET Core on request/response fashion do not show this problem. This uses Websockets, where each connection last usually around 10 minutes... so I guess everything related with the connection survives till Gen 2 easily.
The application
The application is a very simple ASP.NET core application with two controllers, one simple one for readines/liveness probes from Kubernetes, and another controller for establishing Websocket connetions.
We did some preliminary and rough tests and checked we could handle 500 concurrent websockets per pod using 512Mb. We ran for hours with 2 pods and 1000 concurrent connections with memory being less than 150Mb . The deployed application, with 2 pods, has between 150 and 300 concurrent connections at any moment, and the memory varies from less than 100Mb on the first few days, till reaching the 512Mb in around 2 weeks. There no seems to be correlation between the number of connections and the memory used.
More than 70% of the connections last 10 minutes. Connections usually die abruptly due the load balancer cutting them after 600 seconds (10 min )
We have a limit of 512Mb per pod set using Kubernetes limits:
Message rate is very low. We have a keep interval defined:
We use dotnet 2.1.6 on Linux using
microsoft/dotnet:2.1-aspnetcore-runtime
as base.This is a usual pattern in the application metrics:
The surprising thing
When we connect remotely and take a memory dump (using
createdump
), suddently the memory drops... without the service stopping, restarting or loosing any connected user. See the green line in the picture.Note that there are two pods, showing the same behaviour, and then one (the green) drops suddenly in memory ussage due the taking of the memory dump.
The pods did not restart during the taking of the memory dump:
![enter image description here](https://camo.githubusercontent.com/f682c846b6412088ba05c05e0c089c45e3049b07ad3043d6122fb32462fc3ce4/68747470733a2f2f692e737461636b2e696d6775722e636f6d2f556a7959332e706e67)
No connection was lost or restarted.
Memory dump data
I cannot share the dump for security reasons, but here is some data:
And the result of
dumpheap -stat
: https://pastebin.com/ERN7LZ0nHeap:
Free objects:
Is there any explanation to this behaviour?
The text was updated successfully, but these errors were encountered: