-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Artery memory leak #28390
Comments
@gakesson Thanks for reporting and describing how to reproduce. Will look into it. |
by the way, is it with tcp or aeron-upd transport? |
Thanks Patrik! |
Do you see anything more that could be interesting in the logs? Anything about quarantine? Could you share logs for I haven't been able to reproduce yet. It stays at 1 one instance of AssociationState. Tried on mac so using pfctl instead of iptables, but should be the same. |
It is not AssociationState instances that is leaking but the Promise$DefaultPromise member of the AssociationState. If you're unable to reproduce that too I will try to create a stand-alone application for it, replicating the usage of our production application. Please let me know! |
ok, I was looking for wrong thing then. I'll try some more tomorrow. |
@gakesson I would be better you share the heap dump with the team. |
@hepin1989 I agree but I'm not allowed to do that :-| |
@gakesson Yes , forgot that. A reproducer would be really helpful I think. |
I think I understand how this can happen. I haven't been able to reproduce with a "real" scenario so a stand-alone application would be very useful to ensure that we are fixing the right thing. I think the reason is that the handshake isn't completed, the stream restarted, and that builds up the callbacks on the uniqueRemoteAddressPromise, keeping references to the old incarnations of the streams. It looks like this: Does that match your heap dump if you drill down into the DefaultPromise? |
Good, I have an easy way to reproduce it now, so a stand-alone application might not be needed, but I'd like that you test with a snapshot when I have something ready. |
Sure, sounds good but I think the snapshot needs to be based on Akka 2.5.x (we just recently moved to 2.5.26, but 2.6.x will require us to do some code changes). |
* If the handshake doesn't complete the Promise in AssociationState was not completed and each new restarted stream added future callback to it OutboundHandshake stage. Those references are kept in the promise and therefore old OutboundHandshake (and probably entire stream) couldn't be garbage collected. * Using own notification mechanism to have control of listener deregistration from postStop instead of using Promise/Future. * Trying to create new Promise after failure/restart would be difficult due to that the same AssociationState Promise is accessed from several outbound streams, all possibly restarted individually. Therefore easier to cleanup from postStop.
@gakesson You can try the fix with Akka version It's on top of latest 2.5.27. If you want to review the changes you find them in PR #28407 |
@patriknw Sorry for the delay but I've been away on holiday. |
@patriknw Test has been running for 24h (previously the issue materialized after ~8h) and there is no build-up in the object retention. Heap dump also confirms that the memory leak is resolved. Thanks for the swift help with this issue! May I ask when the next 2.5.x is planned to be released (and if this correction will be incorporated into that release)? |
It will be included in upcoming 2.5.28 and 2.6.2 within a few weeks. You should be able to use the timestamped version until then. |
Thank you! |
* If the handshake doesn't complete the Promise in AssociationState was not completed and each new restarted stream added future callback to it OutboundHandshake stage. Those references are kept in the promise and therefore old OutboundHandshake (and probably entire stream) couldn't be garbage collected. * Using own notification mechanism to have control of listener deregistration from postStop instead of using Promise/Future. * Trying to create new Promise after failure/restart would be difficult due to that the same AssociationState Promise is accessed from several outbound streams, all possibly restarted individually. Therefore easier to cleanup from postStop.
* If the handshake doesn't complete the Promise in AssociationState was not completed and each new restarted stream added future callback to it OutboundHandshake stage. Those references are kept in the promise and therefore old OutboundHandshake (and probably entire stream) couldn't be garbage collected. * Using own notification mechanism to have control of listener deregistration from postStop instead of using Promise/Future. * Trying to create new Promise after failure/restart would be difficult due to that the same AssociationState Promise is accessed from several outbound streams, all possibly restarted individually. Therefore easier to cleanup from postStop. (cherry picked from commit 3b7e609)
* Fix memory leak of restarting Artery outbound stream, #28390 (#28407) * If the handshake doesn't complete the Promise in AssociationState was not completed and each new restarted stream added future callback to it OutboundHandshake stage. Those references are kept in the promise and therefore old OutboundHandshake (and probably entire stream) couldn't be garbage collected. * Using own notification mechanism to have control of listener deregistration from postStop instead of using Promise/Future. * Trying to create new Promise after failure/restart would be difficult due to that the same AssociationState Promise is accessed from several outbound streams, all possibly restarted individually. Therefore easier to cleanup from postStop. (cherry picked from commit 3b7e609) * move mima filter to 2.6.1.backwards.excludes
We have an application running on Akka 2.5.19 and we use Akka arterty (not cluster) to communicate between two remote actor systems.
We noticed a memory leak in Artery Association when we had an incorrect network routing setup between two actor systems (messages were dropped in one direction), which causes a scala collection in AssociationState to pile up and entries never get released. Eventually the application goes into a continous GC (CMS in our case), until restarted.
When trying to connect to the other actor system below error log is continously printed:
We have no specific remote artery configuration (except for security), i.e. use the default settings.
This issue is reproducible every time by having a host trying to communicate with a remote actor system:
Issue becomes visible after a few hours on a 384mb heap (260 old generation).
Attaching a snippet on the heapdump taken (3 different Association consuming >99% of retained heap).
The text was updated successfully, but these errors were encountered: