Stop unused Artery outbound streams #23967

patriknw · 2017-11-12T17:22:46Z

As noticed by @mboogerd it's an oversight that we don't stop outbound streams when they are not used. They are stopped when quarantined (incl when cluster member is removed), but it would be good to stop unused streams to reduce resource consumption.

If there are unacknowledged system messages the control stream should not be stopped until after a long timeout (see existing give-up-system-message-after config).

Slightly related to #21359

The text was updated successfully, but these errors were encountered:

mboogerd · 2017-11-14T07:37:51Z

@patriknw I'm willing to help out, but I am facing some difficulties finding the right approach due to my unfamiliarity with Remoting/Artery.

Testability. For now I'm a little clueless how to test this. We are fixing something that is inherently very internal to Arterty (we can't detect from the outside that it is continuing idling). I can't find existing tests that would serve as a good template for setting up Artery such that it allows internal-state (AssociationRegistry) inspection. Do you have suggestions towards an approach?
I'm divided between implementing an association-wide-idle-timer and a per-channel-idle-timer. The former could involve an idle-flag per outbound-stream; once their conjunction is true, all streams complete. Alternatively, each channel can individually close when idle, but then also needs to be dynamically reopened (within the same Association) when re-used. The Association can be removed from the AssociationRegistry once Association.streamsCompleted completes. Do you have preferences?
A related issue is that the documentation of Association.streamsCompleted states that it's only reliable during shutdown. Also, if runOutboundOrdinaryMessagesStream is invoked with configuration value "outbound-lanes" exceeding 1, it looks like those streams won't have their materialized value registered in streamMatValues. What would be required to get a reliable check for shutdown?

patriknw · 2017-11-14T07:48:53Z

Thanks a lot for offering your help, @mboogerd . This is very deep internal into Artery and I have to think hard myself of how to solve this, and when I will have done that I have probably implemented most of it (at least in my head). Therefore I suggest that I will take this ticket and you can help out with reviewing and testing with real application. Do you have an urgent deadline?

mboogerd · 2017-11-14T11:19:04Z

That sounds good to me @patriknw. My 'real application' is very much WIP and even once released won't have 1000s of clients instantly, so no urgent deadlines. I'll be happy to review and test, just ping me when you're ready.

patriknw · 2017-11-14T11:34:32Z

ok, good. I can't make any promises when I can start on this, but it's something I want to fix independent of your request to make Artery ready for prime time.

hardening, and fix memory leak in SystemMessageDelivery

* make sure compressions for quarantined are removed in case they are lingering around * also means that advertise will not be done for quarantined * remove tombstone in InboundCompressions

hardening, and fix memory leak in SystemMessageDelivery

* make sure compressions for quarantined are removed in case they are lingering around * also means that advertise will not be done for quarantined * remove tombstone in InboundCompressions

* fix memory leak in SystemMessageDelivery * initial set of tests for idle outbound associations, credit to mboogerd * close inbound compression when quarantined, #23967 * make sure compressions for quarantined are removed in case they are lingering around * also means that advertise will not be done for quarantined * remove tombstone in InboundCompressions * simplify async callbacks by using invokeWithFeedback * compression for old incarnation, #24400 * it was fixed by the other previous changes * also confirmed by running the SimpleClusterApp with TCP as described in the ticket * test with tcp and tls-tcp transport * handle the stop signals differently for tcp transport because they are converted to StreamTcpException * cancel timers on shutdown * share the top-level FR for all Association instances * use linked queue for control and large streams, less memory usage * remove quarantined idle Association completely after a configured delay * note that shallow Association instances may still lingering in the heap because of cached references from RemoteActorRef, which may be cached by LruBoundedCache (used by resolve actor ref). Those are small, since the queues have been removed, and the cache is bounded.

Stop unused Artery outbound streams, #23967

patriknw added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted t:remoting:artery labels Nov 12, 2017

patriknw self-assigned this Nov 14, 2017

patriknw added a commit that referenced this issue Nov 20, 2017

Stop unused Artery outbound streams, #23967

375167f

patriknw mentioned this issue Nov 20, 2017

Stop unused Artery outbound streams, #23967 #24023

Merged

5 tasks

patriknw added a commit that referenced this issue Jan 22, 2018

Stop unused Artery outbound streams, #23967

8f04784

hardening, and fix memory leak in SystemMessageDelivery

patriknw pushed a commit that referenced this issue Jan 22, 2018

Implemented initial set of tests for idle outbound associations, #23967

b7f8060

patriknw added a commit that referenced this issue Jan 30, 2018

Stop unused Artery outbound streams, #23967

0c2c013

hardening, and fix memory leak in SystemMessageDelivery

patriknw pushed a commit that referenced this issue Jan 30, 2018

Implemented initial set of tests for idle outbound associations, #23967

b87dc6d

patriknw added a commit that referenced this issue Feb 14, 2018

Stop unused Artery outbound streams, #23967

fee624f

hardening, and fix memory leak in SystemMessageDelivery

patriknw pushed a commit that referenced this issue Feb 14, 2018

Implemented initial set of tests for idle outbound associations, #23967

1831593

patriknw added a commit that referenced this issue Feb 21, 2018

Merge pull request #24023 from akka/wip-23967-stop-idle-patriknw

f070977

Stop unused Artery outbound streams, #23967

patriknw closed this as completed Feb 21, 2018

patriknw removed 2 - pick next Used to mark issues which are next up in the queue to be worked on. The tag is non-binding 3 - in progress Someone is working on this ticket labels Feb 21, 2018

patriknw added this to the 2.5.10 milestone Feb 21, 2018

ktoso removed the 3 - in progress Someone is working on this ticket label Feb 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop unused Artery outbound streams #23967

Stop unused Artery outbound streams #23967

patriknw commented Nov 12, 2017 •

edited

Loading

mboogerd commented Nov 14, 2017

patriknw commented Nov 14, 2017

mboogerd commented Nov 14, 2017

patriknw commented Nov 14, 2017

Stop unused Artery outbound streams #23967

Stop unused Artery outbound streams #23967

Comments

patriknw commented Nov 12, 2017 • edited Loading

mboogerd commented Nov 14, 2017

patriknw commented Nov 14, 2017

mboogerd commented Nov 14, 2017

patriknw commented Nov 14, 2017

patriknw commented Nov 12, 2017 •

edited

Loading