New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop unused Artery outbound streams #23967

Closed
patriknw opened this Issue Nov 12, 2017 · 4 comments

Comments

Projects
None yet
3 participants
@patriknw
Member

patriknw commented Nov 12, 2017

As noticed by @mboogerd it's an oversight that we don't stop outbound streams when they are not used. They are stopped when quarantined (incl when cluster member is removed), but it would be good to stop unused streams to reduce resource consumption.

If there are unacknowledged system messages the control stream should not be stopped until after a long timeout (see existing give-up-system-message-after config).

Slightly related to #21359

@mboogerd

This comment has been minimized.

Show comment
Hide comment
@mboogerd

mboogerd Nov 14, 2017

@patriknw I'm willing to help out, but I am facing some difficulties finding the right approach due to my unfamiliarity with Remoting/Artery.

  1. Testability. For now I'm a little clueless how to test this. We are fixing something that is inherently very internal to Arterty (we can't detect from the outside that it is continuing idling). I can't find existing tests that would serve as a good template for setting up Artery such that it allows internal-state (AssociationRegistry) inspection. Do you have suggestions towards an approach?

  2. I'm divided between implementing an association-wide-idle-timer and a per-channel-idle-timer. The former could involve an idle-flag per outbound-stream; once their conjunction is true, all streams complete. Alternatively, each channel can individually close when idle, but then also needs to be dynamically reopened (within the same Association) when re-used. The Association can be removed from the AssociationRegistry once Association.streamsCompleted completes. Do you have preferences?

  3. A related issue is that the documentation of Association.streamsCompleted states that it's only reliable during shutdown. Also, if runOutboundOrdinaryMessagesStream is invoked with configuration value "outbound-lanes" exceeding 1, it looks like those streams won't have their materialized value registered in streamMatValues. What would be required to get a reliable check for shutdown?

mboogerd commented Nov 14, 2017

@patriknw I'm willing to help out, but I am facing some difficulties finding the right approach due to my unfamiliarity with Remoting/Artery.

  1. Testability. For now I'm a little clueless how to test this. We are fixing something that is inherently very internal to Arterty (we can't detect from the outside that it is continuing idling). I can't find existing tests that would serve as a good template for setting up Artery such that it allows internal-state (AssociationRegistry) inspection. Do you have suggestions towards an approach?

  2. I'm divided between implementing an association-wide-idle-timer and a per-channel-idle-timer. The former could involve an idle-flag per outbound-stream; once their conjunction is true, all streams complete. Alternatively, each channel can individually close when idle, but then also needs to be dynamically reopened (within the same Association) when re-used. The Association can be removed from the AssociationRegistry once Association.streamsCompleted completes. Do you have preferences?

  3. A related issue is that the documentation of Association.streamsCompleted states that it's only reliable during shutdown. Also, if runOutboundOrdinaryMessagesStream is invoked with configuration value "outbound-lanes" exceeding 1, it looks like those streams won't have their materialized value registered in streamMatValues. What would be required to get a reliable check for shutdown?

@patriknw

This comment has been minimized.

Show comment
Hide comment
@patriknw

patriknw Nov 14, 2017

Member

Thanks a lot for offering your help, @mboogerd . This is very deep internal into Artery and I have to think hard myself of how to solve this, and when I will have done that I have probably implemented most of it (at least in my head). Therefore I suggest that I will take this ticket and you can help out with reviewing and testing with real application. Do you have an urgent deadline?

Member

patriknw commented Nov 14, 2017

Thanks a lot for offering your help, @mboogerd . This is very deep internal into Artery and I have to think hard myself of how to solve this, and when I will have done that I have probably implemented most of it (at least in my head). Therefore I suggest that I will take this ticket and you can help out with reviewing and testing with real application. Do you have an urgent deadline?

@mboogerd

This comment has been minimized.

Show comment
Hide comment
@mboogerd

mboogerd Nov 14, 2017

That sounds good to me @patriknw. My 'real application' is very much WIP and even once released won't have 1000s of clients instantly, so no urgent deadlines. I'll be happy to review and test, just ping me when you're ready.

mboogerd commented Nov 14, 2017

That sounds good to me @patriknw. My 'real application' is very much WIP and even once released won't have 1000s of clients instantly, so no urgent deadlines. I'll be happy to review and test, just ping me when you're ready.

@patriknw

This comment has been minimized.

Show comment
Hide comment
@patriknw

patriknw Nov 14, 2017

Member

ok, good. I can't make any promises when I can start on this, but it's something I want to fix independent of your request to make Artery ready for prime time.

Member

patriknw commented Nov 14, 2017

ok, good. I can't make any promises when I can start on this, but it's something I want to fix independent of your request to make Artery ready for prime time.

@patriknw patriknw self-assigned this Nov 14, 2017

patriknw added a commit that referenced this issue Nov 20, 2017

@patriknw patriknw referenced this issue Nov 20, 2017

Merged

Stop unused Artery outbound streams, #23967 #24023

5 of 5 tasks complete

patriknw added a commit that referenced this issue Jan 22, 2018

Stop unused Artery outbound streams, #23967
hardening, and fix memory leak in SystemMessageDelivery

patriknw added a commit that referenced this issue Jan 22, 2018

patriknw added a commit that referenced this issue Jan 30, 2018

Stop unused Artery outbound streams, #23967
hardening, and fix memory leak in SystemMessageDelivery

patriknw added a commit that referenced this issue Jan 30, 2018

Close inbound compression when quarantined, #23967
* make sure compressions for quarantined are removed in case they
  are lingering around
* also means that advertise will not be done for quarantined
* remove tombstone in InboundCompressions

patriknw added a commit that referenced this issue Feb 14, 2018

Stop unused Artery outbound streams, #23967
hardening, and fix memory leak in SystemMessageDelivery

patriknw added a commit that referenced this issue Feb 14, 2018

Close inbound compression when quarantined, #23967
* make sure compressions for quarantined are removed in case they
  are lingering around
* also means that advertise will not be done for quarantined
* remove tombstone in InboundCompressions

patriknw added a commit that referenced this issue Feb 16, 2018

Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.

patriknw added a commit that referenced this issue Feb 16, 2018

Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.

patriknw added a commit that referenced this issue Feb 20, 2018

Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.

patriknw added a commit that referenced this issue Feb 21, 2018

Stop unused Artery outbound streams, #23967
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.

patriknw added a commit that referenced this issue Feb 21, 2018

Merge pull request #24023 from akka/wip-23967-stop-idle-patriknw
Stop unused Artery outbound streams, #23967

@patriknw patriknw closed this Feb 21, 2018

@patriknw patriknw added this to the 2.5.10 milestone Feb 21, 2018

@ktoso ktoso removed the 3 - in progress label Feb 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment