Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop unused Artery outbound streams #23967

Closed
patriknw opened this issue Nov 12, 2017 · 4 comments
Closed

Stop unused Artery outbound streams #23967

patriknw opened this issue Nov 12, 2017 · 4 comments
Assignees
Milestone

Comments

@patriknw
Copy link
Member

patriknw commented Nov 12, 2017

As noticed by @mboogerd it's an oversight that we don't stop outbound streams when they are not used. They are stopped when quarantined (incl when cluster member is removed), but it would be good to stop unused streams to reduce resource consumption.

If there are unacknowledged system messages the control stream should not be stopped until after a long timeout (see existing give-up-system-message-after config).

Slightly related to #21359

@patriknw patriknw added 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted t:remoting:artery labels Nov 12, 2017
@mboogerd
Copy link

@patriknw I'm willing to help out, but I am facing some difficulties finding the right approach due to my unfamiliarity with Remoting/Artery.

  1. Testability. For now I'm a little clueless how to test this. We are fixing something that is inherently very internal to Arterty (we can't detect from the outside that it is continuing idling). I can't find existing tests that would serve as a good template for setting up Artery such that it allows internal-state (AssociationRegistry) inspection. Do you have suggestions towards an approach?

  2. I'm divided between implementing an association-wide-idle-timer and a per-channel-idle-timer. The former could involve an idle-flag per outbound-stream; once their conjunction is true, all streams complete. Alternatively, each channel can individually close when idle, but then also needs to be dynamically reopened (within the same Association) when re-used. The Association can be removed from the AssociationRegistry once Association.streamsCompleted completes. Do you have preferences?

  3. A related issue is that the documentation of Association.streamsCompleted states that it's only reliable during shutdown. Also, if runOutboundOrdinaryMessagesStream is invoked with configuration value "outbound-lanes" exceeding 1, it looks like those streams won't have their materialized value registered in streamMatValues. What would be required to get a reliable check for shutdown?

@patriknw
Copy link
Member Author

Thanks a lot for offering your help, @mboogerd . This is very deep internal into Artery and I have to think hard myself of how to solve this, and when I will have done that I have probably implemented most of it (at least in my head). Therefore I suggest that I will take this ticket and you can help out with reviewing and testing with real application. Do you have an urgent deadline?

@mboogerd
Copy link

That sounds good to me @patriknw. My 'real application' is very much WIP and even once released won't have 1000s of clients instantly, so no urgent deadlines. I'll be happy to review and test, just ping me when you're ready.

@patriknw
Copy link
Member Author

ok, good. I can't make any promises when I can start on this, but it's something I want to fix independent of your request to make Artery ready for prime time.

@patriknw patriknw self-assigned this Nov 14, 2017
@patriknw patriknw added 2 - pick next Used to mark issues which are next up in the queue to be worked on. The tag is non-binding 3 - in progress Someone is working on this ticket and removed 1 - triaged Tickets that are safe to pick up for contributing in terms of likeliness of being accepted labels Nov 14, 2017
patriknw added a commit that referenced this issue Jan 22, 2018
hardening, and fix memory leak in SystemMessageDelivery
patriknw added a commit that referenced this issue Jan 30, 2018
hardening, and fix memory leak in SystemMessageDelivery
patriknw added a commit that referenced this issue Jan 30, 2018
* make sure compressions for quarantined are removed in case they
  are lingering around
* also means that advertise will not be done for quarantined
* remove tombstone in InboundCompressions
patriknw added a commit that referenced this issue Feb 14, 2018
hardening, and fix memory leak in SystemMessageDelivery
patriknw added a commit that referenced this issue Feb 14, 2018
* make sure compressions for quarantined are removed in case they
  are lingering around
* also means that advertise will not be done for quarantined
* remove tombstone in InboundCompressions
patriknw added a commit that referenced this issue Feb 16, 2018
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
patriknw added a commit that referenced this issue Feb 16, 2018
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
patriknw added a commit that referenced this issue Feb 20, 2018
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
patriknw added a commit that referenced this issue Feb 21, 2018
* fix memory leak in SystemMessageDelivery
* initial set of tests for idle outbound associations, credit to mboogerd
* close inbound compression when quarantined, #23967
  * make sure compressions for quarantined are removed in case they are lingering around
  * also means that advertise will not be done for quarantined
  * remove tombstone in InboundCompressions
* simplify async callbacks by using invokeWithFeedback
* compression for old incarnation, #24400
  * it was fixed by the other previous changes
  * also confirmed by running the SimpleClusterApp with TCP
    as described in the ticket
* test with tcp and tls-tcp transport
  * handle the stop signals differently for tcp transport because they
    are converted to StreamTcpException
* cancel timers on shutdown
* share the top-level FR for all Association instances
* use linked queue for control and large streams, less memory usage
* remove quarantined idle Association completely after a configured delay
  * note that shallow Association instances may still lingering in the
    heap because of cached references from RemoteActorRef, which may
    be cached by LruBoundedCache (used by resolve actor ref).
    Those are small, since the queues have been removed, and the cache
    is bounded.
patriknw added a commit that referenced this issue Feb 21, 2018
@patriknw patriknw removed 2 - pick next Used to mark issues which are next up in the queue to be worked on. The tag is non-binding 3 - in progress Someone is working on this ticket labels Feb 21, 2018
@patriknw patriknw added this to the 2.5.10 milestone Feb 21, 2018
@ktoso ktoso removed the 3 - in progress Someone is working on this ticket label Feb 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants