-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allocator: Fix events watch leak #2215
Conversation
The allocator starts a watch on events and never shuts it down. If this node loses leadership, events will continue to pile up in that queue forever, leading to memory exhaustion. Fix this by giving the watch a well-defined lifecycle. Return it along with a cancel function from an init() method to make it clear it must be cancelled, instead of sticking the cancel function value in a struct where there is no clear responsibilty for calling it. In the future, we might consider changing Watch to run a callback function, so shutting down the watch is mandatory (similar to what we do with store transactions). Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>
Codecov Report
@@ Coverage Diff @@
## master #2215 +/- ##
==========================================
+ Coverage 60.03% 60.05% +0.01%
==========================================
Files 124 124
Lines 20115 20122 +7
==========================================
+ Hits 12077 12085 +8
+ Misses 6680 6674 -6
- Partials 1358 1363 +5 |
👍 great catch @briantd! Thanks for working on a patch for it @aaronlehmann! |
LGTM. |
LGTM, with the caveat that I don't know the allocator code very well :) So please excuse the dumb question: this looks like it's creating a watch per |
It didn't make sense to me that the watch used to be shared between all actors. That wouldn't have worked once we had multiple actors, because they would have competed for events, so I moved the watch call to be per-actor.
|
Ah ok, so it's not like a pool of workers who grabs the first event, it's more of a fanout to all actors? |
Correct. Other actors would be handling things other than network allocation. It's a bit of a weird design and we would probably change it anyway if we introduced other kinds of allocators.
|
Ok, that makes sense, thanks for explaining! 👍 |
LGTM |
This was something I tripped over in one of my PoC attempts to refactor some of this stuff for CNI networking on top of #1965, I can confirm that it does not work well at all... |
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>
The allocator starts a watch on events and never shuts it down. If this node loses leadership, events will continue to pile up in that queue forever, leading to memory exhaustion.
Fix this by giving the watch a well-defined lifecycle. Return it along with a cancel function from an init() method to make it clear it must be cancelled, instead of sticking the cancel function value in a struct where there is no clear responsibilty for calling it.
In the future, we might consider changing
Watch
to run a callback function, so shutting down the watch is mandatory (similar to what we do with store transactions).cc @briantd @jakegdocker