allocator: Fix events watch leak #2215

aaronlehmann · 2017-06-05T18:30:08Z

The allocator starts a watch on events and never shuts it down. If this node loses leadership, events will continue to pile up in that queue forever, leading to memory exhaustion.

Fix this by giving the watch a well-defined lifecycle. Return it along with a cancel function from an init() method to make it clear it must be cancelled, instead of sticking the cancel function value in a struct where there is no clear responsibilty for calling it.

In the future, we might consider changing Watch to run a callback function, so shutting down the watch is mandatory (similar to what we do with store transactions).

cc @briantd @jakegdocker

The allocator starts a watch on events and never shuts it down. If this node loses leadership, events will continue to pile up in that queue forever, leading to memory exhaustion. Fix this by giving the watch a well-defined lifecycle. Return it along with a cancel function from an init() method to make it clear it must be cancelled, instead of sticking the cancel function value in a struct where there is no clear responsibilty for calling it. In the future, we might consider changing Watch to run a callback function, so shutting down the watch is mandatory (similar to what we do with store transactions). Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

codecov · 2017-06-05T18:41:20Z

Codecov Report

Merging #2215 into master will increase coverage by 0.01%.
The diff coverage is 88.23%.

@@            Coverage Diff             @@
##           master    #2215      +/-   ##
==========================================
+ Coverage   60.03%   60.05%   +0.01%     
==========================================
  Files         124      124              
  Lines       20115    20122       +7     
==========================================
+ Hits        12077    12085       +8     
+ Misses       6680     6674       -6     
- Partials     1358     1363       +5

rogaha · 2017-06-05T18:54:52Z

👍 great catch @briantd! Thanks for working on a patch for it @aaronlehmann!

ijc · 2017-06-07T08:50:01Z

LGTM.

cyli · 2017-06-08T01:28:57Z

LGTM, with the caveat that I don't know the allocator code very well :) So please excuse the dumb question: this looks like it's creating a watch per allocActor? Granted there is only ever one right now, so it doesn't matter at the moment - but I was sure why there's a for loop - is the intention that there will be more in the future? If so in the future would we have to condense the watches again so that there is only one no matter how many allocActors there are?

aaronlehmann · 2017-06-08T02:00:45Z

It didn't make sense to me that the watch used to be shared between all actors. That wouldn't have worked once we had multiple actors, because they would have competed for events, so I moved the watch call to be per-actor.

cyli · 2017-06-08T02:45:41Z

Ah ok, so it's not like a pool of workers who grabs the first event, it's more of a fanout to all actors?

aaronlehmann · 2017-06-08T03:02:34Z

Correct. Other actors would be handling things other than network allocation. It's a bit of a weird design and we would probably change it anyway if we introduced other kinds of allocators.

cyli · 2017-06-08T03:11:43Z

Ok, that makes sense, thanks for explaining! 👍

dongluochen · 2017-06-08T07:30:00Z

LGTM

ijc · 2017-06-08T10:27:57Z

It didn't make sense to me that the watch used to be shared between all actors.

This was something I tripped over in one of my PoC attempts to refactor some of this stuff for CNI networking on top of #1965, I can confirm that it does not work well at all...

- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>

aaronlehmann added priority/P0 process/cherry-pick labels Jun 5, 2017

aaronlehmann added this to the 17.06 milestone Jun 5, 2017

This was referenced Jun 5, 2017

Consider changing Watch to run a callback function #2216

Open

17.06.0 RC2 tracker docker/for-linux#2

Closed

andrewhsu mentioned this pull request Jun 6, 2017

17.06.0 RC3 tracker docker/for-linux#8

Closed

40 tasks

cyli mentioned this pull request Jun 8, 2017

[17.06] Re-vendor swarmkit docker-archive/docker-ce#43

Merged

3 tasks

cyli merged commit ba9fec7 into moby:master Jun 8, 2017

cyli added the process/cherry-picked label Jun 8, 2017

aaronlehmann removed the process/cherry-pick label Jun 12, 2017

aaronlehmann deleted the watch-leak branch June 12, 2017 23:57

silvin-lubecki pushed a commit to silvin-lubecki/docker-ce that referenced this pull request Feb 3, 2020

Re-vendor swarmkit to include the following fixes:

e7ebfdd

- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>

silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Feb 3, 2020

Re-vendor swarmkit to include the following fixes:

7a88eaf

- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>

silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 10, 2020

Re-vendor swarmkit to include the following fixes:

cf1a2b4

- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>

silvin-lubecki pushed a commit to silvin-lubecki/engine-extract that referenced this pull request Mar 23, 2020

Re-vendor swarmkit to include the following fixes:

f8ecde1

- moby/swarmkit#2218 - moby/swarmkit#2215 - moby/swarmkit#2233 Signed-off-by: Ying <ying.li@docker.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allocator: Fix events watch leak #2215

allocator: Fix events watch leak #2215

aaronlehmann commented Jun 5, 2017

codecov bot commented Jun 5, 2017

rogaha commented Jun 5, 2017

ijc commented Jun 7, 2017

cyli commented Jun 8, 2017

aaronlehmann commented Jun 8, 2017 via email

cyli commented Jun 8, 2017

aaronlehmann commented Jun 8, 2017 via email

cyli commented Jun 8, 2017

dongluochen commented Jun 8, 2017

ijc commented Jun 8, 2017

allocator: Fix events watch leak #2215

allocator: Fix events watch leak #2215

Conversation

aaronlehmann commented Jun 5, 2017

codecov bot commented Jun 5, 2017

Codecov Report

rogaha commented Jun 5, 2017

ijc commented Jun 7, 2017

cyli commented Jun 8, 2017

aaronlehmann commented Jun 8, 2017 via email

cyli commented Jun 8, 2017

aaronlehmann commented Jun 8, 2017 via email

cyli commented Jun 8, 2017

dongluochen commented Jun 8, 2017

ijc commented Jun 8, 2017