allocator: Less aggressive retry #2021

aaronlehmann · 2017-03-08T21:55:24Z

Instead of retrying unallocated tasks, services, and networks every time data changes in the store, limit these retries to every 5 minutes.

When a repeated attempt to allocate one of these objects fails, log it at the debug log level, to reduce noise in the logs.

cc @alexmavr @yongtang @aboch

codecov · 2017-03-08T22:06:08Z

Codecov Report

Merging #2021 into master will increase coverage by 0.05%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master    #2021      +/-   ##
==========================================
+ Coverage   53.66%   53.71%   +0.05%     
==========================================
  Files         109      109              
  Lines       18991    19008      +17     
==========================================
+ Hits        10191    10210      +19     
+ Misses       7578     7564      -14     
- Partials     1222     1234      +12

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e4762bc...513d028. Read the comment docs.

aboch · 2017-03-08T23:33:10Z

manager/allocator/network.go

@@ -401,12 +416,22 @@ func (a *Allocator) doNetworkAlloc(ctx context.Context, ev events.Event) {
 	case state.EventCreateNode, state.EventUpdateNode, state.EventDeleteNode:
 		a.doNodeAlloc(ctx, ev)
 	case state.EventCreateTask, state.EventUpdateTask, state.EventDeleteTask:
-		a.doTaskAlloc(ctx, ev)
+		a.doTaskAlloc(ctx, ev, nc.pendingTasks)


Couldn't doTaskAlloc(ctx,ev) retrieve pendingTasks on its own via ctx.nc.pendingTasks ?

aboch · 2017-03-08T23:40:59Z

manager/allocator/network.go

-func (a *Allocator) procUnallocatedTasksNetwork(ctx context.Context) {
-	nc := a.netCtx
-	allocatedTasks := make([]*api.Task, 0, len(nc.unallocatedTasks))
+func (a *Allocator) procTasksNetwork(ctx context.Context, toAllocate map[string]*api.Task, quiet bool) {


If working on the nc retrieved from the context is equivalent, would it make sense to write this method as

func (a *Allocator) procTasksNetwork(ctx context.Context, onRetryInterval bool) { nc := a.netCtx quiet := false toAllocate := nc.pendingTasks if onRetryInterval { toAllocate = nc.unallocatedTasks quiet = true } ...

aboch · 2017-03-08T23:50:15Z

Logic looks good to me.
I just have a couple of comments about the functions' prototype.

aaronlehmann · 2017-03-09T00:23:39Z

Updated, thanks

aboch · 2017-03-09T00:30:20Z

manager/allocator/network.go

-	allocatedTasks := make([]*api.Task, 0, len(nc.unallocatedTasks))
+	quiet := false
+	toAllocate := nc.pendingTasks
+	allocatedTasks := make([]*api.Task, 0, len(toAllocate))


This line should go below the if block, after which we know what toAllocate points to

Instead of retrying unallocated tasks, services, and networks every time data changes in the store, limit these retries to every 5 minutes. When a repeated attempt to allocate one of these objects fails, log it at the debug log level, to reduce noise in the logs. Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

aboch · 2017-03-09T01:37:26Z

Looks good to me

aluzzardi · 2017-03-09T19:25:09Z

Do we want to handle the potentially impossible case in which we don't get a commit?

e.g. we receive a commit (and turns out that it free'ed up an IP address), we're above the 5 minutes limit so we don't try, and no other commit comes after so we don't allocate the task.

aaronlehmann · 2017-03-09T19:27:36Z

I think that's a very good point. I had considered this but didn't want to add too much complexity, especially because I think this should be backported. Do you think it's a good idea to add a timer that triggers after 5 minutes if no commits happen during that interval?

aluzzardi · 2017-03-09T21:10:22Z

I think it's such a rare case that we may not need to bother ...

I guess it depends if the fix would be extremely tiny?

Can this simply be another switch case with a time.After?

aluzzardi · 2017-03-09T21:10:52Z

Or a timer that we reset every time we receive a commit

aluzzardi · 2017-03-09T21:11:41Z

Or maybe we shouldn't bother :)

This is going to be so rare that the code to handle this case this may be buggy and we'll never notice

aaronlehmann · 2017-03-09T21:24:20Z

Yeah, let's not bother. I liked the suggestion of adding a time.After in the select, but then I realized this would start a timer every time we receive an event, and that timer wouldn't be reaped until it fired 5 minutes later. It could be a big waste of resources. The right way to do it would be to use an actual time.Timer and reset it whenever there's activity, but that's easier to mess up, and I think you're right that it's a very unusual case. Even without addressing that case, this PR is still making things a lot better without adding too much risk.

aluzzardi · 2017-03-09T21:25:07Z

LGTM

aaronlehmann added the process/cherry-pick label Mar 8, 2017

aaronlehmann added this to the 17.03.1 milestone Mar 8, 2017

aboch reviewed Mar 8, 2017

View reviewed changes

aaronlehmann force-pushed the allocator-aggressive-retry branch from 6e78fc2 to 456c2ec Compare March 9, 2017 00:23

aboch reviewed Mar 9, 2017

View reviewed changes

aaronlehmann force-pushed the allocator-aggressive-retry branch from 456c2ec to 513d028 Compare March 9, 2017 01:33

aluzzardi merged commit e928827 into moby:master Mar 9, 2017

aaronlehmann added process/cherry-picked and removed process/cherry-pick labels Mar 10, 2017

This was referenced Mar 10, 2017

[17.03.x] Vendor swarmkit f93948c moby/moby#31742

Merged

bump to 17.03.1-rc1 moby/moby#31754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allocator: Less aggressive retry #2021

allocator: Less aggressive retry #2021

aaronlehmann commented Mar 8, 2017

codecov bot commented Mar 8, 2017 •

edited

Loading

aboch Mar 8, 2017

aboch Mar 8, 2017

aboch commented Mar 8, 2017

aaronlehmann commented Mar 9, 2017

aboch Mar 9, 2017

aaronlehmann Mar 9, 2017

aboch commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aaronlehmann commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aaronlehmann commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

allocator: Less aggressive retry #2021

allocator: Less aggressive retry #2021

Conversation

aaronlehmann commented Mar 8, 2017

codecov bot commented Mar 8, 2017 • edited Loading

Codecov Report

aboch Mar 8, 2017

Choose a reason for hiding this comment

aboch Mar 8, 2017

Choose a reason for hiding this comment

aboch commented Mar 8, 2017

aaronlehmann commented Mar 9, 2017

aboch Mar 9, 2017

Choose a reason for hiding this comment

aaronlehmann Mar 9, 2017

Choose a reason for hiding this comment

aboch commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aaronlehmann commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

aaronlehmann commented Mar 9, 2017

aluzzardi commented Mar 9, 2017

codecov bot commented Mar 8, 2017 •

edited

Loading