Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: Avoid assigning duplicate IPs during initialization #2237

Merged
merged 1 commit into from Jun 12, 2017

Conversation

@aaronlehmann
Copy link
Collaborator

commented Jun 10, 2017

When the allocator starts up, there is a pass that "allocates" the
existing tasks/nodes/services in the store. In fact, existing tasks are
typically already allocated, and this is mostly populating local state
to reflect which IPs are taken. However, if there are any tasks in the
store which are brand new, or previously failed to allocated, these will
actually receive new allocations.

The problem is that allocation of new IPs is interspersed with updating
local state with existing IPs. If a task, node, or service that needs an
IP is processed before one that claims a specific IP, the IP claimed by
the latter task be assigned.

This change makes the allocator do two passes on initialization. First
it handles objects that claim a specific IP, then it handles all other
objects.

cc @abhinandanpb @mavenugo @sanimej @ijc

@aaronlehmann

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 10, 2017

One open question: Do we need to do something similar for networks? I notice we store a DriverState field on networks, but I'm not sure if this includes IP addresses.

allocator: Avoid assigning duplicate IPs during initialization
When the allocator starts up, there is a pass that "allocates" the
existing tasks/nodes/services in the store. In fact, existing tasks are
typically already allocated, and this is mostly populating local state
to reflect which IPs are taken. However, if there are any tasks in the
store which are brand new, or previously failed to allocated, these will
actually receive new allocations.

The problem is that allocation of new IPs is interspersed with updating
local state with existing IPs. If a task, node, or service that needs an
IP is processed before one that claims a specific IP, the IP claimed by
the latter task be assigned.

This change makes the allocator do two passes on initialization. First
it handles objects that claim a specific IP, then it handles all other
objects.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

@aaronlehmann aaronlehmann force-pushed the aaronlehmann:duplicate-ips branch from 0064ad4 to 61483fb Jun 10, 2017

@aaronlehmann aaronlehmann referenced this pull request Jun 10, 2017
40 of 40 tasks complete
@aluzzardi

This comment has been minimized.

Copy link
Contributor

commented Jun 10, 2017

LGTM

Unrelated to this PR, but the complexity of this part is frightening, had a hard time understanding the implications :/

@@ -212,6 +213,7 @@ func TestAllocator(t *testing.T) {
go func() {
assert.NoError(t, a.Run(context.Background()))
}()
defer a.Stop()

This comment has been minimized.

Copy link
@ijc

ijc Jun 12, 2017

Contributor

This is an unrelated fix I think?

This comment has been minimized.

Copy link
@aaronlehmann

aaronlehmann Jun 12, 2017

Author Collaborator

Not a fix, just a cleanup to move a.Stop to a defer statement.

@ijc

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2017

I 100% agree WRT the complexity.

In this case although the diff looks pretty large and confusing the bulk of it is just factoring out allocateTasks and allocateServices and then arranging to call those and allocateNodes twice. IMO the refactoring is a good thing in its own right since that function is pretty big and tricky to follow.

So, LGTM. (I'm a little worried about the apparent non-determinism in the test case, implied by the looping 100 times, but I guess that is unavoidable for an issue like this).

@aaronlehmann

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 12, 2017

I didn't observe any non-determinism, but wanted to make sure the test case is aggressive enough to catch regressions. The problem triggers much more quickly when creating tasks in reverse numerical order, which is why the test does this. If I remember correctly, it only takes 3 tasks to trigger the problem. But in forward numerical order, it took about 52 tasks (since the allocator iterates over tasks in lexical order, which I suppose is close to numerical order). To avoid encoding assumptions about the iteration order, I decided to do 100 iterations. The test case runs instantly so I don't think there's any downside.

We could consider randomizing the task IDs, but I generally prefer to avoid nondeterminism in tests.

@mavenugo

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2017

Thanks @aaronlehmann @aluzzardi @ijc am not super familiar with this area of the code. But it does feel complicated.

BTW, should this be back ported to 17.03 as well ? I have seen multiple reports of duplicate IP usage in swarm-mode.

@aaronlehmann

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 12, 2017

BTW, should this be back ported to 17.03 as well ?

Yes, if there are plans for another 17.03 release, I think it should be. There are also other high-priority fixes that would make sense to backport. Please let me know if I should work on these backports.

@abhi

This comment has been minimized.

Copy link
Contributor

commented Jun 12, 2017

LGTM

@aaronlehmann aaronlehmann merged commit a4bf013 into docker:master Jun 12, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
dco-signed All commits are signed

@aaronlehmann aaronlehmann deleted the aaronlehmann:duplicate-ips branch Jun 12, 2017

@aaronlehmann aaronlehmann added this to the 17.06 milestone Jun 12, 2017

andrewhsu added a commit to andrewhsu/docker-ce that referenced this pull request Jun 12, 2017
revendor github.com/docker/swarmkit to ef3c57a
To get the changes:
* docker/swarmkit#2234
* docker/swarmkit#2237

Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
andrewhsu added a commit to andrewhsu/docker-ce that referenced this pull request Jun 13, 2017
revendor github.com/docker/swarmkit to 6083c76
To get the changes:
* docker/swarmkit#2234
* docker/swarmkit#2237
* docker/swarmkit#2238

Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.