New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: Avoid assigning duplicate IPs during initialization #2237

Merged
merged 1 commit into from Jun 12, 2017

Conversation

Projects
None yet
5 participants
@aaronlehmann
Collaborator

aaronlehmann commented Jun 10, 2017

When the allocator starts up, there is a pass that "allocates" the
existing tasks/nodes/services in the store. In fact, existing tasks are
typically already allocated, and this is mostly populating local state
to reflect which IPs are taken. However, if there are any tasks in the
store which are brand new, or previously failed to allocated, these will
actually receive new allocations.

The problem is that allocation of new IPs is interspersed with updating
local state with existing IPs. If a task, node, or service that needs an
IP is processed before one that claims a specific IP, the IP claimed by
the latter task be assigned.

This change makes the allocator do two passes on initialization. First
it handles objects that claim a specific IP, then it handles all other
objects.

cc @abhinandanpb @mavenugo @sanimej @ijc

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jun 10, 2017

Collaborator

One open question: Do we need to do something similar for networks? I notice we store a DriverState field on networks, but I'm not sure if this includes IP addresses.

Collaborator

aaronlehmann commented Jun 10, 2017

One open question: Do we need to do something similar for networks? I notice we store a DriverState field on networks, but I'm not sure if this includes IP addresses.

allocator: Avoid assigning duplicate IPs during initialization
When the allocator starts up, there is a pass that "allocates" the
existing tasks/nodes/services in the store. In fact, existing tasks are
typically already allocated, and this is mostly populating local state
to reflect which IPs are taken. However, if there are any tasks in the
store which are brand new, or previously failed to allocated, these will
actually receive new allocations.

The problem is that allocation of new IPs is interspersed with updating
local state with existing IPs. If a task, node, or service that needs an
IP is processed before one that claims a specific IP, the IP claimed by
the latter task be assigned.

This change makes the allocator do two passes on initialization. First
it handles objects that claim a specific IP, then it handles all other
objects.

Signed-off-by: Aaron Lehmann <aaron.lehmann@docker.com>

@aaronlehmann aaronlehmann referenced this pull request Jun 10, 2017

Closed

17.06.0 RC3 tracker #8

40 of 40 tasks complete
@aluzzardi

This comment has been minimized.

Show comment
Hide comment
@aluzzardi

aluzzardi Jun 10, 2017

Contributor

LGTM

Unrelated to this PR, but the complexity of this part is frightening, had a hard time understanding the implications :/

Contributor

aluzzardi commented Jun 10, 2017

LGTM

Unrelated to this PR, but the complexity of this part is frightening, had a hard time understanding the implications :/

@@ -212,6 +213,7 @@ func TestAllocator(t *testing.T) {
go func() {
assert.NoError(t, a.Run(context.Background()))
}()
defer a.Stop()

This comment has been minimized.

@ijc

ijc Jun 12, 2017

Contributor

This is an unrelated fix I think?

@ijc

ijc Jun 12, 2017

Contributor

This is an unrelated fix I think?

This comment has been minimized.

@aaronlehmann

aaronlehmann Jun 12, 2017

Collaborator

Not a fix, just a cleanup to move a.Stop to a defer statement.

@aaronlehmann

aaronlehmann Jun 12, 2017

Collaborator

Not a fix, just a cleanup to move a.Stop to a defer statement.

@ijc

This comment has been minimized.

Show comment
Hide comment
@ijc

ijc Jun 12, 2017

Contributor

I 100% agree WRT the complexity.

In this case although the diff looks pretty large and confusing the bulk of it is just factoring out allocateTasks and allocateServices and then arranging to call those and allocateNodes twice. IMO the refactoring is a good thing in its own right since that function is pretty big and tricky to follow.

So, LGTM. (I'm a little worried about the apparent non-determinism in the test case, implied by the looping 100 times, but I guess that is unavoidable for an issue like this).

Contributor

ijc commented Jun 12, 2017

I 100% agree WRT the complexity.

In this case although the diff looks pretty large and confusing the bulk of it is just factoring out allocateTasks and allocateServices and then arranging to call those and allocateNodes twice. IMO the refactoring is a good thing in its own right since that function is pretty big and tricky to follow.

So, LGTM. (I'm a little worried about the apparent non-determinism in the test case, implied by the looping 100 times, but I guess that is unavoidable for an issue like this).

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jun 12, 2017

Collaborator

I didn't observe any non-determinism, but wanted to make sure the test case is aggressive enough to catch regressions. The problem triggers much more quickly when creating tasks in reverse numerical order, which is why the test does this. If I remember correctly, it only takes 3 tasks to trigger the problem. But in forward numerical order, it took about 52 tasks (since the allocator iterates over tasks in lexical order, which I suppose is close to numerical order). To avoid encoding assumptions about the iteration order, I decided to do 100 iterations. The test case runs instantly so I don't think there's any downside.

We could consider randomizing the task IDs, but I generally prefer to avoid nondeterminism in tests.

Collaborator

aaronlehmann commented Jun 12, 2017

I didn't observe any non-determinism, but wanted to make sure the test case is aggressive enough to catch regressions. The problem triggers much more quickly when creating tasks in reverse numerical order, which is why the test does this. If I remember correctly, it only takes 3 tasks to trigger the problem. But in forward numerical order, it took about 52 tasks (since the allocator iterates over tasks in lexical order, which I suppose is close to numerical order). To avoid encoding assumptions about the iteration order, I decided to do 100 iterations. The test case runs instantly so I don't think there's any downside.

We could consider randomizing the task IDs, but I generally prefer to avoid nondeterminism in tests.

@mavenugo

This comment has been minimized.

Show comment
Hide comment
@mavenugo

mavenugo Jun 12, 2017

Contributor

Thanks @aaronlehmann @aluzzardi @ijc am not super familiar with this area of the code. But it does feel complicated.

BTW, should this be back ported to 17.03 as well ? I have seen multiple reports of duplicate IP usage in swarm-mode.

Contributor

mavenugo commented Jun 12, 2017

Thanks @aaronlehmann @aluzzardi @ijc am not super familiar with this area of the code. But it does feel complicated.

BTW, should this be back ported to 17.03 as well ? I have seen multiple reports of duplicate IP usage in swarm-mode.

@aaronlehmann

This comment has been minimized.

Show comment
Hide comment
@aaronlehmann

aaronlehmann Jun 12, 2017

Collaborator

BTW, should this be back ported to 17.03 as well ?

Yes, if there are plans for another 17.03 release, I think it should be. There are also other high-priority fixes that would make sense to backport. Please let me know if I should work on these backports.

Collaborator

aaronlehmann commented Jun 12, 2017

BTW, should this be back ported to 17.03 as well ?

Yes, if there are plans for another 17.03 release, I think it should be. There are also other high-priority fixes that would make sense to backport. Please let me know if I should work on these backports.

@abhi

This comment has been minimized.

Show comment
Hide comment
@abhi

abhi Jun 12, 2017

Member

LGTM

Member

abhi commented Jun 12, 2017

LGTM

@aaronlehmann aaronlehmann merged commit a4bf013 into docker:master Jun 12, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
dco-signed All commits are signed

@aaronlehmann aaronlehmann deleted the aaronlehmann:duplicate-ips branch Jun 12, 2017

@aaronlehmann aaronlehmann added this to the 17.06 milestone Jun 12, 2017

andrewhsu added a commit to andrewhsu/docker-ce that referenced this pull request Jun 12, 2017

revendor github.com/docker/swarmkit to ef3c57a
To get the changes:
* docker/swarmkit#2234
* docker/swarmkit#2237

Signed-off-by: Andrew Hsu <andrewhsu@docker.com>

andrewhsu added a commit to andrewhsu/docker-ce that referenced this pull request Jun 13, 2017

revendor github.com/docker/swarmkit to 6083c76
To get the changes:
* docker/swarmkit#2234
* docker/swarmkit#2237
* docker/swarmkit#2238

Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment