Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix nil pointer dereference in node allocation #2764

Merged
merged 1 commit into from
Oct 17, 2018

Conversation

dperny
Copy link
Collaborator

@dperny dperny commented Oct 17, 2018

- What I did

Fix a nil pointer dereference in the

- How I did it

https://www.youtube.com/watch?v=bLHL75H_VEM

When the network allocator starts, it performs two passes of allocation. The first, with existingAddressesOnly set to "true", simply re-allocates any already reserved addresses, which make the local driver state consistent with the state in swarmkit's object store. The second pass then performs any outstanding new allocations, from when the allocator last stopped.

Since #2725, nodes only have attachments allocated for them if they have a task currently scheduled which requires those networks. This happens after a task is allocated and scheduled.

Before this change, it was possible that, if a Task was correctly allocated, but the allocator stopped before the Node was also allocated, during the restore phase, an empty api.NetworkAttachment object was added to the Node's attachments. Then, in the new allocations phase, when trying to process all attachments, we were unconditionally looking at the NetworkAttachment object's Network field, which was nil. This caused a segfault and crash.

With this change, we no longer add these errant NetworkAttachment objects to nodes.

- How to test it

Includes an automated test, which I have verified fails before the change and passes after.

- Description for the changelog

Fix nil pointer dereference that could crash swarmkit.

When the network allocator starts, it performs two passes of allocation.
The first, with existingAddressesOnly set to "true", simply re-allocates
any already reserved addresses, which make the local driver state
consistent with the state in swarmkit's object store. The second pass
then performs any outstanding new allocations, from when the allocator
last stopped.

Since moby#2725, nodes only have attachments allocated for them if they have
a task currently scheduled which requires those networks. This happens
after a task is allocated and scheduled.

Before this change, it was possible that, if a Task was correctly
allocated, but the allocator stopped before the Node was also allocated,
during the restore phase, an empty api.NetworkAttachment object was
added to the Node's attachments. Then, in the new allocations phase,
when trying to process all attachments, we were unconditionally looking
at the NetworkAttachment object's Network field, which was nil. This
caused a segfault and crash.

With this change, we no longer add these errant NetworkAttachment
objects to nodes.

Signed-off-by: Drew Erny <drew.erny@docker.com>
@codecov
Copy link

codecov bot commented Oct 17, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@7d5d33b). Click here to learn what that means.
The diff coverage is 100%.

@@           Coverage Diff            @@
##             master   #2764   +/-   ##
========================================
  Coverage          ?   61.8%           
========================================
  Files             ?     134           
  Lines             ?   21867           
  Branches          ?       0           
========================================
  Hits              ?   13514           
  Misses            ?    6901           
  Partials          ?    1452

@mavenugo
Copy link
Contributor

@anshulpundir @dperny we need this to be backported to the 18.09 branch (if it is not created, I guess it must be created and cherry-picked ?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants