New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Duplicate IP issues #2105

Merged
merged 3 commits into from Mar 9, 2018

Conversation

Projects
None yet
5 participants
@abhi
Member

abhi commented Mar 8, 2018

While investigating duplicate IP issue in IPAM library the following bugs were found:

  1. Race condition: IPAM library seems to assume the present of datastore to achieve atomicity. However IPAM library can be used without datastore , for eg swarmkit can use the ipam by setting the datastore to nil. The race condition causes corruption of the bitsequence which might lead to duplicate IPs by releasing non deallocated IP bits and also cause IP leaks due to certain bit being set even though it was not supposed to be allocated.
  2. Free bit logic: The logic in getAvailableBit checks for the presence of a free bit from a start bit in a block. However if the start bit is right after the last available bit, it will return the last bit in the block as the next available bit even though it is not free.
    For eg:
    Consider the block if curr:16 and sequence:0xfffeffff, count:10, next:{sequence:0x0fffffff, count:6} then the getAvailableBit(16) would return 31 as the next available bit and we will end up with duplicate IPs/vxlanids etc.
  3. Byte Offset Calculations
  4. Another Byte Offset Calculations (Regression): Similar to the same example in 2) we were moving on to the next block altogether instead of looking at the next instance in the same block if it is available.
  5. Concurrent Map Access

Tests are added to verify the above scenrios

Signed-off-by: Abhinandan Prativadi abhi@docker.com

@abhi abhi requested a review from fcrisciani Mar 8, 2018

@abhi abhi referenced this pull request Mar 8, 2018

Merged

IPAM tests #2104

@abhi abhi force-pushed the abhi:bitseq branch from 7d4b47e to 7a9c339 Mar 8, 2018

@abhi abhi requested a review from ddebroy Mar 8, 2018

abhi added some commits Mar 8, 2018

Fixing Duplicate IP issue in IPAM library
This commit contains fixes for duplicate IP with 3 issues addressed:
1) Race condition when datastore is not present in cases like swarmkit
2) Byte Offset calculation depending on where the start of the bit
   in the bitsequence is, the offset was adding more bytes to the offset
   when the start of the bit is in the middle of one of the instances in
   a block
3) Finding the available bit was returning the last bit in the curent instance in
   a block if the block is not full and the current bit is after the last
   available bit.

Signed-off-by: Abhinandan Prativadi <abhi@docker.com>
Fixing concurrent map access
This commit fixes panic due to concurrent map access

Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

@abhi abhi force-pushed the abhi:bitseq branch from 7a9c339 to 4b12122 Mar 8, 2018

@codecov-io

This comment has been minimized.

codecov-io commented Mar 8, 2018

Codecov Report

❗️ No coverage uploaded for pull request base (master@3aca383). Click here to learn what that means.
The diff coverage is 84.84%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2105   +/-   ##
=========================================
  Coverage          ?   40.43%           
=========================================
  Files             ?      139           
  Lines             ?    22376           
  Branches          ?        0           
=========================================
  Hits              ?     9047           
  Misses            ?    12000           
  Partials          ?     1329
Impacted Files Coverage Δ
ipam/allocator.go 70.84% <75%> (ø)
bitseq/sequence.go 83.33% <86.2%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3aca383...c6843ee. Read the comment docs.

@abhi abhi force-pushed the abhi:bitseq branch from 4b12122 to 233d58d Mar 8, 2018

@fcrisciani

This comment has been minimized.

Member

fcrisciani commented Mar 8, 2018

ping @aboch

@@ -498,24 +506,40 @@ func (h *Handle) UnmarshalJSON(data []byte) error {
func getFirstAvailable(head *sequence, start uint64) (uint64, uint64, error) {
// Find sequence which contains the start bit
byteStart, bitStart := ordinalToPos(start)

This comment has been minimized.

@fcrisciani

fcrisciani Mar 8, 2018

Member

as a reminder for myself, byteStart and bitStart here are relative to the first byte of the block where the ordinal start falls into

@@ -498,24 +506,40 @@ func (h *Handle) UnmarshalJSON(data []byte) error {
func getFirstAvailable(head *sequence, start uint64) (uint64, uint64, error) {
// Find sequence which contains the start bit
byteStart, bitStart := ordinalToPos(start)
current, _, _, inBlockBytePos := findSequence(head, byteStart)
current, _, precBlocks, inBlockBytePos := findSequence(head, byteStart)

This comment has been minimized.

@fcrisciani

fcrisciani Mar 8, 2018

Member

current is the pointer to the block where byteStart fall into
precBlocks, is the number of blocks within the same block where the byteStart fall into
example: head->0xFFFF00FF|10 and my byte start is 5 because my ordinal is 42, in this case precBlocks is 1, indicating that my ordinal falls in the second compressed block

goto next
}
if err != nil {
// There are some more instances of the same block, so add the offset

This comment has been minimized.

@fcrisciani

fcrisciani Mar 8, 2018

Member

this is only possible in the case of serial allocation, where the start can be with in the block itself

@fcrisciani

LGTM

@ddebroy

First set of comments. Still going through the actual RLE changes.

if store != nil {
h.Unlock() // The lock is acquired in the GetObject

This comment has been minimized.

@ddebroy

ddebroy Mar 9, 2018

Contributor

It appears store.GetObject will modify h (through the SetValue and SetIndex calls). If so, shouldn't we keep h locked while that is happening? Also, it appears the lock acquired in GetObject is a datastore lock rather than the object's lock. Is h.Unlock() needed and correct before the GetObject call invoked on h?

This comment has been minimized.

@abhi

abhi Mar 9, 2018

Member

The way the store logic works in libnetwork is , the in memory data structure gets overloaded by the store data. So there need not be lock going into the GetObjec

if store != nil {
h.Unlock() // The lock is acquired in the GetObject
if err := store.GetObject(datastore.Key(h.Key()...), h); err != nil && err != datastore.ErrKeyNotFound {

This comment has been minimized.

@ddebroy

ddebroy Mar 9, 2018

Contributor

h.unlock() will be needed at this point since we are returning. Potentially in a later patch, do you think the code can be restructured a little bit to just defer unlocks (instead of the unlock/lock sequences) after grabbing a lock on h before the for loop. I.e.

h.lock()
defer h.unlock()
for {
    .
    .
    .
}

This comment has been minimized.

@abhi

abhi Mar 9, 2018

Member

Yes I hate the sequence too. But because of the way we handle differently for swarmkit and native, its kind of needed like this. I will work on the refactor next.

if err := nh.writeToStore(); err != nil {
if _, ok := err.(types.RetryError); !ok {
return ret, fmt.Errorf("internal failure while setting the bit: %v", err)
if h.store != nil {

This comment has been minimized.

@ddebroy

ddebroy Mar 9, 2018

Contributor

Do we need to check h.store again at this point? It seems we already check earlier in the function. If we lock h outside the for early in the function, I think h.store should not become nil mid-way?

This comment has been minimized.

@abhi

abhi Mar 9, 2018

Member

This is to differentiate between a store logic and a non store logic codepath. I need to unlock before going to write store it will lead to deadlock.

This comment has been minimized.

@ddebroy

ddebroy Mar 9, 2018

Contributor

Got it regarding the logic to differentiate store/non-store codepaths.

For the locks, it looks like we are calling nh.writeToStore() below which will lock nh's mutex and not that of h, right?

This comment has been minimized.

@abhi

abhi Mar 9, 2018

Member

Yep thats right. All the more reason not to have a lock around the operation right ? I retained the logic needed for store code path the way it was.

IPAM and Bitseq test cases
This commit contains test cases to verify the changes and to
solidify the library.

Signed-off-by: Abhinandan Prativadi <abhi@docker.com>

@abhi abhi force-pushed the abhi:bitseq branch from 233d58d to c6843ee Mar 9, 2018

@abhi

This comment has been minimized.

@ddebroy

ddebroy approved these changes Mar 9, 2018

LGTM

@fcrisciani fcrisciani merged commit 834b391 into docker:master Mar 9, 2018

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
dco-signed All commits are signed
@alexafshar

This comment has been minimized.

alexafshar commented Apr 28, 2018

@abhi @fcrisciani
may the underlying cause of this dup ip issue (ipam race condition, free bit logic, bit offset calc ), also manifest itself in the form of "rouge ip" situation as well? i.e. a "rouge ip" gets assigned to a replica that is not routable. This ip might have been assigned before. cluster dns thinks its routable on a nslookup. However docker inspect on the service doesn't show it.

docker inspect $(docker ps | grep some-xyz-service | awk '{print $1}') | grep IPv4
IPv4Address: 10.1.107.85
IPv4Address: 10.3.37.251
IPv4Address: 10.1.107.74

then doing a nslookup from a running replica for some-xyz-service, shows an extra IP that does not show on a docker inspect of the service

docker exec -it b9c424e28055 nslookup some-xyz-service
;; Truncated, retrying in TCP mode.
Server:                   127.0.0.11
Address:                127.0.0.11#53

Non-authoritative answer:
Name:   some-xyz-service
Address: 10.1.107.85
Name:   some-xyz-service
**Address: 10.0.58.156**      <---------   not routable
Name:   some-xyz-service
Address: 10.3.37.251
Name:  some-xyz-service
Address: 10.1.107.74
@fcrisciani

This comment has been minimized.

Member

fcrisciani commented Apr 30, 2018

@alexafshar typically I would say no. But this fix will be effective on a network that is not already corrupted. so the suggestion is to check that the network does not show duplicates and if so then the best is to start with a new network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment