New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay fix for transient IP reuse #1935

Merged
merged 5 commits into from Oct 3, 2017

Conversation

Projects
None yet
10 participants
@fcrisciani
Member

fcrisciani commented Sep 7, 2017

It is possible that for a limited period of time the same IP and Mac address are used at the same time.
The previous logic was not able to handle the condition creating connectivity issues that were lasting up to 5 minutes.

Fixes: #1934

if err := d.checkEncryption(nid, nil, 0, true, false); err != nil {

This comment has been minimized.

@fcrisciani

fcrisciani Sep 7, 2017

Member

checkEncryption is done inside the peerDelete

@fcrisciani

fcrisciani Sep 7, 2017

Member

checkEncryption is done inside the peerDelete

@@ -1090,15 +1060,6 @@ func (n *network) contains(ip net.IP) bool {
return false
}
func (n *network) getSubnetforIPAddr(ip net.IP) *subnet {

This comment has been minimized.

@fcrisciani

fcrisciani Sep 7, 2017

Member

was unused

@fcrisciani

fcrisciani Sep 7, 2017

Member

was unused

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Sep 7, 2017

Codecov Report

❗️ No coverage uploaded for pull request base (master@389e1e6). Click here to learn what that means.
The diff coverage is 0.58%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1935   +/-   ##
=========================================
  Coverage          ?   37.88%           
=========================================
  Files             ?      137           
  Lines             ?    27325           
  Branches          ?        0           
=========================================
  Hits              ?    10351           
  Misses            ?    15703           
  Partials          ?     1271
Impacted Files Coverage Δ
drivers/overlay/encryption.go 0% <ø> (ø)
drivers/overlay/joinleave.go 0% <0%> (ø)
drivers/overlay/peerdb.go 3.81% <0%> (ø)
drivers/overlay/ov_serf.go 30.86% <0%> (ø)
drivers/overlay/ov_network.go 0.26% <0%> (ø)
networkdb/networkdb.go 60% <0%> (ø)
osl/neigh_linux.go 0% <0%> (ø)
drivers/overlay/overlay.go 27.52% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 389e1e6...d93b9b0. Read the comment docs.

codecov-io commented Sep 7, 2017

Codecov Report

❗️ No coverage uploaded for pull request base (master@389e1e6). Click here to learn what that means.
The diff coverage is 0.58%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1935   +/-   ##
=========================================
  Coverage          ?   37.88%           
=========================================
  Files             ?      137           
  Lines             ?    27325           
  Branches          ?        0           
=========================================
  Hits              ?    10351           
  Misses            ?    15703           
  Partials          ?     1271
Impacted Files Coverage Δ
drivers/overlay/encryption.go 0% <ø> (ø)
drivers/overlay/joinleave.go 0% <0%> (ø)
drivers/overlay/peerdb.go 3.81% <0%> (ø)
drivers/overlay/ov_serf.go 30.86% <0%> (ø)
drivers/overlay/ov_network.go 0.26% <0%> (ø)
networkdb/networkdb.go 60% <0%> (ø)
osl/neigh_linux.go 0% <0%> (ø)
drivers/overlay/overlay.go 27.52% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 389e1e6...d93b9b0. Read the comment docs.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Sep 18, 2017

Member

ping @mavenugo @abhinandanpb PTAL; this one should be fixing various stability issues

Member

thaJeztah commented Sep 18, 2017

ping @mavenugo @abhinandanpb PTAL; this one should be fixing various stability issues

@thaJeztah

Left some nits, and there's some code churn between commits, but would like @mavenugo or @abhinandanpb to have a look as they are probably more familiar with the code 😄

Show outdated Hide outdated drivers/overlay/joinleave.go Outdated
Show outdated Hide outdated drivers/overlay/joinleave.go Outdated
@@ -68,7 +68,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
ep.ifName = containerIfName
if err := d.writeEndpointToStore(ep); err != nil {
if err = d.writeEndpointToStore(ep); err != nil {

This comment has been minimized.

@thaJeztah

thaJeztah Sep 20, 2017

Member

Is there a reason you changed these? Overall I think it's clearer to just create a new variable here, as the error is not used outside the if. (It makes the code a bit easier to "grasp", because you know the result is not used outside of the if 😄)

@thaJeztah

thaJeztah Sep 20, 2017

Member

Is there a reason you changed these? Overall I think it's clearer to just create a new variable here, as the error is not used outside the if. (It makes the code a bit easier to "grasp", because you know the result is not used outside of the if 😄)

This comment has been minimized.

@fcrisciani

fcrisciani Sep 20, 2017

Member

I just changed it as a result of a warning from one of the code checker tools, I can change it back but I also saw that in some other places the proper outer err was never set, so shadowing is risky

@fcrisciani

fcrisciani Sep 20, 2017

Member

I just changed it as a result of a warning from one of the code checker tools, I can change it back but I also saw that in some other places the proper outer err was never set, so shadowing is risky

Show outdated Hide outdated osl/neigh_linux.go Outdated
Show outdated Hide outdated drivers/overlay/joinleave.go Outdated
Show outdated Hide outdated drivers/overlay/ov_network.go Outdated
Show outdated Hide outdated drivers/overlay/ov_network.go Outdated
Show outdated Hide outdated drivers/overlay/ov_network.go Outdated
if i != 1 {
// Transient case, there is more than one endpoint that is using the same IP,MAC pair
s, _ := pMap.mp.String(pKey.String())
logrus.Warnf("peerDbAdd transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s)

This comment has been minimized.

@thaJeztah

thaJeztah Sep 20, 2017

Member

Can you add a space after the colons here (for readability)?

@thaJeztah

thaJeztah Sep 20, 2017

Member

Can you add a space after the colons here (for readability)?

if i != 0 {
// Transient case, there is more than one endpoint that is using the same IP,MAC pair
s, _ := pMap.mp.String(pKey.String())
logrus.Warnf("peerDbDelete transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s)

This comment has been minimized.

@thaJeztah

thaJeztah Sep 20, 2017

Member

Can you add a space after the colons here (for readability)?

@thaJeztah

thaJeztah Sep 20, 2017

Member

Can you add a space after the colons here (for readability)?

This comment has been minimized.

@thaJeztah

thaJeztah Sep 20, 2017

Member

(same for other errors)

@thaJeztah

thaJeztah Sep 20, 2017

Member

(same for other errors)

@nishanttotla

Is this PR expected to be part of an upcoming release, or there's no timeline on it yet?

@fcrisciani

This comment has been minimized.

Show comment
Hide comment
@fcrisciani

fcrisciani Sep 25, 2017

Member

@nishanttotla I would like so, but still stuck on code review.
@mavenugo @abhinandanpb @vieux PTAL

Member

fcrisciani commented Sep 25, 2017

@nishanttotla I would like so, but still stuck on code review.
@mavenugo @abhinandanpb @vieux PTAL

@abhi

This comment has been minimized.

Show comment
Hide comment
@abhi

abhi Sep 25, 2017

Member

@fcrisciani will review this today

Member

abhi commented Sep 25, 2017

@fcrisciani will review this today

@vieux

This comment has been minimized.

Show comment
Hide comment
@vieux

vieux Sep 26, 2017

Contributor

@abhinandanpb ping :)

Contributor

vieux commented Sep 26, 2017

@abhinandanpb ping :)

Show outdated Hide outdated drivers/overlay/ov_network.go Outdated
// If there is still an entry into the database and the deletion went through without errors means that there is now no
// configuration active in the kernel.
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one

This comment has been minimized.

@mavenugo

mavenugo Sep 27, 2017

Contributor

Its guaranteed to be one or more. Correct ?

@mavenugo

mavenugo Sep 27, 2017

Contributor

Its guaranteed to be one or more. Correct ?

This comment has been minimized.

@fcrisciani

fcrisciani Sep 27, 2017

Member

yep 4 lines above the if dbEntries == 0 { return nil }

@fcrisciani

fcrisciani Sep 27, 2017

Member

yep 4 lines above the if dbEntries == 0 { return nil }

@abhi

Overall LGTM. I have one concern about the scenario I have mentioned in the comment.

Alternate thought not necessary to be implemented: Should we treat this problem as a classic networking mac move scenario. If we detect a mac move , unicast arp for it ?

// If there is still an entry into the database and the deletion went through without errors means that there is now no
// configuration active in the kernel.
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one
peerKey, peerEntry, err := d.peerDbSearch(nid, peerIP)

This comment has been minimized.

@abhi

abhi Sep 27, 2017

Member

Will it have more than 1 entry for the same IP if there is continuous container add deletes ? From what I see of the internal map set implementation its a map of maps. https://github.com/deckarep/golang-set/blob/master/threadunsafe.go#L36. If thats the case the order will not be honored. Would that be a problem ? How is the peerDbdelete going to handle that ? For eg lets have a node n100,
C1 is on n1 , fdb on n100 will have c1 against 10.1.1.1, n1
C1 appears on n2 , fdb will have c1 still against 10.1.1.1 on n1 , peerdb mapset cache will have this new entry until the previous entry is deleted
now C1 cppears on n3 , fdb might still have the transient state against 10.1.1.1 on n1 so the peerdb cache will have 2 entries.
At this point we will receive the delete for the entry against n1. Now the peerdb search will fetch one of them ? For our scenario lets say we pick the entry for C1 pointing to n3. Now if we get a delete for the entry C1 pointing to n2 - how would that proceed ?

@abhi

abhi Sep 27, 2017

Member

Will it have more than 1 entry for the same IP if there is continuous container add deletes ? From what I see of the internal map set implementation its a map of maps. https://github.com/deckarep/golang-set/blob/master/threadunsafe.go#L36. If thats the case the order will not be honored. Would that be a problem ? How is the peerDbdelete going to handle that ? For eg lets have a node n100,
C1 is on n1 , fdb on n100 will have c1 against 10.1.1.1, n1
C1 appears on n2 , fdb will have c1 still against 10.1.1.1 on n1 , peerdb mapset cache will have this new entry until the previous entry is deleted
now C1 cppears on n3 , fdb might still have the transient state against 10.1.1.1 on n1 so the peerdb cache will have 2 entries.
At this point we will receive the delete for the entry against n1. Now the peerdb search will fetch one of them ? For our scenario lets say we pick the entry for C1 pointing to n3. Now if we get a delete for the entry C1 pointing to n2 - how would that proceed ?

This comment has been minimized.

@fcrisciani

fcrisciani Sep 27, 2017

Member

Of course we are considering a transient period that should converge "pretty quickly" (according to the timing of the networkdb in delivering notification). The idea here is:

  1. push in the kernel the first configuration
  2. if any other notification arrives for the same IP, save it in the Set
  3. once the delete notification arrives if it deletes the current configuration in the kernel, take the next one available and push it to the kernel (note this can still be not the final one)
  4. the expected state at the end is to have only 1 entry in the set and that is the one that is also configured in the kernel.

For your example let's say that you receive the delete for the entry C1, n2, then nothing happens, remove it from the set and contine.
After that you receive the delete for C1, n100, that is the entry configured in the kernel, then the kernel state is going to be purged and the next entry is configured, in this case C1, n3.
The set now contains only 1 element that is actually the one configured and the only real incarnation of the container C1 with the correct peer destination

@fcrisciani

fcrisciani Sep 27, 2017

Member

Of course we are considering a transient period that should converge "pretty quickly" (according to the timing of the networkdb in delivering notification). The idea here is:

  1. push in the kernel the first configuration
  2. if any other notification arrives for the same IP, save it in the Set
  3. once the delete notification arrives if it deletes the current configuration in the kernel, take the next one available and push it to the kernel (note this can still be not the final one)
  4. the expected state at the end is to have only 1 entry in the set and that is the one that is also configured in the kernel.

For your example let's say that you receive the delete for the entry C1, n2, then nothing happens, remove it from the set and contine.
After that you receive the delete for C1, n100, that is the entry configured in the kernel, then the kernel state is going to be purged and the next entry is configured, in this case C1, n3.
The set now contains only 1 element that is actually the one configured and the only real incarnation of the container C1 with the correct peer destination

Show outdated Hide outdated drivers/overlay/peerdb.go Outdated
@mavenugo

@fcrisciani am done with my review. Pls address the comments before we can merge this PR.

fcrisciani added some commits Sep 5, 2017

Handle IP reuse in overlay
In case of IP reuse locally there was a race condition
that was leaving the overlay namespace with wrong configuration
causing connectivity issues.
This commit introduces the use of setMatrix to handle the transient
state and make sure that the proper configuration is maintained

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
flush peerdb entries on network delete
peerDB was never being flushed on network delete
leaveing behind stale entries

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
log for miss notification
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Addressing code review comments
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Changed ipMask to string
Avoid error logs in case of local peer case, there is no need for deleteNeighbor
Avoid the network leave to readvertise already deleted entries to upper layer

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Show outdated Hide outdated drivers/overlay/peerdb.go Outdated
@mavenugo

This comment has been minimized.

Show comment
Hide comment
@mavenugo

mavenugo Oct 3, 2017

Contributor

@fcrisciani thanks for addressing the comments.

Contributor

mavenugo commented Oct 3, 2017

@fcrisciani thanks for addressing the comments.

@mavenugo

LGTM

@mavenugo mavenugo merged commit 7447e54 into docker:master Oct 3, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
dco-signed All commits are signed

@fcrisciani fcrisciani deleted the fcrisciani:overlay-setmatrix branch Oct 3, 2017

@Nossnevs

This comment has been minimized.

Show comment
Hide comment
@Nossnevs

Nossnevs Oct 4, 2017

When will this fix be released?
@fcrisciani @mavenugo

Nossnevs commented Oct 4, 2017

When will this fix be released?
@fcrisciani @mavenugo

@kleptog

This comment has been minimized.

Show comment
Hide comment
@kleptog

kleptog Oct 9, 2017

And will it be backported to 17.03? We found 17.06 fairly flaky and 17.09 is pretty new.

kleptog commented Oct 9, 2017

And will it be backported to 17.03? We found 17.06 fairly flaky and 17.09 is pretty new.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Oct 9, 2017

Member

@kleptog Docker CE (Community Edition) 17.03 has reached end of life (Docker CE 17.06 soon); Docker EE (Enterprise Edition) has a longer support duration (12 Months currently), but I don't know if this is planned to be backported. It's also possible that 17.03 does not have the same issue (may have the same effect, but could be a different cause)

Member

thaJeztah commented Oct 9, 2017

@kleptog Docker CE (Community Edition) 17.03 has reached end of life (Docker CE 17.06 soon); Docker EE (Enterprise Edition) has a longer support duration (12 Months currently), but I don't know if this is planned to be backported. It's also possible that 17.03 does not have the same issue (may have the same effect, but could be a different cause)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment