Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlay fix for transient IP reuse #1935

Merged
merged 5 commits into from
Oct 3, 2017

Conversation

fcrisciani
Copy link

It is possible that for a limited period of time the same IP and Mac address are used at the same time.
The previous logic was not able to handle the condition creating connectivity issues that were lasting up to 5 minutes.

Fixes: #1934


if err := d.checkEncryption(nid, nil, 0, true, false); err != nil {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkEncryption is done inside the peerDelete

@@ -1090,15 +1060,6 @@ func (n *network) contains(ip net.IP) bool {
return false
}

func (n *network) getSubnetforIPAddr(ip net.IP) *subnet {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was unused

@codecov-io
Copy link

codecov-io commented Sep 7, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@389e1e6). Click here to learn what that means.
The diff coverage is 0.58%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1935   +/-   ##
=========================================
  Coverage          ?   37.88%           
=========================================
  Files             ?      137           
  Lines             ?    27325           
  Branches          ?        0           
=========================================
  Hits              ?    10351           
  Misses            ?    15703           
  Partials          ?     1271
Impacted Files Coverage Δ
drivers/overlay/encryption.go 0% <ø> (ø)
drivers/overlay/joinleave.go 0% <0%> (ø)
drivers/overlay/peerdb.go 3.81% <0%> (ø)
drivers/overlay/ov_serf.go 30.86% <0%> (ø)
drivers/overlay/ov_network.go 0.26% <0%> (ø)
networkdb/networkdb.go 60% <0%> (ø)
osl/neigh_linux.go 0% <0%> (ø)
drivers/overlay/overlay.go 27.52% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 389e1e6...d93b9b0. Read the comment docs.

@thaJeztah
Copy link
Member

ping @mavenugo @abhinandanpb PTAL; this one should be fixing various stability issues

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some nits, and there's some code churn between commits, but would like @mavenugo or @abhinandanpb to have a look as they are probably more familiar with the code 😄

if err := d.checkEncryption(nid, nil, 0, true, false); err != nil {
logrus.Warn(err)
}
// if err := d.checkEncryption(nid, nil, 0, true, false); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this code instead of commenting it out in this commit?

@@ -232,13 +233,9 @@ func (d *driver) Leave(nid, eid string) error {
}
}

n.leaveSandbox()

// if err := d.checkEncryption(nid, nil, 0, true, false); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this code should probably be in the previous commit (see my other comment) 😄

@@ -68,7 +68,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,

ep.ifName = containerIfName

if err := d.writeEndpointToStore(ep); err != nil {
if err = d.writeEndpointToStore(ep); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you changed these? Overall I think it's clearer to just create a new variable here, as the error is not used outside the if. (It makes the code a bit easier to "grasp", because you know the result is not used outside of the if 😄)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed it as a result of a warning from one of the code checker tools, I can change it back but I also saw that in some other places the proper outer err was never set, so shadowing is risky

@@ -188,7 +188,7 @@ func (n *networkNamespace) AddNeighbor(dstIP net.IP, dstMac net.HardwareAddr, fo
n.Lock()
n.neighbors = append(n.neighbors, nh)
n.Unlock()
logrus.Debugf("Neighbor entry added for IP %v, mac %v", dstIP, dstMac)
logrus.Debugf("Neighbor entry added for IP %v, mac %v on ifc:%s", dstIP, dstMac, nh.linkName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a space after ifc:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I print field especially names or ids, I prefer to not put space between the : in order to make it clear if there is some unprintable or transparent char. IMO the delimiter to help readability is actually the ':'. Does it make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would quoting work? %q? Point is that it's not done like that in many places, so if would still be ambiguous probably

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah realigning the whole project I think is an endless work, maybe I can update all the fields of this log to be consistent at least

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a huge issue, just noticed while looking :)

@@ -224,6 +224,7 @@ func (d *driver) Leave(nid, eid string) error {
return types.InternalMaskableErrorf("could not find endpoint with id %s", eid)
}

logrus.Errorf("The channel is valid:%t", d.notifyCh != nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message is a bit odd "the channel is valid"; should this be "invalid"? Also missing a space after :.

Consider using just Error instead of Errorf, because printing booleans doesn't need formatting;

logrus.Error("The channel is valid: ", d.notifyCh != nil)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry my bad, this is an error logs that I put for debug, I will remove it

// When we get a L3 miss check if its a valid peer and reprogram the neighbor
// entry again. Rate limit it to once attempt every 500ms, just in case a faulty
// container sends a flood of packets to invalid peers
if !l3Miss {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're removing this check in this commit, but adding it again in the last commit?

logrus.Warnf("miss notification for peer:%+v l3Miss:%t l2Miss:%t, if the problem persist check the gc_thresholds", neigh, l3Miss, l2Miss)
pKey, pEntry, err := n.driver.peerDbSearch(n.id, ip)
if !pEntry.isLocal {
logrus.Warnf("miss notification for peer:%+v l3Miss:%t l2Miss:%t, if the problem persist check the gc_thresh on the host pKey:%+v pEntry:%+v err:%v",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add spaces after the colons here as well? (peer: %+v etc)

if time.Since(t) > 500*time.Millisecond {
// The time limit here is to guarantee that the dbSearch is not
// done too frequently causing a stall of the peerDB operations.
if l3Miss && time.Since(t) > 500*time.Millisecond {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The l3Miss check was removed in an earlier commit, and already had a description for the 500ms; perhaps restore that check and description to keep the diff smaller

if i != 1 {
// Transient case, there is more than one endpoint that is using the same IP,MAC pair
s, _ := pMap.mp.String(pKey.String())
logrus.Warnf("peerDbAdd transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a space after the colons here (for readability)?

if i != 0 {
// Transient case, there is more than one endpoint that is using the same IP,MAC pair
s, _ := pMap.mp.String(pKey.String())
logrus.Warnf("peerDbDelete transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a space after the colons here (for readability)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same for other errors)

// Entries which are marked as permanent are never deleted by the garbage-collector.
// The time limit here is to guarantee that the dbSearch is not
// done too frequently causing a stall of the peerDB operations.
if l3Miss && time.Since(t) > 500*time.Millisecond {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that we could combine the else and if into else if.
Tiny point. 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will do

Copy link

@nishanttotla nishanttotla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this PR expected to be part of an upcoming release, or there's no timeline on it yet?

@fcrisciani
Copy link
Author

@nishanttotla I would like so, but still stuck on code review.
@mavenugo @abhinandanpb @vieux PTAL

@abhi
Copy link
Contributor

abhi commented Sep 25, 2017

@fcrisciani will review this today

@vieux
Copy link
Contributor

vieux commented Sep 26, 2017

@abhinandanpb ping :)

if time.Since(t) > 500*time.Millisecond {
t = time.Now()
n.programNeighbor(ip)
} else if l3Miss && time.Since(t) > 500*time.Millisecond {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is existing code, I think 500ms is aggressive. we can even double it to 1second. wdyt ?

Copy link
Contributor

@mavenugo mavenugo Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this is just a warning message without any action, we can even increase this timer to avoid excessive logs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure can bump to 1sec


// If there is still an entry into the database and the deletion went through without errors means that there is now no
// configuration active in the kernel.
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its guaranteed to be one or more. Correct ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep 4 lines above the if dbEntries == 0 { return nil }

Copy link
Contributor

@abhi abhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I have one concern about the scenario I have mentioned in the comment.

Alternate thought not necessary to be implemented: Should we treat this problem as a classic networking mac move scenario. If we detect a mac move , unicast arp for it ?

// If there is still an entry into the database and the deletion went through without errors means that there is now no
// configuration active in the kernel.
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one
peerKey, peerEntry, err := d.peerDbSearch(nid, peerIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it have more than 1 entry for the same IP if there is continuous container add deletes ? From what I see of the internal map set implementation its a map of maps. https://github.com/deckarep/golang-set/blob/master/threadunsafe.go#L36. If thats the case the order will not be honored. Would that be a problem ? How is the peerDbdelete going to handle that ? For eg lets have a node n100,
C1 is on n1 , fdb on n100 will have c1 against 10.1.1.1, n1
C1 appears on n2 , fdb will have c1 still against 10.1.1.1 on n1 , peerdb mapset cache will have this new entry until the previous entry is deleted
now C1 cppears on n3 , fdb might still have the transient state against 10.1.1.1 on n1 so the peerdb cache will have 2 entries.
At this point we will receive the delete for the entry against n1. Now the peerdb search will fetch one of them ? For our scenario lets say we pick the entry for C1 pointing to n3. Now if we get a delete for the entry C1 pointing to n2 - how would that proceed ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we are considering a transient period that should converge "pretty quickly" (according to the timing of the networkdb in delivering notification). The idea here is:

  1. push in the kernel the first configuration
  2. if any other notification arrives for the same IP, save it in the Set
  3. once the delete notification arrives if it deletes the current configuration in the kernel, take the next one available and push it to the kernel (note this can still be not the final one)
  4. the expected state at the end is to have only 1 entry in the set and that is the one that is also configured in the kernel.

For your example let's say that you receive the delete for the entry C1, n2, then nothing happens, remove it from the set and contine.
After that you receive the delete for C1, n100, that is the entry configured in the kernel, then the kernel state is going to be purged and the next entry is configured, in this case C1, n3.
The set now contains only 1 element that is actually the one configured and the only real incarnation of the container C1 with the correct peer destination

eid string
vtep string
peerIPMaskOnes int
peerIPMaskBits int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is peerIPMask, a net.IPMask converted to couple of int, while the vtep net.IP converted to String ?
Isnt it easy to just marshal peerIPMask as a String and that makes it consistent and readable ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only convenience, for the net.IPMask looks like the best way to initialize it back is using that decomposition. Ref: https://golang.org/pkg/net/#IPMask

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked again and looks like this can be converted to string and use the ParseCIDR that will return the IPNet that contains the IPMask object internally, will change

Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcrisciani am done with my review. Pls address the comments before we can merge this PR.

Flavio Crisciani added 5 commits October 2, 2017 11:12
In case of IP reuse locally there was a race condition
that was leaving the overlay namespace with wrong configuration
causing connectivity issues.
This commit introduces the use of setMatrix to handle the transient
state and make sure that the proper configuration is maintained

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
peerDB was never being flushed on network delete
leaveing behind stale entries

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Avoid error logs in case of local peer case, there is no need for deleteNeighbor
Avoid the network leave to readvertise already deleted entries to upper layer

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
// was never been configured into the kernel (we allow only 1 configuration at the time per <ip,mac> mapping)
return nil
// Local peers do not have any local configuration to delete
if !localPeer {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body internally is only indented in the new if condition

@mavenugo
Copy link
Contributor

mavenugo commented Oct 3, 2017

@fcrisciani thanks for addressing the comments.

Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mavenugo mavenugo merged commit 7447e54 into moby:master Oct 3, 2017
@fcrisciani fcrisciani deleted the overlay-setmatrix branch October 3, 2017 17:22
@Nossnevs
Copy link
Contributor

Nossnevs commented Oct 4, 2017

When will this fix be released?
@fcrisciani @mavenugo

@kleptog
Copy link

kleptog commented Oct 9, 2017

And will it be backported to 17.03? We found 17.06 fairly flaky and 17.09 is pretty new.

@thaJeztah
Copy link
Member

@kleptog Docker CE (Community Edition) 17.03 has reached end of life (Docker CE 17.06 soon); Docker EE (Enterprise Edition) has a longer support duration (12 Months currently), but I don't know if this is planned to be backported. It's also possible that 17.03 does not have the same issue (may have the same effect, but could be a different cause)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants