Increase gRPC request timeout to 20 seconds for sending snapshots #2391

nishanttotla · 2017-10-02T18:17:24Z

Since #2375 was merged, we've been getting context deadline exceeded while sending large raft snapshots. After investigation from @anshulpundir and I, it seems that the gRPC context timeout was too short at 2 seconds to be sufficient for sending large snapshots. This PR increases the default value to 45 seconds. Testing from @davidwilliamson confirms that this fixes the issue.

cc @anshulpundir @aluzzardi @stevvooe @aaronlehmann @marcusmartins @mghazizadeh

Signed-off-by: Nishant Totla nishanttotla@gmail.com

dperny · 2017-10-02T18:56:35Z

i'm pretty sure that's going to mean it takes 45 seconds to recognize that a raft member is down.

stevvooe · 2017-10-02T18:59:49Z

I think @dperny's concern is worth investigation.

nishanttotla · 2017-10-02T19:52:47Z

@dperny let me test this to make sure, but your concern is legitimate. This is only the worst case though right? Does changing this timeout mean we need to adjust other Raft settings?

@davidwilliamson in your tests, is it possible to check for @dperny's concern above?

codecov · 2017-10-02T20:38:45Z

Codecov Report

Merging #2391 into master will increase coverage by 0.18%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2391      +/-   ##
==========================================
+ Coverage    60.5%   60.68%   +0.18%     
==========================================
  Files         128      128              
  Lines       26319    26332      +13     
==========================================
+ Hits        15923    15980      +57     
+ Misses       9006     8958      -48     
- Partials     1390     1394       +4

nishanttotla · 2017-10-02T20:50:02Z

@dperny why did you claim that detecting down Raft members will take longer to detect when down? Is the "heartbeat" done through a gRPC call that times out at 2 seconds?

Anyway, based on a discussion with @anshulpundir, I've updated this PR to only increase the time limit for when we send a snapshot. This is a little hacky perhaps, but should avoid other issues.

I would still suggest that @davidwilliamson test this change out before we consider merging it. I will create a build shortly.

anshulpundir

Spoke with @nishanttotla offline about making this a more targeted to only affect snap send messages.

anshulpundir · 2017-10-02T20:53:59Z

Oops, didn't see @nishanttotla previous message, where he already mentioned this. Please ignore.

dperny · 2017-10-02T21:46:19Z

PTAL, this is less of a hack, but might be over-engineered. WDYT? @nishanttotla @anshulpundir

master...dperny:increase-snapshot-timeout

stevvooe · 2017-10-02T21:49:11Z

It would be good to see that this is tunable.

Do we have a PR for the real solution?

nishanttotla · 2017-10-02T21:50:56Z

@dperny looks alright to me, I think that is the right way to engineer it. The only qualm I have with doing this is that if we introduce this LargeSendTimeout option, then it just stays, even after we use streaming to send large snapshots.

What we also need to figure out is whether the core change (in peer.go) is the right one to begin with.

aluzzardi · 2017-10-02T22:23:40Z

Had the same concern as @dperny - I see it got addressed, looks good to me.

@stevvooe Agree with the tunable part generally speaking, however I'd hope we'd get rid of this workaround very soon and therefore the tunable option would become useless.

Can we do a round of manual testing to validate this? Specifically, raft down nodes with and without the patch as well as sending a large snapshot.

Not sure what to think about the current value of 45 seconds. Given the max snapshot size is 128MB, you'd need a transfer speed of at least 23 mbit/s to make it happen within the timeout limit, which I think is a reasonable expectation?

There are a few edge cases but I don't think we can really address those. I believe the raft state machine counts any message for heart beat purposes. If the peer goes down just as we're sending a snapshot, it would take 43 additional seconds to detect it's down. But that's really a edge case and we can't really work around that, can we?

davidwilliamson · 2017-10-02T22:36:17Z

@aluzzardi re: "Given the max snapshot size is 128MB, you'd need a transfer speed of at least 23 mbit/s to make it happen within the timeout limit"

What is the expected behavior if there are, say, five snapshots (current + four historical), i.e., docker swarm update --max-snapshots 4 ?

Is the 45 second timeout (and thus the bandwidth calculation) per snapshot, or for the entire history?

marcusmartins · 2017-10-02T22:41:38Z

@davidwilliamson --max-snapshot only control how many snapshots to keep locally, they don't get synced over the wire so they don't influence that calculation.

anshulpundir · 2017-10-02T23:01:12Z

What is the expected behavior if there are, say, five snapshots (current + four historical), i.e., docker swarm update --max-snapshots 4 ?

Only the latest snapshot is sent @davidwilliamson

anshulpundir · 2017-10-02T23:07:02Z

+1 on making this tunable, since the previous value of 2s for the timeout also seems arbitrary.

anshulpundir · 2017-10-02T23:13:24Z

manager/state/raft/transport/peer.go

+	// adjust timeout to be higher for when a snapshot is being sent. This
+	// is to accommodate for the fact that snapshots can be large.
+	if m.Type == raftpb.MsgSnap {
+		timeout = 45 * time.Second


define a constant for the default ?

nishanttotla · 2017-10-02T23:54:01Z

Added a SendTimeoutSnapshot field to NodeOptions so that the value can be made configurable in the future.

Also reduced the value of the large timeout to 20 seconds. Assuming a 100Mbps connection, about 240MB can be sent in a period of 20 seconds. This is well over our allowed limit.

anshulpundir · 2017-10-03T00:01:20Z

manager/state/raft/raft.go

-	SendTimeout    time.Duration
-	TLSCredentials credentials.TransportCredentials
-	KeyRotator     EncryptionKeyRotator
+	SendTimeout time.Duration


It would be nice to be able to modify this without a code change, for testing etc. Is that currently possible ?

That would have to be wired through the CLI and require more changes. I don't think that should be part of this PR.

I wasn't recommending that it be a part of this PR. But, I was curious if there is an easy way to add customizable options to swarmkit. Is the answer no ? Also, is CLI the only way to do it ?

It's certainly possible (and not hard) to make this configurable at the SwarmKit level, but assuming that it'll mostly be used through the Docker API means it's not very valuable unless it's done end-to-end. This is just my opinion.

anshulpundir · 2017-10-03T00:01:43Z

manager/state/raft/transport/mock_raft_test.go

-		Raft:              mr,
+		HeartbeatInterval:   3 * time.Second,
+		SendTimeout:         2 * time.Second,
+		SendTimeoutSnapshot: 20 * time.Second,


I'd still suggest making this (and others too) a constant and using it here and in raft.go

constants won't make this clearer, just harder to read.

Well, its better to have it defined in one place than inline in many places. Agreed that that other place is test code, but still.

Let's do that in a follow-up if we think it's required.

This should be a simple change, what do you think ? Also, I don't think there is a rush to get this in since 17.03 RC2 isn't waiting for this.

stevvooe · 2017-10-03T00:42:20Z

manager/state/raft/raft.go

+	SendTimeout time.Duration
+	// SendTimeoutSnapshot is the timeout on the sending snapshots to other raft
+	// nodes. Leave this as 0 to get the default value.
+	SendTimeoutSnapshot time.Duration


This should be SendSnapshotTimeout.

aaronlehmann

This looks good to me after the outstanding comments are addressed.

The same problem might exist for AppendEntries. Snapshots are most likely to take a long time to send, but sending a lot of entries outside a snapshot could take a long time too.

I think it is safe to increase the timeout for the general case. It's true that it might take a leader longer to notice that a follower node is down, but this case isn't particularly important. It's more important when the leader goes down and some other node needs to start a leader election, but this is handled by the Raft state machine and I don't think the timeout has any influence on this. Of course, it would be a good idea to test any changes here carefully.

nishanttotla · 2017-10-03T19:23:01Z

@aaronlehmann for AppendEntries, are all entries required to be sent in a single gRPC request? It might be worth increasing the timeout for that too.

I understand that increasing the overall timeout may work out, but given that we're able to be specific here, I think we should only increase timeout where necessary.

anshulpundir · 2017-10-03T19:36:52Z

are all entries required to be sent in a single gRPC request? It might be worth increasing the timeout for that too.

Even if all the entries are sent together, does increasing the grpc message size have any affect on this ? @nishanttotla @aaronlehmann Basically I'm curious why you feel that the timeout for that also needs to be increased.

anshulpundir · 2017-10-03T23:25:40Z

The MaxSizePerMsg, which limits the number of entries in an append message, is set to 64K. Depending on the size of each entry, a larger grpc message size can lead to larger append messages.

Also, depending on the size of each entry, MaxSizePerMsg should probably be lowered.

stevvooe · 2017-10-03T23:35:30Z

LGTM

aaronlehmann · 2017-10-04T02:44:04Z

for AppendEntries, are all entries required to be sent in a single gRPC request? It might be worth increasing the timeout for that too.

I think it would be fine to split these across multiple gRPC requests if we detect that a lot of data would be sent. It would also be okay to increase the timeout. Whatever's easiest.

anshulpundir

We can increase the timeout for specifically for append entries message, if thats possible. Otherwise, reducing MaxSizePerMsg seems simpler.

nishanttotla · 2017-10-05T21:34:38Z

Based on a chat with @anshulpundir, we have decided to pursue the case of AppendEntries in a follow-up PR. We think more discussion is needed for that.

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>

nishanttotla · 2017-10-06T17:34:16Z

After more discussion, @anshulpundir and I have decided to just increase the timeout for AppendEntries, since it seems right, and we still have the upper limit of 128MB for the gRPC message. I've updated that. I think we can now merge this PR.

cc @aaronlehmann

nishanttotla · 2017-10-09T17:13:58Z

@thaJeztah @andrewhsu we must cherry pick this PR along with the gRPC limit increase. This must go into 17.09 as well.

nishanttotla added area/raft priority/P1 labels Oct 2, 2017

nishanttotla force-pushed the increase-grpc-timeout branch from 3dbe6ea to 86fccb3 Compare October 2, 2017 20:38

nishanttotla force-pushed the increase-grpc-timeout branch from 86fccb3 to 1815b5f Compare October 2, 2017 20:40

anshulpundir reviewed Oct 2, 2017

View reviewed changes

nishanttotla changed the title ~~Increase gRPC request timeout to 45 seconds~~ Increase gRPC request timeout to 20 seconds for sending snapshots Oct 2, 2017

nishanttotla force-pushed the increase-grpc-timeout branch from 1815b5f to c0586aa Compare October 2, 2017 23:51

anshulpundir reviewed Oct 3, 2017

View reviewed changes

stevvooe reviewed Oct 3, 2017

View reviewed changes

aaronlehmann reviewed Oct 3, 2017

View reviewed changes

nishanttotla force-pushed the increase-grpc-timeout branch from c0586aa to df42e10 Compare October 3, 2017 19:18

anshulpundir reviewed Oct 5, 2017

View reviewed changes

nishanttotla force-pushed the increase-grpc-timeout branch from df42e10 to 56f1d85 Compare October 5, 2017 20:41

Increase gRPC request timeout to 20 seconds when sending snapshots

e3e2821

Signed-off-by: Nishant Totla <nishanttotla@gmail.com>

nishanttotla force-pushed the increase-grpc-timeout branch from 56f1d85 to e3e2821 Compare October 6, 2017 17:32

aaronlehmann approved these changes Oct 7, 2017

View reviewed changes

anshulpundir approved these changes Oct 7, 2017

View reviewed changes

nishanttotla merged commit 1e80cfb into moby:master Oct 9, 2017

nishanttotla deleted the increase-grpc-timeout branch October 9, 2017 13:12

nishanttotla added the process/cherry-pick label Oct 9, 2017

simonferquel mentioned this pull request Oct 30, 2017

Added support for swarm service isolation mode moby/moby#34424

Merged

thaJeztah mentioned this pull request Oct 30, 2017

Revendored Swarmkit moby/moby#35326

Merged

nishanttotla mentioned this pull request Nov 3, 2017

[17.11] Increase gRPC request timeout to 20 seconds when sending snapshots #2425

Closed

dperny mentioned this pull request Feb 10, 2023

Large snapshot causes adding a new manager to fail #3113

Closed

Increase gRPC request timeout to 20 seconds for sending snapshots #2391

Increase gRPC request timeout to 20 seconds for sending snapshots #2391

Conversation

nishanttotla commented Oct 2, 2017

dperny commented Oct 2, 2017

stevvooe commented Oct 2, 2017

nishanttotla commented Oct 2, 2017

codecov bot commented Oct 2, 2017 • edited Loading

Codecov Report

nishanttotla commented Oct 2, 2017

anshulpundir left a comment

Choose a reason for hiding this comment

anshulpundir commented Oct 2, 2017

dperny commented Oct 2, 2017

stevvooe commented Oct 2, 2017

nishanttotla commented Oct 2, 2017

aluzzardi commented Oct 2, 2017

davidwilliamson commented Oct 2, 2017

marcusmartins commented Oct 2, 2017

anshulpundir commented Oct 2, 2017

anshulpundir commented Oct 2, 2017

Choose a reason for hiding this comment

nishanttotla commented Oct 2, 2017

anshulpundir Oct 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anshulpundir Oct 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anshulpundir Oct 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronlehmann left a comment

Choose a reason for hiding this comment

nishanttotla commented Oct 3, 2017

anshulpundir commented Oct 3, 2017

anshulpundir commented Oct 3, 2017

stevvooe commented Oct 3, 2017

aaronlehmann commented Oct 4, 2017

anshulpundir left a comment

Choose a reason for hiding this comment

nishanttotla commented Oct 5, 2017

nishanttotla commented Oct 6, 2017

nishanttotla commented Oct 9, 2017

codecov bot commented Oct 2, 2017 •

edited

Loading

anshulpundir Oct 3, 2017 •

edited

Loading

anshulpundir Oct 3, 2017 •

edited

Loading

anshulpundir Oct 3, 2017 •

edited

Loading