Set grpc max message size to 128MB. #2375

anshulpundir · 2017-09-15T07:59:20Z

Signed-off-by: Anshul Pundir anshul.pundir@docker.com

codecov · 2017-09-15T08:11:41Z

Codecov Report

Merging #2375 into master will increase coverage by 0.09%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2375      +/-   ##
==========================================
+ Coverage   60.35%   60.45%   +0.09%     
==========================================
  Files         128      128              
  Lines       26267    26268       +1     
==========================================
+ Hits        15854    15880      +26     
+ Misses       9011     8993      -18     
+ Partials     1402     1395       -7

nishanttotla

I'd like to know the full side effects of doing this. My hunch is that message encoding gets affected for larger message size limits. Also related: https://stackoverflow.com/questions/34128872/google-protobuf-maximum-size

Ping @stevvooe.

wsong · 2017-09-15T21:18:53Z

I'd also recommend that we make this configurable. The default value for this parameter can stay at 128MB or whatever, but it'd be nice if we read this value out of /etc/docker/daemon.json and passed it into the swarmkit code in the moby/moby repo.

anshulpundir · 2017-09-15T21:32:56Z

I'd also recommend that we make this configurable. The default value for this parameter can stay at 128MB or whatever, but it'd be nice if we read this value out of /etc/docker/daemon.json and passed it into the swarmkit code in the moby/moby repo.

Agreed @wsong

I'd like to know the full side effects of doing this.

I'd like to also :)

My hunch is that message encoding gets affected for larger message size limits.

I think encoding and decoding depends on actual message size and the structure of the data. For example, a deeply nested struct would be more costly to encode/decode. I don't think changing the limit affects encoding/decoding performance. @nishanttotla

aluzzardi · 2017-09-18T18:44:10Z

The default value for this parameter can stay at 128MB or whatever, but it'd be nice if we read this value out of /etc/docker/daemon.json and passed it into the swarmkit code in the moby/moby repo.

I'm not sure about that - when moving to the "proper" solution (streaming), the config option won't make sense anymore.

I don't know if it makes sense to introduce a new option in a patch release just so we can deprecate it immediately after.

aluzzardi · 2017-09-18T18:46:04Z

manager/manager.go

@@ -231,6 +231,7 @@ func New(config *Config) (*Manager, error) {
 		grpc.Creds(config.SecurityConfig.ServerTLSCreds),
 		grpc.StreamInterceptor(grpc_prometheus.StreamServerInterceptor),
 		grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor),
+		grpc.MaxMsgSize(1024 * 1024 * 128),


nit: This could be a constant

Use shift syntax: 128 << 20.

aluzzardi · 2017-09-18T18:46:37Z

Overall looks good

What is going to happen when we reach 128M? Any way we could log a proper error message?

nishanttotla · 2017-09-18T18:57:28Z

I think some log messages should be included to bubble up errors from hitting the gRPC limit (either in this PR or as an immediate follow up).

If we increase the limit to 128MB, then is it worth the engineering effort to implement streaming in addition? If we do implement streaming later, then we'd want to bring the limit down again. This may cause backward compatibility issues.

I agree with @aluzzardi that if we introduce this option, it doesn't make sense to immediately deprecate it.

anshulpundir · 2017-09-18T19:07:25Z

We just met and agreed to not add the knob to adjust the max grpc message size.
I will manually test the positive test case today and investigate adding an integration test for positive and negative scenarios.

stevvooe · 2017-09-18T20:42:54Z

-1 on making this configurable.

If we increase the limit to 128MB, then is it worth the engineering effort to implement streaming in addition?

Yes, this is a horrible thing to have as a scaling limit and it needs to be addressed. Besides patching this as a workaround, the next focus should be actually fixing this to handle arbitrary snapshot sizes. Increasing the message size limit and calling it good is not an acceptable engineering solution. Increasing this limit exposes us to message floods on all services, not just the raft service.

Is there anything we can add to log messages that are over some threshold, so that we can have idea of the level of impact this is having? Prom counters bucketed by size might be appropriate. Such large messages will have an effect on stability at around 10MB and up with protobuf. The allocations required for processing a 128MB message may cause stability issues. I'd really like to know if there are messages that are larger than 1MB transiting swarm GRPC endpoints, as those are likely causing "buffer bloat" and allocation spikes. We really need to have an easy way for users to understand when they are impacted by this problem.

wsong · 2017-09-18T20:46:59Z

Yeah, after meeting offline, I agree that this should not be configurable.

Adding a metric to track large (i.e. > 1MB) GRPC messages is definitely a good idea, but it should probably go in a separate PR. Logging the length of every single message before sending it might have some unforeseen performance implications.

stevvooe · 2017-09-18T21:02:06Z

Logging the length of every single message before sending it might have some unforeseen performance implications.

No really what I was suggesting:

if len(msg) > 1 << 20 {
 // log it.
}

Better would be to have bucketed counters (unrolled below):

if len(msg) >= 1 << 20 {
  gt1MB++
}

if len(msg) >= 4 << 20 {
  gt4MB++
}

if len(msg) >= 10 << 20 {
  gt10MB++
}

if len(msg) >= 16 << 20 {
  gt16MB++
}
...

Notice that this is a cumulative counter setup, rather than bucketed, which allows easy aggregates.

probably go in a separate PR.

Sure, but let's make sure it gets backported: right now, this condition is hard to diagnose and debug.

anshulpundir · 2017-09-19T06:53:51Z

Manually tested for now. Positive test-case for around 120Mib. Negative for around 240Mib. Will explore the metrics and error reporting on a separate PR.

andrewhsu · 2017-09-19T16:51:11Z

Hmm...looks like the linter changed a few hours ago: golang/lint#319

I'm seeing these errors from the circleci job:

🐳 lint
ca/keyreadwriter.go:190:2: redundant if ...; err != nil check, just return error instead.
manager/allocator/network.go:175:2: redundant if ...; err != nil check, just return error instead.
manager/controlapi/network.go:99:2: redundant if ...; err != nil check, just return error instead.
manager/controlapi/service.go:59:2: redundant if ...; err != nil check, just return error instead.
manager/controlapi/service.go:164:2: redundant if ...; err != nil check, just return error instead.
manager/controlapi/service.go:484:2: redundant if ...; err != nil check, just return error instead.
manager/dispatcher/dispatcher.go:857:3: redundant if ...; err != nil check, just return error instead.
manager/logbroker/broker_test.go:405:3: redundant if ...; err != nil check, just return error instead.
manager/logbroker/broker_test.go:527:3: redundant if ...; err != nil check, just return error instead.
manager/logbroker/broker_test.go:658:3: redundant if ...; err != nil check, just return error instead.
manager/orchestrator/update/updater.go:387:4: redundant if ...; err != nil check, just return error instead.
manager/scheduler/scheduler.go:95:2: redundant if ...; err != nil check, just return error instead.
manager/state/raft/raft.go:1286:2: redundant if ...; err != nil check, just return error instead.
manager/state/raft/raft.go:1851:2: redundant if ...; err != nil check, just return error instead.
manager/state/raft/storage/storage.go:229:2: redundant if ...; err != nil check, just return error instead.
manager/state/raft/transport/transport.go:238:2: redundant if ...; err != nil check, just return error instead.
make: *** [lint] Error 1

marcusmartins · 2017-09-19T16:52:03Z

@anshulpundir @nishanttotla the CI is failing on lint. Do you know why?

thaJeztah · 2017-09-19T17:11:27Z

Can we pin the linter to a specific version to prevent that?

wsong · 2017-09-19T17:14:21Z

Alternately, it may be faster to just open up a separate PR to fix those places in the code; it looks like there might not be too many.

anshulpundir · 2017-09-19T17:18:28Z

Alternately, it may be faster to just open up a separate PR to fix those places in the code; it looks like there might not be too many.

Doing this right now.

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>

* Fix linter errors. Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit aa2c48b) Signed-off-by: Andrew Hsu <andrewhsu@docker.com> * Set grpc max message size to 128MB. (#2375) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>(cherry picked from commit b1bcc05) Signed-off-by: Andrew Hsu <andrewhsu@docker.com> * run vndr again because it now deletes unused files Signed-off-by: Andrew Hsu <andrewhsu@docker.com>

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Victor Vieux <victorvieux@gmail.com>

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>

anshulpundir requested a review from aluzzardi September 15, 2017 07:59

anshulpundir mentioned this pull request Sep 15, 2017

Large snapshots prevent the addition of new managers to the cluster #2374

Closed

anshulpundir requested a review from nishanttotla September 15, 2017 08:05

nishanttotla reviewed Sep 15, 2017

View reviewed changes

aluzzardi reviewed Sep 18, 2017

View reviewed changes

nishanttotla added the area/raft label Sep 18, 2017

anshulpundir force-pushed the msg_size branch from 41c7e7d to 58aa594 Compare September 19, 2017 06:51

Set grpc max message size to 128MB.

7af6543

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>

anshulpundir force-pushed the msg_size branch from 58aa594 to 7af6543 Compare September 19, 2017 17:37

anshulpundir merged commit b1bcc05 into moby:master Sep 19, 2017

anshulpundir deleted the msg_size branch September 19, 2017 18:03

andrewhsu mentioned this pull request Sep 19, 2017

[17.06] backport fixes for max grpc message size and lint errors #2378

Merged

vieux pushed a commit that referenced this pull request Sep 19, 2017

Set grpc max message size to 128MB. (#2375)

cabf6f9

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Victor Vieux <victorvieux@gmail.com>

anshulpundir added a commit that referenced this pull request Sep 22, 2017

Set grpc max message size to 128MB. (#2375)

70926a2

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>

anshulpundir added a commit that referenced this pull request Sep 22, 2017

Set grpc max message size to 128MB. (#2375)

509ef5b

Signed-off-by: Anshul Pundir <anshul.pundir@docker.com> (cherry picked from commit b1bcc05) Signed-off-by: Anshul Pundir <anshul.pundir@docker.com>

nishanttotla mentioned this pull request Oct 2, 2017

Increase gRPC request timeout to 20 seconds for sending snapshots #2391

Merged

nishanttotla mentioned this pull request Oct 11, 2017

[17.10] Increase gRPC request timeout to 20 seconds when sending snapshots #2403

Merged

thaJeztah mentioned this pull request Nov 21, 2017

[17.09] Fix reapTime logic in NetworkDB + handle cleanup DNS for attachable container moby/libnetwork#2017

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set grpc max message size to 128MB. #2375

Set grpc max message size to 128MB. #2375

anshulpundir commented Sep 15, 2017

codecov bot commented Sep 15, 2017 •

edited

Loading

nishanttotla left a comment

wsong commented Sep 15, 2017

anshulpundir commented Sep 15, 2017 •

edited

Loading

aluzzardi commented Sep 18, 2017

aluzzardi Sep 18, 2017

stevvooe Sep 18, 2017

aluzzardi commented Sep 18, 2017

nishanttotla commented Sep 18, 2017

anshulpundir commented Sep 18, 2017

stevvooe commented Sep 18, 2017

wsong commented Sep 18, 2017

stevvooe commented Sep 18, 2017

anshulpundir commented Sep 19, 2017

andrewhsu commented Sep 19, 2017

marcusmartins commented Sep 19, 2017

thaJeztah commented Sep 19, 2017

wsong commented Sep 19, 2017

anshulpundir commented Sep 19, 2017 •

edited by vieux

Loading

Set grpc max message size to 128MB. #2375

Set grpc max message size to 128MB. #2375

Conversation

anshulpundir commented Sep 15, 2017

codecov bot commented Sep 15, 2017 • edited Loading

Codecov Report

nishanttotla left a comment

Choose a reason for hiding this comment

wsong commented Sep 15, 2017

anshulpundir commented Sep 15, 2017 • edited Loading

aluzzardi commented Sep 18, 2017

aluzzardi Sep 18, 2017

Choose a reason for hiding this comment

stevvooe Sep 18, 2017

Choose a reason for hiding this comment

aluzzardi commented Sep 18, 2017

nishanttotla commented Sep 18, 2017

anshulpundir commented Sep 18, 2017

stevvooe commented Sep 18, 2017

wsong commented Sep 18, 2017

stevvooe commented Sep 18, 2017

anshulpundir commented Sep 19, 2017

andrewhsu commented Sep 19, 2017

marcusmartins commented Sep 19, 2017

thaJeztah commented Sep 19, 2017

wsong commented Sep 19, 2017

anshulpundir commented Sep 19, 2017 • edited by vieux Loading

codecov bot commented Sep 15, 2017 •

edited

Loading

anshulpundir commented Sep 15, 2017 •

edited

Loading

anshulpundir commented Sep 19, 2017 •

edited by vieux

Loading